当前位置:网站首页>Re18: Read the paper GCI Everything Has a Cause: Leveraging Causal Inference in Legal Text Analysis
Re18: Read the paper GCI Everything Has a Cause: Leveraging Causal Inference in Legal Text Analysis
2022-07-30 09:53:00 【The gods are silent】
论文名称:Everything Has a Cause: Leveraging Causal Inference in Legal Text Analysis
论文ArXiv下载地址:https://arxiv.org/abs/2104.09420
论文NAACL官方下载地址:https://aclanthology.org/2021.naacl-main.155/(The website has an official explainer video)
官方GitHub项目:xxxiaol/GCI: Code for Everything Has a Cause: Leveraging Causal Inference in Legal Text Analysis.
本文是2021年NAACL论文,The author is from Peking University.
This paper applies the method of causal inference on legal text data,构建了因果图,to help make decisions.This paper conducts experiments on the task of identifying and analyzing similar crimes,It is proved that this paradigm is valid,Injecting causal knowledge into neural networks can improve model performance,and can provide interpretability,尤其在few-shot条件下.
Causality is also used in classification tasks.
Discrimination and analysis of crimes with similar taskssimilar charge disambiguation:multi-class分类,But tag sets aresimilar charge set(The input is the fact description text)
本文解决了2difficult task:①Unsupervised extraction is associated with prediction resultsfactor,会有噪音.②Combining traditional causal inference models with modern neural network architectures.
文章目录
1. Background
1.1 因果推理
自变量treatment,因变量outcome,The amount of change in the independent variableintervention,Calculate whether and how the dependent variable changes when the independent variable is perturbed,It's causal inference
Confounder:变量,Both independent and dependent variables are affected
treated group是自变量为1的,反之是untreated group
1.2 因果图
factor(本文中所有factor都是二元变量)和charge是节点,Cause and effect are edges
The traditional causal inference method combined with text is relatively simple,Just treat the text as a node directly,without considering the differenceaspectsuch as events.
1.3 PAG
2. 模型
Automatically build cause-and-effect graphs from factual description text,Use causal reasoning to assist legal decision making,本文中similar charge disambiguationThe effect of the framework was tested on the task.
①用关键词抽取(用YAKE+IDFCalculate the importance of the word to the count),to identify the factual descriptionkey factors.②将相似的key factorsCluster into groups,Each group is treated as an independent node.(Each group and count of nodes in the graph)③Robust to unidentified variables(Unsupervised extraction may lead to incomplete keywords,Causal discovery was unrecognizedconfounder)的causal discovery algorithm(Greedy Fast Causal Inference (GFCI))to construct a causal diagram.(输出是Partial Ancestral Graph (PAG))(can be seen in the appendix,This algorithm can identify hiddenfactor)(限制:1. Prohibition of the accused node out of the edge.2. 以案例(Event description text)chronological order to limit causality)(Sampling cause and effect diagrams)④Estimate for each edgecausal strengthto reduce the impact of unreliable edges.(保持Confounder不变)(方法:Average Treatment Effect (ATE))(估算ATE的方法:Propensity Score Matching (PSM) 在treated/untreated groupto construct similar sample pairs)
Incorporate causal knowledge into NN中:①在NN attention weights上加入causal strength限制(Add loss function).②Used on causal chains extracted from causal diagramsRNN.
3. 实验
3.1 数据集
本文使用的是CAIL数据集.
3.2 实验结果
proven results:(1) The constructed causal diagram is reasonable.(2) Can capture subtle differences in text,Especially when there is very little training data.
Demonstrate causal superioritybaseline:GCI-co(correlation-based graph,factorPreviously such as the Pearson correlation coefficient>0.5from the earlier ones with higher frequencyfactorConnect one edge to the other)
Explain why paradigms such as multitasking are not used、No pretrained model is used,Previous work incorporating causal inference failed to capture causal relationships within the text and therefore did nothingbaseline.
The performance of the model under different training set scales is compared.
Every experiment is in3run on a random seed,以平均ACC和macro-F1作为指标.
3.3 Cause-and-effect graph quality analysis
Robustness of the causal discovery process,Sensitivity analysis of cause and effect diagrams
- Random Confounder
- Placebo Treatment
- Subset of Data
3.4 人工评估:看attention
3.5 讨论
Coarse granularity due to clustering、Negative semantics、pronoun resolution、intent
3.6 Gender equity issues arising from data imbalances
False Positive Equality Difference (FPED) and False Negative Equality Difference (FNED)
边栏推荐
- 积分专题笔记-与路径无关条件
- 团队级敏捷真的没你想的那么简单
- 转行软件测试,报培训班3个月出来就是高薪工作,靠谱吗?
- Google Cloud Spanner的实践经验
- Integral Topic Notes - Path Independent Conditions
- leetcode 剑指 Offer 58 - I. 翻转单词顺序
- 嘉为鲸翼·多云管理平台荣获信通院可信云技术服务最佳实践
- leetcode 剑指 Offer 10- II. 青蛙跳台阶问题
- sort函数使用cmp出错Line 22: Char 38: error: reference to non-static member function must be called
- 虚幻引擎图文笔记:could not be compiled. Try rebuilding from source manually.问题的解决
猜你喜欢
20220728使用电脑上的蓝牙和汇承科技的蓝牙模块HC-05配对蓝牙串口传输
An article to understand service governance in distributed development
分布式系统大势所趋,银行运维如何与时俱进?
XP电源维修fleXPower电源X7-2J2J2P-120018系列详解
How to use Jmeter to carry out high concurrency in scenarios such as panic buying and seckill?
仿牛客网项目第一章:开发社区首页(详细步骤和思路)
How to avoid CMDB becoming a data island?
PyQt5快速开发与实战 8.1 窗口风格
shell脚本
大数据产品:标签体系0-1搭建实践
随机推荐
快解析结合泛微OA
Jetpack Compose 从入门到入门(八)
Use the R language to read the csv file into a data frame, and then view the properties of each column.
Apache DolphinScheduler新一代分布式工作流任务调度平台实战-上
Two solutions for Excel xlsx file not supported
Only after such a stage of development can digital retail have a new evolution
快解析结合友加畅捷通t1飞跃版
涛思 TDengine 2.6+优化参数
MySQL【运算符】
leetcode 剑指 Offer 57. 和为s的两个数字
els 方块、背景上色
Concise Notes on Integrals - Types of Curve Integrals of the First Kind
一个近乎完美的 Unity 全平台热更方案
【 HMS core 】 【 】 the FAQ HMS Toolkit collection of typical questions 1
自动化测试selenium(一)
Integral Special Notes - Definition of Integral
CSDN21天学习挑战赛
利用R语言读取csv文件入一个数据框,然后查看各列的属性。
Jenkins 如何玩转接口自动化测试?
leetcode 剑指 Offer 25. 合并两个排序的链表