当前位置:网站首页>Re11: read EPM legal judgment prediction via event extraction with constraints
Re11: read EPM legal judgment prediction via event extraction with constraints
2022-07-28 17:02:00 【The gods were silent】
The gods were silent - personal CSDN Blog Directory
Title of thesis :Legal Judgment Prediction via Event Extraction with Constraints
The paper ACL Official download address :https://aclanthology.org/2022.acl-long.48/
Paper official GitHub project :WAPAY/EPM
This article is about 2022 year ACL The paper , The author is from Nanjing University .
This article focuses on CAIL Prediction of legal decisions on data sets legal judgment prediction problem , That is, take the case event description text as input , Prediction method 、 charges 、 Term of imprisonment , It's multitasking multi-class Classification task . Restrictions on the use of this article ( Add a penalty term to the loss function ) To take advantage of the relationship between the three subtasks
The intermediate task is to extract event features , Use event information to assist in predicting the decision result .
List of articles
1. Background & Motivation
This article considers the past LJP The reason for the misprediction of the model lies in the wrong positioning of the key event information that affects the decision result , And not used LJP Cross task consistency constraints between subtasks ( That is to say, a specific law can only correspond to a specific crime and sentence ), Therefore, this paper proposes a prediction model based on events and constraints EPM To solve these problems .
The law consists of event mode (event pattern) And judgment (judgment)/ punishment (penalty) Two parts . This paper believes that as long as the event information in the case can be extracted , Can predict the correct verdict .
① Extract events to assist LJP Mission ( It is believed that the previous model mispredicted events, resulting in prediction failure ).
② Between event output and subtask constraint( Increase when certain conditions are not met penalty. Restrict certain event roles to appear 、 Event types must correspond , Certain laws will restrict charge and terms of penalty Range of options . This list of specific constraints is given in the code ).
2. Problem definition and model introduction
2.1 Define hierarchical events
( It is different from the traditional definition of events in the legal field , In order to trigger types and argument roles Can be used for LJP Task defined )
Define legal events based on legal provisions , Because the law is hierarchical , Therefore, the corresponding defined events are also hierarchical 
Fine grained events :
event trigger: Indicates the occurrence of an event , Match specific events ( Such as events Robbery Corresponding trigger type by Trigger-Rob)
event role: Type of event element ( The feeling can be compared to ,role It's a class ,argument Is the instance )
token labeling Task paradigm :subordinate trigger( If it's time to token yes trigger Part of ) perhaps subordinate role type( If it's time to token yes argument Part of )
2.2 EPM
I swear this is the most magical model I have seen this year , This is too folding !
Joint training :① Extract events .② Multi task classification using event characteristics ( Use a text feature to do attention, Consider the event output constraint And multitasking constraint).
baseline edition /EPM Complete model ( use Switch Classifier to switch : See the experiment section later )
baseline edition : Use facts to describe textual representations (context features) And bar representation (article embeddings) do attention, Then I do 3 Category tasks
EPM edition : Use the event representation of the extracted event , And the corresponding token The representation of concat The characterization obtained after (event features), Replace baseline Medium context features

Token representation layer: Using pre-trained Legal BERT The model implements fact description and text representation
The text representation of fact description max pooling, obtain context feature( This is in baseline It's used in , stay EPM Will be replaced by the event features that will be introduced later )
Use the semantic information of the rule : use Token representation layer Carry out character characterization 、 Use max pooling Get the characterization of each bar , Then use this and context feature do attention:
Legal judgment prediction layer: Implement a linear classifier for each subtask 
hierarchical event extraction layer:①superordinate module: Calculate each fact description text token The representation vector of comes with superordinate types/roles Of correlation score ②subordinate module: Calculate based on hierarchical information subordinate type/role The probability of distribution of
- Each... Is represented by a trainable vector superordinate type/role The semantic features of , Use the full connection layer to calculate each token With each superordinate type/role Of correlation score:

- use softmax Calculate each token Of superordinate type/role feature( A weighted sum , Soft representation )

- forecast token Belong to subordinate type/role Probability : The input feature is concat token Characterization and superordinate type/role feature:

- use CRF Generate the highest score types/roles Sequence :

- Use the predicted types Sequence to generate event characteristics : Take each one out span The text representation of (token Characterized by max pooling obtain ) and subordinate type/role embedding Conduct concat, obtain span The representation of ; And then all span The representation takes max pooling, Get the final event characteristics
- Replace the previous text with event characteristics baseline With context feature
Training stage loss function :
3 The loss function of the subtask is cross entropy .
Loss function of event extraction and total loss function ( This penalty without event output limit ):
Event output limit : If specific trigger or role defect , Will increase penalty; given trigger type A specific role must appear 

Consistency constraints between multiple tasks : The predicted law will limit charge and term penalty Range of options ( During training , If the legal prediction is correct , Add... To the loss function mask I talked with my younger brother about feeling as a single category , It's useless to train , But in ablation study in article It will also be affected , So the training stage should also have an impact . In addition, the loss function here in the original paper has two consecutive plus signs , But other loss function formulas are on single samples , So it is suspected that it is wrong. This is the case . I asked the author , be supposed to ① Did label smoothing, therefore mask It will always work .② The data itself has noise , So the real label of training set is not necessarily right , So when y by 1 when mask Not necessarily 1. Always add mask)
author の reply :
The code also follows from one-hot Changed to label smoothing.
Add mask Methods , Directly set the probability of categories that are not allowed to be output to 0:
3. experiment
3.1 Data sets
CAIL(big and small Two data sets )
New dataset LJP-E( Manually marked 15 Event information of cases on charges )
3.2 baseline
①baseline: Remove event extraction and constraint、 Replace the event features with facts to describe the context features of the text EPM Model .
③EPM The model first removes the event part ( finger ① With baseline Model ) In the original dataset CAIL Pre training on the training set , Then mark the data set of event information LJP-E On the training set .( problem :trick? I feel that I have gained fact description Information about , Unfair )
3.3 Experimental setup
Super parameter setting :Legal-Bert The longest input length of is 512, use Adam As an optimizer , The learning rate is 1 0 − 4 10^{-4} 10−4,batch size by 32,warmup step by 3000. Models train at most 20 individual epoch, Save each subtask on the validation set Macro-F1 The highest checkpoint(CAIL-big No validation set , Use directly CAIL-small The verification set of ). stay LJP-E Running on a dataset 5 An experiment , Report average results .
stay ② Four of the total loss functions listed in λ yes 0.5, 0.5, 0.4, 0.2, The event output limits the superparameter of the penalty λ p \lambda_p λp yes 0.1
Use 2 individual Tesla V100 GPUs Run the experiment .
3.4 The main experimental results
The indicators used to measure the model :Accuracy (Acc), Macro-Precision (MP), Macro-Recall (MR) and Macro-F1 (F1)
( In the table of experimental results gold or @G Refers to the use of real event tags ( Instead of predicting events ) To generate results )
stay LJP-E The results of the experiment on :
stay CAIL The results of the experiment on :
because LJP-E The dataset contains only 15 Types of cases , So first CAIL One was trained in the training set legal BERT, use [CLS] token Whether the representation prediction case belongs to this 15 One of the species ( This classifier is called Switch.batch size by 32, Training 20 individual epoch, use Adam Be an optimizer , The learning rate is 0.0001, stay CAIL-big The accuracy on is 89.82%,CAIL-small by 85.32%), If so, use EPM To predict the , If not, use the one before fine-tuning EPM(③ In the pre training EPM) To predict the .
Except for direct use EPM And SOTA Outside the model :
- In contrast SOTA Add EPM( If Switch The prediction case belongs to LJP-E Of 15 One of the species , Just use fine-tuned EPM To classify ; On the contrary, use the original model to classify )( I think it's strange to add it up directly )
- modify TOPJUDGE Model ( The results of the TOPJUDGE+Event): take CNN encoder Switch to LSTM, Replace the original input fact description representation with event characteristics . The effect will be better than direct TOPJUDGE+EPM Worse , Explain to take it directly EPM When the black box is used, the effect will be better .


Indicators of event extraction :
3.5 model analysis
3.5.1 Ablation Study
- Delete the event element
- Delete the event output limit (absolute constraint→CSTR1,event-based consistency constraint→CSTR2)
- Remove restrictions between subtasks (article-charge constraint→DEP1,article-term constraint→DEP2)
- Delete Superordinate types, The model directly predicts token Of superordinate features
- take event extraction As auxiliary task( and LJP Task sharing encoder)


4. Code reappearance
I asked the author :
The author replied :
I’m right here waiting for you!
边栏推荐
- Installation of QT learning
- [deep learning]: day 9 of pytorch introduction to project practice: dropout implementation (including source code)
- Leetcode learn to insert and sort unordered linked lists (detailed explanation)
- 【深度学习】:《PyTorch入门到项目实战》第四天:从0到1实现logistic回归(附源码)
- Re13:读论文 Gender and Racial Stereotype Detection in Legal Opinion Word Embeddings
- 3D建模工具Archicad 26全新发布
- 做题笔记2(两数相加)
- Oracle table partition
- Microsoft: edge browser has built-in disk cache compression technology, which can save space and not reduce system performance
- Implementation of paging
猜你喜欢

Probability theory and mathematical statistics Chapter 1

【深度学习】:《PyTorch入门到项目实战》第八天:权重衰退(含源码)

ERROR: transport library not found: dt_ socket

Do you really understand CMS garbage collector?

【深度学习】:《PyTorch入门到项目实战》第四天:从0到1实现logistic回归(附源码)

在AD中添加差分对及连线

【深度学习】:《PyTorch入门到项目实战》第二天:从零实现线性回归(含详细代码)

Cluster construction and use of redis5

College students participated in six Star Education PHP training and found jobs with salaries far higher than those of their peers

负整数及浮点数的二进制表示
随机推荐
egg(十九):使用egg-redis性能优化,缓存数据提升响应效率
Some suggestions on Oracle SQL tuning
综合设计一个OPPE主页--页面服务部分
College students participated in six Star Education PHP training and found jobs with salaries far higher than those of their peers
Re14:读论文 ILLSI Interpretable Low-Resource Legal Decision Making
Given positive integers n and m, both between 1 and 10 ^ 9, n < = m, find out how many numbers have even digits between them (including N and m)
Technology sharing | how to recover the erroneously deleted table and the data in the table?
结构化设计的概要与原理--模块化
【深度学习】:《PyTorch入门到项目实战》第四天:从0到1实现logistic回归(附源码)
ticdc同步数据怎么设置只同步指定的库?
智慧园区是未来发展的趋势吗?
技术分享 | MySQL Shell 定制化部署 MySQL 实例
华为Mate 40系列曝光:大曲率双曲面屏,5nm麒麟1020处理器!还将有天玑1000+的版本
Alibaba cloud - Wulin headlines - site building expert competition
海康威视回应'美国禁令'影响:目前所使用的元器件都有备选
【深度学习】:《PyTorch入门到项目实战》第九天:Dropout实现(含源码)
Programmers from entry to roast!!!!
【深度学习】:《PyTorch入门到项目实战》第七天之模型评估和选择(上):欠拟合和过拟合(含源码)
epoll水平出发何边沿触发
如何使用Fail2Ban保护WordPress登录页面