当前位置:网站首页>Re8: reading papers Hier spcnet: a legal stat hierarchy based heterogeneous network for computing legal case
Re8: reading papers Hier spcnet: a legal stat hierarchy based heterogeneous network for computing legal case
2022-07-26 16:19:00 【The gods were silent】
The gods were silent - personal CSDN Blog Directory
Title of thesis :Hier-SPCNet: A Legal Statute Hierarchy-based Heterogeneous Network for Computing Legal Case Document Similarity
The paper ArXiv Download address :https://arxiv.org/abs/2007.03225
Paper official ACM Download address :https://dl.acm.org/doi/abs/10.1145/3397271.3401191( The site also has SIGIR Video of the presentation of papers at the meeting , All the pictures in this article except the pictures attached to the paper are taken from this video )
This article is about 2020 year SIGIR The passage , It is the similarity task of Indian legal instruments ( It belongs to legal information retrieval Legal IR field ). The data and code are not public .
Legal information retrieval includes recommendation system and precedent search , It aims to search for cases matching a given scene or case from a large number of precedents .
This paper is a graph based similarity computing paradigm , The first one combines legal knowledge ( Legal information ) To do the work of legal document similarity : Put legal documents and articles ( Combined with its hierarchical information ) Make a heterogeneous diagram , Through the heterogeneous graph embedding method node2vec1 and metapath2vec2 Get node representation , Calculate node similarity through cosine similarity , That is, the similarity of documents .
in addition to , This paper also proposes that , Text based methods and graph based methods can be combined ( Find the maximum or average ), To complement each other , Get better results .
The calculated similarity and manually marked data are used to calculate the Pearson correlation coefficient as the model index .
I'm particularly speechless about this article. Its indicators haven't yet 2017 Text only baseline The effect is good , Then follow the text-based baseline Made a vote like hybrid Just brush up the indicators . i don't know what to say ! This kind of article can also be published SIGIR! That can take the place of , I also want to go SIGIR!
This paper is to do the task of extracting calligraphy strips from Indian legal texts LeSICiN3 One of the important reference papers of this paper , Some of this blog post is in LeSICiN The existing content in will be briefly written , Please refer to the corresponding blog .
List of articles
1. Background & Motivation
Supervised learning method is not applicable to legal document similarity tasks , Because there is not enough tag data .
The similarity of legal documents is not strictly defined , Mainly rely on experts . The similarity of legal documents needs to be interpretable .
Legal document similarity tasks are commonly used in two ways: text-based and web-based .
Based on text :Measuring Similarity among Legal Court Case Documents
Based on the Internet :4 and 5
hybrid:Finding Similar Legal Judgements under Common Law System
Model principle : Quote the same law or precedent , Or quote different laws or precedents 、 But documents that are similar in network structure are similar .
There are two sources of legal knowledge in the common law system : Written laws and precedents .

The previous network-based similarity calculation method only considers the citation relationship between case documents precedent citation network (PCNet), This will lead to the loss of an important source of legal knowledge : Hierarchy of legal provisions .
Former PCNet Indicators used to measure the similarity of legal documents :
- Bibliographic Coupling4:precedent citations (out-citations) A collection of Jaccard similarity index
- Co-citation4: similar Bibliographic Coupling, But with in-citations
- Dispersion5: Measure the out-neighbours (out-citation documents) The similarity of , That is, whether it exists in the same community / In cluster .NetworkX The latest version of the implementation function :https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.dispersion.html( I didn't read it , I'll learn more about it if necessary )
2. Hier-SPCNet The construction of graph
Full name :Hierarchical Statute and Precedent Citation Network

node :
Case study
articles of law (5 Kind of )
Relationship :
Document quotation law ( You can quote any level of law )
Mutual citation of documents
The law points to each other
There is a hierarchical relationship between laws ( This is similar to LeSICiN Of , But it's a little different ):Act → Part → Chapter → Topic → Section/Article( Not everyone has all levels )
3. Representation of nodes
node2vec1: By random walk (BFS or DFS) Generate node representation .
The implementation tool is aditya-grover/node2vec,128 dimension , Other super parameters are default values .
because node2vec Suppose the network is homogeneous , therefore Hier-SPCNet It is also treated as a homogeneous graph .
metapath2vec2: be based on user-defined metapaths
This paper defines 14 Start with a document metapath, Related to legal system .metapath Neighbor relationship implies some similarity .
The implementation tool is stellargraph · PyPI
Follow LeSICiN similar ,Hier-SPCNet Medium metapath It also starts from the same kind of nodes . But this article just needs to do document representation
Then the most outrageous thing is that it is omitted here 10 Kind of metapath The definition of , I'm speechless !
4. experiment
4.1 Data sets
Data collected from the Supreme Court of India , Climb from Thomson Reuters Westlaw India, Only public data is used .
Extract references from text : Regular expression based patterns , Such as < [section or article number] of the [Act] >
share 1806 Case documents ,128 individual acts( And its hierarchy , Quoted by at least one document ).Hier-SPCNet The Communist Party of China has 22566 Nodes ,31309 side .PCNet There is the same 1806 Case document nodes and 542 Reference edges .
The document similarity label is marked by experts 100 For documents , Details .
4.2 The main experimental results
The evaluation index is Pearson correlation coefficient .
co-citation The value of is the same because in-citations identical ( Because the law does not quote documents ).
Other analysis is omitted .

average and max It is to average or maximize the two similarity scores
5. Code reappearance
Wait until my server is ready .
边栏推荐
- The "nuclear bomb level" log4j vulnerability is still widespread and has a continuing impact
- 研发效能的道与术 - 道篇
- How to configure tke cluster node Max pod
- 2022 latest Beijing Construction Safety Officer simulation question bank and answers
- First knowledge of OpenGL (3) fragment shader
- 综合设计一个OPPE主页--布局与初始化
- Clojure operation principle bytecode generation
- Parker solenoid valve d1vw020dnypz5
- What is GPIO and what is its use
- Linux Installation mysql8.0.29 detailed tutorial
猜你喜欢

Technology vane | interpretation of cloud native technology architecture maturity model

Operating system migration practice: deploying MySQL database on openeuler

Pat grade a 1048 find coins

【万字长文】使用 LSM-Tree 思想基于.Net 6.0 C# 实现 KV 数据库(案例版)

Paper: all models are wrong, but many are useful: all models are wrong, but many are useful: understand the importance of variables by studying a whole class of prediction models at the same time

Google Earth Engine——MERRA-2 M2T1NXAER:1980-2022年气溶胶逐日数据集
最终一致性性分布式事务 TCC

FTP protocol
![[RCTF2015]EasySQL](/img/68/328ee5cffc8b267b6b0f284eb8db2c.png)
[RCTF2015]EasySQL

HaWe screw cartridge check valve RK4
随机推荐
PAT甲级 1050 String Subtraction
综合设计一个OPPE主页--布局与初始化
Parker solenoid valve d1vw020dnypz5
Paper: all models are wrong, but many are useful: all models are wrong, but many are useful: understand the importance of variables by studying a whole class of prediction models at the same time
Sql语句——单行注释与多行注释
C语言重点知识总结
PAT甲级 1049 Counting Ones
[RCTF2015]EasySQL
[ten thousand words long text] Based on LSM tree thought Net 6.0 C # realize kV database (case version)
2022 latest Beijing Construction Safety Officer simulation question bank and answers
Linux安装mysql8.0.29详细教程
【工具分享】自动生成文件目录结构工具mddir
“卡片笔记法”在思源的具体实践案例
Pat class a 1047 student list for course
想让照片中的云飘起来?视频编辑服务一键动效3步就能实现
DTS is equipped with a new self-developed kernel, which breaks through the key technology of the three center architecture of the two places Tencent cloud database
How to use job plug-in type to call a kettle job through ETL scheduling tool taskctl?
[RCTF2015]EasySQL
vscode批量删除
提问征集丨快来向NLLB作者提问啦!(智源Live第24期)