当前位置:网站首页>Re13: read the paper gender and racial stereotype detection in legal opinion word embeddings
Re13: read the paper gender and racial stereotype detection in legal opinion word embeddings
2022-07-28 16:55:00 【The gods were silent】
The gods were silent - personal CSDN Blog Directory
Title of thesis :Gender and Racial Stereotype Detection in Legal Opinion Word Embeddings
The paper ArXiv Download address :https://arxiv.org/abs/2203.13369
The paper AAAI Official preprint download address :https://www.aaai.org/AAAI22Papers/AISI-10870.MatthewsS.pdf
Official video :https://aaai-2022.virtualchair.net/poster_aisi10870( The author speaks English very fast , Popping )
This article is about 2022 year AAAI The paper , Pay attention to the fairness of machine learning , American Civil Law for testing judicial opinions( Legal opinion legal opinion) Gender and racial stereotypes embedded in trained words (stereotype,bias).
This article focuses on historical and representation bias. Experiments will prove that historical factors ( The age of the case used for word embedding training ) Not the main influencing factor .
It feels strange to detect fairness problems , Manual work is more than machine work .
List of articles
1. Background
Implicit Association Test (IAT) Measure human participants' response to the target word ( Flowers or insects ) And attribute terms ( Happy or unhappy ) Reaction time for classification .
Word Embedding Association Test (WEAT): Measure target word grouping ( Such as men and women ) And attribute terms ( Such as positive or negative emotions ) The similarity of , For example, measure whether male related word embedding is closer to positive emotional word embedding . The method is to measure two groups of target words ( Such as typical male or female names ) And attribute terms ( Such as happy (love peace) Or unhappy (ugly hatred)) The similarity between embedded words (association, Cosine similarity )
bias classification :historical, representation, measurement, aggregation, evaluation, and deployment biases
2. Difficulties and corresponding solutions
- The legal text is more formal , Use regular personal pronouns many times , The person's name 、 surname 、 Personal pronouns may be embedded bias, Only checking the person's name will lead to other bias The loss of .→ Use surnames related to race .
- There is a shortage of women among legal workers , May cause gender-occupational stereotypes.
- It cannot be directly used in the legal field open-domain Emotional vocabulary →WEAT The attribute vocabulary of the test uses a general vocabulary , add domain specific and expanded word lists( Selected some iconic words as seed terms, Then use word embedding to generate expanded word lists( Positive word : And The vector difference between existing positive words and existing negative words The cosine similarity of this vector is high . Negative words are the opposite ), Then manually review and delete words with obvious racial or gender characteristics )
- IAT The test mainly considers the positivity and negativity of attributes , But for legal issues , The impact of the results is greater → Use some measures to measure the impact of legal opinions on the results grant or deny To measure the positive and negative of the result .
Extract phrases (Idiomatic Phrase Extraction)→ Training skip-gram word2vec model Word embedding ( In all corpora 、 By time or legal topic The cut sub corpus is trained separately )→ On gender and race WEAT testing
Gender : Names and other typical demonstrative pronouns
race : surname
Optimize :
- Idiomatic Phrase Extraction: In order to prevent n-gram dictionaries Too big , Only phrases with high frequency of common occurrence are considered , use Normalized Point-wise Mutual Information (NPMI) Indicators to select phrases added to the dictionary .
- The last name may coincide with the company name :
- Title cased the surnames to target proper nouns.
- Idiomatic phrase extraction Exclude some non human names .
- Centroid-based filtering to remove multi-sense words.( Calculate the representation of all surnames , Calculate the representation of each surname and all surnames centroid Cosine similarity of , Delete 20% The last name with the lowest similarity )( Names are handled in a similar way )
Experimental setup :
- Phrase extraction phase NPMI The threshold of is 0.5.
- The dimension of word embedding is 300, The lowest frequency of words is 30,sampling threshold by 1 0 − 4 10^{-4} 10−4, The learning rate is 0.05,window size by 10,negative samples by 10.
- WEAT The standard deviation is calculated (by sub-sampling the word lists with a simple bootstrapping procedure)


- Considering that there is a more serious problem of discrimination in American history , Therefore, the time factor is excluded (temporal effect), But there are still unfair problems .
practice : Time segmented corpus , Train word embedding in different time periods , Conduct WEAT test
- Gender stereotypes , Use different target words :

- Considering the difference legal topic: The corpus is divided into different topic Segmentation .( To prevent low frequency effects , Deleted occurrence frequency less than 30 Attribute words of )
3. Code reappearance
The paper does not give the public code , But it doesn't seem difficult to reproduce ( Just get the data set ), When my server is ready and I have time, I will write a !
4. Other practices related to fairness
LeSICiN1 and ECHR2 Is to name entities mask, To reduce demographic bias.
边栏推荐
- ticdc同步数据怎么设置只同步指定的库?
- LeetCode-学会对无序链表进行插入排序(详解)
- 【深度学习】:《PyTorch入门到项目实战》第九天:Dropout实现(含源码)
- Learn to use MySQL explain to execute the plan, and SQL performance tuning is no longer difficult
- 关于 CMS 垃圾回收器,你真的懂了吗?
- Learn ABAQUS script programming script in an hour
- Leetcode daily practice - 160. Cross linked list
- Microsoft: edge browser has built-in disk cache compression technology, which can save space and not reduce system performance
- Introduction and implementation of stack (detailed explanation)
- 做题笔记2(两数相加)
猜你喜欢

小程序:获取元素节点信息

MySQL5.7及SQLyogV12安装及使用破解及常用命令

Sort 2 bubble sort and quick sort (recursive and non recursive explanation)

College students participated in six Star Education PHP training and found jobs with salaries far higher than those of their peers

Interesting kotlin 0x08:what am I

Applet: get element node information

Introduction and implementation of queue (detailed explanation)

ANSYS secondary development - MFC interface calls ADPL file

Leetcode daily practice - 160. Cross linked list

有趣的 Kotlin 0x08:What am I
随机推荐
Nowcode- learn to delete duplicate elements in the linked list (detailed explanation)
记录开发问题
配置web服务器步骤详细记录(多有借鉴)
parseJson
Sort 2 bubble sort and quick sort (recursive and non recursive explanation)
"Wei Lai Cup" 2022 Niuke summer multi school training camp 3 acfhj
[JS] eight practical new functions of 1394-es2022
有趣的 Kotlin 0x06:List minus list
WSL+Valgrind+Clion
Microsoft question 100 - do it every day - question 16
Interesting kotlin 0x07:composition
"Wei Lai Cup" 2022 Niuke summer multi school training camp 3 a.ancestor lca+ violence count
Detailed record of steps to configure web server (many references)
Debugging methods of USB products (fx3, ccg3pa)
parseJson
MySQL 5.7 and sqlyogv12 installation and use cracking and common commands
Implementation of transfer business
Analysis of echo service model in the first six chapters of unp
Microsoft question 100 - do it every day - question 11
Ruoyi's solution to error reporting after integrating flyway