当前位置:网站首页>Debiasing word embeddings | talking about word embedding and deviation removal # yyds dry goods inventory #
Debiasing word embeddings | talking about word embedding and deviation removal # yyds dry goods inventory #
2022-07-01 17:42:00 【LolitaAnn】
This article is my note sharing , The content mainly comes from teacher Wu Enda's in-depth learning course . [1]
The existence of stereotypes
word embedding It has a very important impact on the generalization of our model , Therefore, we should also ensure that they are not affected by unexpected forms of bias . Such as sexism , Racial discrimination , Religious discrimination and so on .
Of course, I think the word hint is a little serious , Here we can understand it as stereotype .
Take a chestnut :
My father is a doctor , My mother is _______ .
My father is a company employee , My mother is _______ .
Boys like _______ . Girls like _______ .
The first empty one, of course, is likely to be “ The nurse ”. The second empty answer is likely to be “ housewife ”. The third empty answer is likely to be “ The transformers ”. The fourth empty answer is likely to be “ Barbie doll ”.
What is this ? This is the so-called gender stereotype . These stereotypes are related to socio-economic status .
Learning algorithms are not stereotyped , But the words written by human beings are stereotyped . and Word embedding Can “ very good ” Learn these stereotypes .
So we need to modify the learning algorithm as much as possible , Minimize or idealize , Eliminate these unexpected types of bias .
Over many decades, over many centuries,I think humanity has made progress in reducing these types of bias. And I think maybe fortunately AI, I think we actually have better ideas for quickly reducing the bias in AI than for quickly reducing the bias in the human race. Although I think we are by no means done for AI as well, and there is still a lot of research and hard work to be done to reduce these types of biases in our learning of learning algorithms.
Eliminate word embedding stereotypes
With the aid of arXiv:1607.06520 [2] Methods .
It is mainly divided into the following three steps :
- Identify bias direction.
- Neutralize: For every word that is not definitional, project to get rid of bias.
- Equalize pairs.
Suppose now we have a good student word embedding.
Or continue our previous style . It uses 300 Dimension characteristics , Then we map it to a two-dimensional plane . The distribution of these words on the plane is shown in the figure .

1. Find a way
To find out the main direction of stereotype between two words , This method we talked about earlier word embedding I mentioned the feature once . Is to subtract two vectors to get the main dimension of their difference .
After subtracting the above, you will find that their differences are mainly in gender In this dimension .
Then make a for the above Average .
We can get the following result :
We can find out the main direction of our stereotype bias . Then you can also find a direction that is not related to a particular bias .
Be careful : In this case , We think our bias is in the direction of “gender” It's a one-dimensional space , And the other irrelevant direction is 299 The subspace of dimension . This is simplified compared with the original paper . Specifically, you can read the references provided at the end of the article .
2. Neutralization treatment
There is this word, which is clearly gender differentiated , But some words should exist fairly without gender distinction .
Gender specific words , such as grandmother and grandfather, There is no gender distinction , such as nurse,doctor. For this kind of words, we should neutralize them , That is, reduce the horizontal distance in the direction of bias .

3. Balancing
The second step is to deal with words that are gender neutral . What's wrong with gender specific words .

We can clearly see from the above figure . about nurse The word , It is associated with girl The distance is significantly longer than boy A more recent . So if the text is generated , mention nurse, appear girl Will be more likely . So we need to balance the distance through calculation .
After calculation, translate it , It's a gender neutral word. It's . The distance between words with gender distinction is equal .

边栏推荐
- Leetcode records - sort -215, 347, 451, 75
- June issue | antdb database participated in the preparation of the "Database Development Research Report" and appeared on the list of information technology and entrepreneurship industries
- [2. Basics of Delphi grammar] 4 Object Pascal operators and expressions
- Oom caused by improper use of multithreading
- National Security Agency (NSA) "sour Fox" vulnerability attack weapon platform technical analysis report
- Penetration practice vulnhub range Keyring
- Code example of libcurl download file
- GameFramework食用指南
- 中国氮化硅陶瓷基板行业研究与投资前景报告(2022版)
- 线上开通ETF基金账户安全吗?有哪些步骤?
猜你喜欢

Intelligent operation and maintenance practice: banking business process and single transaction tracking

(28) Shape matching based on contour features

ACM MM 2022视频理解挑战赛视频分类赛道冠军AutoX团队技术分享

【牛客网刷题系列 之 Verilog快速入门】~ 优先编码器电路①

Work and leisure suggestions of old programmers

LeetCode中等题之TinyURL 的加密与解密

Heavy disclosure! Hundreds of important information systems have been invaded, and the host has become a key attack target

Kia recalls some K3 new energy with potential safety hazards

剑指 Offer 20. 表示数值的字符串

In aks, use secret in CSI driver mount key vault
随机推荐
PHP实现敏感词过滤系统「建议收藏」
Detailed explanation of string's trim() and substring()
In aks, use secret in CSI driver mount key vault
Reflective XSS vulnerability
An example of data analysis of an old swatch and an old hard disk disassembly and assembly combined with the sensor of an electromagnetic press
中国一次性卫生用品生产设备行业深度调研报告(2022版)
Wechat applet blind box - docking wechat payment
Can hero sports go public against the wind?
提交review时ReviewBoard出现500错误解决方法
中国乙腈市场预测与战略咨询研究报告(2022版)
June issue | antdb database participated in the preparation of the "Database Development Research Report" and appeared on the list of information technology and entrepreneurship industries
vulnhub靶场-hacksudo - Thor
China PBAT resin Market Forecast and Strategic Research Report (2022 Edition)
Research Report on China's enzyme Market Forecast and investment strategy (2022 Edition)
Vulnhub range hacker_ Kid-v1.0.1
unity3d扩展工具栏
Penetration practice vulnhub range Tornado
中国PBAT树脂市场预测及战略研究报告(2022版)
(12) About time-consuming printing
The difference and relationship between iteratible objects, iterators and generators