当前位置:网站首页>Debiasing word embeddings | talking about word embedding and deviation removal # yyds dry goods inventory #
Debiasing word embeddings | talking about word embedding and deviation removal # yyds dry goods inventory #
2022-07-01 17:42:00 【LolitaAnn】
This article is my note sharing , The content mainly comes from teacher Wu Enda's in-depth learning course . [1]
The existence of stereotypes
word embedding It has a very important impact on the generalization of our model , Therefore, we should also ensure that they are not affected by unexpected forms of bias . Such as sexism , Racial discrimination , Religious discrimination and so on .
Of course, I think the word hint is a little serious , Here we can understand it as stereotype .
Take a chestnut :
My father is a doctor , My mother is _______ .
My father is a company employee , My mother is _______ .
Boys like _______ . Girls like _______ .
The first empty one, of course, is likely to be “ The nurse ”. The second empty answer is likely to be “ housewife ”. The third empty answer is likely to be “ The transformers ”. The fourth empty answer is likely to be “ Barbie doll ”.
What is this ? This is the so-called gender stereotype . These stereotypes are related to socio-economic status .
Learning algorithms are not stereotyped , But the words written by human beings are stereotyped . and Word embedding Can “ very good ” Learn these stereotypes .
So we need to modify the learning algorithm as much as possible , Minimize or idealize , Eliminate these unexpected types of bias .
Over many decades, over many centuries,I think humanity has made progress in reducing these types of bias. And I think maybe fortunately AI, I think we actually have better ideas for quickly reducing the bias in AI than for quickly reducing the bias in the human race. Although I think we are by no means done for AI as well, and there is still a lot of research and hard work to be done to reduce these types of biases in our learning of learning algorithms.
Eliminate word embedding stereotypes
With the aid of arXiv:1607.06520 [2] Methods .
It is mainly divided into the following three steps :
- Identify bias direction.
- Neutralize: For every word that is not definitional, project to get rid of bias.
- Equalize pairs.
Suppose now we have a good student word embedding.
Or continue our previous style . It uses 300 Dimension characteristics , Then we map it to a two-dimensional plane . The distribution of these words on the plane is shown in the figure .

1. Find a way
To find out the main direction of stereotype between two words , This method we talked about earlier word embedding I mentioned the feature once . Is to subtract two vectors to get the main dimension of their difference .
After subtracting the above, you will find that their differences are mainly in gender In this dimension .
Then make a for the above Average .
We can get the following result :
We can find out the main direction of our stereotype bias . Then you can also find a direction that is not related to a particular bias .
Be careful : In this case , We think our bias is in the direction of “gender” It's a one-dimensional space , And the other irrelevant direction is 299 The subspace of dimension . This is simplified compared with the original paper . Specifically, you can read the references provided at the end of the article .
2. Neutralization treatment
There is this word, which is clearly gender differentiated , But some words should exist fairly without gender distinction .
Gender specific words , such as grandmother and grandfather, There is no gender distinction , such as nurse,doctor. For this kind of words, we should neutralize them , That is, reduce the horizontal distance in the direction of bias .

3. Balancing
The second step is to deal with words that are gender neutral . What's wrong with gender specific words .

We can clearly see from the above figure . about nurse The word , It is associated with girl The distance is significantly longer than boy A more recent . So if the text is generated , mention nurse, appear girl Will be more likely . So we need to balance the distance through calculation .
After calculation, translate it , It's a gender neutral word. It's . The distance between words with gender distinction is equal .

边栏推荐
- Work and leisure suggestions of old programmers
- PHP implements sensitive word filtering system "suggestions collection"
- RadHat搭建内网YUM源服务器
- SLO is increasingly used to achieve observability | Devops
- LeetCode中等题之TinyURL 的加密与解密
- Petrv2: a unified framework for 3D perception of multi camera images
- (12) About time-consuming printing
- Leetcode records - sort -215, 347, 451, 75
- 荣威 RX5 的「多一点」产品策略
- 提交review时ReviewBoard出现500错误解决方法
猜你喜欢

String的trim()和substring()详解

LeetCode中等题之TinyURL 的加密与解密

The difference and relationship between iteratible objects, iterators and generators

DNS

An example of data analysis of an old swatch and an old hard disk disassembly and assembly combined with the sensor of an electromagnetic press

The new server is packaged with the source code of H5 mall with an operation level value of several thousand

【Try to Hack】vulnhub DC4

How to use JMeter function and mockjs function in metersphere interface test

Rotation order and universal lock of unity panel
![[C language foundation] 12 strings](/img/42/9c024eb08eb935fe66c3aaac7589d8.jpg)
[C language foundation] 12 strings
随机推荐
反射型XSS漏洞
How to use JMeter function and mockjs function in metersphere interface test
官宣!香港科技大学(广州)获批!
换掉UUID,NanoID更快更安全!
Development cost of smart factory management system software platform
How to write good code - Defensive Programming Guide
(27) Open operation, close operation, morphological gradient, top hat, black hat
China metallocene polyethylene (MPE) Industry Research Report (2022 Edition)
开发那些事儿:EasyCVR集群设备管理页面功能展示优化
Openlayers 自定义气泡框以及定位到气泡框
[C supplement] [string] display the schedule of a month by date
Gameframework eating guide
深度优先遍历和广度优先遍历[通俗易懂]
vulnhub靶场-Hacker_Kid-v1.0.1
(17) DAC conversion experiment
Software construction scheme of smart factory collaborative management and control application system
[mathematical modeling] [matlab] implementation of two-dimensional rectangular packing code
GameFramework食用指南
Rotation order and universal lock of unity panel
阿里云李飞飞:中国云数据库在很多主流技术创新上已经领先国外