当前位置:网站首页>Negative sampling

Negative sampling

2022-07-05 22:56:00 Ice cream and Mousse Cake

1. Noise comparison estimate (Noise contrastive estimation)

In the language model , According to context c, In the whole corpus V Predict a word in w Probability , It is generally used softmax form , Formula for :
 Insert picture description here
In order to avoid a huge amount of computation ,NCE The idea is to softmax The problem of parameter estimation Turn it into Two classification . Two categories of samples are real samples and noise samples :
Positive sample : Generated by empirical distribution ( That is, the real distribution ) label D=0,
Negative sample : noise , from q(w) Generate Corresponding label D=1.

Extract one from the positive sample , Extract from the noise distribution k Noise samples , In total samples ( Real samples + Noise samples ) in ,w The target word for prediction .
that (d,w) The joint probability distribution of is as follows :
 Insert picture description here
Some calculation processes and assumptions are omitted in the middle ,( Process link ) Final :
 Insert picture description here
 Insert picture description here

source :
Computational reasoning ( You know )https://zhuanlan.zhihu.com/p/76568362
[ Reasoning process, etc (csdn)]https://blog.csdn.net/weixin_40901056/article/details/88568344

2. Negative sampling Negative Sampling

(1) Negative sampling Negative Sampling yes NCE A variation of , The definition of probability is different :

 Insert picture description here

(2) The problem of negative sampling
Negative sampling is for A compromise solution to too many categories .
Points to consider in practice include : There may be deviations ; Negative samples are too far from the dividing line to be valuable ; The distribution fluctuates greatly , The noise is too strong , It is difficult to select negative samples ; Negative samples may bring positive benefits by themselves . When there is a gap with expectations, you can consider .
such as Prediction task , Negative sampling may Make it more skewed , Such as click through rate estimation , The original sample click through rate is 0.01, Negative sampling makes the proportion of positive and negative samples 1:9, The average click through rate of the final sample is 0.1, If negative sampling is necessary for this task, certain corrections must be made .
Sample information is too redundant , adopt negative sampling You can improve the training speed with the same machine resources , And the impact on the effect is very limited , This is very important for a limited budget .
When negative samples cannot effectively reflect the real intention of users ,negative sampling May bring benefits , For example, there are some scenes that most users may not see, resulting in negative sample collection ;

(3) Negative sampling method

Static negative sampling 、 Strong negative sample 、 Counter negative sampling 、 Graph based negative sampling And introducing negative sampling of additional information .

① Static negative sampling (Static Negative Sampling)

from Non interactive set Choose from Known examples are negative examples , By giving different examples Set different weights , Then we can sample according to the negative case distribution . When When the probability that each sample is sampled as a negative sample does not change with the training , We call this sampling strategy static negative sampling .
In the static negative sampling method , The simplest and most widely used method is Random negative sampling (RNS, viz. Random Negative Sampling), Also known as Uniform negative sampling (Uniform Negative Sampling).RNS Randomly select one negative example from the negative example candidate set , In studies that do not consider negative sampling , Researchers generally use RNS As a basic sampling method , In order to be fair with baseline Compare .
For every positive example , Different negative cases have different effects , A heuristic strategy for negative case distribution is Negative sampling based on popularity (PNS, viz. Popularity-biased Negative Sampling).** Popularity can be measured by frequency (frequency) Or degree (degree) To reflect ,** Sample The sum of the probability of being selected as a negative example The popularity of The power has a proportional relationship .

Strong negative sample (Hard Negative Sampling)
The so-called strong negative example (hard negative) Of hard Depends on the model , those Examples of misclassification , Or negative cases with higher prediction scores , It is more relevant to the results of the improved model .

③ Counter negative sampling (Adversarial Sampling)
Antagonistic negative sampling methods usually be based on GAN To select the negative example .
**GAN:** Random generation , Right or wrong , Call back and regenerate , To determine …

Graph based negative sampling (Graph-based Sampling)
The graph based negative sampling method is further combined with examples The structure on the graph (structural) Information .

Introduce negative sampling of additional information (Additional Data Enhanced Sampling)
Some jobs use connections in Social Networks 、 The user's geographic location 、 Product category information and additional interactive data , For example, products that users browse but are not clicked (viewed but non-clicked) , And the products that users click but don't buy (clicked but non-purchased) To enhance the selection of negative examples .
In the recommended scenario of industry , Different behaviors ( For example, browsing 、 Click on 、 Add cart 、 Buy ) Is the key to modeling user preferences .

(4) Integrate into the course learning (Curriculum Learning)

(5) Negative sampling ratio

The negative sampling method is mainly to improve the quality of negative samples , and The negative sampling ratio determines the number of negative cases .

Main reference :
The problem of negative sampling
Negative sampling method

About the first part NCE The specific derivation process is still a little incomprehensible , It can be further improved if necessary in the future .

summary ,

Negative sampling is actually giving certain weight to some negative cases , It's a kind of thought , That is, those data that do not contain the content that needs to be investigated are also valuable . More knowledge is needed for specific operation and Application .

Learn to organize , If there is something wrong, please point it out , thank you !

原网站

版权声明
本文为[Ice cream and Mousse Cake]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202140349119065.html