当前位置:网站首页>Negative sampling
Negative sampling
2022-07-05 22:56:00 【Ice cream and Mousse Cake】
1. Noise comparison estimate (Noise contrastive estimation)
In the language model , According to context c, In the whole corpus V Predict a word in w Probability , It is generally used softmax form , Formula for :
In order to avoid a huge amount of computation ,NCE The idea is to softmax The problem of parameter estimation Turn it into Two classification . Two categories of samples are real samples and noise samples :
Positive sample : Generated by empirical distribution ( That is, the real distribution ) label D=0,
Negative sample : noise , from q(w) Generate Corresponding label D=1.
Extract one from the positive sample , Extract from the noise distribution k Noise samples , In total samples ( Real samples + Noise samples ) in ,w The target word for prediction .
that (d,w) The joint probability distribution of is as follows :
Some calculation processes and assumptions are omitted in the middle ,( Process link ) Final :
source :
Computational reasoning ( You know )https://zhuanlan.zhihu.com/p/76568362
[ Reasoning process, etc (csdn)]https://blog.csdn.net/weixin_40901056/article/details/88568344
2. Negative sampling Negative Sampling
(1) Negative sampling Negative Sampling yes NCE A variation of , The definition of probability is different :
(2) The problem of negative sampling
Negative sampling is for A compromise solution to too many categories .
Points to consider in practice include : There may be deviations ; Negative samples are too far from the dividing line to be valuable ; The distribution fluctuates greatly , The noise is too strong , It is difficult to select negative samples ; Negative samples may bring positive benefits by themselves . When there is a gap with expectations, you can consider .
such as Prediction task , Negative sampling may Make it more skewed , Such as click through rate estimation , The original sample click through rate is 0.01, Negative sampling makes the proportion of positive and negative samples 1:9, The average click through rate of the final sample is 0.1, If negative sampling is necessary for this task, certain corrections must be made .
Sample information is too redundant , adopt negative sampling You can improve the training speed with the same machine resources , And the impact on the effect is very limited , This is very important for a limited budget .
When negative samples cannot effectively reflect the real intention of users ,negative sampling May bring benefits , For example, there are some scenes that most users may not see, resulting in negative sample collection ;
(3) Negative sampling method
Static negative sampling 、 Strong negative sample 、 Counter negative sampling 、 Graph based negative sampling And introducing negative sampling of additional information .
① Static negative sampling (Static Negative Sampling)
from Non interactive set Choose from Known examples are negative examples , By giving different examples Set different weights , Then we can sample according to the negative case distribution . When When the probability that each sample is sampled as a negative sample does not change with the training , We call this sampling strategy static negative sampling .
In the static negative sampling method , The simplest and most widely used method is Random negative sampling (RNS, viz. Random Negative Sampling), Also known as Uniform negative sampling (Uniform Negative Sampling).RNS Randomly select one negative example from the negative example candidate set , In studies that do not consider negative sampling , Researchers generally use RNS As a basic sampling method , In order to be fair with baseline Compare .
For every positive example , Different negative cases have different effects , A heuristic strategy for negative case distribution is Negative sampling based on popularity (PNS, viz. Popularity-biased Negative Sampling).** Popularity can be measured by frequency (frequency) Or degree (degree) To reflect ,** Sample The sum of the probability of being selected as a negative example The popularity of The power has a proportional relationship .
② Strong negative sample (Hard Negative Sampling)
The so-called strong negative example (hard negative) Of hard Depends on the model , those Examples of misclassification , Or negative cases with higher prediction scores , It is more relevant to the results of the improved model .
③ Counter negative sampling (Adversarial Sampling)
Antagonistic negative sampling methods usually be based on GAN To select the negative example .
**GAN:** Random generation , Right or wrong , Call back and regenerate , To determine …
④ Graph based negative sampling (Graph-based Sampling)
The graph based negative sampling method is further combined with examples The structure on the graph (structural) Information .
⑤ Introduce negative sampling of additional information (Additional Data Enhanced Sampling)
Some jobs use connections in Social Networks 、 The user's geographic location 、 Product category information and additional interactive data , For example, products that users browse but are not clicked (viewed but non-clicked) , And the products that users click but don't buy (clicked but non-purchased) To enhance the selection of negative examples .
In the recommended scenario of industry , Different behaviors ( For example, browsing 、 Click on 、 Add cart 、 Buy ) Is the key to modeling user preferences .
(4) Integrate into the course learning (Curriculum Learning)
(5) Negative sampling ratio
The negative sampling method is mainly to improve the quality of negative samples , and The negative sampling ratio determines the number of negative cases .
Main reference :
The problem of negative sampling
Negative sampling method
About the first part NCE The specific derivation process is still a little incomprehensible , It can be further improved if necessary in the future .
summary ,
Negative sampling is actually giving certain weight to some negative cases , It's a kind of thought , That is, those data that do not contain the content that needs to be investigated are also valuable . More knowledge is needed for specific operation and Application .
Learn to organize , If there is something wrong, please point it out , thank you !
边栏推荐
- Nangou Gili hard Kai font TTF Download with installation tutorial
- Tensor attribute statistics
- [digital signal denoising] improved wavelet modulus maxima digital signal denoising based on MATLAB [including Matlab source code 1710]
- Event trigger requirements of the function called by the event trigger
- Global and Chinese markets of tantalum heat exchangers 2022-2028: Research Report on technology, participants, trends, market size and share
- Yiwen gets rid of the garbage collector
- Un article traite de la microstructure et des instructions de la classe
- Distance entre les points et les lignes
- Three.js-01 入门
- Why does the C# compiler allow an explicit cast between IEnumerable< T> and TAlmostAnything?
猜你喜欢
Finally understand what dynamic planning is
Exponential weighted average and its deviation elimination
Double pointeur de liste liée (pointeur rapide et lent, pointeur séquentiel, pointeur de tête et de queue)
分布式解决方案之TCC
Distance entre les points et les lignes
One article deals with the microstructure and instructions of class
Common JVM tools and optimization strategies
Golang writes the opening chapter of selenium framework
Paddle Serving v0.9.0 重磅发布多机多卡分布式推理框架
【无标题】
随机推荐
Overview of Fourier analysis
Binary tree (III) -- heap sort optimization, top k problem
openresty ngx_ Lua regular expression
Record several frequently asked questions (202207)
Hcip day 11 (BGP agreement)
如何快速理解复杂业务,系统思考问题?
Paddy serving v0.9.0 heavy release multi machine multi card distributed reasoning framework
从 1.5 开始搭建一个微服务框架——日志追踪 traceId
PLC编程基础之数据类型、变量声明、全局变量和I/O映射(CODESYS篇 )
C language - structural basis
从 1.5 开始搭建一个微服务框架——日志追踪 traceId
Codeforces Global Round 19
Ieventsystemhandler event interface
Lesson 1: serpentine matrix
Vision Transformer (ViT)
H5c3 advanced - player
Common JVM tools and optimization strategies
Masked Autoencoders Are Scalable Vision Learners (MAE)
d3dx9_ What if 29.dll is missing? System missing d3dx9_ Solution of 29.dll file
MoCo: Momentum Contrast for Unsupervised Visual Representation Learning