当前位置:网站首页>Negative sampling
Negative sampling
2022-07-05 22:56:00 【Ice cream and Mousse Cake】
1. Noise comparison estimate (Noise contrastive estimation)
In the language model , According to context c, In the whole corpus V Predict a word in w Probability , It is generally used softmax form , Formula for :
In order to avoid a huge amount of computation ,NCE The idea is to softmax The problem of parameter estimation Turn it into Two classification . Two categories of samples are real samples and noise samples :
Positive sample : Generated by empirical distribution ( That is, the real distribution ) label D=0,
Negative sample : noise , from q(w) Generate Corresponding label D=1.
Extract one from the positive sample , Extract from the noise distribution k Noise samples , In total samples ( Real samples + Noise samples ) in ,w The target word for prediction .
that (d,w) The joint probability distribution of is as follows :
Some calculation processes and assumptions are omitted in the middle ,( Process link ) Final :
source :
Computational reasoning ( You know )https://zhuanlan.zhihu.com/p/76568362
[ Reasoning process, etc (csdn)]https://blog.csdn.net/weixin_40901056/article/details/88568344
2. Negative sampling Negative Sampling
(1) Negative sampling Negative Sampling yes NCE A variation of , The definition of probability is different :
(2) The problem of negative sampling
Negative sampling is for A compromise solution to too many categories .
Points to consider in practice include : There may be deviations ; Negative samples are too far from the dividing line to be valuable ; The distribution fluctuates greatly , The noise is too strong , It is difficult to select negative samples ; Negative samples may bring positive benefits by themselves . When there is a gap with expectations, you can consider .
such as Prediction task , Negative sampling may Make it more skewed , Such as click through rate estimation , The original sample click through rate is 0.01, Negative sampling makes the proportion of positive and negative samples 1:9, The average click through rate of the final sample is 0.1, If negative sampling is necessary for this task, certain corrections must be made .
Sample information is too redundant , adopt negative sampling You can improve the training speed with the same machine resources , And the impact on the effect is very limited , This is very important for a limited budget .
When negative samples cannot effectively reflect the real intention of users ,negative sampling May bring benefits , For example, there are some scenes that most users may not see, resulting in negative sample collection ;
(3) Negative sampling method
Static negative sampling 、 Strong negative sample 、 Counter negative sampling 、 Graph based negative sampling And introducing negative sampling of additional information .
① Static negative sampling (Static Negative Sampling)
from Non interactive set Choose from Known examples are negative examples , By giving different examples Set different weights , Then we can sample according to the negative case distribution . When When the probability that each sample is sampled as a negative sample does not change with the training , We call this sampling strategy static negative sampling .
In the static negative sampling method , The simplest and most widely used method is Random negative sampling (RNS, viz. Random Negative Sampling), Also known as Uniform negative sampling (Uniform Negative Sampling).RNS Randomly select one negative example from the negative example candidate set , In studies that do not consider negative sampling , Researchers generally use RNS As a basic sampling method , In order to be fair with baseline Compare .
For every positive example , Different negative cases have different effects , A heuristic strategy for negative case distribution is Negative sampling based on popularity (PNS, viz. Popularity-biased Negative Sampling).** Popularity can be measured by frequency (frequency) Or degree (degree) To reflect ,** Sample The sum of the probability of being selected as a negative example The popularity of The power has a proportional relationship .
② Strong negative sample (Hard Negative Sampling)
The so-called strong negative example (hard negative) Of hard Depends on the model , those Examples of misclassification , Or negative cases with higher prediction scores , It is more relevant to the results of the improved model .
③ Counter negative sampling (Adversarial Sampling)
Antagonistic negative sampling methods usually be based on GAN To select the negative example .
**GAN:** Random generation , Right or wrong , Call back and regenerate , To determine …
④ Graph based negative sampling (Graph-based Sampling)
The graph based negative sampling method is further combined with examples The structure on the graph (structural) Information .
⑤ Introduce negative sampling of additional information (Additional Data Enhanced Sampling)
Some jobs use connections in Social Networks 、 The user's geographic location 、 Product category information and additional interactive data , For example, products that users browse but are not clicked (viewed but non-clicked) , And the products that users click but don't buy (clicked but non-purchased) To enhance the selection of negative examples .
In the recommended scenario of industry , Different behaviors ( For example, browsing 、 Click on 、 Add cart 、 Buy ) Is the key to modeling user preferences .
(4) Integrate into the course learning (Curriculum Learning)
(5) Negative sampling ratio
The negative sampling method is mainly to improve the quality of negative samples , and The negative sampling ratio determines the number of negative cases .
Main reference :
The problem of negative sampling
Negative sampling method
About the first part NCE The specific derivation process is still a little incomprehensible , It can be further improved if necessary in the future .
summary ,
Negative sampling is actually giving certain weight to some negative cases , It's a kind of thought , That is, those data that do not contain the content that needs to be investigated are also valuable . More knowledge is needed for specific operation and Application .
Learn to organize , If there is something wrong, please point it out , thank you !
边栏推荐
- New 3D particle function in QT 6.3
- Record several frequently asked questions (202207)
- One article deals with the microstructure and instructions of class
- Function default parameters, function placeholder parameters, function overloading and precautions
- First, redis summarizes the installation types
- C language - structural basis
- Business introduction of Zhengda international futures company
- Fix the memory structure of JVM in one article
- Activate function and its gradient
- Navigation day answer applet: preliminary competition of navigation knowledge competition
猜你喜欢
[error record] file search strategy in groovy project (src/main/groovy/script.groovy needs to be used in the main function | groovy script directly uses the relative path of code)
openresty ngx_ Lua request response
VOT toolkit environment configuration and use
openresty ngx_lua请求响应
LeetCode102. Sequence traversal of binary tree (output by layer and unified output)
Vcomp110.dll download -vcomp110 What if DLL is lost
SPSS analysis of employment problems of college graduates
One article deals with the microstructure and instructions of class
第十七周作业
Post-90s tester: "after joining Ali, this time, I decided not to change jobs."
随机推荐
实现反向代理客户端IP透传
Common JVM tools and optimization strategies
openresty ngx_lua请求响应
从 1.5 开始搭建一个微服务框架——日志追踪 traceId
Arduino 测量交流电流
分布式解决方案选型
[speech processing] speech signal denoising based on Matlab GUI Hanning window fir notch filter [including Matlab source code 1711]
Global and Chinese markets of tantalum heat exchangers 2022-2028: Research Report on technology, participants, trends, market size and share
Distributed solution selection
30 optimization skills about mysql, super practical
Global and Chinese market of water treatment technology 2022-2028: Research Report on technology, participants, trends, market size and share
openresty ngx_ Lua request response
2022 Software Test Engineer salary increase strategy, how to reach 30K in three years
透彻理解JVM类加载子系统
基于STM32的ADC采样序列频谱分析
New 3D particle function in QT 6.3
抖音__ac_signature
Nacos 的安装与服务的注册
Editor extensions in unity
Boring boring