当前位置:网站首页>Negative sampling
Negative sampling
2022-07-05 22:56:00 【Ice cream and Mousse Cake】
1. Noise comparison estimate (Noise contrastive estimation)
In the language model , According to context c, In the whole corpus V Predict a word in w Probability , It is generally used softmax form , Formula for :
In order to avoid a huge amount of computation ,NCE The idea is to softmax The problem of parameter estimation Turn it into Two classification . Two categories of samples are real samples and noise samples :
Positive sample : Generated by empirical distribution ( That is, the real distribution ) label D=0,
Negative sample : noise , from q(w) Generate Corresponding label D=1.
Extract one from the positive sample , Extract from the noise distribution k Noise samples , In total samples ( Real samples + Noise samples ) in ,w The target word for prediction .
that (d,w) The joint probability distribution of is as follows :
Some calculation processes and assumptions are omitted in the middle ,( Process link ) Final :

source :
Computational reasoning ( You know )https://zhuanlan.zhihu.com/p/76568362
[ Reasoning process, etc (csdn)]https://blog.csdn.net/weixin_40901056/article/details/88568344
2. Negative sampling Negative Sampling
(1) Negative sampling Negative Sampling yes NCE A variation of , The definition of probability is different :

(2) The problem of negative sampling
Negative sampling is for A compromise solution to too many categories .
Points to consider in practice include : There may be deviations ; Negative samples are too far from the dividing line to be valuable ; The distribution fluctuates greatly , The noise is too strong , It is difficult to select negative samples ; Negative samples may bring positive benefits by themselves . When there is a gap with expectations, you can consider .
such as Prediction task , Negative sampling may Make it more skewed , Such as click through rate estimation , The original sample click through rate is 0.01, Negative sampling makes the proportion of positive and negative samples 1:9, The average click through rate of the final sample is 0.1, If negative sampling is necessary for this task, certain corrections must be made .
Sample information is too redundant , adopt negative sampling You can improve the training speed with the same machine resources , And the impact on the effect is very limited , This is very important for a limited budget .
When negative samples cannot effectively reflect the real intention of users ,negative sampling May bring benefits , For example, there are some scenes that most users may not see, resulting in negative sample collection ;
(3) Negative sampling method
Static negative sampling 、 Strong negative sample 、 Counter negative sampling 、 Graph based negative sampling And introducing negative sampling of additional information .
① Static negative sampling (Static Negative Sampling)
from Non interactive set Choose from Known examples are negative examples , By giving different examples Set different weights , Then we can sample according to the negative case distribution . When When the probability that each sample is sampled as a negative sample does not change with the training , We call this sampling strategy static negative sampling .
In the static negative sampling method , The simplest and most widely used method is Random negative sampling (RNS, viz. Random Negative Sampling), Also known as Uniform negative sampling (Uniform Negative Sampling).RNS Randomly select one negative example from the negative example candidate set , In studies that do not consider negative sampling , Researchers generally use RNS As a basic sampling method , In order to be fair with baseline Compare .
For every positive example , Different negative cases have different effects , A heuristic strategy for negative case distribution is Negative sampling based on popularity (PNS, viz. Popularity-biased Negative Sampling).** Popularity can be measured by frequency (frequency) Or degree (degree) To reflect ,** Sample The sum of the probability of being selected as a negative example The popularity of The power has a proportional relationship .
② Strong negative sample (Hard Negative Sampling)
The so-called strong negative example (hard negative) Of hard Depends on the model , those Examples of misclassification , Or negative cases with higher prediction scores , It is more relevant to the results of the improved model .
③ Counter negative sampling (Adversarial Sampling)
Antagonistic negative sampling methods usually be based on GAN To select the negative example .
**GAN:** Random generation , Right or wrong , Call back and regenerate , To determine …
④ Graph based negative sampling (Graph-based Sampling)
The graph based negative sampling method is further combined with examples The structure on the graph (structural) Information .
⑤ Introduce negative sampling of additional information (Additional Data Enhanced Sampling)
Some jobs use connections in Social Networks 、 The user's geographic location 、 Product category information and additional interactive data , For example, products that users browse but are not clicked (viewed but non-clicked) , And the products that users click but don't buy (clicked but non-purchased) To enhance the selection of negative examples .
In the recommended scenario of industry , Different behaviors ( For example, browsing 、 Click on 、 Add cart 、 Buy ) Is the key to modeling user preferences .
(4) Integrate into the course learning (Curriculum Learning)
(5) Negative sampling ratio
The negative sampling method is mainly to improve the quality of negative samples , and The negative sampling ratio determines the number of negative cases .
Main reference :
The problem of negative sampling
Negative sampling method
About the first part NCE The specific derivation process is still a little incomprehensible , It can be further improved if necessary in the future .
summary ,
Negative sampling is actually giving certain weight to some negative cases , It's a kind of thought , That is, those data that do not contain the content that needs to be investigated are also valuable . More knowledge is needed for specific operation and Application .
Learn to organize , If there is something wrong, please point it out , thank you !
边栏推荐
- Lesson 1: serpentine matrix
- SPSS analysis of employment problems of college graduates
- 利用LNMP实现wordpress站点搭建
- 一文搞定JVM常见工具和优化策略
- 第一讲:蛇形矩阵
- 【Note17】PECI(Platform Environment Control Interface)
- H5c3 advanced - player
- Global and Chinese markets for welding products 2022-2028: Research Report on technology, participants, trends, market size and share
- Metaverse ape ape community was invited to attend the 2022 Guangdong Hong Kong Macao Great Bay metauniverse and Web3.0 theme summit to share the evolution of ape community civilization from technology
- 我把开源项目alinesno-cloud-service关闭了
猜你喜欢

VOT toolkit environment configuration and use
![[speech processing] speech signal denoising and denoising based on Matlab GUI low-pass filter [including Matlab source code 1708]](/img/df/9aa83ac5bd9f614942310a040a6dff.jpg)
[speech processing] speech signal denoising and denoising based on Matlab GUI low-pass filter [including Matlab source code 1708]

终于搞懂什么是动态规划的

谷歌地图案例

Starting from 1.5, build a micro Service Framework -- log tracking traceid

The method and principle of viewing the last modification time of the web page

Metaverse ape received $3.5 million in seed round financing from negentropy capital

Three.JS VR看房

Event trigger requirements of the function called by the event trigger

Codeforces Global Round 19
随机推荐
VOT toolkit environment configuration and use
openresty ngx_lua正则表达式
2022 R2 mobile pressure vessel filling review simulation examination and R2 mobile pressure vessel filling examination questions
Starting from 1.5, build a micro Service Framework -- log tracking traceid
Double pointer of linked list (fast and slow pointer, sequential pointer, head and tail pointer)
The code generator has deoptimised the styling of xx/typescript.js as it exceeds the max of 500kb
leecode-学习笔记
查看网页最后修改时间方法以及原理简介
Usage Summary of scriptable object in unity
Global and Chinese markets for welding products 2022-2028: Research Report on technology, participants, trends, market size and share
openresty ngx_ Lua request response
[digital signal denoising] improved wavelet modulus maxima digital signal denoising based on MATLAB [including Matlab source code 1710]
C language - structural basis
Matlab smooth curve connection scatter diagram
I closed the open source project alinesno cloud service
First, redis summarizes the installation types
VIM tail head intercept file import
audiopolicy
TCC of distributed solutions
Distance from point to line intersection and included angle of line