当前位置:网站首页>[tricks] whiteningbert: an easy unsupervised sentence embedding approach
[tricks] whiteningbert: an easy unsupervised sentence embedding approach
2022-07-02 07:22:00 【lwgkzl】
executive summary
This article mainly introduces three uses BERT do Sentence Embedding Small Trick, Respectively :
- You should use all token embedding Of average As a sentence, it means , Instead of just using [CLS] Representation of corresponding position .
- stay BERT Multi level sentence vector superposition should be used in , Instead of just using the last layer .
- When judging sentence similarity through cosine similarity , have access to Whitening Operation to unify sentence embedding Vector distribution of , So we can get better sentence expression .
Model
The first two points introduced in this paper do not involve models , Only the third point Whitening The operation can be briefly introduced .
starting point : Cosine similarity as a measure of vector similarity is based on “ Orthonormal basis ” On the basis of , The basis vectors are different , The meaning of each value in the vector also changes . And then pass by BERT The coordinate system of the extracted sentence vector may not be based on the same “ Orthonormal basis ” The coordinate system of .
Solution : Normalize each vector into the coordinate system of the same standard orthogonal basis . A guess is , Each sentence vector generated by the pre training language model should be relatively uniform at each position in the coordinate system , That is, show all kinds of homosexuality . Based on this guess , We can normalize all sentence vectors , Make it isotropic . A feasible solution is to reduce the distribution of sentence vectors to normal distribution , Because the normal distribution satisfies the isotropy ( Mathematical theorems ).
practice :
Content screenshot from Su Shen's blog : link
Experiment and conclusion
- You should use all token embedding Of average As a sentence, it means , Instead of just using [CLS] Representation of corresponding position .
- superposition BERT Of 1,2,12 The vector effect of these three layers is the best .


- Whiten Operation is effective for most pre training language models .
Code
def whitening_torch_final(embeddings):
# For torch < 1.10
mu = torch.mean(embeddings, dim=0, keepdim=True)
cov = torch.mm((embeddings - mu).t(), (embeddings - mu))
# For torch >= 1.10
cov = torch.cov(embedding)
u, s, vt = torch.svd(cov)
W = torch.mm(u, torch.diag(1/torch.sqrt(s)))
embeddings = torch.mm(embeddings - mu, W)
return embeddings
after bert encoder The vector after that , Send in whitening_torch_final Function whitening The operation of .
Optimize
According to Su Shen's blog , Only keep SVD Extracted before N Eigenvalues can improve further effect . also , Because only the former N Features , So PCA The principle is similar , It is equivalent to doing a step of dimensionality reduction on the sentence vector .
Change the code to :
def whitening_torch_final(embeddings, keep_dim=256):
# For torch >= 1.10
cov = torch.cov(embedding) # emb_dim * emb_dim
u, s, vt = torch.svd(cov)
# u : emb_dim * emb_dim, s: emb_dim
W = torch.mm(u, torch.diag(1/torch.sqrt(s))) # W: emb_dim * emb_dim
embeddings = torch.mm(embeddings - mu, W[:,:keep_dim]) # truncation
return embeddings # bs * keep_dim
边栏推荐
猜你喜欢
随机推荐
CAD secondary development object
Oracle EBS interface development - quick generation of JSON format data
ORACLE EBS中消息队列fnd_msg_pub、fnd_message在PL/SQL中的应用
ORACLE EBS 和 APEX 集成登录及原理分析
Two table Association of pyspark in idea2020 (field names are the same)
SSM laboratory equipment management
使用Matlab实现:幂法、反幂法(原点位移)
ERNIE1.0 与 ERNIE2.0 论文解读
优化方法:常用数学符号的含义
Principle analysis of spark
The boss said: whoever wants to use double to define the amount of goods, just pack up and go
ORACLE 11G利用 ORDS+pljson来实现json_table 效果
Spark的原理解析
一份Slide两张表格带你快速了解目标检测
ORACLE EBS DATAGUARD 搭建
ORACLE 11.2.0.3 不停机处理SYSAUX表空间一直增长问题
Practice and thinking of offline data warehouse and Bi development
实现接口 Interface Iterable&lt;T&gt;
Thinkphp5中一个字段对应多个模糊查询
ARP攻击









