当前位置:网站首页>[tricks] whiteningbert: an easy unsupervised sentence embedding approach
[tricks] whiteningbert: an easy unsupervised sentence embedding approach
2022-07-02 07:22:00 【lwgkzl】
executive summary
This article mainly introduces three uses BERT do Sentence Embedding Small Trick, Respectively :
- You should use all token embedding Of average As a sentence, it means , Instead of just using [CLS] Representation of corresponding position .
- stay BERT Multi level sentence vector superposition should be used in , Instead of just using the last layer .
- When judging sentence similarity through cosine similarity , have access to Whitening Operation to unify sentence embedding Vector distribution of , So we can get better sentence expression .
Model
The first two points introduced in this paper do not involve models , Only the third point Whitening The operation can be briefly introduced .
starting point : Cosine similarity as a measure of vector similarity is based on “ Orthonormal basis ” On the basis of , The basis vectors are different , The meaning of each value in the vector also changes . And then pass by BERT The coordinate system of the extracted sentence vector may not be based on the same “ Orthonormal basis ” The coordinate system of .
Solution : Normalize each vector into the coordinate system of the same standard orthogonal basis . A guess is , Each sentence vector generated by the pre training language model should be relatively uniform at each position in the coordinate system , That is, show all kinds of homosexuality . Based on this guess , We can normalize all sentence vectors , Make it isotropic . A feasible solution is to reduce the distribution of sentence vectors to normal distribution , Because the normal distribution satisfies the isotropy ( Mathematical theorems ).
practice :
Content screenshot from Su Shen's blog : link
Experiment and conclusion
- You should use all token embedding Of average As a sentence, it means , Instead of just using [CLS] Representation of corresponding position .
- superposition BERT Of 1,2,12 The vector effect of these three layers is the best .


- Whiten Operation is effective for most pre training language models .
Code
def whitening_torch_final(embeddings):
# For torch < 1.10
mu = torch.mean(embeddings, dim=0, keepdim=True)
cov = torch.mm((embeddings - mu).t(), (embeddings - mu))
# For torch >= 1.10
cov = torch.cov(embedding)
u, s, vt = torch.svd(cov)
W = torch.mm(u, torch.diag(1/torch.sqrt(s)))
embeddings = torch.mm(embeddings - mu, W)
return embeddings
after bert encoder The vector after that , Send in whitening_torch_final Function whitening The operation of .
Optimize
According to Su Shen's blog , Only keep SVD Extracted before N Eigenvalues can improve further effect . also , Because only the former N Features , So PCA The principle is similar , It is equivalent to doing a step of dimensionality reduction on the sentence vector .
Change the code to :
def whitening_torch_final(embeddings, keep_dim=256):
# For torch >= 1.10
cov = torch.cov(embedding) # emb_dim * emb_dim
u, s, vt = torch.svd(cov)
# u : emb_dim * emb_dim, s: emb_dim
W = torch.mm(u, torch.diag(1/torch.sqrt(s))) # W: emb_dim * emb_dim
embeddings = torch.mm(embeddings - mu, W[:,:keep_dim]) # truncation
return embeddings # bs * keep_dim
边栏推荐
- 软件开发模式之敏捷开发(scrum)
- RMAN incremental recovery example (1) - without unbacked archive logs
- Oracle EBS database monitoring -zabbix+zabbix-agent2+orabbix
- Sqli Labs clearance summary - page 2
- Module not found: Error: Can't resolve './$$_ gendir/app/app. module. ngfactory'
- 【信息检索导论】第三章 容错式检索
- Message queue fnd in Oracle EBS_ msg_ pub、fnd_ Application of message in pl/sql
- 【论文介绍】R-Drop: Regularized Dropout for Neural Networks
- SSM学生成绩信息管理系统
- JSP智能小区物业管理系统
猜你喜欢

Oracle EBS数据库监控-Zabbix+zabbix-agent2+orabbix

离线数仓和bi开发的实践和思考

SSM second hand trading website

Write a thread pool by hand, and take you to learn the implementation principle of ThreadPoolExecutor thread pool

Classloader and parental delegation mechanism

Implementation of purchase, sales and inventory system with ssm+mysql

【模型蒸馏】TinyBERT: Distilling BERT for Natural Language Understanding

【BERT,GPT+KG调研】Pretrain model融合knowledge的论文集锦

IDEA2020中测试PySpark的运行出错

【信息检索导论】第六章 词项权重及向量空间模型
随机推荐
Network security -- intrusion detection of emergency response
Sparksql data skew
view的绘制机制(二)
TCP attack
Oracle EBS数据库监控-Zabbix+zabbix-agent2+orabbix
实现接口 Interface Iterable&lt;T&gt;
How to efficiently develop a wechat applet
php中通过集合collect的方法来实现把某个值插入到数组中指定的位置
【论文介绍】R-Drop: Regularized Dropout for Neural Networks
[introduction to information retrieval] Chapter II vocabulary dictionary and inverted record table
CRP implementation methodology
Implementation of purchase, sales and inventory system with ssm+mysql
Feeling after reading "agile and tidy way: return to origin"
【MEDICAL】Attend to Medical Ontologies: Content Selection for Clinical Abstractive Summarization
ssm垃圾分类管理系统
MySQL组合索引加不加ID
User login function: simple but difficult
Oracle EBS database monitoring -zabbix+zabbix-agent2+orabbix
【信息检索导论】第六章 词项权重及向量空间模型
php中获取汉字拼音大写首字母