当前位置:网站首页>SimCLR:NLP中的对比学习
SimCLR:NLP中的对比学习
2022-07-06 08:53:00 【InfoQ】
论文简介
- 诸神黄昏时代的对比学习
- “军备竞赛”时期的对比学习
- 使用预训练好的 Bert 直接获得句子向量,可以是 CLS 位的向量,也可以是不同 token 向量的平均值。
- Bert-flow^[On the Sentence Embeddings from Pre-trained Language Models],主要是利用流模型校正 Bert 的向量。
- Bert-whitening^[Whitening Sentence Representations for Better Semantics and Faster Retrieval],用预训练 Bert 获得所有句子的向量,得到句子向量矩阵,然后通过一个线性变换把句子向量矩阵变为一个均值 0,协方差矩阵为单位阵的矩阵。
- Sentence-Bert (SBERT)^[Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks],通过 Bert 的孪生网络获得两个句子的向量,进行有监督学习,SBERT 的结构如下图所示。

什么是dropout

dropout和其他数据增强方法进行比较

不同的dropout rate

对比学习评价指标
- alignment 计算所有正样本对之间的距离,如果 alignment 越小,则正样本的向量越接近,对比学习效果越好,计算公式如下:$$
- uniformity 表示所有句子向量分布的均匀程度,越小表示向量分布越均匀,对比学习效果越好,计算公式如下:$$\ell_{\text {uniform }} \triangleq \log \quad \mathbb{E}{x, y \stackrel{i . i . d .}{\sim} p{\text {data }}} e^{-2|f(x)-f(y)|^{2}}$$

无监督

有监督


结果

代码实践
pip install simcse
SimCSE("在这里填写不同版本")
from simcse import SimCSE
model = SimCSE("princeton-nlp/sup-simcse-bert-base-uncased")
embeddings = model.encode("A woman is reading.")

sentences_a = ['A woman is reading.', 'A man is playing a guitar.']
sentences_b = ['He plays guitar.', 'A woman is making a photo.']
similarities = model.similarity(sentences_a, sentences_b)
similarities1 = model.similarity('A woman is reading.', 'A man is playing a guitar.')
similarities2 = model.similarity('He plays guitar.', 'A man is playing a guitar.')

sentences = ['A woman is reading.', 'A man is playing a guitar.']
model.build_index(sentences)
results = model.search("He plays guitar.")
similarities2 = model.similarity('He plays guitar.', 'A man is playing a guitar.')
He plays guitar.
A man is playing a guitar.

边栏推荐
- Visual implementation and inspection of visdom
- LeetCode:498. 对角线遍历
- Promise 在uniapp的简单使用
- PC easy to use essential software (used)
- ROS compilation calls the third-party dynamic library (xxx.so)
- Li Kou daily question 1 (2)
- After PCD is converted to ply, it cannot be opened in meshlab, prompting error details: ignored EOF
- Esp8266-rtos IOT development
- UML diagram memory skills
- Purpose of computer F1-F12
猜你喜欢
随机推荐
Mobile phones and computers on the same LAN access each other, IIS settings
[sword finger offer] serialized binary tree
随手记01
@JsonBackReference和@JsonManagedReference(解决对象中存在双向引用导致的无限递归)
目标检测——Pytorch 利用mobilenet系列(v1,v2,v3)搭建yolov4目标检测平台
Purpose of computer F1-F12
The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower
LeetCode:劍指 Offer 42. 連續子數組的最大和
ESP8266-RTOS物联网开发
Indentation of tabs and spaces when writing programs for sublime text
The network model established by torch is displayed by torch viz
poi追加写EXCEL文件
如何有效地进行自动化测试?
LeetCode:41. 缺失的第一个正数
【嵌入式】Cortex M4F DSP库
[Hacker News Weekly] data visualization artifact; Top 10 Web hacker technologies; Postman supports grpc
Super efficient! The secret of swagger Yapi
Pytorch view tensor memory size
Chapter 1 :Application of Artificial intelligence in Drug Design:Opportunity and Challenges
Deep anatomy of C language -- C language keywords