当前位置:网站首页>SimCLR:NLP中的对比学习
SimCLR:NLP中的对比学习
2022-07-06 08:53:00 【InfoQ】
论文简介
- 诸神黄昏时代的对比学习
- “军备竞赛”时期的对比学习
- 使用预训练好的 Bert 直接获得句子向量,可以是 CLS 位的向量,也可以是不同 token 向量的平均值。
- Bert-flow^[On the Sentence Embeddings from Pre-trained Language Models],主要是利用流模型校正 Bert 的向量。
- Bert-whitening^[Whitening Sentence Representations for Better Semantics and Faster Retrieval],用预训练 Bert 获得所有句子的向量,得到句子向量矩阵,然后通过一个线性变换把句子向量矩阵变为一个均值 0,协方差矩阵为单位阵的矩阵。
- Sentence-Bert (SBERT)^[Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks],通过 Bert 的孪生网络获得两个句子的向量,进行有监督学习,SBERT 的结构如下图所示。

什么是dropout

dropout和其他数据增强方法进行比较

不同的dropout rate

对比学习评价指标
- alignment 计算所有正样本对之间的距离,如果 alignment 越小,则正样本的向量越接近,对比学习效果越好,计算公式如下:$$
- uniformity 表示所有句子向量分布的均匀程度,越小表示向量分布越均匀,对比学习效果越好,计算公式如下:$$\ell_{\text {uniform }} \triangleq \log \quad \mathbb{E}{x, y \stackrel{i . i . d .}{\sim} p{\text {data }}} e^{-2|f(x)-f(y)|^{2}}$$

无监督

有监督


结果

代码实践
pip install simcse
SimCSE("在这里填写不同版本")
from simcse import SimCSE
model = SimCSE("princeton-nlp/sup-simcse-bert-base-uncased")
embeddings = model.encode("A woman is reading.")

sentences_a = ['A woman is reading.', 'A man is playing a guitar.']
sentences_b = ['He plays guitar.', 'A woman is making a photo.']
similarities = model.similarity(sentences_a, sentences_b)
similarities1 = model.similarity('A woman is reading.', 'A man is playing a guitar.')
similarities2 = model.similarity('He plays guitar.', 'A man is playing a guitar.')

sentences = ['A woman is reading.', 'A man is playing a guitar.']
model.build_index(sentences)
results = model.search("He plays guitar.")
similarities2 = model.similarity('He plays guitar.', 'A man is playing a guitar.')
He plays guitar.
A man is playing a guitar.

边栏推荐
- MySQL uninstallation and installation methods
- PC easy to use essential software (used)
- Intel Distiller工具包-量化实现1
- LeetCode:387. 字符串中的第一个唯一字符
- Alibaba cloud server mining virus solution (practiced)
- LeetCode:41. 缺失的第一个正数
- Leetcode刷题题解2.1.1
- LeetCode:387. The first unique character in the string
- @Jsonbackreference and @jsonmanagedreference (solve infinite recursion caused by bidirectional references in objects)
- vb. Net changes with the window, scales the size of the control and maintains its relative position
猜你喜欢
Using C language to complete a simple calculator (function pointer array and callback function)
vb. Net changes with the window, scales the size of the control and maintains its relative position
The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower
Delay initialization and sealing classes
Indentation of tabs and spaces when writing programs for sublime text
广州推进儿童友好城市建设,将探索学校周边200米设安全区域
Cesium draw points, lines, and faces
C language double pointer -- classic question type
Esp8266-rtos IOT development
Guangzhou will promote the construction of a child friendly city, and will explore the establishment of a safe area 200 meters around the school
随机推荐
Indentation of tabs and spaces when writing programs for sublime text
Niuke winter vacation training 6 maze 2
[NVIDIA development board] FAQ (updated from time to time)
Bitwise logical operator
C語言雙指針——經典題型
Intel Distiller工具包-量化实现1
Notes 01
swagger设置字段required必填
Crash problem of Chrome browser
数学建模2004B题(输电问题)
Mongodb installation and basic operation
vb. Net changes with the window, scales the size of the control and maintains its relative position
LeetCode:387. The first unique character in the string
JVM quick start
Navicat Premium 创建MySql 创建存储过程
LeetCode:41. Missing first positive number
Pytorch view tensor memory size
The problem and possible causes of the robot's instantaneous return to the origin of the world coordinate during rviz simulation
Mobile phones and computers on the same LAN access each other, IIS settings
软件压力测试常见流程有哪些?专业出具软件测试报告公司分享