当前位置:网站首页>Reading the thesis "sentence embeddings using Siamese Bert networks"
Reading the thesis "sentence embeddings using Siamese Bert networks"
2022-07-23 10:44:00 【jst100】
Article address :https://arxiv.org/abs/1908.10084
List of articles
Article content
BERT and RoBERTa Good results have been achieved in some tasks , But when dealing with tasks related to sentence pairs , They require that 2 Input three sentences together into the network for training , And this will lead to high computational costs . Therefore, the author of this paper proposed Sentence-BERT(SBERT), It makes twin or triple network architecture (siamese and triplet network) Generate meaningful sentences embedded , Then, the cosine similarity can be used for comparison , Because the above network architecture can encode sentences in parallel , Therefore, the processing time of sentences to tasks can be greatly reduced .
Article model
SBERT stay BERT A pooling operation is added to the model , In fact, the author has tried three ways , Namely :[CLS]、MEAN-strategy、MAX-strategy, The best way to pass the experiment is MEAN-strategy.
The author has set 3 Structure and objective function .
Classification Objective Function: Classification objective function 
The classification structure is shown above , In order to better learn 2 Interaction between sentences , The following strategies are used :
Loss is the common cross loss entropy . Of course, the author also tested several strategies of interaction or feature fusion , The results are as follows :
Regression Objective Function: Regression objective function :
This is a simple calculation 2 Cosine similarity between sentences , In fact, the author has also tried European distance or Manhattan distance , But cosine similarity is the best . Use the mean square error as the loss function .
Triplet Objective Function.: Sansheng network : This is the given anchor sentence a, Example p, Negative example sentence n, The training method is to let a and p As far as possible , Give Way a and n As far as possible , The formula is as follows , The author here simply uses Euclidean distance .
Article summary
For experiments and training ( Wikipedia 、 Sentence pair classification task ) No more details about , I feel that the most valuable part of this article is the interactivity of this sentence , How to classify the network , And the cosine similarity of the regression network 、 Various distances, etc , Maybe it can be used in other tasks , And through 2 The same encoder can encode sentences in parallel, which can be fully hardware , Saving time .
边栏推荐
- Chapter2 Standard Output
- Visual Studio 2022有趣又强大的智能辅助编码
- MapReduce advanced
- openvino_datawhale
- Customer first | domestic Bi leader, smart software completes round C financing
- SQLZOO——SELECT from WORLD Tutorial
- Warning lnk4210 reports an error when writing the driver
- Chapter 4 Executing Commands
- Rapid SQL all platforms high performance SQL code
- After 100 billion of revenue, Alibaba cloud ecosystem has a new way to play
猜你喜欢

推荐一款 Shell 装逼神器,已开源!网友:真香。。。

序列模型(三)- 序列模型和注意力机制

Rapid SQL all platforms high performance SQL code

Information security is in danger, and it is urgent to control the leakage of enterprise data assets

交换机Exchanges

8 < tag dynamic programming and LCS problems > lt.300. Longest increasing subsequence + lt.674. Longest continuous increasing sequence

SQLZOO——SELECT from WORLD Tutorial

数据湖:Apache Iceberg介绍

The safe distance between you and personal information leakage may be decided by a laptop!

Chapter 4: runtime data area - shared space
随机推荐
Ue5 official case Lyra full feature explanation 6. generate defense tower
[qt5.12] qt5.12 installation tutorial
中国经济网:“元宇宙”炙手可热
How to protect the copyright of NFT digital collections?
美团8年经验之谈,测试工程师如何进阶(自动化、性能、测开)
MySQL query optimization - detailed explanation
MGRE环境下实现私网互通综合实验
kex_ exchange_ Identification: read: connection reset by peer imperfect solution (one)
LeetCode刷题--点滴记录022
牛客刷题篇——剑指offer (第二期)
Network data leakage events occur frequently, how to protect personal privacy information?
China Economic Net: "Yuan universe" is hot
Interpretation of ultra fast deep lane detection with hybrid anchor driven ordinal classification
Flutter 运行flutter pub get 报错“客户端没有所需特权“
Openvino Datawhale
第一章概述-------第一节--1.2互联网概述
SQLZOO——SELECT Quiz
Accessory mode
selenium JD爬虫
Chapter2 Standard Output