当前位置:网站首页>Bert bidirectional encoder based on transformer
Bert bidirectional encoder based on transformer
2022-07-28 06:23:00 【A tavern on the mountain】
BERT(Bidirection Ecoder Representations from Transformers): Left and right information , Deep two-way Transformer code (enceder) Model
Feature based model feature-based:ELMo, be based on RNN
Model based on fine tuning fine-tuning:GPT, One way language model
Catalog
1.MLM:Mask Language Model Mask language model ( Cloze form ) Self supervision
1.MLM:Mask Language Model Mask language model ( Cloze form ) Self supervision
for example i am a little boy,she is a beautiful gilr. Convert into MLM by i__ a __boy,__is a beautiful girl.
2.BERT frame :
pre-training and fine-tuning Two steps , First step , Training models with unlabeled data on different pre training tasks ; The second step , Use pre training initialization parameters and fine tune based on tagged data .
Task a :Mask-L-M 15% Probability mask , among 80%mask,10% Random substitution ,10% unchanged
Task 2 : Adjacent sentence prediction (next sentence prediction),50% Adjacent sentences ,50% Random .
The first etymology of each sequence is special for classification Token[CLS], The state of the last hidden layer is related to this Token[CLS] Related to ,Token[CLS] It is used to generate the representation of sequences and for classification tasks , That is, the learned features are integrated into [CLS] In this etymology .
The training result of a large number of unlabeled data is not necessarily worse than that of a small number of labeled data sets .

Input consists of three parts ,token embedding Code words , Semantic information .segment Code the segment , It means from the first few sentences .position embedding Embed... For location , Represents location information .
3. Loss function
- The first part is from Mask-LM Word level classification task ; Using cross entropy loss
- The other part is the sentence level classification task ; Using cross entropy loss
- The total objective function is to minimize the sum of the two loss functions
4. summary
Through the joint learning of these two tasks , You can make BERT The learned representations are token Level information , It also contains sentence level semantic information .
BERT Tell us , The training results on a large number of unlabeled data are not necessarily worse than those on a small number of labeled data .
BERT Use self-monitoring in two tasks (MLM and NSP) Pre training on , Make the encoder have high-quality feature extraction ability , The task of migrating to downstream is further completed NLP The task of .
边栏推荐
- Chinese display problem of calendarextender control
- 天线效应解决办法
- Varistor design parameters and classic circuit recording hardware learning notes 5
- ICC2使用report_placement检查floorplan
- 针对大量数据,MATLAB生成EXCEL文件并进行排版处理的源码
- Redhawk Dynamic Analysis
- Create a basic report using MS chart controls
- ICC2(四)Routing and Postroute Optimization
- ICC2(一)Preparing the Design
- set_multicycle_path
猜你喜欢

(PHP graduation project) obtained based on PHP novel website management system

How does fluke dtx-1800 test cat7 network cable?

Weight decay

Communication between DSP and FPGA

(PHP graduation project) based on PHP Gansu tourism website management system to obtain

ICC2(四)Routing and Postroute Optimization

Electric fast burst (EFT) design - EMC series hardware design notes 4

(PHP graduation project) based on PHP user online submission management system

(PHP graduation project) obtained based on thinkphp5 campus news release management system

硬件电路设计学习笔记2--降压电源电路
随机推荐
PyTorch 学习笔记 1 —— Quick Start
A NOVEL DEEP PARALLEL TIME-SERIES RELATION NETWORK FOR FAULT DIAGNOSIS
CString转char[]函数
机器学习笔记 5 —— Logistic Regression
硬件电路设计学习笔记2--降压电源电路
PLC的整体认识
FLUKE福禄克Aircheck wifi测试仪无法配置文件?---终极解决心得
针对大量数据,MATLAB生成EXCEL文件并进行排版处理的源码
N positions of bouncing shell
Electric fast burst (EFT) design - EMC series hardware design notes 4
EXFO 730C光时域反射计只有iOLM光眼升级OTDR(开通otdr权限)
将GrilView中的数据转换成DataTable
Low power design isolation cell
In asp Usage of cookies in. Net
TVS管参数与选型
(PHP graduation project) obtained based on thinkphp5 campus news release management system
ICC2使用report_placement检查floorplan
【YOLOv5】环境搭建:Win11 + mx450
Shuffle Net_ v1-shuffle_ v2
Deep learning (I): enter the theoretical part of machine learning and deep learning
https://www.bilibili.com/video/BV1PL411M7eQ?spm_id_from=333.999.0.0