当前位置:网站首页>Introduction to Bert and Vit
Introduction to Bert and Vit
2022-06-29 04:27:00 【Binary artificial intelligence】
List of articles
BERT and ViT brief introduction
BERT(Bidirectional Encoder Representations from Transformers) It's a language model ; and VIT(Vision Transformer) Is a visual model . Both use Transformer The encoder :
BERT
BERT Enter the word vector of the text , Output the semantic representation of the text . In the process of the training BERT It can be used for various language processing tasks .
BERT Pre training :
(1) Mission 1:Masked Language Model(MLM)
Content : Fill in the blanks .
Purpose : The training model shows the deep and two-way representation of sentences , That is, the words in the sentence can be inferred from left to right, and the words in the sentence can also be inferred from right to left .
Method :
Mask a sentence randomly [MASK] Hollowing out .
Then type the sentence BERT Get its expression :

The final will be [MASK] The corresponding means to input a multi class linear classifier to predict and fill in the blank :

(2) Mission 2: Next Sentence Prediction (NSP)
Content : Judge whether the two sentences are connected .
Purpose : Train the model to understand the relationship between sentences .
Method :

Different sentences ( for example “ Wake up! ” and “ You don't have a sister ”) use [SEP] Flag separated , And [CLS] Type... Together BERT. [CLS] The corresponding representation will be input into a two classifier , Judge whether the two sentences are connected . Be careful ,BERT Inside, there is a lot of attention ,[CLS] It can be placed anywhere in the sentence , Finally, you can get other input information .
Using pre-trained BERT:
(1) Enter a sentence , Output category :

The linear classifier (Linear Classifier) It's training from scratch , and BERT fine-tuning (Fine-tune) Parameters .
(2) Enter a sentence , Classify each word in the sentence ( For example, the verb 、 Noun 、 Pronouns, etc ).
Empathy , The linear classifier is trained from scratch , and BERT fine-tuning (Fine-tune) Parameters .
(3) Type in two sentences , Output a category ( for example , Judge whether there is a certain relationship between the two sentences ).
(4) reading comprehension : Input article (Document) And some questions (Query), Output the answer to the question .
Output ( s , e ) (s,e) (s,e) Indicates the... In the article s s s Two words and ( Include ) The first e e e The content between words . for example :


Using two trainable parameter vectors, respectively, and document The corresponding representation is a dot product , And then pass by Softmax Choose the location with the highest probability .
| The starting position ( Orange parameter vector ) | End position ( Blue parameter vector ) |
|---|---|
![]() | ![]() |
ViT

VIT And BERT equally , Also used. Transformer The encoder , But because it deals with image data , So we need to do some special processing on the image in the input part :VIT Block and vectorize the input picture , Thus, the same coding model as the word vector can be used .
(1) Divide the image into small pieces of a sequence (patch), Each piece is equivalent to a word in the sentence .

(2) Flatten the small pieces (flatten) Form a vector and use a linear transformation matrix to map it linearly .

(3) And above BERT Of [CLS] equally ,VIT Such a category vector has also been added :*. Then add position information for each vector .

(4) Input Transformer Encoder


(5) The last is classification , And BERT Empathy .

Be careful ,VIT The pre training task is also classified .
[1] machine learning , Li Hongyi ,http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML19.html
[2] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding h ttps://arxiv.org/pdf/1810.04805v2.pdf
[3] AN IMAGE IS WORTH 16X16 WORDS:TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE https://arxiv.org/abs/2010.11929
边栏推荐
- How to quickly change the database name in MySQL
- Ask a simple question about SQL
- 源代码防泄露技术种类浅析
- 1015 德才论
- Is the sink usually the JDBC insert update delete?
- C language -- branch structure
- 剑指 Offer II 040. 矩阵中最大的矩形
- SQL database stored procedure writing method
- NotImplementedError: Could not run torchvision::nms
- Template method pattern
猜你喜欢

Command pattern

What are the MySQL database constraint types

Apifox: it is not only an API debugging tool, but also a collaboration artifact of the development team

Remote connection of raspberry pie in VNC Viewer Mode

Open source demo| you draw and I guess -- make your life more interesting

Runtimeerror in yolox: dataloader worker (PID (s) 17724, 1364, 18928) exited unexpectedly

仿真與燒錄程序有哪幾種方式?(包含常用工具與使用方式)

Remediation for Unsafe Cryptographic Encryption

Composite pattern

If you choose the right school, you can enter Huawei as a junior college. I wish I had known
随机推荐
The great gods take connections from the MySQL connection pool in the open of the rich function. The initialization of the connection pool is 20. If the parallelism of the rich function is 1
Installation and configuration of interrealsense d435i camera driver
My creation anniversary
Does cdc2.2.1 not support postgresql14.1? Based on the pgbouncer connection mode, with 5433
Cisco voice card handling configuration
[hackthebox] dancing (SMB)
【HackTheBox】dancing(SMB)
从零到一,教你搭建「以文搜图」搜索服务(一)
【HackTheBox】dancing(SMB)
[C language] explain the thread exit function pthread_ exit
Is the interviewer too difficult to serve? A try catch asks so many tricks
Runtimeerror in yolox: dataloader worker (PID (s) 17724, 1364, 18928) exited unexpectedly
Seattention channel attention mechanism
ECS 四 Sync Point、Write Group、Version Number
Airflow2.2.3 + efficiency + MySQL 8 build a robust distributed scheduling cluster
From zero to one, I will teach you to build a "search by text and map" search service (I)
Daily practice - February 15, 2022
使用AssetStudio/UnityStudio UABE等
SEAttention 通道注意力机制
树莓派用VNC Viewer方式远程连接

