当前位置：网站首页>Introduction to Bert and Vit

Introduction to Bert and Vit

2022-06-29 04:27:00 【Binary artificial intelligence】

List of articles

BERT and ViT brief introduction
- BERT
- ViT

BERT and ViT brief introduction

BERT(Bidirectional Encoder Representations from Transformers) It's a language model ; and VIT(Vision Transformer) Is a visual model . Both use Transformer The encoder ：

BERT

BERT Enter the word vector of the text , Output the semantic representation of the text . In the process of the training BERT It can be used for various language processing tasks .

BERT Pre training ：

（1） Mission 1：Masked Language Model（MLM）

Content ： Fill in the blanks .
Purpose ： The training model shows the deep and two-way representation of sentences , That is, the words in the sentence can be inferred from left to right, and the words in the sentence can also be inferred from right to left .
Method ：

Mask a sentence randomly [MASK] Hollowing out .

Then type the sentence BERT Get its expression ：

The final will be [MASK] The corresponding means to input a multi class linear classifier to predict and fill in the blank ：

（2） Mission 2： Next Sentence Prediction (NSP)

Content ： Judge whether the two sentences are connected .
Purpose ： Train the model to understand the relationship between sentences .
Method ：

Different sentences （ for example “ Wake up! ” and “ You don't have a sister ”） use [SEP] Flag separated , And [CLS] Type... Together BERT. [CLS] The corresponding representation will be input into a two classifier , Judge whether the two sentences are connected . Be careful ,BERT Inside, there is a lot of attention ,[CLS] It can be placed anywhere in the sentence , Finally, you can get other input information .

Using pre-trained BERT：

（1） Enter a sentence , Output category ：

The linear classifier (Linear Classifier) It's training from scratch , and BERT fine-tuning （Fine-tune） Parameters .

（2） Enter a sentence , Classify each word in the sentence （ For example, the verb 、 Noun 、 Pronouns, etc ）.

Empathy , The linear classifier is trained from scratch , and BERT fine-tuning （Fine-tune） Parameters .

（3） Type in two sentences , Output a category （ for example , Judge whether there is a certain relationship between the two sentences ）.

（4） reading comprehension ： Input article （Document） And some questions （Query）, Output the answer to the question .

Output $(s, e)$ Indicates the... In the article $s$ Two words and （ Include ） The first $e$ The content between words . for example ：

Using two trainable parameter vectors, respectively, and document The corresponding representation is a dot product , And then pass by Softmax Choose the location with the highest probability .

The starting position （ Orange parameter vector ）	End position （ Blue parameter vector ）

ViT

VIT And BERT equally , Also used. Transformer The encoder , But because it deals with image data , So we need to do some special processing on the image in the input part ：VIT Block and vectorize the input picture , Thus, the same coding model as the word vector can be used .

(1) Divide the image into small pieces of a sequence （patch）, Each piece is equivalent to a word in the sentence .

（2） Flatten the small pieces （flatten） Form a vector and use a linear transformation matrix to map it linearly .

Insert picture description here
（3） And above BERT Of [CLS] equally ,VIT Such a category vector has also been added ：*. Then add position information for each vector .