当前位置：网站首页>Figure out the working principle of gpt3

Figure out the working principle of gpt3

2022-07-07 07:48:00 【Chief prisoner】

The illustration GPT3 How it works

GPT3 The hype Caused an uproar in the scientific and technological circles . A large number of language models ( Such as GPT3) Our ability began to surprise us . Although most enterprises can't safely display these models in front of customers , But they are showing some clever sparks , These sparks will definitely accelerate the process of Automation , And promote the development of intelligent computer systems . Let's eliminate GPT3 Mysterious aura , Learn how it trains and works .

The trained language model generates text .

We can choose to pass some text to it as input , This will affect its output .

These outputs are generated by scanning a large amount of text by the model during training “ Acquire ” Something produced .

Moving picture cover

Training is the process of exposing the model to a large amount of text . The process has been completed . All the experiments you see now come from the trained model . It is estimated that , It costs 355 Year of GPU Time , cost 460 Thousands of dollars .

Moving picture cover

3000 100 million texts token The data set of is used to generate training samples of the model . for example , These are three training samples generated from a sentence at the top .

You will see how to slide the window over all the text and generate many samples .

The model comes with an example . We only show it the characteristics , Then let it predict the next word .

The prediction of this model will be wrong . We calculate the error in the prediction and update the model , To make better predictions next time .

Repeat millions of times

Moving picture cover

Now? , Let's look at these same steps in more detail .

GPT3 In fact, one output is generated at a time token（ Now suppose a token It's a word ）.

Moving picture cover

Please note that ： That's right GPT-3 Description of the working mode of , Not about it GPT-3 Discussion of novelty （ It's mainly the ridiculous scale ）. The architecture is based on this article https://arxiv.org/pdf/1801.10198.pdf Of Transformer Decoder model

GPT3 It's huge . It's right from 1750 An digital （ It's called a parameter ） Code what you learn from your training . These numbers are used to calculate the token.

Untrained models start with random parameters . Training will find value that can bring better prediction .

These numbers are part of hundreds of matrices in the model . Prediction is mainly a lot of matrix multiplication .

stay YouTube Upper AI Introduction , Shows a simple with one parameter ML Model . A good start , To unlock this 175B The monster .

In order to clarify the distribution and use of these parameters , We need to open the model and look inside .

GPT3 by 2048 individual token. This is its “ Context window ”. That means it has 2048 Tracks , Process along these tracks token.

Let's follow the purple track . How does the system deal with “robotics” Word and produce “ A”？

step ：

Convert words to representative words Vector （ A list of numbers ）
Calculating predictions
Convert the result vector into words

Moving picture cover

GPT3 The important calculation of takes place in its 96 individual Transformer Inside the stack of the decoder layer .

See all these layers ？ This is a “ Deep learning ” Medium “ depth ”.

Each of these layers has its own 1.8B Parameters are calculated . That's it “ Magic ” Where it happened . This is a high-level view of the process ：

Moving picture cover

Can be in the article The Illustrated GPT2 in See the detailed description of all contents inside the decoder .
And GPT3 The difference lies in the alternating density and Sparse self attention layer .

This is a GPT3 Input and response in （“Okay human”） Of X ray . Note that each token How to flow through the entire layer stack . We don't care about the output of the first word . When you're done typing , We began to care about output . We feed each word back into the model .

Moving picture cover

stay React Code generation example , The description will be an input prompt ( Use green to show ), I believe there are still a few description=> Code example .react The code will look like pink here token Generate one after another token.

My assumption is , Start the example and description as input , Use specific token Separate the examples from the results . Then input it into the model .

Moving picture cover