当前位置:网站首页>Figure out the working principle of gpt3
Figure out the working principle of gpt3
2022-07-07 07:48:00 【Chief prisoner】
The illustration GPT3 How it works
GPT3 The hype Caused an uproar in the scientific and technological circles . A large number of language models ( Such as GPT3) Our ability began to surprise us . Although most enterprises can't safely display these models in front of customers , But they are showing some clever sparks , These sparks will definitely accelerate the process of Automation , And promote the development of intelligent computer systems . Let's eliminate GPT3 Mysterious aura , Learn how it trains and works .
The trained language model generates text .
We can choose to pass some text to it as input , This will affect its output .
These outputs are generated by scanning a large amount of text by the model during training “ Acquire ” Something produced .
Training is the process of exposing the model to a large amount of text . The process has been completed . All the experiments you see now come from the trained model . It is estimated that , It costs 355 Year of GPU Time , cost 460 Thousands of dollars .
3000 100 million texts token The data set of is used to generate training samples of the model . for example , These are three training samples generated from a sentence at the top .
You will see how to slide the window over all the text and generate many samples .
The model comes with an example . We only show it the characteristics , Then let it predict the next word .
The prediction of this model will be wrong . We calculate the error in the prediction and update the model , To make better predictions next time .
Repeat millions of times
Now? , Let's look at these same steps in more detail .
GPT3 In fact, one output is generated at a time token( Now suppose a token It's a word ).
Please note that : That's right GPT-3 Description of the working mode of , Not about it GPT-3 Discussion of novelty ( It's mainly the ridiculous scale ). The architecture is based on this article https://arxiv.org/pdf/1801.10198.pdf Of Transformer Decoder model
GPT3 It's huge . It's right from 1750 An digital ( It's called a parameter ) Code what you learn from your training . These numbers are used to calculate the token.
Untrained models start with random parameters . Training will find value that can bring better prediction .
These numbers are part of hundreds of matrices in the model . Prediction is mainly a lot of matrix multiplication .
stay YouTube Upper AI Introduction , Shows a simple with one parameter ML Model . A good start , To unlock this 175B The monster .
In order to clarify the distribution and use of these parameters , We need to open the model and look inside .
GPT3 by 2048 individual token. This is its “ Context window ”. That means it has 2048 Tracks , Process along these tracks token.
Let's follow the purple track . How does the system deal with “robotics” Word and produce “ A”?
step :
- Convert words to representative words Vector ( A list of numbers )
- Calculating predictions
- Convert the result vector into words
GPT3 The important calculation of takes place in its 96 individual Transformer Inside the stack of the decoder layer .
See all these layers ? This is a “ Deep learning ” Medium “ depth ”.
Each of these layers has its own 1.8B Parameters are calculated . That's it “ Magic ” Where it happened . This is a high-level view of the process :
Can be in the article The Illustrated GPT2 in See the detailed description of all contents inside the decoder .
And GPT3 The difference lies in the alternating density and Sparse self attention layer .
This is a GPT3 Input and response in (“Okay human”) Of X ray . Note that each token How to flow through the entire layer stack . We don't care about the output of the first word . When you're done typing , We began to care about output . We feed each word back into the model .
stay React Code generation example , The description will be an input prompt ( Use green to show ), I believe there are still a few description=> Code example .react The code will look like pink here token Generate one after another token.
My assumption is , Start the example and description as input , Use specific token Separate the examples from the results . Then input it into the model .
What impresses me is , It works like this . Because you just have to wait for GPT3 Fine tuning . The possibility will be even more amazing .
Fine tuning will actually update the weight of the model , To make the model perform better on some tasks .
reference :
边栏推荐
- Detailed explanation of uboot image generation process of Hisilicon chip (hi3516dv300)
- [2022 ciscn] replay of preliminary web topics
- 测试周期被压缩?教你9个方法去应对
- After 95, Alibaba P7 published the payroll: it's really fragrant to make up this
- Is the test cycle compressed? Teach you 9 ways to deal with it
- 1、 Go knowledge check and remedy + practical course notes youth training camp notes
- Bi she - college student part-time platform system based on SSM
- Redis data migration
- 即刻报名|飞桨黑客马拉松第三期等你挑战
- Codeforces Global Round 19
猜你喜欢
Idea add class annotation template and method template
Mutual conversion between InputStream, int, shot, long and byte arrays
[2022 ACTF]web题目复现
1、 Go knowledge check and remedy + practical course notes youth training camp notes
【斯坦福计网CS144项目】Lab3: TCPSender
[experience sharing] how to expand the cloud service icon for Visio
通信设备商,到底有哪些岗位?
Common validation comments
PHP exports millions of data
nacos
随机推荐
Few-Shot Learning && Meta Learning:小样本学习原理和Siamese网络结构(一)
微博发布案例
gslx680触摸屏驱动源码码分析(gslX680.c)
Why should we understand the trend of spot gold?
buuctf misc USB
Determining the full type of a variable
2、 Concurrent and test notes youth training camp notes
misc ez_usb
JSON introduction and JS parsing JSON
[GUET-CTF2019]虚假的压缩包
@component(““)
PHP exports millions of data
[OBS] win capture requires winrt
Technology cloud report: from robot to Cobot, human-computer integration is creating an era
科技云报道:从Robot到Cobot,人机共融正在开创一个时代
[mathematical notes] radian
直播平台源码,可折叠式菜单栏
[ANSYS] learning experience of APDL finite element analysis
Button wizard script learning - about tmall grabbing red envelopes
IPv4 exercises