当前位置:网站首页>Figure out the working principle of gpt3
Figure out the working principle of gpt3
2022-07-07 07:48:00 【Chief prisoner】
The illustration GPT3 How it works
GPT3 The hype Caused an uproar in the scientific and technological circles . A large number of language models ( Such as GPT3) Our ability began to surprise us . Although most enterprises can't safely display these models in front of customers , But they are showing some clever sparks , These sparks will definitely accelerate the process of Automation , And promote the development of intelligent computer systems . Let's eliminate GPT3 Mysterious aura , Learn how it trains and works .
The trained language model generates text .
We can choose to pass some text to it as input , This will affect its output .
These outputs are generated by scanning a large amount of text by the model during training “ Acquire ” Something produced .
Training is the process of exposing the model to a large amount of text . The process has been completed . All the experiments you see now come from the trained model . It is estimated that , It costs 355 Year of GPU Time , cost 460 Thousands of dollars .
3000 100 million texts token The data set of is used to generate training samples of the model . for example , These are three training samples generated from a sentence at the top .
You will see how to slide the window over all the text and generate many samples .
The model comes with an example . We only show it the characteristics , Then let it predict the next word .
The prediction of this model will be wrong . We calculate the error in the prediction and update the model , To make better predictions next time .
Repeat millions of times
Now? , Let's look at these same steps in more detail .
GPT3 In fact, one output is generated at a time token( Now suppose a token It's a word ).
Please note that : That's right GPT-3 Description of the working mode of , Not about it GPT-3 Discussion of novelty ( It's mainly the ridiculous scale ). The architecture is based on this article https://arxiv.org/pdf/1801.10198.pdf Of Transformer Decoder model
GPT3 It's huge . It's right from 1750 An digital ( It's called a parameter ) Code what you learn from your training . These numbers are used to calculate the token.
Untrained models start with random parameters . Training will find value that can bring better prediction .
These numbers are part of hundreds of matrices in the model . Prediction is mainly a lot of matrix multiplication .
stay YouTube Upper AI Introduction , Shows a simple with one parameter ML Model . A good start , To unlock this 175B The monster .
In order to clarify the distribution and use of these parameters , We need to open the model and look inside .
GPT3 by 2048 individual token. This is its “ Context window ”. That means it has 2048 Tracks , Process along these tracks token.
Let's follow the purple track . How does the system deal with “robotics” Word and produce “ A”?
step :
- Convert words to representative words Vector ( A list of numbers )
- Calculating predictions
- Convert the result vector into words
GPT3 The important calculation of takes place in its 96 individual Transformer Inside the stack of the decoder layer .
See all these layers ? This is a “ Deep learning ” Medium “ depth ”.
Each of these layers has its own 1.8B Parameters are calculated . That's it “ Magic ” Where it happened . This is a high-level view of the process :
Can be in the article The Illustrated GPT2 in See the detailed description of all contents inside the decoder .
And GPT3 The difference lies in the alternating density and Sparse self attention layer .
This is a GPT3 Input and response in (“Okay human”) Of X ray . Note that each token How to flow through the entire layer stack . We don't care about the output of the first word . When you're done typing , We began to care about output . We feed each word back into the model .
stay React Code generation example , The description will be an input prompt ( Use green to show ), I believe there are still a few description=> Code example .react The code will look like pink here token Generate one after another token.
My assumption is , Start the example and description as input , Use specific token Separate the examples from the results . Then input it into the model .
What impresses me is , It works like this . Because you just have to wait for GPT3 Fine tuning . The possibility will be even more amazing .
Fine tuning will actually update the weight of the model , To make the model perform better on some tasks .
reference :
边栏推荐
- L'externalisation a duré trois ans.
- About some details of final, I have something to say - learn about final CSDN creation clock out from the memory model
- SQL优化的魅力!从 30248s 到 0.001s
- Rust Versus Go(哪种是我的首选语言?)
- Common validation comments
- Tencent's one-day life
- 解决could not find or load the Qt platform plugin “xcb“in ““.
- 通信设备商,到底有哪些岗位?
- 今日现货白银操作建议
- Flutter riverpod is comprehensively and deeply analyzed. Why is it officially recommended?
猜你喜欢
Common method signatures and meanings of Iterable, collection and list
1140_ SiCp learning notes_ Use Newton's method to solve the square root
After 95, the CV engineer posted the payroll and made up this. It's really fragrant
How can a 35 year old programmer build a technological moat?
测试周期被压缩?教你9个方法去应对
buuctf misc USB
【webrtc】m98 screen和window采集
Talk about seven ways to realize asynchronous programming
[UTCTF2020]file header
1141_ SiCp learning notes_ Functions abstracted as black boxes
随机推荐
Leetcode-226. Invert Binary Tree
Weibo publishing cases
Info | webrtc M97 update
JSON introduction and JS parsing JSON
idea添加类注释模板和方法模板
测试周期被压缩?教你9个方法去应对
Regular e-commerce problems part1
leetcode:105. 从前序与中序遍历序列构造二叉树
Detailed explanation of uboot image generation process of Hisilicon chip (hi3516dv300)
pytest+allure+jenkins環境--填坑完畢
A bit of knowledge - about Apple Certified MFI
IO stream file
Sign up now | oar hacker marathon phase III, waiting for your challenge
[UTCTF2020]file header
电商常规问题part1
Rust Versus Go(哪种是我的首选语言?)
Tencent's one-day life
pytorch 参数初始化
JS get all date or time stamps between two time stamps
[SUCTF 2019]Game