当前位置:网站首页>Figure out the working principle of gpt3
Figure out the working principle of gpt3
2022-07-07 07:48:00 【Chief prisoner】
The illustration GPT3 How it works
GPT3 The hype Caused an uproar in the scientific and technological circles . A large number of language models ( Such as GPT3) Our ability began to surprise us . Although most enterprises can't safely display these models in front of customers , But they are showing some clever sparks , These sparks will definitely accelerate the process of Automation , And promote the development of intelligent computer systems . Let's eliminate GPT3 Mysterious aura , Learn how it trains and works .
The trained language model generates text .
We can choose to pass some text to it as input , This will affect its output .
These outputs are generated by scanning a large amount of text by the model during training “ Acquire ” Something produced .
Training is the process of exposing the model to a large amount of text . The process has been completed . All the experiments you see now come from the trained model . It is estimated that , It costs 355 Year of GPU Time , cost 460 Thousands of dollars .
3000 100 million texts token The data set of is used to generate training samples of the model . for example , These are three training samples generated from a sentence at the top .
You will see how to slide the window over all the text and generate many samples .
The model comes with an example . We only show it the characteristics , Then let it predict the next word .
The prediction of this model will be wrong . We calculate the error in the prediction and update the model , To make better predictions next time .
Repeat millions of times
Now? , Let's look at these same steps in more detail .
GPT3 In fact, one output is generated at a time token( Now suppose a token It's a word ).
Please note that : That's right GPT-3 Description of the working mode of , Not about it GPT-3 Discussion of novelty ( It's mainly the ridiculous scale ). The architecture is based on this article https://arxiv.org/pdf/1801.10198.pdf Of Transformer Decoder model
GPT3 It's huge . It's right from 1750 An digital ( It's called a parameter ) Code what you learn from your training . These numbers are used to calculate the token.
Untrained models start with random parameters . Training will find value that can bring better prediction .
These numbers are part of hundreds of matrices in the model . Prediction is mainly a lot of matrix multiplication .
stay YouTube Upper AI Introduction , Shows a simple with one parameter ML Model . A good start , To unlock this 175B The monster .
In order to clarify the distribution and use of these parameters , We need to open the model and look inside .
GPT3 by 2048 individual token. This is its “ Context window ”. That means it has 2048 Tracks , Process along these tracks token.
Let's follow the purple track . How does the system deal with “robotics” Word and produce “ A”?
step :
- Convert words to representative words Vector ( A list of numbers )
- Calculating predictions
- Convert the result vector into words
GPT3 The important calculation of takes place in its 96 individual Transformer Inside the stack of the decoder layer .
See all these layers ? This is a “ Deep learning ” Medium “ depth ”.
Each of these layers has its own 1.8B Parameters are calculated . That's it “ Magic ” Where it happened . This is a high-level view of the process :
Can be in the article The Illustrated GPT2 in See the detailed description of all contents inside the decoder .
And GPT3 The difference lies in the alternating density and Sparse self attention layer .
This is a GPT3 Input and response in (“Okay human”) Of X ray . Note that each token How to flow through the entire layer stack . We don't care about the output of the first word . When you're done typing , We began to care about output . We feed each word back into the model .
stay React Code generation example , The description will be an input prompt ( Use green to show ), I believe there are still a few description=> Code example .react The code will look like pink here token Generate one after another token.
My assumption is , Start the example and description as input , Use specific token Separate the examples from the results . Then input it into the model .
What impresses me is , It works like this . Because you just have to wait for GPT3 Fine tuning . The possibility will be even more amazing .
Fine tuning will actually update the weight of the model , To make the model perform better on some tasks .
reference :
边栏推荐
- resource 创建包方式
- JSON introduction and JS parsing JSON
- Mysql高低版本切换需要修改的配置5-8(此处以aicode为例)
- [UTCTF2020]file header
- [Linux] process control and parent-child processes
- leanote私有云笔记搭建
- 微信小程序中使用wx.showToast()进行界面交互
- [SUCTF 2019]Game
- 电商常规问题part1
- [performance pressure test] how to do a good job of performance pressure test?
猜你喜欢
Is the test cycle compressed? Teach you 9 ways to deal with it
[P2P] local packet capturing
The annual salary of general test is 15W, and the annual salary of test and development is 30w+. What is the difference between the two?
How can a 35 year old programmer build a technological moat?
misc ez_usb
buuctf misc USB
L'externalisation a duré trois ans.
@component(““)
Common method signatures and meanings of Iterable, collection and list
2022-07-06: will the following go language codes be panic? A: Meeting; B: No. package main import “C“ func main() { var ch chan struct
随机推荐
Invalid table alias or column reference`xxx`
[SUCTF 2019]Game
[2022 ACTF]web题目复现
直播平台源码,可折叠式菜单栏
[2022 ciscn] replay of preliminary web topics
CentOS7下安装PostgreSQL11数据库
UWB learning 1
[webrtc] m98 Screen and Window Collection
Robot technology innovation and practice old version outline
gslx680触摸屏驱动源码码分析(gslX680.c)
Pytest+allure+jenkins environment -- completion of pit filling
Detailed explanation of uboot image generation process of Hisilicon chip (hi3516dv300)
@component(““)
Leetcode sword finger offer brush questions - day 20
【webrtc】m98 screen和window采集
php导出百万数据
numpy中dot函数使用与解析
A bit of knowledge - about Apple Certified MFI
Rxjs - observable doesn't complete when an error occurs - rxjs - observable doesn't complete when an error occurs
【webrtc】m98 screen和window采集