当前位置:网站首页>Figure out the working principle of gpt3
Figure out the working principle of gpt3
2022-07-07 07:48:00 【Chief prisoner】
The illustration GPT3 How it works
GPT3 The hype Caused an uproar in the scientific and technological circles . A large number of language models ( Such as GPT3) Our ability began to surprise us . Although most enterprises can't safely display these models in front of customers , But they are showing some clever sparks , These sparks will definitely accelerate the process of Automation , And promote the development of intelligent computer systems . Let's eliminate GPT3 Mysterious aura , Learn how it trains and works .
The trained language model generates text .
We can choose to pass some text to it as input , This will affect its output .
These outputs are generated by scanning a large amount of text by the model during training “ Acquire ” Something produced .
Training is the process of exposing the model to a large amount of text . The process has been completed . All the experiments you see now come from the trained model . It is estimated that , It costs 355 Year of GPU Time , cost 460 Thousands of dollars .
3000 100 million texts token The data set of is used to generate training samples of the model . for example , These are three training samples generated from a sentence at the top .
You will see how to slide the window over all the text and generate many samples .
The model comes with an example . We only show it the characteristics , Then let it predict the next word .
The prediction of this model will be wrong . We calculate the error in the prediction and update the model , To make better predictions next time .
Repeat millions of times
Now? , Let's look at these same steps in more detail .
GPT3 In fact, one output is generated at a time token( Now suppose a token It's a word ).
Please note that : That's right GPT-3 Description of the working mode of , Not about it GPT-3 Discussion of novelty ( It's mainly the ridiculous scale ). The architecture is based on this article https://arxiv.org/pdf/1801.10198.pdf Of Transformer Decoder model
GPT3 It's huge . It's right from 1750 An digital ( It's called a parameter ) Code what you learn from your training . These numbers are used to calculate the token.
Untrained models start with random parameters . Training will find value that can bring better prediction .
These numbers are part of hundreds of matrices in the model . Prediction is mainly a lot of matrix multiplication .
stay YouTube Upper AI Introduction , Shows a simple with one parameter ML Model . A good start , To unlock this 175B The monster .
In order to clarify the distribution and use of these parameters , We need to open the model and look inside .
GPT3 by 2048 individual token. This is its “ Context window ”. That means it has 2048 Tracks , Process along these tracks token.
Let's follow the purple track . How does the system deal with “robotics” Word and produce “ A”?
step :
- Convert words to representative words Vector ( A list of numbers )
- Calculating predictions
- Convert the result vector into words
GPT3 The important calculation of takes place in its 96 individual Transformer Inside the stack of the decoder layer .
See all these layers ? This is a “ Deep learning ” Medium “ depth ”.
Each of these layers has its own 1.8B Parameters are calculated . That's it “ Magic ” Where it happened . This is a high-level view of the process :
Can be in the article The Illustrated GPT2 in See the detailed description of all contents inside the decoder .
And GPT3 The difference lies in the alternating density and Sparse self attention layer .
This is a GPT3 Input and response in (“Okay human”) Of X ray . Note that each token How to flow through the entire layer stack . We don't care about the output of the first word . When you're done typing , We began to care about output . We feed each word back into the model .
stay React Code generation example , The description will be an input prompt ( Use green to show ), I believe there are still a few description=> Code example .react The code will look like pink here token Generate one after another token.
My assumption is , Start the example and description as input , Use specific token Separate the examples from the results . Then input it into the model .
What impresses me is , It works like this . Because you just have to wait for GPT3 Fine tuning . The possibility will be even more amazing .
Fine tuning will actually update the weight of the model , To make the model perform better on some tasks .
reference :
边栏推荐
- [2022 CISCN]初赛 web题目复现
- Asemi rectifier bridge rs210 parameters, rs210 specifications, rs210 package
- Talk about seven ways to realize asynchronous programming
- Outsourcing for four years, abandoned
- What is the interval in gatk4??
- buuctf misc USB
- 海思芯片(hi3516dv300)uboot镜像生成过程详解
- Is the test cycle compressed? Teach you 9 ways to deal with it
- A bit of knowledge - about Apple Certified MFI
- vus. Precautions for SSR requesting data in asyndata function
猜你喜欢
IPv4 exercises
为什么要了解现货黄金走势?
@component(““)
The configuration that needs to be modified when switching between high and low versions of MySQL 5-8 (take aicode as an example here)
今日现货白银操作建议
[UTCTF2020]file header
[SUCTF 2019]Game
My ideal software tester development status
Tencent's one-day life
[UTCTF2020]file header
随机推荐
Is the test cycle compressed? Teach you 9 ways to deal with it
[guess-ctf2019] fake compressed packets
Common method signatures and meanings of Iterable, collection and list
[experience sharing] how to expand the cloud service icon for Visio
Rust Versus Go(哪种是我的首选语言?)
PHP exports millions of data
buuctf misc USB
基于Flask搭建个人网站
【数学笔记】弧度
按键精灵采集学习-矿药采集及跑图
Weibo publishing cases
Make a bat file for cleaning system garbage
[SUCTF 2019]Game
Tencent's one-day life
Why is the row of SQL_ The ranking returned by number is 1
解决:Could NOT find KF5 (missing: CoreAddons DBusAddons DocTools XmlGui)
Solve could not find or load the QT platform plugin "xcb" in "
Pytorch parameter initialization
直播平台源码,可折叠式菜单栏
IO流 file