当前位置:网站首页>ICML 2022 | explore the best architecture and training method of language model
ICML 2022 | explore the best architecture and training method of language model
2022-07-05 15:10:00 【Zhiyuan community】
This article introduces two articles published in ICML 2022 The paper of , Researchers are mainly from Google. Both papers are very practical analytical papers . It's different from the common papers' innovation in the model , Both papers are aimed at existing NLP The structure and training method of language model 、 Explore its advantages and disadvantages in different scenarios and summarize empirical rules .
Here, the author first collates the main experimental conclusions of the two papers :
1. The first paper found that although encoder-decoder It has occupied the absolute mainstream of machine translation , But when the model parameters are large , Design language model reasonably LM It can be compared with the traditional encoder-decoder The performance of architecture for machine translation tasks is comparable ; And LM stay zero-shot scenario 、 Better performance in small language machine translation 、 In large language machine translation, it also has off-target Fewer benefits .
2. The second paper found that I was not doing finetuning Under the circumstances ,Causal decoder LM framework +full language modeling Training in zero-shot The best performance in the task ; And there are multitasking prompt finetuning when , It is encoder-decoder framework +masked language modeling Training has the best zero-shot performance .
The paper 1:Examining Scaling and Transfer of Language Model Architectures for Machine Translation
link :https://arxiv.org/abs/2202.00528
The paper 2:What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
边栏推荐
- 市值蒸发超百亿美元,“全球IoT云平台第一股”赴港求生
- Mongdb learning notes
- CODING DevSecOps 助力金融企业跑出数字加速度
- Live broadcast preview | how to implement Devops with automatic tools (welfare at the end of the article)
- Common redis data types and application scenarios
- Garbage collection mechanism of PHP (theoretical questions of PHP interview)
- Common MySQL interview questions
- 想问下大家伙,有无是从腾讯云MYSQL同步到其他地方的呀?腾讯云MySQL存到COS上的binlog
- Where is the operation of convertible bond renewal? Is it safer and more reliable to open an account
- Talk about your understanding of microservices (PHP interview theory question)
猜你喜欢
Garbage collection mechanism of PHP (theoretical questions of PHP interview)
IPv6与IPv4的区别 网信办等三部推进IPv6规模部署
Interpretation of Apache linkage parameters in computing middleware
How to paste the contents copied by the computer into mobaxterm? How to copy and paste
数据库学习——数据库安全性
Live broadcast preview | how to implement Devops with automatic tools (welfare at the end of the article)
Photoshop插件-动作相关概念-ActionList-ActionDescriptor-ActionList-动作执行加载调用删除-PS插件开发
Dark horse programmer - software testing -10 stage 2-linux and database -44-57 why learn database, description of database classification relational database, description of Navicat operation data, de
Differences between IPv6 and IPv4 three departments including the office of network information technology promote IPv6 scale deployment
Select sort and bubble sort
随机推荐
DVWA range clearance tutorial
js亮瞎你眼的日期选择器
Common PHP interview questions (1) (written PHP interview questions)
Select sort and bubble sort
webRTC SDP mslabel lable
"Sequelae" of the withdrawal of community group purchase from the city
[detailed explanation of Huawei machine test] character statistics and rearrangement
【华为机试真题详解】字符统计及重排
面试突击62:group by 有哪些注意事项?
华为哈勃化身硬科技IPO收割机
想问下大家伙,有无是从腾讯云MYSQL同步到其他地方的呀?腾讯云MySQL存到COS上的binlog
Super wow fast row, you are worth learning!
CPU设计实战-第四章实践任务二用阻塞技术解决相关引发的冲突
Handwriting promise and async await
sql server学习笔记
The difference between abstract classes and interfaces in PHP (PHP interview theory question)
Machine learning notes - gray wolf optimization
Reconnaissance des caractères easycr
Magic methods and usage in PHP (PHP interview theory questions)
Jmeter性能测试:ServerAgent资源监控