当前位置:网站首页>ICML 2022 | explore the best architecture and training method of language model
ICML 2022 | explore the best architecture and training method of language model
2022-07-05 15:10:00 【Zhiyuan community】
This article introduces two articles published in ICML 2022 The paper of , Researchers are mainly from Google. Both papers are very practical analytical papers . It's different from the common papers' innovation in the model , Both papers are aimed at existing NLP The structure and training method of language model 、 Explore its advantages and disadvantages in different scenarios and summarize empirical rules .
Here, the author first collates the main experimental conclusions of the two papers :
1. The first paper found that although encoder-decoder It has occupied the absolute mainstream of machine translation , But when the model parameters are large , Design language model reasonably LM It can be compared with the traditional encoder-decoder The performance of architecture for machine translation tasks is comparable ; And LM stay zero-shot scenario 、 Better performance in small language machine translation 、 In large language machine translation, it also has off-target Fewer benefits .
2. The second paper found that I was not doing finetuning Under the circumstances ,Causal decoder LM framework +full language modeling Training in zero-shot The best performance in the task ; And there are multitasking prompt finetuning when , It is encoder-decoder framework +masked language modeling Training has the best zero-shot performance .
The paper 1:Examining Scaling and Transfer of Language Model Architectures for Machine Translation
link :https://arxiv.org/abs/2202.00528
The paper 2:What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
边栏推荐
- I want to inquire about how to ensure data consistency when a MySQL transaction updates multiple tables?
- Install PHP extension spoole
- The elimination strategy of redis
- Can I pass the PMP Exam in 20 days?
- PHP high concurrency and large traffic solution (PHP interview theory question)
- Shanghai under layoffs
- Interview shock 62: what are the precautions for group by?
- Behind the ultra clear image quality of NBA Live Broadcast: an in-depth interpretation of Alibaba cloud video cloud "narrowband HD 2.0" technology
- Ctfshow web entry command execution
- 华为哈勃化身硬科技IPO收割机
猜你喜欢
Differences between IPv6 and IPv4 three departments including the office of network information technology promote IPv6 scale deployment
Huawei Hubble incarnation hard technology IPO harvester
面试突击62:group by 有哪些注意事项?
IPv6与IPv4的区别 网信办等三部推进IPv6规模部署
Photoshop插件-动作相关概念-ActionList-ActionDescriptor-ActionList-动作执行加载调用删除-PS插件开发
美团优选管理层变动:老将刘薇调岗,前阿里高管加盟
Surpass palm! Peking University Master proposed diverse to comprehensively refresh the NLP reasoning ranking
当代人的水焦虑:好水究竟在哪里?
How can I quickly check whether there is an error after FreeSurfer runs Recon all—— Core command tail redirection
Fr exercise topic --- comprehensive question
随机推荐
Selection and use of bceloss, crossentropyloss, sigmoid, etc. in pytorch classification
Thymeleaf uses background custom tool classes to process text
我想咨询一下,mysql一个事务对于多张表的更新,怎么保证数据一致性的?
Common interview questions about swoole
Behind the ultra clear image quality of NBA Live Broadcast: an in-depth interpretation of Alibaba cloud video cloud "narrowband HD 2.0" technology
机器学习笔记 - 灰狼优化
Mysql---- function
Common redis data types and application scenarios
【華為機試真題詳解】歡樂的周末
B站做短视频,学抖音死,学YouTube生?
华为哈勃化身硬科技IPO收割机
Photoshop插件-动作相关概念-非加载执行动作文件中动作-PS插件开发
Change multiple file names with one click
Photoshop plug-in action related concepts actionlist actiondescriptor actionlist action execution load call delete PS plug-in development
Crud de MySQL
js亮瞎你眼的日期选择器
Mongdb learning notes
Au - delà du PARM! La maîtrise de l'Université de Pékin propose diverse pour actualiser complètement le classement du raisonnement du NLP
[recruitment position] infrastructure software developer
Drive brushless DC motor based on Ti drv10970