当前位置:网站首页>ICML 2022 | explore the best architecture and training method of language model
ICML 2022 | explore the best architecture and training method of language model
2022-07-05 15:10:00 【Zhiyuan community】
This article introduces two articles published in ICML 2022 The paper of , Researchers are mainly from Google. Both papers are very practical analytical papers . It's different from the common papers' innovation in the model , Both papers are aimed at existing NLP The structure and training method of language model 、 Explore its advantages and disadvantages in different scenarios and summarize empirical rules .
Here, the author first collates the main experimental conclusions of the two papers :
1. The first paper found that although encoder-decoder It has occupied the absolute mainstream of machine translation , But when the model parameters are large , Design language model reasonably LM It can be compared with the traditional encoder-decoder The performance of architecture for machine translation tasks is comparable ; And LM stay zero-shot scenario 、 Better performance in small language machine translation 、 In large language machine translation, it also has off-target Fewer benefits .
2. The second paper found that I was not doing finetuning Under the circumstances ,Causal decoder LM framework +full language modeling Training in zero-shot The best performance in the task ; And there are multitasking prompt finetuning when , It is encoder-decoder framework +masked language modeling Training has the best zero-shot performance .
The paper 1:Examining Scaling and Transfer of Language Model Architectures for Machine Translation
link :https://arxiv.org/abs/2202.00528
The paper 2:What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
边栏推荐
- Mysql---- function
- Reconnaissance des caractères easycr
- Interview shock 62: what are the precautions for group by?
- ICML 2022 | 探索语言模型的最佳架构和训练方法
- MongDB学习笔记
- Handwriting promise and async await
- Using tensorboard to visualize the training process in pytoch
- PHP high concurrency and large traffic solution (PHP interview theory question)
- Talking about how dataset and dataloader call when loading data__ getitem__ () function
- Redis distributed lock principle and its implementation with PHP (1)
猜你喜欢
Visual task scheduling & drag and drop | scalph data integration based on Apache seatunnel
P6183 [USACO10MAR] The Rock Game S
爱可可AI前沿推介(7.5)
一键更改多个文件名字
Microframe technology won the "cloud tripod Award" at the global Cloud Computing Conference!
DVWA range clearance tutorial
How can I quickly check whether there is an error after FreeSurfer runs Recon all—— Core command tail redirection
Au - delà du PARM! La maîtrise de l'Université de Pékin propose diverse pour actualiser complètement le classement du raisonnement du NLP
Dark horse programmer - software testing -10 stage 2-linux and database -44-57 why learn database, description of database classification relational database, description of Navicat operation data, de
[detailed explanation of Huawei machine test] character statistics and rearrangement
随机推荐
Brief introduction of machine learning framework
qt creater断点调试程序详解
CPU设计实战-第四章实践任务二用阻塞技术解决相关引发的冲突
What are CSRF, XSS, SQL injection, DDoS attack and timing attack respectively and how to prevent them (PHP interview theory question)
Photoshop plug-in - action related concepts - actions in non loaded execution action files - PS plug-in development
Implement a blog system -- using template engine technology
CPU设计实战-第四章实践任务三用前递技术解决相关引发的冲突
Anaconda uses China University of science and technology source
Run faster with go: use golang to serve machine learning
通过npm 或者 yarn安装依赖时 报错 出现乱码解决方式
Behind the ultra clear image quality of NBA Live Broadcast: an in-depth interpretation of Alibaba cloud video cloud "narrowband HD 2.0" technology
面试突击62:group by 有哪些注意事项?
长列表优化虚拟滚动
js亮瞎你眼的日期选择器
可视化任务编排&拖拉拽 | Scaleph 基于 Apache SeaTunnel的数据集成
Leetcode: Shortest Word Distance II
Common MySQL interview questions (1) (written MySQL interview questions)
Common PHP interview questions (1) (written PHP interview questions)
1330:【例8.3】最少步数
The elimination strategy of redis