当前位置:网站首页>ICML 2022 | 探索语言模型的最佳架构和训练方法
ICML 2022 | 探索语言模型的最佳架构和训练方法
2022-07-05 14:48:00 【智源社区】
本文介绍两篇发表于 ICML 2022 的论文,研究者都主要来自于 Google。两篇论文都是很实践性的分析论文。和常见的论文在模型做创新不一样,两篇论文都是针对现有 NLP 语言模型的架构和训练方法、探索其在不同场景下的优劣并总结出经验规律。
在这里笔者优先整理一下两篇论文的主要实验结论:
1. 第一篇论文发现了虽然 encoder-decoder 占据了机器翻译的绝对主流,但在模型参数量较大时,合理地设计语言模型 LM 可以使其与传统的 encoder-decoder 架构做机器翻译任务的性能不相上下;且 LM 在 zero-shot 场景下、在小语种机器翻译上性能更好、在大语种机器翻译上也具有 off-target 更少的优点。
2. 第二篇论文发现在不做 finetuning 的情况下,Causal decoder LM 架构+full language modeling 训练在 zero-shot 任务上表现最好;而在有多任务 prompt finetuning 时,则是 encoder-decoder 架构+masked language modeling 训练有最好的 zero-shot 性能。
论文1:Examining Scaling and Transfer of Language Model Architectures for Machine Translation
链接:https://arxiv.org/abs/2202.00528
论文2:What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
边栏推荐
- [12 classic written questions of array and advanced pointer] these questions meet all your illusions about array and pointer, come on!
- CPU design practice - Chapter 4 practice task 3 use pre delivery technology to solve conflicts caused by related issues
- Penetration testing methodology
- mysql8.0JSON_ Instructions for using contains
- 【招聘岗位】基础设施软件开发人员
- Microframe technology won the "cloud tripod Award" at the global Cloud Computing Conference!
- TS所有dom元素的类型声明
- 想问下大家伙,有无是从腾讯云MYSQL同步到其他地方的呀?腾讯云MySQL存到COS上的binlog
- 裁员下的上海
- CODING DevSecOps 助力金融企业跑出数字加速度
猜你喜欢

Select sort and bubble sort

安装配置Jenkins

美团优选管理层变动:老将刘薇调岗,前阿里高管加盟
![[JVM] operation instruction](/img/f5/85580495474ef58eafbb421338e93f.png)
[JVM] operation instruction

There is a powerful and good-looking language bird editor, which is better than typora and developed by Alibaba

leetcode:881. lifeboat

Section - left closed right open

Under the crisis of enterprise development, is digital transformation the future savior of enterprises

FR练习题目---简单题

浅谈Dataset和Dataloader在加载数据时如何调用到__getitem__()函数
随机推荐
1330:【例8.3】最少步数
Topology visual drawing engine
CPU设计实战-第四章实践任务二用阻塞技术解决相关引发的冲突
be careful! Software supply chain security challenges continue to escalate
Penetration testing methodology
Matrix chain multiplication dynamic programming example
MySQL之CRUD
CPU design practice - Chapter 4 practical task 2 using blocking technology to solve conflicts caused by related problems
Long list optimized virtual scrolling
Two policemen were shot dead in a "safety accident" in Philadelphia, USA
Photoshop插件-动作相关概念-非加载执行动作文件中动作-PS插件开发
可视化任务编排&拖拉拽 | Scaleph 基于 Apache SeaTunnel的数据集成
How to paste the contents copied by the computer into mobaxterm? How to copy and paste
Differences between IPv6 and IPv4 three departments including the office of network information technology promote IPv6 scale deployment
你童年的快乐,都是被它承包了
微帧科技荣获全球云计算大会“云鼎奖”!
P1451 求细胞数量/1329:【例8.2】细胞
Selection and use of bceloss, crossentropyloss, sigmoid, etc. in pytorch classification
【招聘岗位】基础设施软件开发人员
Total amount analysis accounting method and potential method - allocation analysis