当前位置:网站首页>ICML 2022 | 探索语言模型的最佳架构和训练方法
ICML 2022 | 探索语言模型的最佳架构和训练方法
2022-07-05 14:48:00 【智源社区】
本文介绍两篇发表于 ICML 2022 的论文,研究者都主要来自于 Google。两篇论文都是很实践性的分析论文。和常见的论文在模型做创新不一样,两篇论文都是针对现有 NLP 语言模型的架构和训练方法、探索其在不同场景下的优劣并总结出经验规律。
在这里笔者优先整理一下两篇论文的主要实验结论:
1. 第一篇论文发现了虽然 encoder-decoder 占据了机器翻译的绝对主流,但在模型参数量较大时,合理地设计语言模型 LM 可以使其与传统的 encoder-decoder 架构做机器翻译任务的性能不相上下;且 LM 在 zero-shot 场景下、在小语种机器翻译上性能更好、在大语种机器翻译上也具有 off-target 更少的优点。
2. 第二篇论文发现在不做 finetuning 的情况下,Causal decoder LM 架构+full language modeling 训练在 zero-shot 任务上表现最好;而在有多任务 prompt finetuning 时,则是 encoder-decoder 架构+masked language modeling 训练有最好的 zero-shot 性能。
论文1:Examining Scaling and Transfer of Language Model Architectures for Machine Translation
链接:https://arxiv.org/abs/2202.00528
论文2:What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
边栏推荐
- Photoshop插件-动作相关概念-非加载执行动作文件中动作-PS插件开发
- 如何将电脑复制的内容粘贴进MobaXterm?如何复制粘贴
- 机器学习框架简述
- Leetcode: Shortest Word Distance II
- Drive brushless DC motor based on Ti drv10970
- 两个BI开发,3000多张报表?如何做的到?
- Postgresql 13 安装
- Photoshop plug-in - action related concepts - actions in non loaded execution action files - PS plug-in development
- Change multiple file names with one click
- 【NVMe2.0b 14-9】NVMe SR-IOV
猜你喜欢
【NVMe2.0b 14-9】NVMe SR-IOV
Topology可视化绘图引擎
Mysql---- function
MongDB学习笔记
P6183 [USACO10MAR] The Rock Game S
Fr exercise topic - simple question
【NVMe2.0b 14-9】NVMe SR-IOV
[12 classic written questions of array and advanced pointer] these questions meet all your illusions about array and pointer, come on!
729. 我的日程安排表 I :「模拟」&「线段树(动态开点)」&「分块 + 位运算(分桶)」
用 Go 跑的更快:使用 Golang 为机器学习服务
随机推荐
CPU design practice - Chapter 4 practice task 3 use pre delivery technology to solve conflicts caused by related issues
1330:【例8.3】最少步数
开挖财上的证券账户可以吗?安全吗?
GPS original coordinates to Baidu map coordinates (pure C code)
Mysql---- function
面试突击62:group by 有哪些注意事项?
Un week - end heureux
Live broadcast preview | how to implement Devops with automatic tools (welfare at the end of the article)
What about SSL certificate errors? Solutions to common SSL certificate errors in browsers
useMemo,memo,useRef等相关hooks详解
CPU design practice - Chapter 4 practical task 2 using blocking technology to solve conflicts caused by related problems
leetcode:881. lifeboat
Coding devsecops helps financial enterprises run out of digital acceleration
Easyocr character recognition
漫画:程序员不是修电脑的!
Strong connection component
P1451 求细胞数量/1329:【例8.2】细胞
Selection and use of bceloss, crossentropyloss, sigmoid, etc. in pytorch classification
启牛证券账户怎么开通,开户安全吗?
How to open an account of qiniu securities? Is it safe to open an account?