当前位置:网站首页>ICML 2022 | 探索语言模型的最佳架构和训练方法
ICML 2022 | 探索语言模型的最佳架构和训练方法
2022-07-05 14:48:00 【智源社区】
本文介绍两篇发表于 ICML 2022 的论文,研究者都主要来自于 Google。两篇论文都是很实践性的分析论文。和常见的论文在模型做创新不一样,两篇论文都是针对现有 NLP 语言模型的架构和训练方法、探索其在不同场景下的优劣并总结出经验规律。
在这里笔者优先整理一下两篇论文的主要实验结论:
1. 第一篇论文发现了虽然 encoder-decoder 占据了机器翻译的绝对主流,但在模型参数量较大时,合理地设计语言模型 LM 可以使其与传统的 encoder-decoder 架构做机器翻译任务的性能不相上下;且 LM 在 zero-shot 场景下、在小语种机器翻译上性能更好、在大语种机器翻译上也具有 off-target 更少的优点。
2. 第二篇论文发现在不做 finetuning 的情况下,Causal decoder LM 架构+full language modeling 训练在 zero-shot 任务上表现最好;而在有多任务 prompt finetuning 时,则是 encoder-decoder 架构+masked language modeling 训练有最好的 zero-shot 性能。
论文1:Examining Scaling and Transfer of Language Model Architectures for Machine Translation
链接:https://arxiv.org/abs/2202.00528
论文2:What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
边栏推荐
- Differences between IPv6 and IPv4 three departments including the office of network information technology promote IPv6 scale deployment
- 【数组和进阶指针经典笔试题12道】这些题,满足你对数组和指针的所有幻想,come on !
- 【NVMe2.0b 14-9】NVMe SR-IOV
- Interpretation of Apache linkage parameters in computing middleware
- P6183 [USACO10MAR] The Rock Game S
- Super wow fast row, you are worth learning!
- Microframe technology won the "cloud tripod Award" at the global Cloud Computing Conference!
- Dark horse programmer - software testing -10 stage 2-linux and database -44-57 why learn database, description of database classification relational database, description of Navicat operation data, de
- [JVM] operation instruction
- MongDB学习笔记
猜你喜欢
Security analysis of Web Architecture
超级哇塞的快排,你值得学会!
729. My schedule I: "simulation" & "line segment tree (dynamic open point) &" block + bit operation (bucket Division) "
Implement a blog system -- using template engine technology
leetcode:881. 救生艇
百亿按摩仪蓝海,难出巨头
Two Bi development, more than 3000 reports? How to do it?
安装配置Jenkins
两个BI开发,3000多张报表?如何做的到?
Dark horse programmer - software testing -10 stage 2-linux and database -44-57 why learn database, description of database classification relational database, description of Navicat operation data, de
随机推荐
[summary of leetcode weekly competition] the 81st fortnight competition of leetcode (6.25)
Run faster with go: use golang to serve machine learning
GPS原始坐标转百度地图坐标(纯C代码)
【华为机试真题详解】欢乐的周末
开挖财上的证券账户可以吗?安全吗?
选择排序和冒泡排序
FR练习题目---简单题
CPU design practice - Chapter 4 practice task 3 use pre delivery technology to solve conflicts caused by related issues
Photoshop plug-in action related concepts actionlist actiondescriptor actionlist action execution load call delete PS plug-in development
js亮瞎你眼的日期选择器
【华为机试真题详解】字符统计及重排
MySQL之CRUD
CPU设计实战-第四章实践任务二用阻塞技术解决相关引发的冲突
[detailed explanation of Huawei machine test] happy weekend
dynamic programming
市值蒸发超百亿美元,“全球IoT云平台第一股”赴港求生
Brief introduction of machine learning framework
Does maxcompute have SQL that can query the current storage capacity (KB) of the table?
Under the crisis of enterprise development, is digital transformation the future savior of enterprises
有一个强大又好看的,赛过Typora,阿里开发的语雀编辑器