当前位置:网站首页>ICML 2022 | 探索语言模型的最佳架构和训练方法
ICML 2022 | 探索语言模型的最佳架构和训练方法
2022-07-05 14:48:00 【智源社区】
本文介绍两篇发表于 ICML 2022 的论文,研究者都主要来自于 Google。两篇论文都是很实践性的分析论文。和常见的论文在模型做创新不一样,两篇论文都是针对现有 NLP 语言模型的架构和训练方法、探索其在不同场景下的优劣并总结出经验规律。
在这里笔者优先整理一下两篇论文的主要实验结论:
1. 第一篇论文发现了虽然 encoder-decoder 占据了机器翻译的绝对主流,但在模型参数量较大时,合理地设计语言模型 LM 可以使其与传统的 encoder-decoder 架构做机器翻译任务的性能不相上下;且 LM 在 zero-shot 场景下、在小语种机器翻译上性能更好、在大语种机器翻译上也具有 off-target 更少的优点。
2. 第二篇论文发现在不做 finetuning 的情况下,Causal decoder LM 架构+full language modeling 训练在 zero-shot 任务上表现最好;而在有多任务 prompt finetuning 时,则是 encoder-decoder 架构+masked language modeling 训练有最好的 zero-shot 性能。
论文1:Examining Scaling and Transfer of Language Model Architectures for Machine Translation
链接:https://arxiv.org/abs/2202.00528
论文2:What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
边栏推荐
- Topology可视化绘图引擎
- CyCa children's physical etiquette Ningbo training results assessment came to a successful conclusion
- Crud de MySQL
- be careful! Software supply chain security challenges continue to escalate
- Is it OK to open the securities account on the excavation finance? Is it safe?
- PHP - fatal error: allowed memory size of 314572800 bytes exhausted
- Anaconda uses China University of science and technology source
- js亮瞎你眼的日期选择器
- P1451 求细胞数量/1329:【例8.2】细胞
- Total amount analysis accounting method and potential method - allocation analysis
猜你喜欢
MongDB学习笔记
美团优选管理层变动:老将刘薇调岗,前阿里高管加盟
[JVM] operation instruction
qt creater断点调试程序详解
Talking about how dataset and dataloader call when loading data__ getitem__ () function
Penetration testing methodology
Behind the ultra clear image quality of NBA Live Broadcast: an in-depth interpretation of Alibaba cloud video cloud "narrowband HD 2.0" technology
P6183 [USACO10MAR] The Rock Game S
Share 20 strange JS expressions and see how many correct answers you can get
leetcode:881. 救生艇
随机推荐
[detailed explanation of Huawei machine test] character statistics and rearrangement
Structure - C language
危机重重下的企业发展,数字化转型到底是不是企业未来救星
申请代码签名证书时如何选择合适的证书品牌?
maxcompute有没有能查询 表当前存储容量的大小(kb) 的sql?
PyTorch二分类时BCELoss,CrossEntropyLoss,Sigmoid等的选择和使用
Security analysis of Web Architecture
【leetcode周赛总结】LeetCode第 81 场双周赛(6.25)
机器学习框架简述
mysql8.0JSON_ Instructions for using contains
CyCa children's physical etiquette Ningbo training results assessment came to a successful conclusion
华为哈勃化身硬科技IPO收割机
[recruitment position] Software Engineer (full stack) - public safety direction
超级哇塞的快排,你值得学会!
webRTC SDP mslabel lable
APR protocol and defense
Using tensorboard to visualize the training process in pytoch
Run faster with go: use golang to serve machine learning
12 MySQL interview questions that you must chew through to enter Alibaba
Longest common subsequence dynamic programming