当前位置:网站首页>ICML 2022 | explore the best architecture and training method of language model
ICML 2022 | explore the best architecture and training method of language model
2022-07-05 15:10:00 【Zhiyuan community】
This article introduces two articles published in ICML 2022 The paper of , Researchers are mainly from Google. Both papers are very practical analytical papers . It's different from the common papers' innovation in the model , Both papers are aimed at existing NLP The structure and training method of language model 、 Explore its advantages and disadvantages in different scenarios and summarize empirical rules .
Here, the author first collates the main experimental conclusions of the two papers :
1. The first paper found that although encoder-decoder It has occupied the absolute mainstream of machine translation , But when the model parameters are large , Design language model reasonably LM It can be compared with the traditional encoder-decoder The performance of architecture for machine translation tasks is comparable ; And LM stay zero-shot scenario 、 Better performance in small language machine translation 、 In large language machine translation, it also has off-target Fewer benefits .
2. The second paper found that I was not doing finetuning Under the circumstances ,Causal decoder LM framework +full language modeling Training in zero-shot The best performance in the task ; And there are multitasking prompt finetuning when , It is encoder-decoder framework +masked language modeling Training has the best zero-shot performance .
The paper 1:Examining Scaling and Transfer of Language Model Architectures for Machine Translation
link :https://arxiv.org/abs/2202.00528
The paper 2:What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
边栏推荐
- Brief introduction of machine learning framework
- 【華為機試真題詳解】歡樂的周末
- 做研究无人咨询、与学生不交心,UNC助理教授两年教职挣扎史
- CPU设计实战-第四章实践任务三用前递技术解决相关引发的冲突
- Reasons and solutions for redis cache penetration and cache avalanche
- How to solve the problem of garbled code when installing dependency through NPM or yarn
- STM32+BH1750光敏传感器获取光照强度
- I collect multiple Oracle tables at the same time. After collecting for a while, I will report that Oracle's OGA memory is exceeded. Have you encountered it?
- Install PHP extension spoole
- Does maxcompute have SQL that can query the current storage capacity (KB) of the table?
猜你喜欢

729. 我的日程安排表 I :「模拟」&「线段树(动态开点)」&「分块 + 位运算(分桶)」

Photoshop plug-in action related concepts actionlist actiondescriptor actionlist action execution load call delete PS plug-in development

B站做短视频,学抖音死,学YouTube生?

Thymeleaf uses background custom tool classes to process text

可视化任务编排&拖拉拽 | Scaleph 基于 Apache SeaTunnel的数据集成

Differences between IPv6 and IPv4 three departments including the office of network information technology promote IPv6 scale deployment

12 MySQL interview questions that you must chew through to enter Alibaba

Fr exercise topic - simple question

Ctfshow web entry explosion

Ten billion massage machine blue ocean, difficult to be a giant
随机推荐
729. 我的日程安排表 I :「模拟」&「线段树(动态开点)」&「分块 + 位运算(分桶)」
[detailed explanation of Huawei machine test] happy weekend
Photoshop插件-动作相关概念-ActionList-ActionDescriptor-ActionList-动作执行加载调用删除-PS插件开发
长列表优化虚拟滚动
Ctfshow web entry command execution
Cartoon: programmers don't repair computers!
Detailed explanation of QT creator breakpoint debugger
CPU设计实战-第四章实践任务二用阻塞技术解决相关引发的冲突
1330:【例8.3】最少步数
Select sort and bubble sort
百亿按摩仪蓝海,难出巨头
easyOCR 字符识别
CPU design practice - Chapter 4 practice task 3 use pre delivery technology to solve conflicts caused by related issues
Want to ask the big guy, is there any synchronization from Tencent cloud Mysql to other places? Binlog saved by Tencent cloud MySQL on cos
12 MySQL interview questions that you must chew through to enter Alibaba
我这边同时采集多个oracle表,采集一会以后,会报oracle的oga内存超出,大家有没有遇到的?
[JVM] operation instruction
爱可可AI前沿推介(7.5)
"Sequelae" of the withdrawal of community group purchase from the city
js亮瞎你眼的日期选择器