当前位置:网站首页>ICML 2022 | explore the best architecture and training method of language model

ICML 2022 | explore the best architecture and training method of language model

2022-07-05 15:10:00 Zhiyuan community

This article introduces two articles published in ICML 2022 The paper of , Researchers are mainly from Google. Both papers are very practical analytical papers . It's different from the common papers' innovation in the model , Both papers are aimed at existing NLP The structure and training method of language model 、 Explore its advantages and disadvantages in different scenarios and summarize empirical rules .

 

Here, the author first collates the main experimental conclusions of the two papers : 

 

1. The first paper found that although encoder-decoder It has occupied the absolute mainstream of machine translation , But when the model parameters are large , Design language model reasonably LM It can be compared with the traditional encoder-decoder The performance of architecture for machine translation tasks is comparable ; And LM stay zero-shot scenario 、 Better performance in small language machine translation 、 In large language machine translation, it also has off-target Fewer benefits . 

 

2. The second paper found that I was not doing finetuning Under the circumstances ,Causal decoder LM framework +full language modeling Training in zero-shot The best performance in the task ; And there are multitasking prompt finetuning when , It is encoder-decoder framework +masked language modeling Training has the best zero-shot performance .

 

The paper 1:Examining Scaling and Transfer of Language Model Architectures for Machine Translation

link :https://arxiv.org/abs/2202.00528

 

The paper 2:What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?

link :https://arxiv.org/abs/2204.05832

原网站

版权声明
本文为[Zhiyuan community]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/186/202207051448474419.html