当前位置：网站首页>Acl2021 best paper released, from ByteDance

Acl2021 best paper released, from ByteDance

2022-07-27 09:59:00 【51CTO】

️ Click on the above , Send you dry goods every day ️

ACL2021 The best paper of was published today , It's from ByteDance Artificial Intelligence Laboratory 「Vocabulary Learning via Optimal Transport for Neural Machine Translation」.

The experience of this paper is quite bumpy , It was finished at the beginning ICLR2021 after , Only got 4,4,3,3 The score （ Full marks 10 branch ）, The probability is to be rejected . Later, it was also carried out rebuttal, But because I want to switch ACL2021, So I withdrew my manuscript . After repeated retouching and modification , Improved methods and experiments , In the end in ACL2021 Got high marks on , And was rated as the best paper .

「ACL2021 Paper receiving list ：」
https://2021.aclweb.org/program/accept

「 Address of thesis ：」
https://arxiv.org/abs/2012.15671

「 Source code address ：」
https://github.com/Jingjing-NLP/VOLT

ByteDance artificial intelligence laboratory has achieved quite a lot this year , Previously, the industry's first open source NLP Model training and reasoning whole process acceleration engine LightSeq：https://github.com/bytedance/lightseq

It's open source TensorFlow Version of Transformer Training library NeurST：https://github.com/bytedance/neurst

Because the scores of the two meetings differ greatly , Zhihushang also immediately had a heated discussion , Several authors came out for a detailed interpretation , Let's move on to the answers of the two original authors .

「 Question link ：」
https://www.zhihu.com/question/470224094

Know user @WAZWY

https://www.zhihu.com/question/470224094/answer/1980448588

I am this. paper One of the authors of , A colleague in the company group just sent me a link to this question , I'm shocked that someone should pay so much attention to us paper, Hand speed is so fast , Thank you very much , The code is still being sorted , Welcome to use after finishing , I hope everyone can try VOLT, There must still be many deficiencies , Also welcome to give us more comments .

First of all, congratulations @ Xu Jingjing , It's not easy ！！！

Next, answer this question ： About from ICLR To ACL Conversion of , The situation was like this , We're casting ICLR When , Spend too much time on experiments , stay writing I don't spend enough time on it , Whole paper To put it plainly ,Intuition Didn't say , And some important experiments have not been supplemented . As a result, you can see , I think this is an important issue lesson, You are also welcome to compare our two versions of the paper ...

Take Away： But he that doeth good , Don't ask future . You should still work hard 360 Do a good job in all aspects , Do a solid job , Instead of finding a suitable ddl Just go to submit, Now? arxiv It's so convenient , Be satisfied with yourself arxiv that will do .

PS： Why withdraw the manuscript ICLR

This question is not accurate , We actually did rebuttal Of ,ICLR Of reviewer Gave very good advice , We respect and absorb . at that time ACL Have policies ICLR If you don't withdraw the manuscript within the specified time, you can't submit ACL, because open review It's against ACL The rules of . We specially wrote a letter to ask PC Confirmed , I withdrew my manuscript . But later on ACL Made policy adjustments very humanized , This is a later story .

PSS： Welcome to pay attention to another article by ICLR Rejected , Then it was ACL High scores paper：GLAT：Glancing Transformer for Non-Autoregressive Neural Machine Translation. at that time ICLR submission Here it is ：Non-iterative Parallel Text Generation via Glancing TransformerGLAT This paper Also very confident , It's a little bit RUSH, It leads to poor writing . In fact, the effect is very good ,

GLAT In our ByteDance internal volcano translation has been online ,Tiktok Part of the translation traffic on is GLAT serve Of . The bigger the data ,GLAT The better , We use it GLAT Participated in this year's WMT Translation evaluation , Big language German -> English （ be limited to ）, And English -> German （ Unrestricted ） In the competition ,GLAT Take it in both directions BLEU score First , Fully explain parallelism （ Non autoregressive ） The generative model is not necessarily worse than the autoregressive model , Maybe even better , Welcome to follow up ！

=======================

In the blink of an eye 5 After the first answer ： I personally disagree with the anonymous answer above ” Explain that no matter what work peer review Just touch the lottery “, Reviewed twice review The quality is very high , say review The answer is to touch the lottery ticket. At first glance, I haven't read the paper and review, A little irresponsible and misleading , Make some junior Of the students have a wrong understanding of the contribution ！ I hope to read the paper a little .

Know user @ Xu Jingjing

https://www.zhihu.com/question/470224094/answer/1980633745

Thank you for your attention to this work , I am Xu Jingjing, one of the authors of this work , He is also an ordinary melon eater in the natural language processing circle , I just didn't expect to eat my own melon this time orz. Here I would like to briefly share with you my answers to this question and the experience and lessons I learned in this submission .

First of all , The most important lesson I learned is to write things clearly . There is one saying. , We ICLR That work is really not well written . The feedback of the review is mainly in the following aspects ： The experiment is not enough , The introduction of the method is not clear enough , And there's no direct evidence of motivation . The following points , We are ACL A lot of improvements have been made to the versions . We added a lot of follow-up experiments , Writing also starts all over again , Over and over again to see if the logic is reasonable , Whether the experiment is rigorous and sufficient, etc , The whole process is very painful . So then we got ACL I'm very excited about the accreditation of , After all, a lot of hard work has finally paid off .

second , Don't rush to contribute . After we finished our work , I think it's quite interesting , To catch up with ICLR The deadline for , It was written in a hurry , There are various problems , The result is ICLR Our reviewer taught us how to be a man . One thing I learned after this submission is to be fully prepared before submitting , Otherwise, it will bring unnecessary pressure to the review and be taught by the review every minute .

Third , Negative opinions are not negatives , But an important source of progress . In fact, there are many precedents of rejected papers with high scores , For example, the best paper Lottery Ticket Hypothesis ,pre-training Originator ELMO,LayerNorm,KD wait . I don't want to say that our work can be comparable with them （ Of course, we also want to do work that can be really useful , These jobs have always been our role models ）, But I want you to look at this problem objectively . Many people may think that negative opinions are negatives of work , Actually, from another angle , Negative opinions are also an important force for our progress ～ Although it's very stressful to be talked about by everyone this time , But we are also very happy to let you think about negative opinions . When everyone's paper is rejected , Think about it Hinton All the papers were rejected , Will you become more confident !

Fourth ：NLP Conference papers are not necessarily better than ML Poor conference papers . There are many excellent papers in NLP Also got a high profit at the meeting , such as BERT,ELMO wait .ML There are also some forgotten jobs in the meeting . Recently, the number of papers in major conferences has indeed become more and more , Some of the most debilitating papers were accepted , But on the other hand , well paper Also changed more .NLP Your meeting is right NLP More attention ,ML The meeting of paid more attention to the algorithm . What we did at that time was the study of vocabulary , Maybe for ML People are a small problem , But for NLP Field , It's really something you use every day , May also recognize our work more .

Last , Make a small advertisement , In this work, we study the problem of vocabulary learning , Also found some interesting conclusions , We plan to open source the code in the near future , Welcome to try it out at that time ～ A big man said that research is a long-term thing , No matter how many honors you get in the short term , What matters is whether the things you make can stay . We also very much hope to do this kind of work ～

If you have any comments and suggestions on this work , Or confusion about revising the paper , Welcome to join me on wechat ：xujingjingpku

Finally, refute another article about NAS The problem of , We were NAS I was the first to vote NeurIPS, The submission time is 2020 year 5 month 27 Number , I missed and then voted ICLR, Recently accepted .without training The article is on arxiv The time is 2020 year 6 month 8 Number , So strictly speaking, it's simultaneous work ～

- END -

I am a godweiyang, Department of computer science, East China Normal University , Bytes to beat AI Lab NLP Algorithm engineer , Qiuzhao won three big internet factories in Shanghai ssp offer, The main research direction is machine translation 、 Syntactic parsing 、 Model compression and acceleration . The biggest characteristic is a good temper 、 thole , If you have any questions, you can always consult me , Whether it's technical or life .

Official account back office reply 【 push 】

You can send your resume through my twitter code , With my wechat, I can check the progress at any time 、 Ask questions .

Official account back office reply 【 Add group 】

You can join my technical exchange group

ACL2021 The best paper comes out , From byte skipping _ Deep learning