tsinghua & Bytes jointly proposed DA-Transformer It gets rid of the problem that traditional parallel models rely on knowledge distillation , It has greatly surpassed all previous parallel generation models in translation tasks , The highest increase 4.57 BLEU. At the same time, for the first time 、 Even beyond autoregression Transformer Performance of , At the highest level 0.6 BLEU At the same time , Can reduce 7 Times the decoding delay .





![[unity introduction plan] interface Introduction (2) -games view & hierarchy & Project & Inspector](/img/b0/ef084f4391795a1369b0f46a2de8d6.png)




