当前位置:网站首页>360, Tsinghua | zero and R2D2: a large-scale Chinese cross modal benchmark and visual language Framework

360, Tsinghua | zero and R2D2: a large-scale Chinese cross modal benchmark and visual language Framework

2022-06-10 11:21:00 Zhiyuan community

title :360、 tsinghua |Zero and R2D2: A Large-scale Chinese Cross-modal Benchmark and A Vision-Language Framework(Zero and R2D2: A large-scale Chinese cross modal benchmark and visual language framework )

author :Chunyu Xie, Heng Ca, Xiangyang Ji, Yafeng Deng etc.

brief introduction : Visual language pre training on large data sets (VLP) Has shown excellent performance on various downstream tasks . A complete and fair benchmark ( It includes large-scale pre training data sets and various downstream tasks ) about VLP crucial . However, English corpora have many benchmarks , But build a VLP With other languages ( Such as Chinese ) Rich benchmarking is still a key problem . So , The author constructs a large-scale Chinese cross model benchmark , be called Zero, To compare fairly VLP Model . The author released two pre training datasets and five fine tuning datasets for downstream tasks . besides , The author also proposes a novel pre training framework , That is, pre sort + Sort , For cross modal learning . say concretely , The author applies global contrast preorder to learn the single representation of image and text respectively . Then the author passes the image - Text interleaver fuses presentation encoder and text image interleaver in a fine-grained sorting way , Further improve the capability of the model , The authors put forward a two-way distillation strategy composed of target guided distillation and function guided distillation , Name it R2D2. The authors achieved state-of-the-art performance on four common cross modal datasets and five downstream datasets . When performing a zero sample task Flickr30k-CN、COCO-CN and MUGE、R2D2 stay 2.5 Pre training on billion data sets , Compared with the most advanced technology , The average recall rate has increased 4.7%、5.4% and 6.3%.

The code download :https://github.com/yuxie11/R2D2

Paper download :https://arxiv.org/pdf/2205.03860v3.pdf

原网站

版权声明
本文为[Zhiyuan community]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/161/202206101109468439.html