当前位置:网站首页>[multimode] unimo
[multimode] unimo
2022-06-23 04:35:00 【joyce_ peng】
One 、unimo
1、 advantage : Training data contains text 、 Images 、 Data training with pictures and texts , Not limited to picture and text pairs

2、 Strategies and models
(1) Text rewriting (Text Rewriting): In order to enhance the semantic alignment ability of text and text in multiple granularity , The text description of image is changed from sentence level 、 Phrase level and vocabulary level are rewritten .
At the sentence level , be based on Back translation (Back Translation, That is, one sentence machinetranslation model is translated into many other languages , Translate it back , Using the ability of machinetranslation model to get other forms of sentences with the same meaning without changing the original intention of the sentences ) To get multiple positive example texts of a picture .
further , Using the characteristics of discrete symbols in natural language , be based on TF-IDF Similarity retrieval can get more literal words with high repetition rate , But sentences with different meanings are strong negative samples of a picture .
At the phrase and vocabulary levels , First Parse the text into a scene graph , Then randomly replace the objects (object)、 attribute (attribute) And relationship (relation) And their combination , Strong negative examples of these two granularities are obtained .

(2) Images / Text retrieval (Image and Text Retrieval): In order to integrate more single-mode knowledge in cross modal learning , The picture and text pair information will be further enhanced and enriched by the background knowledge retrieved from large-scale single-mode data . This part of the retrieved data will form a weak correlation pair with another modal data in the picture and text pair to join the comparative learning .
(3) Visual and textual learning 

3、 experiment
Pre training data section , Text corpus includes Wikipedia、BookCorpus、OpenWebText Equivalent corpora ; Image data is crawled from the Internet 300K Images ; The multi-mode picture and text pair data includes COCO Caption、Visual Genome、Conceptual Caption、SBU Caption.
Downstream tasks include both visual Q & A 、 Figure description generation 、 Multimode tasks such as visual inference , It also includes text classification 、 Text in this paper, 、 Various text tasks such as problem generation .
The results on multi-mode tasks are very bright , All major tasks are SOTA, In particular, it has great advantages in the retrieval task . From the Case Show Look at ,UNIMO It really performs better in accurately understanding and capturing details .
surface 1: Multimodal downstream task evaluation results . surface 2: Single mode downstream task evaluation results .
As shown in the table 1 Shown , The author will UNIMO and ViLBERT、VLP、UNITER、Oscar、Villa、ERNIE-ViL The multimodal pre training models are compared , It turns out that ,UNIMO On the whole, the best results have been achieved . As shown in the table 2 Shown ,UNIMO In language understanding and generation tasks BERT、RoBERTa、XLNet and UniLM The pre training model has better or equivalent performance .UNIMO Not only has he achieved the best results in multimodal tasks , And we have also achieved good results in single-mode tasks , This proves the superiority of the unified modal architecture .
reference:https://mp.weixin.qq.com/s/7NYe59gKu6-js32tfy4xBw
边栏推荐
- 浅析2022年物联网现状
- How MySQL deletes a row of data in a table
- [从零开始学习FPGA编程-40]:进阶篇 - 设计-竞争与风险Risk或冒险
- What is the APM tool skywalking
- leetcode 91. Decode ways (medium)
- QMainWindow
- Prince language under insect date category
- PTA: Simulation Implementation of 7-87 set (class template)
- Bug STM32 advanced timer (haha, to tell you the truth, the hardware timer can't reflect its strength. In fact, I want to send the kernel timer. Just think about it. Take your time)
- How e-commerce makes use of small programs
猜你喜欢

城链科技董事长肖金伟:践行数据经济系国家战略,引领数字时代新消费发展!

在线文本过滤小于指定长度工具

Differences between MyISAM and InnoDB of MySQL storage engine

在线JSON转CSharp(C#)Class工具

【深度学习】深度学习推理框架 TensorRT MNN OpenVINO ONNXRuntime
![[deep learning] deep learning reasoning framework tensorrt MNN openvino onnxruntime](/img/a9/11bc00a91b79358f28ada2d4c99f32.png)
[deep learning] deep learning reasoning framework tensorrt MNN openvino onnxruntime

众昂矿业:新能源新材料产业链对萤石需求大增

12 excellent practices of wireless network security

Online JSON to CSharp (c) class tool

Leetcode 1208. 尽可能使字符串相等(终于解决,晚安)
随机推荐
会话和守护进程
flutter系列之:flutter中的Wrap
【深度学习】深度学习推理框架 TensorRT MNN OpenVINO ONNXRuntime
IDEA-导入模块
深度学习 TensorFlow入门
【二叉树】翻转等价二叉树
How to process large volume xlsx/csv/txt files?
解决使用Exception抛出后,@Transactional不生效
PTA:7-31 期刊收费
在线文本过滤小于指定长度工具
Differences between MyISAM and InnoDB of MySQL storage engine
移动端城市列表排序js插件vercitylist.js
12 excellent practices of wireless network security
Background ribbon animation plug-in ribbon js
给你的AppImage创建桌面快捷方式
华为联机对战服务玩家快速匹配后,不同玩家收到的同一房间内玩家列表不同
PTA:6-30 时间相加
虫子 STM32 中断 (懂的都懂)
Imitation 360 desktop suspended ball plug-in
JD cloud distributed database stardb won the "stability practice pioneer" of China Academy of information technology