当前位置:网站首页>Multi modal data can also be Mae? Berkeley & Google proposed m3ae to conduct Mae on image and text data! The optimal masking rate can reach 75%, significantly higher than 15% of Bert
Multi modal data can also be Mae? Berkeley & Google proposed m3ae to conduct Mae on image and text data! The optimal masking rate can reach 75%, significantly higher than 15% of Bert
2022-06-25 01:29:00 【Zhiyuan community】
This article shares papers 『Multimodal Masked Autoencoders Learn Transferable Representations』, reflection : Multimodal data can also be MAE?UC Berkeley&Google Put forward M3AE, On image and text data MAE! The optimal masking rate of text can reach 75%, Significantly higher than BERT Of 15%!
The details are as follows :

Thesis link :https://arxiv.org/abs/2205.14204
Build extensible models to diversify 、 Learning from multimodal data is still an open challenge . For visual language data , The main method is based on comparative learning objectives , That is, a separate encoder is trained for each mode . Although effective , However, the comparative learning method will introduce sampling bias according to the data used , This reduces the performance of downstream tasks . Besides , These methods are limited to paired image text data , Inability to leverage widely available unpaired data .
In this paper , The author studies a scheme that only uses masking token A large multimodal model for predictive training , Without using a modal specific encoder or comparative learning , You can learn the transferable representation of downstream tasks . The author proposes a simple and scalable network architecture , That is, multimode masking automatic encoder (Multimodal Masked Autoencoder,M3AE), It's through masking token A unified coder for predictive learning of visual and linguistic data .
The author is interested in training on large-scale image text data sets M3AE Conducted an empirical study , Find out M3AE Be able to learn transferable representations , And can be well migrated to downstream tasks . Due to the joint training of the two data modes , And the standard masking rate is 15% Of BERT comparison ,M3AE Benefit from higher text masking rate (50-90%). The author also provides qualitative analysis , It shows that the learned representation combines meaningful information from image and language . Last , The author shows M3AE extensibility , It has a larger model size and training time , And the flexibility of training on paired image text data and unpaired data .
边栏推荐
- The innovation consortium of Haihe laboratory established gbase and became one of the first member units of the innovation Consortium (Xinchuang)
- 4 years of working experience, and you can't tell the five communication modes between multithreads. Can you believe it?
- 归并排序模板 & 理解
- 字符串常用方法
- Use redis' sorted set to make weekly hot Reviews
- mpls 笔记 part 1
- Powerbi - for you who are learning
- 实验5 8254定时/计数器应用实验【微机原理】【实验】
- PS5连接OPPO K9电视不支持2160P/4K
- IPC mechanism
猜你喜欢

Bi SQL constraints

15. several methods of thread synchronization

多模态数据也能进行MAE?伯克利&谷歌提出M3AE,在图像和文本数据上进行MAE!最优掩蔽率可达75%,显著高于BERT的15%
![搜索二维矩阵[二分巧用 + 记录不同于插入二分的解法]](/img/c9/afc03afd477bbfdd3c0dc54bacfd2d.png)
搜索二维矩阵[二分巧用 + 记录不同于插入二分的解法]

汇编语言(3)16位汇编基础框架与加减循环

Reading notes at night -- deep into virtual function

Bi-sql Union

(CVPR 2020) Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds

海河实验室创新联合体成立 GBASE成为首批创新联合体(信创)成员单位

新一代可级联的以太网远程I/O数据采集模块
随机推荐
脱氧核糖核酸酶I中英文说明书
广发期货安全吗?开户需要什么东西?
梦想CAD云图与GIS结合演示
带马尔科夫切换的正向随机微分方程数值格式模拟
Abnova丨BSG 单克隆抗体中英文说明
excel 汉字转拼音「建议收藏」
How to store dataframe data in pandas into MySQL
Expectation and variance
Redis and jedis
sql 聚合函数有哪些
js数组对象转对象
"One good programmer is worth five ordinary programmers!"
Bi SQL alias
腾讯云WeCity丨你好 2022!
动手学数据分析 数据建模和模型评估
Super detailed description and derivation of convolution and deconvolution (deconvolution is also called transpose convolution and fractional step convolution)
Basic knowledge of assembly language (2) -debug
天书夜读笔记——深入虚函数virtual
2种常见的设备稼动率OEE监测方法
屡获大奖的界面控件开发包DevExpress v22.1官宣发布