当前位置:网站首页>Google proposed the super pre training model coca, and the accuracy of fine-tuning top-1 on Imagenet reached 91%! SOTA on multiple downstream tasks!
Google proposed the super pre training model coca, and the accuracy of fine-tuning top-1 on Imagenet reached 91%! SOTA on multiple downstream tasks!
2022-06-10 13:02:00 【Zhiyuan community】
This article shares papers 『CoCa: Contrastive Captioners are Image-Text Foundation Models』,Google Research Propose a super pre training model CoCa, stay ImageNet Fine tune up Top-1 Accuracy up to 91%! On multiple downstream tasks SOTA!
The details are as follows :

Thesis link :https://arxiv.org/abs/2205.01917
Exploring the basic model of large-scale pre training is of great significance in computer vision , Because these models can be quickly transferred to many downstream tasks . In this paper, we put forward a comparative subtitle (Contrastive Captioner,CoCa) Model , It encodes image text - The basic model of decoder is combined with contrast loss and caption loss for pre training , Thus from CLIP Equal comparison method and SimVLM The advantages of the two models are absorbed in the equal generation method . With all decoder layers attend Standard encoder to encoder output - decoder Transformer Different ,CoCa Cross attention in the first half of the decoder layer is omitted to encode unimodal The text means , And the other decoder levels of the cross attention image encoder are connected to perform multimodal Image text representation .

边栏推荐
- Summary of Kitti related information
- Software project management 6.10 Cost budget
- PCB learning notes (2) -3d packaging related
- Leetcode 96. 不同的二叉搜索树
- Start with interpreting the code automatically generated by BDC, and explain the trial version of the program components of sapgui
- [spark] (task8) pipeline channel establishment in sparkml
- Case sharing and implementation introduction of SAP field service management and wechat integration
- Leetcode 96. Different binary search trees
- Technology sharing | quick intercom, global intercom
- Automatic mapping of tailored landmark representations for automated driving and map learning
猜你喜欢

谷歌提出超强预训练模型CoCa,在ImageNet上微调Top-1准确率达91%!在多个下游任务上SOTA!

Which EDA design software should Altium Allegro pads choose

VDMA调试总结

Recommended learning materials for Altium Designer

Summary of Kitti related information

Oceanbase, phase II of the document promotion plan, invites you to jointly build documents

Slide the navigation fixed head upwards

从解读 BDC 自动生成的代码谈起,讲解 SAPGUI 的程序组成部分

change system time

JS global timer case
随机推荐
深度神经网络每秒分类近20亿张图像,新型类脑光学分类器芯片登上Nature
Baidu programmers were sentenced to nine months for deleting the database. The one click unbinding function of the mobile phone number was released. Twitter compromised with musk again. Today, more bi
蚂蚁金服杨军:蚂蚁数据分析平台的演进及数据分析方法的应用
【抬杠C#】如何实现接口的base调用
20-year technical veteran gives up his CTO title! Why did the startup attract him?
六石编程学:以文字处理的位置,谈谈命名
Tidb elementary course experience 8 (cluster management and maintenance, adding a tikv node)
VDMA commissioning summary
UML class diagram
由文件图形丢失,说明自己都不用自己开发的OFFICE
Unity3D开发MR实现模型遮挡与透明地面接收阴影
Performance test plan (plan) template
CMakeLists.txt 如何编写
Unity3d uses URP rendering pipeline to realize ar shadow (shadow casting and transparent ground)
技术分享| 快对讲,全球对讲
STM32 learning notes (2) -usart (basic application 1)
JTAG-to-AXI Master调试AXI BRAM Controller
'getWidth()' is deprecated,'getHeight()' is deprecated
日本版arXiv凉得一批:2个多月了,才收到37篇论文
PCB learning notes (2) -3d packaging related