当前位置:网站首页>谷歌提出超强预训练模型CoCa,在ImageNet上微调Top-1准确率达91%!在多个下游任务上SOTA!
谷歌提出超强预训练模型CoCa,在ImageNet上微调Top-1准确率达91%!在多个下游任务上SOTA!
2022-06-10 12:39:00 【智源社区】
本文分享论文『CoCa: Contrastive Captioners are Image-Text Foundation Models』,Google Research提出超强预训练模型CoCa,在ImageNet上微调Top-1准确率达91%!在多个下游任务上SOTA!
详细信息如下:

探索大规模预训练基础模型在计算机视觉中具有重要意义,因为这些模型可以快速转移到许多下游任务中。本文提出了对比字幕(Contrastive Captioner,CoCa)模型,它将图像文本编码器-解码器基础模型与对比损失和字幕损失结合起来进行预训练,从而从CLIP等对比方法和SimVLM等生成方法中吸收两种模型的长处。与所有解码器层都attend到编码器输出的标准编码器-解码器Transformer不同,CoCa省略了前一半解码器层中的交叉注意来编码unimodal文本表示,并将交叉注意力图像编码器的其余解码器层级联以进行multimodal图像文本表示。

边栏推荐
- Tidb Primary course experience 8 (Management Maintenance of Clusters, add a tikv Node)
- MySQL 服务演进
- 用C语言创建基本的栈与队列
- Recommended learning materials for Altium Designer
- Which EDA design software should Altium Allegro pads choose
- JTAG to Axi master debugging Axi Bram controller
- Get enumeration values through reflection
- [mobile robot] principle of wheel odometer
- VDO-SLAM源码阅读笔记[2] local optimization和global optimization
- MySQL数据库(26):视图 view
猜你喜欢

Wechat web development tool tutorial, the company develops Web

(6) Classes and objects, object initialization and copy constructors (3)

DynaSLAM II: Tightly-Coupled Multi-Object Tracking and SLAM 论文阅读

编写程序,计算2/1+3/2+5/3+8/5.....的值。要求计算前n项之和,保留2位小数(该序列从第二项起,每一项的分子是前一项分子与分母的和,分母是前一项的分子)

Tidb elementary course experience 8 (cluster management and maintenance, adding a tikv node)

【移动机器人】轮式里程计原理

Start with interpreting the code automatically generated by BDC, and explain the trial version of the program components of sapgui

技术分享| 快对讲,全球对讲

百度程序员删库被判9个月,手机号一键解绑功能发布,推特再向马斯克妥协,今日更多大新闻在此...
![[flinlk] dynamic Kerberos authentication in Flink pit](/img/ba/3a85df364ae0bcfca38e5f26366c03.png)
[flinlk] dynamic Kerberos authentication in Flink pit
随机推荐
使用SoapUI工具生成发送短信接口代码
Mr developed by unity3d realizes model occlusion and transparent ground receiving shadow
從解讀 BDC 自動生成的代碼談起,講解 SAPGUI 的程序組成部分
CMakeLists. Txt how to write
【抬杠C#】如何实现接口的base调用
Which EDA design software should Altium Allegro pads choose
Ant financial services Yang Jun: evolution of ant data analysis platform and application of data analysis methods
Driver. JS - open source and independent interactive guidance tool library for web novices, powerful and highly customizable
OFFICE技术讲座:标点符号-中文-竖排
Ad-pcb schematic diagram learning (1)
启牛能开户吗,启牛在APP上可以直接开通券商安全吗
Unity3d uses URP rendering pipeline to realize ar shadow (shadow casting and transparent ground)
Recommended learning materials for Altium Designer
由文件图形丢失,说明自己都不用自己开发的OFFICE
IQR箱线图
今天,一对情侣拿下香港最大电商IPO
Minimalist random music player
蔚来:“拿捏”了数据,“扭捏”着未来
Colmap source code reading notes [1] threading cc
Dynaslam ii: carefully coupled multi object tracking and slam