当前位置:网站首页>CVPR 2022 | greatly reduce the manual annotation required for zero sample learning, and propose category semantic embedding rich in visual information (source code download)
CVPR 2022 | greatly reduce the manual annotation required for zero sample learning, and propose category semantic embedding rich in visual information (source code download)
2022-07-04 14:03:00 【Computer Vision Research Institute】
Pay attention to the parallel stars
Never get lost
Institute of computer vision
official account ID|ComputerVisionGzq
Study Group | Scan the code to get the join mode on the homepage
Computer Vision Institute column
author :Edison_G
From Beijing University of Posts and telecommunications 、 Researchers from Mapu Institute and other institutions have proposed category embedding mining network , It improves the completeness of category embedding in visual space , It plays an important role in promoting knowledge transfer between categories in zero sample learning .
from 《 Almost Human 》
Zero sample learning aims to imitate human reasoning process , Using knowledge of visible categories , Identify invisible categories without training samples . Category embedding (class embeddings) It is a vector that describes category semantics and visual features , It can realize the transfer of knowledge between categories , Therefore, it plays an irreplaceable role in zero sample learning .
Zero sample classification diagram
As shown in the figure above , Because of attributes (attributes) Can be shared by different categories , Promote the transfer of knowledge between categories , Therefore, it is the most widely used category embedding . And in other computer vision tasks ( Such as face recognition 、 Fine grained classification 、 Fashion trend forecast ) Is widely used as auxiliary information .
However, the process of attribute annotation requires a lot of manpower input and expert knowledge , It limits the expansion of zero sample learning on new data sets . Besides , Limited by human cognitive limitations , Its labeled attributes cannot traverse the visual space , Therefore, some distinguishing features in the image cannot be captured by attributes , Result in poor learning effect of zero samples .
For the above problems , From Beijing University of Posts and telecommunications 、 Researchers from Mapu Institute and other institutions have proposed category embedding mining network (Visually-Grounded Semantic Embedding Network, VGSE), This article mainly answers two questions :(1) How to automatically discover category embedding with semantic and visual features from visible class images ;(2) How to... Without training samples , Forecast category embedding for invisible categories .
Thesis link : https://arxiv.org/abs/2203.10444
Code link : https://github.com/wenjiaXu/VGSE
In order to fully mine the shared visual features among different categories ,VGSE The model clusters a large number of local image slices according to their visual similarity to form attribute clusters , Summarize the visual features shared by different categories of instances from the underlying features of the image . Besides VGSE The model proposes the category relation module , Learning category relationships with the help of a small number of external knowledge sources , Ability to transfer knowledge from source category to target category , Predict class embedding for target classes without training images . Compared with other attributes obtained from corpus based automatic mining ,VGSE Model in CUB、SUN、AWA2 Very competitive results are obtained on the zero sample classification dataset . As shown in the figure below , This paper can explore the visual features that are complementary to the manual annotation attributes , Improve the completeness of category embedding in visual space , It plays an important role in promoting knowledge transfer between categories in zero sample learning . This paper has been published by CVPR 2022 Employment .
Category embedded mining model
Category embedded mining model VGSE The flow of the algorithm is shown below , The model is mainly composed of two modules :(1) Slice clustering module (Patch Clustering, PC) Take training data set as input , Cluster image slices into different clusters .(2) Category relation module (Class Relation, CR) Semantic embedding for predicting invisible classes .
VGSE Model structure
Slice clustering module
Because attributes usually appear in local areas of the image , For example, animal body parts 、 The shape and texture of objects in the scene , Therefore, this paper proposes to use the clustering of image local slices to explore visual attribute clusters . In order to get the information that covers the whole semantic image area ( For example, animal head ) The image block of , The slice clustering module adopts the unsupervised compact watershed segmentation algorithm [4] Divide the image into regular shaped areas , Then the visual similarity of image slices is used for clustering .
The slice clustering module is a differentiable deep neural network , Given image slice , The network first extracts the features of the image , Then through the clustering layer Predict the probability that the feature will be predicted in each attribute cluster :
In this paper, the clustering loss function based on visual similarity is used to train the clustering network . Force image slicing And its similar slice sets are clustered into the same attribute cluster :
In order to enhance the discrimination of category embedding , So that it can distinguish the significant differences between categories , This paper proposes adding discernibility information , By learning the full connectivity layer , The prediction of each picture is mapped to its category prediction probability , Then the cross entropy loss training model is used :
This article aims to learn about attribute clusters shared between categories , Promote the transfer of knowledge between categories , Therefore, attribute clusters are encouraged to contain semantic links between categories . To achieve this goal , By learning the full connectivity layer S, Map the embedding of each picture to the semantic tags of the category ( Use the... Of the category name here w2v vector ). And then through the regression loss training model , To strengthen the semantic connection of category embedding :
Final , The image embedding of a complete image is calculated by averaging the embedding of all slices in the image :
And the category The embedding of is obtained by averaging all the images of this class :
Category relation module
The embedding of visible classes can be predicted by the slice clustering module . But in reality, there are a lot of invisible classes , Its category embedding cannot be predicted by image . Because semantically related categories usually share some attributes , For example, pandas and zebras share “ Black and white “ attribute , Both elk and bull contain “ horn ” This property . This section proposes to learn the semantic similarity between visible and invisible classes , And predict the embedding of invisible classes through semantically related visible classes . Any external semantic knowledge , for example w2v、glove And other category semantic embedded or manually annotated attributes , Can be used to learn the relationship between two classes . Below to w2v An example is given to illustrate the proposed class relation mining module .
For a given visible class w2v Semantic label , And the semantic tags of invisible categories , In this section, you learned about similarity mapping , Where represents the similarity between the target class and the source class . Similarity mapping is learned through the following optimization problems :
among , The attribute value of the target category is the weighted sum of all the attribute values of the source category .
experimental result
In this paper, three general zero sample classification datasets (CUB、AWA2、SUN) To verify the effectiveness of the proposed method .
The figure below shows AWA2 Attribute clusters learned from data sets . We will 10,000 The embedding and utilization of image slices t-SNE Map to 2D space . This paper samples several attribute clusters ( Use dots of the same color ) The image slices from the attribute cluster are marked in the figure .
Mining attribute cluster visualization results
The data in the figure illustrates the following points : First , It can be observed that image slices in the same cluster tend to cluster together , And convey a consistent visual message , This indicates that image embedding provides discernible information . Besides , Almost all attribute clusters contain image slices from multiple categories . for example , Stripes from different animals , Although the color is slightly different, the texture is similar . This phenomenon shows that the category embedding studied in this paper contains the information shared between classes . Another interesting observation is , The model proposed in this paper can find the visual attributes neglected by human annotation , It can enhance the visual completeness of human annotation attributes .
Table 1 It shows the class embedding VGSE-SMO With category w2v The representation of vectors on three data sets . To test the ability to embed two categories , We f-VAEGAN-D2[5] And so on , The results show that the category embedding proposed in this paper can greatly surpass w2v The performance of vectors .
Table 2 In the task of zero sample classification, the effects of category embedding proposed in this paper and several other methods of corpus attribute mining are compared , The results show that the method in this paper only uses w2v In the case of vectors , The effect is better than other methods using online corpus .
As mentioned earlier , The class relation module proposed in this paper can use a variety of external semantic knowledge to learn class similarity ,Table 4 Shows the effect of using different semantic knowledge .
This paper investigates the semantic consistency and visual consistency of the category embedding . Random selection 50 Attribute clusters , And show the 30 A picture . The user is first asked to view an example image of the attribute cluster . Then answer the following questions to measure the effect of attribute clusters .
User survey interface
It turns out that , stay 88.5% and 87.0% Under the circumstances , Users think that the attribute clusters mined by this method convey consistent visual and semantic information .
summary
To reduce the manual annotation required for zero sample learning , Improve the semantic and visual completeness of category embedding , This paper presents an automatic class embedding mining network VSGE Model , It can use the visual similarity of image slices to explore category embedding . The results on three datasets show that , The class embedding scheme proposed in this paper can effectively improve the quality of semantic embedding , And it can mine fine-grained attributes that are difficult for human beings to label . Besides playing an important role in zero sample learning , The category embedding proposed in this paper can also provide new ideas for other attribute related research .
reference :
[1] Al-Halah, Ziad, and Rainer Stiefelhagen. "Automatic discovery, association estimation and learning of semantic attributes for a thousand categories." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
[2] Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Proceedings of the Advances in neural information processing systems. 2013.
[3] Wang, Xiaolong, Yufei Ye, and Abhinav Gupta. "Zero-shot recognition via semantic embeddings and knowledge graphs." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
[4] Neubert, Peer, and Peter Protzel. "Compact watershed and preemptive slic: On improving trade-offs of superpixel segmentation algorithms." Proceedings of the IEEE International Conference on Pattern Recognition. 2014.
[5] Xian, Yongqin, et al. "f-vaegan-d2: A feature generating framework for any-shot learning." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
THE END
Please contact the official account for authorization.
The learning group of computer vision research institute is waiting for you to join !
ABOUT
Institute of computer vision
The Institute of computer vision is mainly involved in the field of deep learning , Mainly devoted to face detection 、 Face recognition , Multi target detection 、 Target tracking 、 Image segmentation and other research directions . The Research Institute will continue to share the latest paper algorithm new framework , The difference of our reform this time is , We need to focus on ” Research “. After that, we will share the practice process for the corresponding fields , Let us really experience the real scene of getting rid of the theory , Develop the habit of hands-on programming and brain thinking !
VX:2311123606
边栏推荐
- 易周金融 | Q1保险行业活跃人数8688.67万人 19家支付机构牌照被注销
- WS2811 M是三通道LED驱动控制专用电路彩灯带方案开发
- Huahao Zhongtian rushes to the scientific and Technological Innovation Board: the annual loss is 280million, and it is proposed to raise 1.5 billion. Beida pharmaceutical is a shareholder
- Fs7867s is a voltage detection chip used for power supply voltage monitoring of digital system
- 锐成芯微冲刺科创板:年营收3.67亿拟募资13亿 大唐电信是股东
- ViewBinding和DataBinding的理解和区别
- Introduction to reverse debugging PE structure resource table 07/07
- 英视睿达冲刺科创板:年营收4.5亿 拟募资9.79亿
- 吃透Chisel语言.09.Chisel项目构建、运行和测试(一)——用sbt构建Chisel项目并运行
- 2022危险化学品经营单位主要负责人练习题及模拟考试
猜你喜欢
Unittest中的TestSuite和TestRunner
.Net之延迟队列
吃透Chisel语言.09.Chisel项目构建、运行和测试(一)——用sbt构建Chisel项目并运行
Unity Shader学习(三)试着绘制一个圆
MySQL45讲——学习极客时间MySQL实战45讲笔记—— 06 | 全局锁和表锁_给表加个字段怎么有这么多阻碍
基于链表管理的单片机轮询程序框架
吃透Chisel语言.10.Chisel项目构建、运行和测试(二)——Chisel中生成Verilog代码&Chisel开发流程
博士申请 | 西湖大学学习与推理系统实验室招收博后/博士/研究实习等
CVPR 2022 | 大幅减少零样本学习所需的人工标注,提出富含视觉信息的类别语义嵌入(源代码下载)...
MySQL version 8 installation Free Tutorial
随机推荐
IP 实验室月复盘 · 第 5 期
2022g3 boiler water treatment examination question simulation examination question bank and simulation examination
面试官:Redis中哈希数据类型的内部实现方式是什么?
IDEA快捷键大全
Understanding and difference between viewbinding and databinding
Web knowledge supplement
C语言个人通讯录管理系统
392. 判断子序列
E-week finance | Q1 the number of active people in the insurance industry was 86.8867 million, and the licenses of 19 Payment institutions were cancelled
js中的变量提升和函数提升
Summary of recent days (non-technical article)
读取 Excel 表数据
吃透Chisel语言.12.Chisel项目构建、运行和测试(四)——Chisel测试之ChiselTest
结合案例:Flink框架中的最底层API(ProcessFunction)用法
逆向调试入门-PE结构-资源表07/07
China Post technology rushes to the scientific innovation board: the annual revenue is 2.058 billion, and the postal group is the major shareholder
JVM memory layout detailed, illustrated, well written!
Huahao Zhongtian rushes to the scientific and Technological Innovation Board: the annual loss is 280million, and it is proposed to raise 1.5 billion. Beida pharmaceutical is a shareholder
Redis —— How To Install Redis And Configuration(如何快速在 Ubuntu18.04 与 CentOS7.6 Linux 系统上安装 Redis)
. Net delay queue