Article reading guidance
In this issue , We have brought you 3 A related paper on self supervised learning , Two of them are by the father of convolutional networks Yann LeCun Participate in publishing .
For large machine vision training tasks , Self supervised learning (Self-supervised learning, abbreviation SSL) And the effect of supervised methods is becoming more and more difficult to distinguish .
among , Self supervised learning refers to the use of auxiliary tasks (pretext), From large-scale unsupervised data , Mining their own supervision information to improve the quality of learning representation , The network is trained by constructing supervision information , So as to learn valuable representations for downstream tasks .
This paper will focus on self supervised learning , Share 3 Papers , In order to improve everyone's understanding of self supervised learning .
Barlow Twins: Based on redundancy reduction SSL
subject :
Barlow Twins: Self-Supervised Learning via Redundancy Reduction
author :
Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, Stephane Deny
For self supervised learning method , A very useful method is to learn to embed vectors (embedding), Because it is not affected by input sample distortion .
But there is also an unavoidable problem with this method :trivial constant solution. Most of the current methods are trying to achieve details , To avoid trivial constant solution Appearance .
In this paper , An objective function is proposed , By measuring the output of two identical networks ( Use distorted samples ) Cross correlation matrix between , Make it as close to the identity matrix as possible (identity matrix), So as to avoid collapse (collapse) Happen .
This makes the sample ( The distortion ) The embedding vector becomes similar , It also makes these vector components (component) The redundancy between is the least . This method is called Barlow Twins.
Barlow Twins Schematic diagram
Barlow Twins There is no need to large batches, Don't need to, network twins There is asymmetry between ( Such as pradictor network、gradient stopping etc. ), This is due to the very high-dimensional output vector .
Barlow Twins Loss function :
among λ It's a normal number (positive constant), Used to weigh Loss Importance of items 1 and 2 ;C Between the outputs of two identical networks , Along the batch Cross correlation matrix of dimension calculation :
among ,b Express batch sample;i, j Vector dimension representing network output ;C Then it represents the square matrix , Its size is the dimension of network output (-1~1 Between ).
stay ImageNet On ,Barlow Twins In low data mechanism (low-data regime) Performance in semi supervised classification under , Better than all previous methods ; stay ImageNet In the classification task , With the most advanced linear classifier The effect is quite ; This is also true in the migration tasks of classification and target detection .
stay ImageNet On the use of 1% and 10% Training examples of , Conduct semi supervised learning , Bold indicates the best result
Experiments show that , Compared with other methods ,Barlow Twins My performance is slightly better ( Use 1% When the data is ) Or even ( Use 10% When the data is ).
Read the complete paper. See :Barlow Twins: Self-Supervised Learning via Redundancy Reduction
VICReg: variance - invariance - Covariance regularization
subject :
VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning
author :
Adrien Bardes, Jean Ponce, Yann LeCun
Self supervised method for image representation learning , Generally based on the same image 、 Consistency between embedded vectors of different views , To maximize . When the encoder Encoder When outputting a constant vector , There will be a trivial solution.
In general , Through the... In the learning architecture implicit bias( Lack of clear reason or explanation ), Avoid this collapse (collapse) The emergence of problems .
In this paper , The authors introduce VICReg ( Full name Variance-Invariance-Covariance Regularization), It is on the embedded variance of each dimension , All have a simple regularization term , Therefore, the occurrence of crash problem can be clearly avoided .
VICReg It combines the variance term with the decorrelation mechanism based on reducing redundancy and covariance regularization (decorrelation mechanism), And on several downstream missions , It has achieved results comparable to the current technical level .
Besides , Experiments show that the new variance term is incorporated into other methods , Helps stabilize training and improve performance .
VICReg Schematic diagram
Given a batch of images I,X and X' Represent different views , Encode it as a representation Y and Y'. The characterization is input to the extender (expander), Generate embedded vectors Z and Z'.
The distance between two embeddings from the same image is minimized , Each embedded variable is in a batch Variance in , Keep above the threshold , And one batch The covariance between pairs of embedded variables in is attracted to zero , Correlate these variables .
Although the two branches do not need the same architecture , There is no need to share weights , But in most experiments , They are twin networks sharing weight (Siamese): The encoder has an output dimension of 2048 Of ResNet-50 Backbone network ; The extender includes 3 Size is 8192 The full connection layer of .
Different methods in ImageNet Performance comparison on , Underline the top performers 3 A self-monitoring method
For evaluation VICReg In the process of the training ResNet-50 The backbone network is characterized :
1、 stay ImageNet Freeze characterization (frozen representation) Above linear classification;
2、 from 1% and 10% Of ImageNet Semi supervised classification based on fine-tuning characterization of samples .
The picture shows Top-1 and Top-5 The accuracy of ( Company :%).
Read the complete paper. See :VICREG: VARIANCE-INVARIANCE-COVARIANCE REGULARIZATION FOR SELF-SUPERVISED LEARNING
iBOT:Image BERT Online Tokenizer
subject :
iBOT: Image BERT Pre-Training with Online Tokenizer
author :
Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, Tao Kong
NLP field Transformer The success of the model , It mainly benefits from the mask language model (masked language modeling, abbreviation MLM) The auxiliary task of (pretext), That is to say, the text is segmented into semantic fragments .
In this paper , The author discusses the mask image model (masked image modeling, abbreviation MIM) Conduct research , A self-monitoring framework is proposed iBOT.
iBOT You can use online tokenizer Make mask prediction (masked prediction). say concretely , The author of masked patch token Self distillation (self-distillation), And will teacher Network as an online word splitter , At the same time class token Self distillation , To get visual semantics (visual semantics).
The online word splitter can also work with MIM Aim to learn together , It also eliminates the multi-stage training of word segmentation which needs pre training in advance pipeline.
iBOT Framework overview , Mask image modeling with the help of online word segmentation
iBOT Outstanding performance , In and classification 、 object detection 、 Downstream tasks related to instance segmentation and semantic segmentation , The most advanced results have been obtained .
Table 2: stay ImageNet-1K Fine tune on ,Table 3: stay ImageNet-1K Fine tune on , And in ImageNet-22K Pre training on
Experimental results show that ,iBOT stay ImageNet-1K Up to 82.3% Of linear probing Accuracy rate , as well as 87.8% Fine tuning accuracy .
Read the complete paper. See :IMAGE BERT PRE-TRAINING WITH ONLINE TOKENIZER
DocArray: Data structure for unstructured data
Self supervised learning is one of the many challenges , For a large number of unlabeled data , Conduct representational learning .
With the rapid development of Internet technology , The number of unstructured data has increased unprecedentedly , Data structures are also covered except for text 、 Audio and video other than images , even to the extent that 3D mesh.
DocArray It can greatly simplify the processing and utilization of unstructured data .
DocArray Is an extensible data structure , Perfect for deep learning tasks , It is mainly used for the transmission of nested and unstructured data , Supported data types include text 、 Images 、 Audio 、 video 、3D mesh etc. .
Compared with other data structures :
Express full support for , Indicates partial support , Show no support for
utilize DocArray, Deep learning engineers can use Pythonic API, Handle effectively 、 The embedded 、 Search for 、 recommend 、 Storing and transferring data .
The above is all the content shared in this self supervised learning paper , What other papers do you want to know 、 Tutorials and tool recommendations ? Welcome to the official account, telling us the background message. , We will send pictures every week according to the message
Reference link :









