Article reading guidance
In this issue , We have brought you 3 A related paper on self supervised learning , Two of them are by the father of convolutional networks Yann LeCun Participate in publishing .

For large machine vision training tasks , Self supervised learning (Self-supervised learning, abbreviation SSL) And the effect of supervised methods is becoming more and more difficult to distinguish .

among , Self supervised learning refers to the use of auxiliary tasks (pretext), From large-scale unsupervised data , Mining their own supervision information to improve the quality of learning representation , The network is trained by constructing supervision information , So as to learn valuable representations for downstream tasks .

This paper will focus on self supervised learning , Share 3 Papers , In order to improve everyone's understanding of self supervised learning .

Barlow Twins： Based on redundancy reduction SSL

subject ：

Barlow Twins: Self-Supervised Learning via Redundancy Reduction

author ：

Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, Stephane Deny

For self supervised learning method , A very useful method is to learn to embed vectors (embedding), Because it is not affected by input sample distortion .

But there is also an unavoidable problem with this method ：trivial constant solution. Most of the current methods are trying to achieve details , To avoid trivial constant solution Appearance .

In this paper , An objective function is proposed , By measuring the output of two identical networks （ Use distorted samples ） Cross correlation matrix between , Make it as close to the identity matrix as possible (identity matrix), So as to avoid collapse (collapse) Happen .

This makes the sample （ The distortion ） The embedding vector becomes similar , It also makes these vector components (component) The redundancy between is the least . This method is called Barlow Twins.

Barlow Twins Schematic diagram

Barlow Twins There is no need to large batches, Don't need to, network twins There is asymmetry between （ Such as pradictor network、gradient stopping etc. ）, This is due to the very high-dimensional output vector .

Barlow Twins Loss function ：

among λ It's a normal number (positive constant), Used to weigh Loss Importance of items 1 and 2 ;C Between the outputs of two identical networks , Along the batch Cross correlation matrix of dimension calculation ：

among ,b Express batch sample;i, j Vector dimension representing network output ;C Then it represents the square matrix , Its size is the dimension of network output (-1~1 Between ).

stay ImageNet On ,Barlow Twins In low data mechanism (low-data regime) Performance in semi supervised classification under , Better than all previous methods ; stay ImageNet In the classification task , With the most advanced linear classifier The effect is quite ; This is also true in the migration tasks of classification and target detection .

stay ImageNet On the use of 1% and 10% Training examples of , Conduct semi supervised learning , Bold indicates the best result

Experiments show that , Compared with other methods ,Barlow Twins My performance is slightly better （ Use 1% When the data is ） Or even （ Use 10% When the data is ）.

Read the complete paper. See ：Barlow Twins: Self-Supervised Learning via Redundancy Reduction

VICReg： variance - invariance - Covariance regularization

subject ：

VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

author ：

Adrien Bardes, Jean Ponce, Yann LeCun

Self supervised method for image representation learning , Generally based on the same image 、 Consistency between embedded vectors of different views , To maximize . When the encoder Encoder When outputting a constant vector , There will be a trivial solution.

In general , Through the... In the learning architecture implicit bias（ Lack of clear reason or explanation ）, Avoid this collapse (collapse) The emergence of problems .

In this paper , The authors introduce VICReg ( Full name Variance-Invariance-Covariance Regularization), It is on the embedded variance of each dimension , All have a simple regularization term , Therefore, the occurrence of crash problem can be clearly avoided .

VICReg It combines the variance term with the decorrelation mechanism based on reducing redundancy and covariance regularization (decorrelation mechanism), And on several downstream missions , It has achieved results comparable to the current technical level .

Besides , Experiments show that the new variance term is incorporated into other methods , Helps stabilize training and improve performance .

VICReg Schematic diagram

Given a batch of images I,X and X' Represent different views , Encode it as a representation Y and Y'. The characterization is input to the extender (expander), Generate embedded vectors Z and Z'.

The distance between two embeddings from the same image is minimized , Each embedded variable is in a batch Variance in , Keep above the threshold , And one batch The covariance between pairs of embedded variables in is attracted to zero , Correlate these variables .

Although the two branches do not need the same architecture , There is no need to share weights , But in most experiments , They are twin networks sharing weight (Siamese)： The encoder has an output dimension of 2048 Of ResNet-50 Backbone network ; The extender includes 3 Size is 8192 The full connection layer of .

Different methods in ImageNet Performance comparison on , Underline the top performers 3 A self-monitoring method

For evaluation VICReg In the process of the training ResNet-50 The backbone network is characterized ：

1、 stay ImageNet Freeze characterization (frozen representation) Above linear classification;

2、 from 1% and 10% Of ImageNet Semi supervised classification based on fine-tuning characterization of samples .

The picture shows Top-1 and Top-5 The accuracy of （ Company ：%）.

Read the complete paper. See ：VICREG: VARIANCE-INVARIANCE-COVARIANCE REGULARIZATION FOR SELF-SUPERVISED LEARNING

iBOT：Image BERT Online Tokenizer

subject ：

iBOT: Image BERT Pre-Training with Online Tokenizer

author ：

Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, Tao Kong

NLP field Transformer The success of the model , It mainly benefits from the mask language model (masked language modeling, abbreviation MLM) The auxiliary task of (pretext), That is to say, the text is segmented into semantic fragments .

In this paper , The author discusses the mask image model (masked image modeling, abbreviation MIM) Conduct research , A self-monitoring framework is proposed iBOT.

iBOT You can use online tokenizer Make mask prediction (masked prediction). say concretely , The author of masked patch token Self distillation (self-distillation), And will teacher Network as an online word splitter , At the same time class token Self distillation , To get visual semantics (visual semantics).

The online word splitter can also work with MIM Aim to learn together , It also eliminates the multi-stage training of word segmentation which needs pre training in advance pipeline.

iBOT Framework overview , Mask image modeling with the help of online word segmentation

iBOT Outstanding performance , In and classification 、 object detection 、 Downstream tasks related to instance segmentation and semantic segmentation , The most advanced results have been obtained .

Table 2： stay ImageNet-1K Fine tune on ,Table 3： stay ImageNet-1K Fine tune on , And in ImageNet-22K Pre training on

Experimental results show that ,iBOT stay ImageNet-1K Up to 82.3% Of linear probing Accuracy rate , as well as 87.8% Fine tuning accuracy .

Read the complete paper. See ：IMAGE BERT PRE-TRAINING WITH ONLINE TOKENIZER

DocArray： Data structure for unstructured data

Self supervised learning is one of the many challenges , For a large number of unlabeled data , Conduct representational learning .

With the rapid development of Internet technology , The number of unstructured data has increased unprecedentedly , Data structures are also covered except for text 、 Audio and video other than images , even to the extent that 3D mesh.

DocArray It can greatly simplify the processing and utilization of unstructured data .

DocArray Is an extensible data structure , Perfect for deep learning tasks , It is mainly used for the transmission of nested and unstructured data , Supported data types include text 、 Images 、 Audio 、 video 、3D mesh etc. .

Compared with other data structures ：

Express full support for , Indicates partial support , Show no support for

utilize DocArray, Deep learning engineers can use Pythonic API, Handle effectively 、 The embedded 、 Search for 、 recommend 、 Storing and transferring data .

The above is all the content shared in this self supervised learning paper , What other papers do you want to know 、 Tutorials and tool recommendations ？ Welcome to the official account, telling us the background message. , We will send pictures every week according to the message

Reference link ：