当前位置:网站首页>Paper sharing | self supervised learning paper jointly released by Yann Lecun and read by engineers

Paper sharing | self supervised learning paper jointly released by Yann Lecun and read by engineers

2022-06-24 18:06:00 JinaAI

Article reading guidance
In this issue , We have brought you 3 A related paper on self supervised learning , Two of them are by the father of convolutional networks Yann LeCun Participate in publishing .

For large machine vision training tasks , Self supervised learning (Self-supervised learning, abbreviation SSL) And the effect of supervised methods is becoming more and more difficult to distinguish .

among , Self supervised learning refers to the use of auxiliary tasks (pretext), From large-scale unsupervised data , Mining their own supervision information to improve the quality of learning representation , The network is trained by constructing supervision information , So as to learn valuable representations for downstream tasks .

This paper will focus on self supervised learning , Share 3 Papers , In order to improve everyone's understanding of self supervised learning .

Barlow Twins: Based on redundancy reduction SSL

subject :

Barlow Twins: Self-Supervised Learning via Redundancy Reduction

author :

Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, Stephane Deny

For self supervised learning method , A very useful method is to learn to embed vectors (embedding), Because it is not affected by input sample distortion .

But there is also an unavoidable problem with this method :trivial constant solution. Most of the current methods are trying to achieve details , To avoid trivial constant solution Appearance .

In this paper , An objective function is proposed , By measuring the output of two identical networks ( Use distorted samples ) Cross correlation matrix between , Make it as close to the identity matrix as possible (identity matrix), So as to avoid collapse (collapse) Happen .

This makes the sample ( The distortion ) The embedding vector becomes similar , It also makes these vector components (component) The redundancy between is the least . This method is called Barlow Twins.

Barlow Twins Schematic diagram

Barlow Twins There is no need to large batches, Don't need to, network twins There is asymmetry between ( Such as pradictor network、gradient stopping etc. ), This is due to the very high-dimensional output vector .

Barlow Twins Loss function :

image.png
among λ It's a normal number (positive constant), Used to weigh Loss Importance of items 1 and 2 ;C Between the outputs of two identical networks , Along the batch Cross correlation matrix of dimension calculation :


among ,b Express batch sample;i, j Vector dimension representing network output ;C Then it represents the square matrix , Its size is the dimension of network output (-1~1 Between ).

stay ImageNet On ,Barlow Twins In low data mechanism (low-data regime) Performance in semi supervised classification under , Better than all previous methods ; stay ImageNet In the classification task , With the most advanced linear classifier The effect is quite ; This is also true in the migration tasks of classification and target detection .


stay ImageNet On the use of 1% and 10% Training examples of , Conduct semi supervised learning , Bold indicates the best result

Experiments show that , Compared with other methods ,Barlow Twins My performance is slightly better ( Use 1% When the data is ) Or even ( Use 10% When the data is ).

Read the complete paper. See :Barlow Twins: Self-Supervised Learning via Redundancy Reduction

VICReg: variance - invariance - Covariance regularization

subject :

VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

author :

Adrien Bardes, Jean Ponce, Yann LeCun

Self supervised method for image representation learning , Generally based on the same image 、 Consistency between embedded vectors of different views , To maximize . When the encoder Encoder When outputting a constant vector , There will be a trivial solution.

In general , Through the... In the learning architecture implicit bias( Lack of clear reason or explanation ), Avoid this collapse (collapse) The emergence of problems .

In this paper , The authors introduce VICReg ( Full name Variance-Invariance-Covariance Regularization), It is on the embedded variance of each dimension , All have a simple regularization term , Therefore, the occurrence of crash problem can be clearly avoided .

VICReg It combines the variance term with the decorrelation mechanism based on reducing redundancy and covariance regularization (decorrelation mechanism), And on several downstream missions , It has achieved results comparable to the current technical level .

Besides , Experiments show that the new variance term is incorporated into other methods , Helps stabilize training and improve performance .


VICReg Schematic diagram

Given a batch of images I,X and X' Represent different views , Encode it as a representation Y and Y'. The characterization is input to the extender (expander), Generate embedded vectors Z and Z'.

The distance between two embeddings from the same image is minimized , Each embedded variable is in a batch Variance in , Keep above the threshold , And one batch The covariance between pairs of embedded variables in is attracted to zero , Correlate these variables .

Although the two branches do not need the same architecture , There is no need to share weights , But in most experiments , They are twin networks sharing weight (Siamese): The encoder has an output dimension of 2048 Of ResNet-50 Backbone network ; The extender includes 3 Size is 8192 The full connection layer of .


Different methods in ImageNet Performance comparison on , Underline the top performers 3 A self-monitoring method

For evaluation VICReg In the process of the training ResNet-50 The backbone network is characterized :

1、 stay ImageNet Freeze characterization (frozen representation) Above linear classification;

2、 from 1% and 10% Of ImageNet Semi supervised classification based on fine-tuning characterization of samples .

The picture shows Top-1 and Top-5 The accuracy of ( Company :%).

Read the complete paper. See :VICREG: VARIANCE-INVARIANCE-COVARIANCE REGULARIZATION FOR SELF-SUPERVISED LEARNING

iBOT:Image BERT Online Tokenizer

subject :

iBOT: Image BERT Pre-Training with Online Tokenizer

author :

Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, Tao Kong

NLP field Transformer The success of the model , It mainly benefits from the mask language model (masked language modeling, abbreviation MLM) The auxiliary task of (pretext), That is to say, the text is segmented into semantic fragments .

In this paper , The author discusses the mask image model (masked image modeling, abbreviation MIM) Conduct research , A self-monitoring framework is proposed iBOT.

iBOT You can use online tokenizer Make mask prediction (masked prediction). say concretely , The author of masked patch token Self distillation (self-distillation), And will teacher Network as an online word splitter , At the same time class token Self distillation , To get visual semantics (visual semantics).

The online word splitter can also work with MIM Aim to learn together , It also eliminates the multi-stage training of word segmentation which needs pre training in advance pipeline.


iBOT Framework overview , Mask image modeling with the help of online word segmentation

iBOT Outstanding performance , In and classification 、 object detection 、 Downstream tasks related to instance segmentation and semantic segmentation , The most advanced results have been obtained .


Table 2: stay ImageNet-1K Fine tune on ,Table 3: stay ImageNet-1K Fine tune on , And in ImageNet-22K Pre training on

Experimental results show that ,iBOT stay ImageNet-1K Up to 82.3% Of linear probing Accuracy rate , as well as 87.8% Fine tuning accuracy .

Read the complete paper. See :IMAGE BERT PRE-TRAINING WITH ONLINE TOKENIZER

DocArray: Data structure for unstructured data

Self supervised learning is one of the many challenges , For a large number of unlabeled data , Conduct representational learning .

With the rapid development of Internet technology , The number of unstructured data has increased unprecedentedly , Data structures are also covered except for text 、 Audio and video other than images , even to the extent that 3D mesh.

DocArray It can greatly simplify the processing and utilization of unstructured data .

DocArray Is an extensible data structure , Perfect for deep learning tasks , It is mainly used for the transmission of nested and unstructured data , Supported data types include text 、 Images 、 Audio 、 video 、3D mesh etc. .

Compared with other data structures :


Express full support for , Indicates partial support , Show no support for

utilize DocArray, Deep learning engineers can use Pythonic API, Handle effectively 、 The embedded 、 Search for 、 recommend 、 Storing and transferring data .

The above is all the content shared in this self supervised learning paper , What other papers do you want to know 、 Tutorials and tool recommendations ? Welcome to the official account, telling us the background message. , We will send pictures every week according to the message

Reference link :

Jina GitHub

DocArray

Finetuner

Join in Slack

原网站

版权声明
本文为[JinaAI]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202211501179339.html