当前位置:网站首页>Overview of video self supervised learning

Overview of video self supervised learning

2022-07-05 18:40:00 Zhiyuan community

https://arxiv.org/abs/2207.00419

The remarkable success of deep learning in various fields depends on the availability of large-scale annotation data sets . However , Using artificially generated annotations will lead to biased learning of the model 、 Poor domain generalization ability and robustness . Getting comments is also expensive , It takes a lot of effort , This is particularly challenging for video . As an alternative , Self supervised learning provides a way to express learning without annotations , It shows prospects in the field of image and video . Different from image domain , Learning video presentation is more challenging , Because the time dimension , Introduced motion and other environmental dynamics . This also provides an opportunity for the exclusive idea of Promoting Self-regulated Learning in the field of video and multimodality . In this review , We provide an existing method for self supervised learning in the field of video . We summarize these methods into three different categories according to their learning objectives : 1) Text preset task ,2) Generative modeling , and 3) Comparative learning . These methods are also different in the way they are used ; 1) video, 2) video-audio, 3) video-text, 4) video-audio-text. We further introduce the commonly used data sets 、 Downstream assessment tasks 、 Limitations of existing work and potential future directions in this field .

The requirement of large-scale labeled samples limits the use of deep network in the problem of limited data and difficult annotation , For example, medical imaging Dargan et al. [2020]. Although in ImageNet Krizhevsky wait forsomeone [2012a] and Kinetics Kay wait forsomeone [2017] Pre training on large-scale labeled data sets can indeed improve performance , But this method has some defects , For example, note the cost Yang et al. [2017], Cai et al. [2021], Annotation deviation Chen and Joo [2021], Rodrigues and Pereira[2018], Lack of domain generalization Wang wait forsomeone [2021a], Hu wait forsomeone [2020],Kim wait forsomeone [2021], And lack of robustness Hendrycks and Dietterich[2019].Hendrycks etc. [2021]. Self supervised learning (SSL) It has become a successful method of pre training depth model , To overcome some of these problems . It is a promising alternative , Models can be trained on large data sets , You don't need to mark Jing and Tian[2020], And it has better generalization .SSL Use some learning objectives from the training sample itself to train the model . then , This pre trained model is used as the initialization of the target data set , Then fine tune it with the available marker samples . chart 1 Shows an overview of this approach .

原网站

版权声明
本文为[Zhiyuan community]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/186/202207051832477624.html