当前位置:网站首页>Overview of video self supervised learning
Overview of video self supervised learning
2022-07-05 18:40:00 【Zhiyuan community】
https://arxiv.org/abs/2207.00419
The remarkable success of deep learning in various fields depends on the availability of large-scale annotation data sets . However , Using artificially generated annotations will lead to biased learning of the model 、 Poor domain generalization ability and robustness . Getting comments is also expensive , It takes a lot of effort , This is particularly challenging for video . As an alternative , Self supervised learning provides a way to express learning without annotations , It shows prospects in the field of image and video . Different from image domain , Learning video presentation is more challenging , Because the time dimension , Introduced motion and other environmental dynamics . This also provides an opportunity for the exclusive idea of Promoting Self-regulated Learning in the field of video and multimodality . In this review , We provide an existing method for self supervised learning in the field of video . We summarize these methods into three different categories according to their learning objectives : 1) Text preset task ,2) Generative modeling , and 3) Comparative learning . These methods are also different in the way they are used ; 1) video, 2) video-audio, 3) video-text, 4) video-audio-text. We further introduce the commonly used data sets 、 Downstream assessment tasks 、 Limitations of existing work and potential future directions in this field .
The requirement of large-scale labeled samples limits the use of deep network in the problem of limited data and difficult annotation , For example, medical imaging Dargan et al. [2020]. Although in ImageNet Krizhevsky wait forsomeone [2012a] and Kinetics Kay wait forsomeone [2017] Pre training on large-scale labeled data sets can indeed improve performance , But this method has some defects , For example, note the cost Yang et al. [2017], Cai et al. [2021], Annotation deviation Chen and Joo [2021], Rodrigues and Pereira[2018], Lack of domain generalization Wang wait forsomeone [2021a], Hu wait forsomeone [2020],Kim wait forsomeone [2021], And lack of robustness Hendrycks and Dietterich[2019].Hendrycks etc. [2021]. Self supervised learning (SSL) It has become a successful method of pre training depth model , To overcome some of these problems . It is a promising alternative , Models can be trained on large data sets , You don't need to mark Jing and Tian[2020], And it has better generalization .SSL Use some learning objectives from the training sample itself to train the model . then , This pre trained model is used as the initialization of the target data set , Then fine tune it with the available marker samples . chart 1 Shows an overview of this approach .
边栏推荐
- Introduction to the development function of Hanlin Youshang system of Hansheng Youpin app
- SAP feature description
- Deep copy and shallow copy [interview question 3]
- Trust counts the number of occurrences of words in the file
- Lombok @builder annotation
- [QNX Hypervisor 2.2用户手册]6.3.2 配置VM
- 让更多港澳青年了解南沙特色文创产品!“南沙麒麟”正式亮相
- Is it safe to open an account and register stocks for stock speculation? Is there any risk? Is it reliable?
- 《力扣刷题计划》复制带随机指针的链表
- rust统计文件中单词出现的次数
猜你喜欢
随机推荐
What are the cache interfaces of nailing open platform applet API?
node_ Exporter memory usage is not displayed
Einstein sum einsum
ClickHouse(03)ClickHouse怎么安装和部署
Record a case of using WinDbg to analyze memory "leakage"
小程序 修改样式 ( placeholder、checkbox的样式)
Is it complicated to open an account? Is online account opening safe?
达梦数据库udf实现
Clickhouse (03) how to install and deploy Clickhouse
【Autosar 十四 启动流程详解】
技术分享 | 常见接口协议解析
FCN: Fully Convolutional Networks for Semantic Segmentation
max31865模块RTD测温注意事项
LeetCode 6109. Number of people who know the secret
@Extension、@SPI注解原理
写作写作写作写作
一文读懂简单查询代价估算
Exemple Quelle est la relation entre le taux d'échantillonnage, l'échantillon et la durée?
Cronab log: how to record the output of my cron script
怎么自动安装pythn三方库