当前位置:网站首页>Overview of video self supervised learning
Overview of video self supervised learning
2022-07-05 18:40:00 【Zhiyuan community】
https://arxiv.org/abs/2207.00419
The remarkable success of deep learning in various fields depends on the availability of large-scale annotation data sets . However , Using artificially generated annotations will lead to biased learning of the model 、 Poor domain generalization ability and robustness . Getting comments is also expensive , It takes a lot of effort , This is particularly challenging for video . As an alternative , Self supervised learning provides a way to express learning without annotations , It shows prospects in the field of image and video . Different from image domain , Learning video presentation is more challenging , Because the time dimension , Introduced motion and other environmental dynamics . This also provides an opportunity for the exclusive idea of Promoting Self-regulated Learning in the field of video and multimodality . In this review , We provide an existing method for self supervised learning in the field of video . We summarize these methods into three different categories according to their learning objectives : 1) Text preset task ,2) Generative modeling , and 3) Comparative learning . These methods are also different in the way they are used ; 1) video, 2) video-audio, 3) video-text, 4) video-audio-text. We further introduce the commonly used data sets 、 Downstream assessment tasks 、 Limitations of existing work and potential future directions in this field .
The requirement of large-scale labeled samples limits the use of deep network in the problem of limited data and difficult annotation , For example, medical imaging Dargan et al. [2020]. Although in ImageNet Krizhevsky wait forsomeone [2012a] and Kinetics Kay wait forsomeone [2017] Pre training on large-scale labeled data sets can indeed improve performance , But this method has some defects , For example, note the cost Yang et al. [2017], Cai et al. [2021], Annotation deviation Chen and Joo [2021], Rodrigues and Pereira[2018], Lack of domain generalization Wang wait forsomeone [2021a], Hu wait forsomeone [2020],Kim wait forsomeone [2021], And lack of robustness Hendrycks and Dietterich[2019].Hendrycks etc. [2021]. Self supervised learning (SSL) It has become a successful method of pre training depth model , To overcome some of these problems . It is a promising alternative , Models can be trained on large data sets , You don't need to mark Jing and Tian[2020], And it has better generalization .SSL Use some learning objectives from the training sample itself to train the model . then , This pre trained model is used as the initialization of the target data set , Then fine tune it with the available marker samples . chart 1 Shows an overview of this approach .
边栏推荐
- Reading notes of Clickhouse principle analysis and Application Practice (5)
- Vulnhub's darkhole_ two
- [use electron to develop desktop on youqilin]
- Take a look at semaphore, the current limiting tool provided by JUC
- Oracle日期格式转换 to_date,to_char,to_timetamp 相互转换
- 音视频包的pts,dts,duration的由来.
- 快速生成ipa包
- Isprs2020/ cloud detection: transferring deep learning models for cloud detection between landsat-8 and proba-v
- Exemple Quelle est la relation entre le taux d'échantillonnage, l'échantillon et la durée?
- MySQL优化六个点的总结
猜你喜欢
爬虫01-爬虫基本原理讲解
ViewPager + RecyclerView的内存泄漏
视频自监督学习综述
Idea configuring NPM startup
使用JMeter录制脚本并调试
SAP feature description
The 11th China cloud computing standards and Applications Conference | cloud computing national standards and white paper series release, and Huayun data fully participated in the preparation
AI金榜题名时,MLPerf榜单的份量究竟有多重?
彻底理解为什么网络 I/O 会被阻塞?
The 2022 China Xinchuang Ecological Market Research and model selection evaluation report released that Huayun data was selected as the mainstream manufacturer of Xinchuang IT infrastructure!
随机推荐
Whether to take a duplicate subset with duplicate elements [how to take a subset? How to remove duplicates?]
What are the cache interfaces of nailing open platform applet API?
开户注册挖财安全吗?有没有风险的?靠谱吗?
[QNX hypervisor 2.2 user manual]6.3.2 configuring VM
Record eval() and no in pytoch_ grad()
7-1 linked list is also simple fina
彻底理解为什么网络 I/O 会被阻塞?
2022最新Android面试笔试,一个安卓程序员的面试心得
A2L file parsing based on CAN bus (3)
Insufficient picture data? I made a free image enhancement software
Problems encountered in the project u-parse component rendering problems
一文读懂简单查询代价估算
FCN: Fully Convolutional Networks for Semantic Segmentation
在通达信上做基金定投安全吗?
Record a case of using WinDbg to analyze memory "leakage"
什么是文本挖掘 ?「建议收藏」
[HCIA cloud] [1] definition of cloud computing, what is cloud computing, architecture and technical description of cloud computing, Huawei cloud computing products, and description of Huawei memory DD
Solutions contents have differences only in line separators
如何写出好代码 - 防御式编程
案例分享|金融业数据运营运维一体化建设