当前位置：网站首页>论文阅读 (64)：Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning

论文阅读 (64)：Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning

2022-08-02 06:39:00 【因吉】

文章目录

1 引入
2 RTFM

1 引入

1.1 题目

2021CVPR：用于弱监督视频异常检测的健壮性时间特征量级学习 (Weakly-supervised video anomaly detection with robust temporal feature magnitude learning)

1.2 代码

Torch：https://github.com/tianyu0207/RTFM

1.3 摘要

弱监督视频级别异常检测是一个典型的多示例学习 (Multi-instance learning, MIL) 问题，每一个视频看作是一个包含多个帧的包，目的是判断包中是否包含异常片段。目前的检测方法性能优异，但它们对正实例，即异常视频中罕见的异常片段的识别，在很大程度上受到了支配性负实例的影响，特别是当异常事件是与正常事件相比时只有很小差异的细微异常时。在许多忽略重要视频时间依赖性的方法中，这个问题更加严重。
为了解决这个问题，提出了健壮性时间特征量级学习：
1）训练了一个特征量级学习函数来有效地识别正实例，大大提高了MIL方法对异常视频负实例的鲁棒性；
2）采用空洞卷积 (Dilated convolutions) 和自注意力机制来捕获长距离和短距离的时间依赖性，从而更可靠地学习特征量级。

1.4 Bib

@inproceedings{
    Tian:2021:49754986,
author		=	{
    Yu Tian and Guan Song Pang and Yuan Hong Chen and Rajvinder Singh and Johan W Verjans and Gustavo Carneiro},
title		=	{
    Weakly-supervised video anomaly detection with robust temporal feature magnitude learning},
booktitle	=	{
    {
    IEEE/CVF} International Conference on Computer Vision},
pages		=	{
    4975--4986},
year		=	{
    2021},
url			=	{
    https://openaccess.thecvf.com/content/ICCV2021/html/Tian_Weakly-Supervised_Video_Anomaly_Detection_With_Robust_Temporal_Feature_Magnitude_Learning_ICCV_2021_paper.html}
}

2 RTFM

RTFM的目的是基于弱标记视频来最大程度地区分异常视频和正常视频。给定训练视频的集合 $\mathcal{D}=\{(\mathbf{F}_i,y_i)\}_{i=1}^{|\mathcal{D}|}$ ，其中 $\mathbf{F}\in\mathcal{F}\subset\mathbb{R}^{T\times D}$ 是 $T$ 个视频帧的 $D$ 维预计算特征，例如I3D和C3D； $y\in\mathcal{Y}=\{0,1\}，1$ 表示异常， $0$ 反之。令 $r_{\theta,\phi}(\mathbf{F})=f_{\phi}(s_\theta(\mathbf{F}))$ 表示RTFM模型，其将返回一个 $T$ 维特征 $0,1]^T$ 以表示视频帧是否异常。
模型的训练包括包括端到端多尺度时间特征学习、特征量级学习，以及MIL 分类器。损失函数如下：
$\min_{\theta,\phi}\sum_{i,j=1}^{|\mathcal{D}|}\ell_s(s_\theta(\mathbf{F}_i),s_\theta(\mathbf{F}_j),y_i,y_j)+\ell_f(f_\phi(s_\theta(\mathbf{F}_i)),y_i),$ 其中 $s_\theta: \mathcal{F}\rightarrow\mathcal{X}\subset\mathbb{R}^{T\times D}$ 是时间特征提取器、 $f_\phi: \mathcal{X}\rightarrow[0,1]^T$ 是一个帧级分类器、$$