Code & Models for Temporal Segment Networks (TSN) in ECCV 2016

Overview

Temporal Segment Networks (TSN)

We have released MMAction, a full-fledged action understanding toolbox based on PyTorch. It includes implementation for TSN as well as other STOA frameworks for various tasks. We highly recommend you switch to it. This repo will keep on being suppported for Caffe users.

This repository holds the codes and models for the papers

Temporal Segment Networks for Action Recognition in Videos, Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool, TPAMI, 2018.

[Arxiv Preprint]

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition, Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool, ECCV 2016, Amsterdam, Netherlands.

[Arxiv Preprint]

News & Updates

Jul. 20, 2018 - For those having trouble building the TSN toolkit, we have provided a built docker image you can use. Download it from DockerHub. It contains OpenCV, Caffe, DenseFlow, and this codebase. All built and ready to use with NVIDIA-Docker

Sep. 8, 2017 - We released TSN models trained on the Kinetics dataset with 76.6% single model top-1 accuracy. Find the model weights and transfer learning experiment results on the website.

Aug 10, 2017 - An experimental pytorch implementation of TSN is released github

Nov. 5, 2016 - The project page for TSN is online. website

Sep. 14, 2016 - We fixed a legacy bug in Caffe. Some parameters in TSN training are affected. You are advised to update to the latest version.

FAQ, How to add a custom dataset

Below is the guidance to reproduce the reported results and explore more.

Contents


Usage Guide

Prerequisites

[back to top]

There are a few dependencies to run the code. The major libraries we use are

The codebase is written in Python. We recommend the Anaconda Python distribution. Matlab scripts are provided for some critical steps like video-level testing.

The most straightforward method to install these libraries is to run the build-all.sh script.

Besides software, GPU(s) are required for optical flow extraction and model training. Our Caffe modification supports highly efficient parallel training. Just throw in as many GPUs as you like and enjoy.

Code & Data Preparation

Get the code

[back to top]

Use git to clone this repository and its submodules

git clone --recursive https://github.com/yjxiong/temporal-segment-networks

Then run the building scripts to build the libraries.

bash build_all.sh

It will build Caffe and dense_flow. Since we need OpenCV to have Video IO, which is absent in most default installations, it will also download and build a local installation of OpenCV and use its Python interfaces.

Note that to run training with multiple GPUs, one needs to enable MPI support of Caffe. To do this, run

MPI_PREFIX=<root path to openmpi installation> bash build_all.sh MPI_ON

Get the videos

[back to top]

We experimented on two mainstream action recognition datasets: UCF-101 and HMDB51. Videos can be downloaded directly from their websites. After download, please extract the videos from the rar archives.

  • UCF101: the ucf101 videos are archived in the downloaded file. Please use unrar x UCF101.rar to extract the videos.
  • HMDB51: the HMDB51 video archive has two-level of packaging. The following commands illustrate how to extract the videos.
mkdir rars && mkdir videos
unrar x hmdb51-org.rar rars/
for a in $(ls rars); do unrar x "rars/${a}" videos/; done;

Get trained models

[back to top]

We provided the trained model weights in Caffe style, consisting of specifications in Protobuf messages, and model weights. In the codebase we provide the model spec for UCF101 and HMDB51. The model weights can be downloaded by running the script

bash scripts/get_reference_models.sh

Extract Frames and Optical Flow Images

[back to top]

To run the training and testing, we need to decompose the video into frames. Also the temporal stream networks need optical flow or warped optical flow images for input.

These can be achieved with the script scripts/extract_optical_flow.sh. The script has three arguments

  • SRC_FOLDER points to the folder where you put the video dataset
  • OUT_FOLDER points to the root folder where the extracted frames and optical images will be put in
  • NUM_WORKER specifies the number of GPU to use in parallel for flow extraction, must be larger than 1

The command for running optical flow extraction is as follows

bash scripts/extract_optical_flow.sh SRC_FOLDER OUT_FOLDER NUM_WORKER

It will take from several hours to several days to extract optical flows for the whole datasets, depending on the number of GPUs.

Testing Provided Models

Get reference models

[back to top]

To help reproduce the results reported in the paper, we provide reference models trained by us for instant testing. Please use the following command to get the reference models.

bash scripts/get_reference_models.sh

Video-level testing

[back to top]

We provide a Python framework to run the testing. For the benchmark datasets, we will measure average accuracy on the testing splits. We also provide the facility to analyze a single video.

Generally, to test on the benchmark dataset, we can use the scripts eval_net.py and eval_scores.py.

For example, to test the reference rgb stream model on split 1 of ucf 101 with 4 GPUs, run

python tools/eval_net.py ucf101 1 rgb FRAME_PATH \
 models/ucf101/tsn_bn_inception_rgb_deploy.prototxt models/ucf101_split_1_tsn_rgb_reference_bn_inception.caffemodel \
 --num_worker 4 --save_scores SCORE_FILE

where FRAME_PATH is the path you extracted the frames of UCF-101 to and SCORE_FILE is the filename to store the extracted scores.

One can also use cached score files to evaluate the performance. To do this, issue the following command

python tools/eval_scores.py SCORE_FILE

The more important function of eval_scores.py is to do modality fusion. For example, once we got the scores of rgb stream in RGB_SCORE_FILE and flow stream in FLOW_SCORE_FILE. The fusion result with weights of 1:1.5 can be achieved with

python tools/eval_scores.py RGB_SCORE_FILE FLOW_SCORE_FILE --score_weights 1 1.5

To view the full help message of these scripts, run python eval_net.py -h or python eval_scores.py -h.

Training Temporal Segment Networks

[back to top]

Training TSN is straightforward. We have provided the necessary model specs, solver configs, and initialization models. To achieve optimal training speed, we strongly advise you to turn on the parallel training support in the Caffe toolbox using following build command

MPI_PREFIX=<root path to openmpi installation> bash build_all.sh MPI_ON

where root path to openmpi installation points to the installation of the OpenMPI, for example /usr/local/openmpi/.

Construct file lists for training and validation

[back to top]

The data feeding in training relies on VideoDataLayer in Caffe. This layer uses a list file to specify its data sources. Each line of the list file will contain a tuple of extracted video frame path, video frame number, and video groundtruth class. A list file looks like

video_frame_path 100 10
video_2_frame_path 150 31
...

To build the file lists for all 3 splits of the two benchmark dataset, we have provided a script. Just use the following command

bash scripts/build_file_list.sh ucf101 FRAME_PATH

and

bash scripts/build_file_list.sh hmdb51 FRAME_PATH

The generated list files will be put in data/ with names like ucf101_flow_val_split_2.txt.

Get initialization models

[back to top]

We have built the initialization model weights for both rgb and flow input. The flow initialization models implements the cross-modality training technique in the paper. To download the model weights, run

bash scripts/get_init_models.sh

Start training

[back to top]

Once all necessities ready, we can start training TSN. For this, use the script scripts/train_tsn.sh. For example, the following command runs training on UCF101 with rgb input

bash scripts/train_tsn.sh ucf101 rgb

the training will run with default settings on 4 GPUs. Usually, it takes around 1 hours to train the rgb model and 4 hours for flow models, on 4 GTX Titan X GPUs.

The learned model weights will be saved in models/. The aforementioned testing process can be used to evaluate them.

Config the training process

[back to top]

Here we provide some information on customizing the training process

  • Change split: By default, the training is conducted on split 1 of the datasets. To change the split, one can modify corresponding model specs and solver files. For example, to train on split 2 of UCF101 with rgb input, one can modify the file models/ucf101/tsn_bn_inception_rgb_train_val.prototxt. On line 8, change
source: "data/ucf101_rgb_train_split_1.txt"`

to

`source: "data/ucf101_rgb_train_split_2.txt"`

On line 34, change

source: "data/ucf101_rgb_val_split_1.txt"

to

source: "data/ucf101_rgb_val_split_2.txt"

Also, in the solver file models/ucf101/tsn_bn_inception_rgb_solver.prototxt, on line 12 change

snapshot_prefix: "models/ucf101_split1_tsn_rgb_bn_inception"

to

snapshot_prefix: "models/ucf101_split2_tsn_rgb_bn_inception"

in order to distiguish the learned weights.

  • Change GPU number, in general, one can use any number of GPU to do the training. To use more or less GPU, one can change the N_GPU in scripts/train_tsn.sh. Important notice: when the GPU number is changed, the effective batchsize is also changed. It's better to always make sure the effective batchsize, which equals to batch_size*iter_size*n_gpu, to be 128. Here, batch_size is the number in the model's prototxt, for example line 9 in models/ucf101/tsn_bn_inception_rgb_train_val.protoxt.

Other Info

[back to top]

Citation

Please cite the following paper if you feel this repository useful.

@inproceedings{TSN2016ECCV,
  author    = {Limin Wang and
               Yuanjun Xiong and
               Zhe Wang and
               Yu Qiao and
               Dahua Lin and
               Xiaoou Tang and
               Luc {Val Gool}},
  title     = {Temporal Segment Networks: Towards Good Practices for Deep Action Recognition},
  booktitle   = {ECCV},
  year      = {2016},
}

Related Projects

Contact

For any question, please contact

Yuanjun Xiong: [email protected]
Limin Wang: [email protected]
Owner
Young and simple. [email protected] -> Amazon Rekognition. We are hiring summer interns for 20
[ICCV 2021] Official PyTorch implementation for Deep Relational Metric Learning.

Ranking Models in Unlabeled New Environments Prerequisites This code uses the following libraries Python 3.7 NumPy PyTorch 1.7.0 + torchivision 0.8.1

Borui Zhang 39 Dec 10, 2022
Rule based classification A hotel s customers dataset

Rule-based-classification-A-hotel-s-customers-dataset- Aim: Categorize new customers by segment and predict how much revenue they can generate This re

Şebnem 4 Jan 02, 2022
利用python脚本实现微信、支付宝账单的合并,并保存到excel文件实现自动记账,可查看可视化图表。

KeepAccounts_v2.0 KeepAccounts.exe和其配套表格能够实现微信、支付宝官方导出账单的读取合并,为每笔帐标记类型,并按月份和类型生成可视化图表。再也不用消费一笔记一笔,每月仅需10分钟,记好所有的帐。 作者: MickLife Bilibili: https://spac

159 Jan 01, 2023
PSPNet in Chainer

PSPNet This is an unofficial implementation of Pyramid Scene Parsing Network (PSPNet) in Chainer. Training Requirement Python 3.4.4+ Chainer 3.0.0b1+

Shunta Saito 76 Dec 12, 2022
The repo for the paper "I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection".

I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection Updates | Introduction | Results | Usage | Citation |

33 Jan 05, 2023
Python package for downloading ECMWF reanalysis data and converting it into a time series format.

ecmwf_models Readers and converters for data from the ECMWF reanalysis models. Written in Python. Works great in combination with pytesmo. Citation If

TU Wien - Department of Geodesy and Geoinformation 31 Dec 26, 2022
This is the code repository implementing the paper "TreePartNet: Neural Decomposition of Point Clouds for 3D Tree Reconstruction".

TreePartNet This is the code repository implementing the paper "TreePartNet: Neural Decomposition of Point Clouds for 3D Tree Reconstruction". Depende

刘彦超 34 Nov 30, 2022
Implementation for the paper SMPLicit: Topology-aware Generative Model for Clothed People (CVPR 2021)

SMPLicit: Topology-aware Generative Model for Clothed People [Project] [arXiv] License Software Copyright License for non-commercial scientific resear

Enric Corona 225 Dec 13, 2022
Pytorch implementation of the Variational Recurrent Neural Network (VRNN).

VariationalRecurrentNeuralNetwork Pytorch implementation of the Variational RNN (VRNN), from A Recurrent Latent Variable Model for Sequential Data. Th

emmanuel 251 Dec 17, 2022
Exponential Graph is Provably Efficient for Decentralized Deep Training

Exponential Graph is Provably Efficient for Decentralized Deep Training This code repository is for the paper Exponential Graph is Provably Efficient

3 Apr 20, 2022
Junction Tree Variational Autoencoder for Molecular Graph Generation (ICML 2018)

Junction Tree Variational Autoencoder for Molecular Graph Generation Official implementation of our Junction Tree Variational Autoencoder https://arxi

Wengong Jin 418 Jan 07, 2023
A tiny, pedagogical neural network library with a pytorch-like API.

candl A tiny, pedagogical implementation of a neural network library with a pytorch-like API. The primary use of this library is for education. Use th

Sri Pranav 3 May 23, 2022
The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution.

WSRGlow The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution. Audio sa

Kexun Zhang 96 Jan 03, 2023
Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

Here is deepparse. Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning. Use deepparse to Use the pr

GRAAL/GRAIL 192 Dec 20, 2022
Add-on for importing and auto setup of character creator 3 character exports.

CC3 Blender Tools An add-on for importing and automatically setting up materials for Character Creator 3 character exports. Using Blender in the Chara

260 Jan 05, 2023
CMSC320 - Introduction to Data Science - Fall 2021

CMSC320 - Introduction to Data Science - Fall 2021 Instructors: Elias Jonatan Gonzalez and José Manuel Calderón Trilla Lectures: MW 3:30-4:45 & 5:00-6

Introduction to Data Science 6 Sep 12, 2022
Gym for multi-agent reinforcement learning

PettingZoo is a Python library for conducting research in multi-agent reinforcement learning, akin to a multi-agent version of Gym. Our website, with

Farama Foundation 1.6k Jan 09, 2023
The code for our NeurIPS 2021 paper "Kernelized Heterogeneous Risk Minimization".

Kernelized-HRM Jiashuo Liu, Zheyuan Hu The code for our NeurIPS 2021 paper "Kernelized Heterogeneous Risk Minimization"[1]. This repo contains the cod

Liu Jiashuo 8 Nov 20, 2022
Tooling for converting STAC metadata to ODC data model

手语识别 0、使用到的模型 (1). openpose,作者:CMU-Perceptual-Computing-Lab https://github.com/CMU-Perceptual-Computing-Lab/openpose (2). 图像分类classification,作者:Bubbl

Open Data Cube 65 Dec 20, 2022
Automatic Image Background Subtraction

Automatic Image Background Subtraction This repo contains set of scripts for automatic one-shot image background subtraction task using the following

Oleg Sémery 6 Dec 05, 2022