Spatio-Temporal Dynamic Inference Network for Group Activity Recognition
The source codes for ICCV2021 Paper: Spatio-Temporal Dynamic Inference Network for Group Activity Recognition.
[paper] [supplemental material] [arXiv]
If you find our work or the codebase inspiring and useful to your research, please cite
@inproceedings{yuan2021DIN,
title={Spatio-Temporal Dynamic Inference Network for Group Activity Recognition},
author={Yuan, Hangjie and Ni, Dong and Wang, Mang},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={7476--7485},
year={2021}
}
Dependencies
- Software Environment: Linux (CentOS 7)
- Hardware Environment: NVIDIA TITAN RTX
- Python
3.6
- PyTorch
1.2.0
, Torchvision0.4.0
- RoIAlign for Pytorch
Prepare Datasets
- Download publicly available datasets from following links: Volleyball dataset and Collective Activity dataset.
- Unzip the dataset file into
data/volleyball
ordata/collective
. - Download the file
tracks_normalized.pkl
from cvlab-epfl/social-scene-understanding and put it intodata/volleyball/videos
Using Docker
-
Checkout repository and
cd PROJECT_PATH
-
Build the Docker container
docker build -t din_gar https://github.com/JacobYuan7/DIN_GAR.git#main
- Run the Docker container
docker run --shm-size=2G -v data/volleyball:/opt/DIN_GAR/data/volleyball -v result:/opt/DIN_GAR/result --rm -it din_gar
--shm-size=2G
: To prevent ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm)., you have to extend the container's shared memory size. Alternatively:--ipc=host
-v data/volleyball:/opt/DIN_GAR/data/volleyball
: Makes the host's folderdata/volleyball
available inside the container at/opt/DIN_GAR/data/volleyball
-v result:/opt/DIN_GAR/result
: Makes the host's folderresult
available inside the container at/opt/DIN_GAR/result
-it
&--rm
: Starts the container with an interactive session (PROJECT_PATH is/opt/DIN_GAR
) and removes the container after closing the session.din_gar
the name/tag of the image- optional:
--gpus='"device=7"'
restrict the GPU devices the container can access.
Get Started
-
Train the Base Model: Fine-tune the base model for the dataset.
# Volleyball dataset cd PROJECT_PATH python scripts/train_volleyball_stage1.py # Collective Activity dataset cd PROJECT_PATH python scripts/train_collective_stage1.py
-
Train with the reasoning module: Append the reasoning modules onto the base model to get a reasoning model.
-
Volleyball dataset
-
DIN
python scripts/train_volleyball_stage2_dynamic.py
-
lite DIN
We can run DIN in lite version by setting cfg.lite_dim = 128 in scripts/train_volleyball_stage2_dynamic.py.python scripts/train_volleyball_stage2_dynamic.py
-
ST-factorized DIN
We can run ST-factorized DIN by setting cfg.ST_kernel_size = [(1,3),(3,1)] and cfg.hierarchical_inference = True.Note that if you set cfg.hierarchical_inference = False, cfg.ST_kernel_size = [(1,3),(3,1)] and cfg.num_DIN = 2, then multiple interaction fields run in parallel.
python scripts/train_volleyball_stage2_dynamic.py
Other model re-implemented by us according to their papers or publicly available codes:
- AT
python scripts/train_volleyball_stage2_at.py
- PCTDM
python scripts/train_volleyball_stage2_pctdm.py
- SACRF
python scripts/train_volleyball_stage2_sacrf_biute.py
- ARG
python scripts/train_volleyball_stage2_arg.py
- HiGCIN
python scripts/train_volleyball_stage2_higcin.py
-
-
Collective Activity dataset
- DIN
python scripts/train_collective_stage2_dynamic.py
- DIN lite
We can run DIN in lite version by setting 'cfg.lite_dim = 128' in 'scripts/train_collective_stage2_dynamic.py'.python scripts/train_collective_stage2_dynamic.py
- DIN
-
Another work done by us, solving GAR from the perspective of incorporating visual context, is also available.
@inproceedings{yuan2021visualcontext,
title={Learning Visual Context for Group Activity Recognition},
author={Yuan, Hangjie and Ni, Dong},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={35},
number={4},
pages={3261--3269},
year={2021}
}