This repository is the offical Pytorch implementation of ContextPose: Context Modeling in 3D Human Pose Estimation: A Unified Perspective (CVPR 2021).

Overview

Context Modeling in 3D Human Pose Estimation: A Unified Perspective (CVPR 2021)

Introduction

This repository is the offical Pytorch implementation of ContextPose, Context Modeling in 3D Human Pose Estimation: A Unified Perspective (CVPR 2021). Below is the example pipeline of using ContextPose for 3D pose estimation. overall pipeline

Quick start

Environment

This project is developed using >= python 3.5 on Ubuntu 16.04. NVIDIA GPUs are needed.

Installation

  1. Clone this repo, and we'll call the directory that you cloned as ${ContextPose_ROOT}.
  2. Install dependences.
    1. Install pytorch >= v1.4.0 following official instruction.
    2. Install other packages. This project doesn't have any special or difficult-to-install dependencies. All installation can be down with:
    pip install -r requirements.txt
  3. Download data following the next section. In summary, your directory tree should be like this
${ContextPose_ROOT}
├── data
├── experiments
├── mvn
├── logs 
├── README.md
├── process_h36m.sh
├── requirements.txt
├── train.py
`── train.sh

Data

Note: We provide the training and evaluation code on Human3.6M dataset. We do NOT provide the source data. We do NOT own the data or have permission to redistribute the data. Please download according to the official instructions.

Human3.6M

  1. Install CDF C Library by following (https://stackoverflow.com/questions/37232008/how-read-common-data-format-cdf-in-python/58167429#58167429), which is neccessary for processing Human3.6M data.
  2. Download and preprocess the dataset by following the instructions in mvn/datasets/human36m_preprocessing/README.md.
  3. To train ContextPose model, you need rough estimations of the pelvis' 3D positions both for train and val splits. In the paper we use the precalculated 3D skeletons estimated by the Algebraic model proposed in learnable-triangulation (which is an opensource repo and we adopt their Volumetric model to be our baseline.) All pretrained weights and precalculated 3D skeletons can be downloaded at once from here and placed to ./data/pretrained. Here, we fine-tuned the pretrained weight on the Human3.6M dataset for another 20 epochs, please download the weight from here and place to ./data/pretrained/human36m.
  4. We provide the limb length mean and standard on the Human3.6M training set, please download from here and place to ./data/human36m/extra.
  5. Finally, your data directory should be like this (for more detailed directory tree, please refer to README.md)
${ContextPose_ROOT}
|-- data
    |-- human36m
    |   |-- extra
    |   |   | -- una-dinosauria-data
    |   |   | -- ...
    |   |   | -- mean_and_std_limb_length.h5
    |   `-- ...
    `-- pretrained
        |-- human36m
            |-- human36m_alg_10-04-2019
            |-- human36m_vol_softmax_10-08-2019
            `-- backbone_weights.pth

Train

Every experiment is defined by .config files. Configs with experiments from the paper can be found in the ./experiments directory. You can use the train.sh script or specifically:

Single-GPU

To train a Volumetric model with softmax aggregation using 1 GPU, run:

python train.py \
  --config experiments/human36m/train/human36m_vol_softmax_single.yaml \
  --logdir ./logs

The training will start with the config file specified by --config, and logs (including tensorboard files) will be stored in --logdir.

Multi-GPU

Multi-GPU training is implemented with PyTorch's DistributedDataParallel. It can be used both for single-machine and multi-machine (cluster) training. To run the processes use the PyTorch launch utility.

To train our model using 4 GPUs on single machine, run:

python -m torch.distributed.launch --nproc_per_node=4 --master_port=2345 --sync_bn\
  train.py  \
  --config experiments/human36m/train/human36m_vol_softmax_single.yaml \
  --logdir ./logs

Evaluation

After training, you can evaluate the model. Inside the same config file, add path to the learned weights (they are dumped to logs dir during training):

model:
    init_weights: true
    checkpoint: {PATH_TO_WEIGHTS}

Single-GPU

Run:

python train.py \
  --eval --eval_dataset val \
  --config experiments/human36m/eval/human36m_vol_softmax_single.yaml \
  --logdir ./logs

Multi-GPU

Using 4 GPUs on single machine, Run:

python -m torch.distributed.launch --nproc_per_node=4 --master_port=2345 \
  train.py  --eval --eval_dataset val \
  --config experiments/human36m/eval/human36m_vol_softmax_single.yaml \
  --logdir ./logs

Argument --eval_dataset can be val or train. Results can be seen in logs directory or in the tensorboard.

Results & Model Zoo

  • We evaluate ContextPose on two available large benchmarks: Human3.6M and MPI-INF-3DHP.
  • To get the results reported in our paper, you can download the weights and place to ./logs.
Dataset to be evaluated Weights Results
Human3.6M link 43.4mm (MPJPE)
MPI-INF-3DHP link 81.5 (PCK), 43.6 (AUC)
  • For H36M, the main metric is MPJPE (Mean Per Joint Position Error) which is L2 distance averaged over all joints. To get the result, run as stated above.
  • For 3DHP, Percentage of Correctly estimated Keypoints (PCK) as well as Area Under the Curve (AUC) are reported. Note that we directly apply our model trained on H36M dataset to 3DHP dataset without re-training to evaluate the generalization performance. To prevent from over-fitting to the H36M-style appearance, we only change the training strategy that we fix the backbone to train 20 epoch before we train the whole network end-to-end. If you want to eval on MPI-INF-3DHP, you can save the results and use the official evaluation code in Matlab.

Human3.6M

MPI-INF-3DHP

Citation

If you use our code or models in your research, please cite with:

@article{ma2021context,
  title={Context Modeling in 3D Human Pose Estimation: A Unified Perspective},
  author={Ma, Xiaoxuan and Su, Jiajun and Wang, Chunyu and Ci, Hai and Wang, Yizhou},
  journal={arXiv preprint arXiv:2103.15507},
  year={2021}
} 

Acknowledgement

This repo is built on https://github.com/karfly/learnable-triangulation-pytorch. Part of the data are provided by https://github.com/una-dinosauria/3d-pose-baseline.

COIN the currently largest dataset for comprehensive instruction video analysis.

COIN Dataset COIN is the currently largest dataset for comprehensive instruction video analysis. It contains 11,827 videos of 180 different tasks (i.e

86 Dec 28, 2022
source code of “Visual Saliency Transformer” (ICCV2021)

Visual Saliency Transformer (VST) source code for our ICCV 2021 paper “Visual Saliency Transformer” by Nian Liu, Ni Zhang, Kaiyuan Wan, Junwei Han, an

89 Dec 21, 2022
Implements the training, testing and editing tools for "Pluralistic Image Completion"

Pluralistic Image Completion ArXiv | Project Page | Online Demo | Video(demo) This repository implements the training, testing and editing tools for "

Chuanxia Zheng 615 Dec 08, 2022
Physical Anomalous Trajectory or Motion (PHANTOM) Dataset

Physical Anomalous Trajectory or Motion (PHANTOM) Dataset Description This dataset contains the six different classes as described in our paper[]. The

0 Dec 16, 2021
Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

Richard Wang 443 Dec 06, 2022
Official pytorch implementation of the AAAI 2021 paper Semantic Grouping Network for Video Captioning

Semantic Grouping Network for Video Captioning Hobin Ryu, Sunghun Kang, Haeyong Kang, and Chang D. Yoo. AAAI 2021. [arxiv] Environment Ubuntu 16.04 CU

Hobin Ryu 43 Nov 25, 2022
CvT2DistilGPT2 is an encoder-to-decoder model that was developed for chest X-ray report generation.

CvT2DistilGPT2 Improving Chest X-Ray Report Generation by Leveraging Warm-Starting This repository houses the implementation of CvT2DistilGPT2 from [1

The Australian e-Health Research Centre 21 Dec 28, 2022
Code repo for "Towards Interpretable Deep Networks for Monocular Depth Estimation" paper.

InterpretableMDE A PyTorch implementation for "Towards Interpretable Deep Networks for Monocular Depth Estimation" paper. arXiv link: https://arxiv.or

Zunzhi You 16 Aug 12, 2022
HeatNet is a python package that provides tools to build, train and evaluate neural networks designed to predict extreme heat wave events globally on daily to subseasonal timescales.

HeatNet HeatNet is a python package that provides tools to build, train and evaluate neural networks designed to predict extreme heat wave events glob

Google Research 6 Jul 07, 2022
An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise

45 Dec 08, 2022
A Tensorflow implementation of CapsNet based on Geoffrey Hinton's paper Dynamic Routing Between Capsules

CapsNet-Tensorflow A Tensorflow implementation of CapsNet based on Geoffrey Hinton's paper Dynamic Routing Between Capsules Notes: The current version

Huadong Liao 3.8k Dec 29, 2022
EmoTag helps you train emotion detection model for Chinese audios

emoTag emoTag helps you train emotion detection model for Chinese audios. Environment pip install -r requirement.txt Data We used Emotional Speech Dat

_zza 4 Sep 07, 2022
Train robotic agents to learn pick and place with deep learning for vision-based manipulation in PyBullet.

Ravens is a collection of simulated tasks in PyBullet for learning vision-based robotic manipulation, with emphasis on pick and place. It features a Gym-like API with 10 tabletop rearrangement tasks,

Google Research 367 Jan 09, 2023
ICCV2021 Oral SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks

Sign-Agnostic Convolutional Occupancy Networks Paper | Supplementary | Video | Teaser Video | Project Page This repository contains the implementation

63 Nov 18, 2022
Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

Mingrui Yu 3 Jan 07, 2022
Transformers based fully on MLPs

Awesome MLP-based Transformers papers An up-to-date list of Transformers based fully on MLPs without attention! Why this repo? After transformers and

Fawaz Sammani 35 Dec 30, 2022
Feedback is important: response-aware feedback mechanism for background based conversation

RFM The code for the paper: "Feedback is important: response-aware feedback mechanism for background based conversation." Requirements python 3.7 pyto

Jiatao Chen 2 Sep 29, 2022
Repo for flood prediction using LSTMs and HAND

Abstract Every year, floods cause billions of dollars’ worth of damages to life, crops, and property. With a proper early flood warning system in plac

1 Oct 27, 2021
Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval (NeurIPS'21)

Baleen Baleen is a state-of-the-art model for multi-hop reasoning, enabling scalable multi-hop search over massive collections for knowledge-intensive

Stanford Future Data Systems 22 Dec 05, 2022