Single-Shot Motion Completion with Transformer

Last update: Dec 29, 2022

Related tags

Overview

Single-Shot Motion Completion with Transformer

👉 [Preprint] 👈

Abstract

Motion completion is a challenging and long-discussed problem, which is of great significance in film and game applications. For different motion completion scenarios (in-betweening, in-filling, and blending), most previous methods deal with the completion problems with case-by-case designs. In this work, we propose a simple but effective method to solve multiple motion completion problems under a unified framework and achieves a new state of the art accuracy under multiple evaluation settings. Inspired by the recent great success of attention-based models, we consider the completion as a sequence to sequence prediction problem. Our method consists of two modules - a standard transformer encoder with self-attention that learns long-range dependencies of input motions, and a trainable mixture embedding module that models temporal information and discriminates key-frames. Our method can run in a non-autoregressive manner and predict multiple missing frames within a single forward propagation in real time. We finally show the effectiveness of our method in music-dance applications.

State-of-the-art on Lafan1 dataset

With the help of Transformer, we achieve a new SOTA result on Lafan1 dataset.

Lengths = 30	L2Q	L2P	NPSS
Zero-Vel	1.51	6.60	0.2318
Interp.	0.98	2.32	0.2013
ERD-QV	0.69	1.28	0.1328
Ours	0.61	1.10	0.1222

Some results (blue appearaces represent keyframes):

Dance Infilling on Anidance Dataset

We also evaluate our method on the Anidance dataset:

Infilling on the test set (black skeletons are the keyframes):

(From Left to Right: Ours, Interp. and Ground Truth)

Infilling on random keyframes (keyframes are randomly chosen from the test set with a random order for simulating in-the-wild scenario):

(From Left to Right: Ours, Interp. and Ground Truth)

Dance blending

Our method can also work on complex dance movement completion:

Code

Coming soon

Citation

@misc{duan2021singleshot,
      title={Single-Shot Motion Completion with Transformer}, 
      author={Yinglin Duan and Tianyang Shi and Zhengxia Zou and Yenan Lin and Zhehui Qian and Bohan Zhang and Yi Yuan},
      year={2021},
      eprint={2103.00776},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Single-Shot Motion Completion with Transformer

Related tags

Overview

Single-Shot Motion Completion with Transformer

Abstract

State-of-the-art on Lafan1 dataset

Dance Infilling on Anidance Dataset

Dance blending

Code

Citation

Owner

FuxiCV

PyTorch implementation of Rethinking Positional Encoding in Language Pre-training

Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

CPPE - 5 (Medical Personal Protective Equipment) is a new challenging object detection dataset

Tensorflow implementation for "Improved Transformer for High-Resolution GANs" (NeurIPS 2021).

Code for our WACV 2022 paper "Hyper-Convolution Networks for Biomedical Image Segmentation"

Learning to Prompt for Vision-Language Models.

RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

The audio-video synchronization of MKV Container Format is exploited to achieve data hiding

Determined: Deep Learning Training Platform

Official Repository for our ECCV2020 paper: Imbalanced Continual Learning with Partitioning Reservoir Sampling

On-device wake word detection powered by deep learning.

Tiny Kinetics-400 for test

Dynamic Bottleneck for Robust Self-Supervised Exploration

DeepFaceEditing: Deep Face Generation and Editing with Disentangled Geometry and Appearance Control

FLVIS: Feedback Loop Based Visual Initial SLAM

Rax is a Learning-to-Rank library written in JAX

Randomized Correspondence Algorithm for Structural Image Editing

Automated image registration. Registrationimation was too much of a mouthful.

A high-performance anchor-free YOLO. Exceeding yolov3~v5 with ONNX, TensorRT, NCNN, and Openvino supported.