This is the official repository of XVFI (eXtreme Video Frame Interpolation)

Overview

XVFI PWC PWC

This is the official repository of XVFI (eXtreme Video Frame Interpolation), https://arxiv.org/abs/2103.16206

Last Update: 20210607

We provide the training and test code along with the trained weights and the dataset (train+test) used for XVFI. If you find this repository useful, please consider citing our paper.

Examples of the VFI (x8 Multi-Frame Interpolation) results on X-TEST

results_045_resized results_079_resized results_158_resized
The [email protected] input frames are interpolated to be [email protected] frames. All results are encoded at 30fps to be played as x8 slow motion and spatially down-scaled due to the limit of file sizes. All methods are trained on X-TRAIN.

Table of Contents

  1. X4K1000FPS
  2. Requirements
  3. Test
  4. Test_Custom
  5. Training
  6. Reference
  7. Contact

X4K1000FPS

Dataset of high-resolution (4096×2160), high-fps (1000fps) video frames with extreme motion.

003 004 045 078 081 146
Some examples of X4K1000FPS dataset, which are frames of 1000-fps and 4K-resolution. Our dataset contains the various scenes with extreme motions. (Displayed in spatiotemporally subsampled .gif files)

We provide our X4K1000FPS dataset which consists of X-TEST and X-TRAIN. Please refer to our main/suppl. paper for the details of the dataset. You can download the dataset from this dropbox link.

X-TEST consists of 15 video clips with 33-length of 4K-1000fps frames. It follows the below directory format:

├──── YOUR_DIR/
    ├──── test/
       ├──── Type1/
          ├──── TEST01/
             ├──── 0000.png
             ├──── ...
             └──── 0032.png
          ├──── TEST02/
             ├──── 0000.png
             ├──── ...
             └──── 0032.png
          ├──── ...
       ├──── ...

X-TRAIN consists of 4,408 clips from various types of 110 scenes. The clips are 65-length of 1000fps frames. Each frame is the size of 768x768 cropped from 4K frame. It follows the below directory format:

├──── YOUR_DIR/
    ├──── train/
       ├──── 002/
          ├──── occ008.320/
             ├──── 0000.png
             ├──── ...
             └──── 0064.png
          ├──── occ008.322/
             ├──── 0000.png
             ├──── ...
             └──── 0064.png
          ├──── ...
       ├──── ...

After downloading the files from the link, decompress the encoded_test.tar.gz and encoded_train.tar.gz. The resulting .mp4 files can be decoded into .png files via running mp4_decoding.py. Please follow the instruction written in mp4_decoding.py.

Requirements

Our code is implemented using PyTorch1.7, and was tested under the following setting:

  • Python 3.7
  • PyTorch 1.7.1
  • CUDA 10.2
  • cuDNN 7.6.5
  • NVIDIA TITAN RTX GPU
  • Ubuntu 16.04 LTS

Caution: since there is "align_corners" option in "nn.functional.interpolate" and "nn.functional.grid_sample" in PyTorch1.7, we recommend you to follow our settings. Especially, if you use the other PyTorch versions, it may lead to yield a different performance.

Test

Quick Start for X-TEST (x8 Multi-Frame Interpolation as in Table 2)

  1. Download the source codes in a directory of your choice .
  2. First download our X-TEST test dataset by following the above section 'X4K1000FPS'.
  3. Download the pre-trained weights, which was trained by X-TRAIN, from this link to place in /checkpoint_dir/XVFInet_X4K1000FPS_exp1.
XVFI
└── checkpoint_dir
   └── XVFInet_X4K1000FPS_exp1
       ├── XVFInet_X4K1000FPS_exp1_latest.pt           
  1. Run main.py with the following options in parse_args:
python main.py --gpu 0 --phase 'test' --exp_num 1 --dataset 'X4K1000FPS' --module_scale_factor 4 --S_tst 5 --multiple 8 

==> It would yield (PSNR/SSIM/tOF) = (30.12/0.870/2.15).

python main.py --gpu 0 --phase 'test' --exp_num 1 --dataset 'X4K1000FPS' --module_scale_factor 4 --S_tst 3 --multiple 8 

==> It would yield (PSNR/SSIM/tOF) = (28.86/0.858/2.67).

Description

  • After running with the above test option, you can get the result images in /test_img_dir/XVFInet_X4K1000FPS_exp1, then obtain the PSNR/SSIM/tOF results per each test clip as "total_metrics.csv" in the same folder.
  • Our proposed XVFI-Net can start from any downscaled input upward by regulating '--S_tst', which is adjustable in terms of the number of scales for inference according to the input resolutions or the motion magnitudes.
  • You can get any Multi-Frame Interpolation (x M) result by regulating '--multiple'.

Quick Start for Vimeo90K (as in Fig. 8)

  1. Download the source codes in a directory of your choice .
  2. First download Vimeo90K dataset from this link (including 'tri_trainlist.txt') to place in /vimeo_triplet.
XVFI
└── vimeo_triplet
       ├──  sequences
       readme.txt
       tri_testlist.txt
       tri_trainlist.txt
  1. Download the pre-trained weights (XVFI-Net_v), which was trained by Vimeo90K, from this link to place in /checkpoint_dir/XVFInet_Vimeo_exp1.
XVFI
└── checkpoint_dir
   └── XVFInet_Vimeo_exp1
       ├── XVFInet_Vimeo_exp1_latest.pt           
  1. Run main.py with the following options in parse_args:
python main.py --gpu 0 --phase 'test' --exp_num 1 --dataset 'Vimeo' --module_scale_factor 2 --S_tst 1 --multiple 2

==> It would yield PSNR = 35.07 on Vimeo90K.

Description

  • After running with the above test option, you can get the result images in /test_img_dir/XVFInet_Vimeo_exp1.
  • There are certain code lines in front of the 'def main()' for a convenience when running with the Vimeo option.
  • The SSIM result of 0.9760 as in Fig. 8 was measured by matlab ssim function for a fair comparison after running the above guide because other SOTA methods did so. We also upload "compare_psnr_ssim.m" matlab file to obtain it.
  • It should be noted that there is a typo "S_trn and S_tst are set to 2" in the current version of XVFI paper, which should be modified to 1 (not 2), sorry for inconvenience.

Test_Custom

Quick Start for your own video data ('--custom_path') for any Multi-Frame Interpolation (x M)

  1. Download the source codes in a directory of your choice .
  2. First prepare your own video datasets in /custom_path by following a hierarchy as belows:
XVFI
└── custom_path
   ├── scene1
       ├── 'xxx.png'
       ├── ...
       └── 'xxx.png'
   ...
   
   ├── sceneN
       ├── 'xxxxx.png'
       ├── ...
       └── 'xxxxx.png'

  1. Download the pre-trained weights trained on X-TRAIN or Vimeo90K as decribed above.

  2. Run main.py with the following options in parse_args (ex) x8 Multi-Frame Interpolation):

# For the model trained on X-TRAIN
python main.py --gpu 0 --phase 'test_custom' --exp_num 1 --dataset 'X4K1000FPS' --module_scale_factor 4 --S_tst 5 --multiple 8 --custom_path './custom_path'
# For the model trained on Vimeo90K
python main.py --gpu 0 --phase 'test_custom' --exp_num 1 --dataset 'Vimeo' --module_scale_factor 2 --S_tst 1 --multiple 8 --custom_path './custom_path'

Description

  • Our proposed XVFI-Net can start from any downscaled input upward by regulating '--S_tst', which is adjustable in terms of the number of scales for inference according to the input resolutions or the motion magnitudes.
  • You can get any Multi-Frame Interpolation (x M) result by regulating '--multiple'.
  • It only supports for '.png' format.
  • Since we can not cover diverse possibilites of naming rule for custom frames, please sort your own frames properly.

Training

Quick Start for X-TRAIN

  1. Download the source codes in a directory of your choice .
  2. First download our X-TRAIN train/val/test datasets by following the above section 'X4K1000FPS' and place them as belows:
XVFI
└── X4K1000FPS
      ├──  train
          ├── 002
          ├── ...
          └── 172
      ├──  val
          ├── Type1
          ├── Type2
          ├── Type3
      ├──  test
          ├── Type1
          ├── Type2
          ├── Type3

  1. Run main.py with the following options in parse_args:
python main.py --phase 'train' --exp_num 1 --dataset 'X4K1000FPS' --module_scale_factor 4 --S_trn 3 --S_tst 5

Quick Start for Vimeo90K

  1. Download the source codes in a directory of your choice .
  2. First download Vimeo90K dataset from this link (including 'tri_trainlist.txt') to place in /vimeo_triplet.
XVFI
└── vimeo_triplet
       ├──  sequences
       readme.txt
       tri_testlist.txt
       tri_trainlist.txt
  1. Run main.py with the following options in parse_args:
python main.py --phase 'train' --exp_num 1 --dataset 'Vimeo' --module_scale_factor 2 --S_trn 1 --S_tst 1

Description

  • You can freely regulate other arguments in the parser of main.py, here

Reference

Hyeonjun Sim*, Jihyong Oh*, and Munchurl Kim "XVFI: eXtreme Video Frame Interpolation", https://arxiv.org/abs/2103.16206, 2021. (* equal contribution)

BibTeX

@article{sim2021xvfi,
  title={XVFI: eXtreme Video Frame Interpolation},
  author={Sim, Hyeonjun and Oh, Jihyong and Kim, Munchurl},
  journal={arXiv preprint arXiv:2103.16206},
  year={2021}
}

Contact

If you have any question, please send an email to either [email protected] or [email protected].

License

The source codes and datasets can be freely used for research and education only. Any commercial use should get formal permission first.

Owner
Jihyong Oh
KAIST Ph.D. Candidate 3rd yr. Please refer to my personal homepage as below (URL).
Jihyong Oh
Implement of "Training deep neural networks via direct loss minimization" in PyTorch for 0-1 loss

This is the implementation of "Training deep neural networks via direct loss minimization" published at ICML 2016 in PyTorch. The implementation targe

Cuong Nguyen 1 Jan 18, 2022
Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis

PyEmits, a python package for easy manipulation in time-series data. Time-series data is very common in real life. Engineering FSI industry (Financial

Descript 150 Dec 06, 2022
Run Effective Large Batch Contrastive Learning on Limited Memory GPU

Gradient Cache Gradient Cache is a simple technique for unlimitedly scaling contrastive learning batch far beyond GPU memory constraint. This means tr

Luyu Gao 198 Dec 29, 2022
DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks

DECAF (DEbiasing CAusal Fairness) Code Author: Trent Kyono This repository contains the code used for the "DECAF: Generating Fair Synthetic Data Using

van_der_Schaar \LAB 7 Nov 24, 2022
OBBDetection: an oriented object detection toolbox modified from MMdetection

OBBDetection note: If you have questions or good suggestions, feel free to propose issues and contact me. introduction OBBDetection is an oriented obj

MIXIAOXIN_HO 3 Nov 11, 2022
POPPY (Physical Optics Propagation in Python) is a Python package that simulates physical optical propagation including diffraction

POPPY: Physical Optics Propagation in Python POPPY (Physical Optics Propagation in Python) is a Python package that simulates physical optical propaga

Space Telescope Science Institute 132 Dec 15, 2022
A PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing"

A PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing" (WebConf 2021). Abstract In this work we propose Pathfind

Benedek Rozemberczki 49 Dec 01, 2022
Shared Attention for Multi-label Zero-shot Learning

Shared Attention for Multi-label Zero-shot Learning Overview This repository contains the implementation of Shared Attention for Multi-label Zero-shot

dathuynh 26 Dec 14, 2022
Official implementation of Protected Attribute Suppression System, ICCV 2021

Official implementation of Protected Attribute Suppression System, ICCV 2021

Prithviraj Dhar 6 Jan 01, 2023
Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement (NeurIPS 2020)

MTTS-CAN: Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement Paper Xin Liu, Josh Fromm, Shwetak Patel, Daniel M

Xin Liu 106 Dec 30, 2022
Experimenting with computer vision techniques to generate annotated image datasets from gameplay recordings automatically.

Experimenting with computer vision techniques to generate annotated image datasets from gameplay recordings automatically. The collected data will then be used to train a deep neural network that can

Martin Valchev 3 Apr 24, 2022
salabim - discrete event simulation in Python

Object oriented discrete event simulation and animation in Python. Includes process control features, resources, queues, monitors. statistical distrib

181 Dec 21, 2022
UFPR-ADMR-v2 Dataset

UFPR-ADMR-v2 Dataset The UFPR-ADMRv2 dataset contains 5,000 dial meter images obtained on-site by employees of the Energy Company of Paraná (Copel), w

Gabriel Salomon 8 Sep 29, 2022
Unoffical implementation about Image Super-Resolution via Iterative Refinement by Pytorch

Image Super-Resolution via Iterative Refinement Paper | Project Brief This is a unoffical implementation about Image Super-Resolution via Iterative Re

LiangWei Jiang 2.5k Jan 02, 2023
A tf.keras implementation of Facebook AI's MadGrad optimization algorithm

MADGRAD Optimization Algorithm For Tensorflow This package implements the MadGrad Algorithm proposed in Adaptivity without Compromise: A Momentumized,

20 Aug 18, 2022
shufflev2-yolov5:lighter, faster and easier to deploy

shufflev2-yolov5: lighter, faster and easier to deploy. Evolved from yolov5 and the size of model is only 1.7M (int8) and 3.3M (fp16). It can reach 10+ FPS on the Raspberry Pi 4B when the input size

pogg 1.5k Jan 05, 2023
Learning Representations that Support Robust Transfer of Predictors

Transfer Risk Minimization (TRM) Code for Learning Representations that Support Robust Transfer of Predictors Prepare the Datasets Preprocess the Scen

Yilun Xu 15 Dec 07, 2022
Train CNNs for the fruits360 data set in NTOU CS「Machine Vision」class.

CNNs fruits360 Train CNNs for the fruits360 data set in NTOU CS「Machine Vision」class. CNN on a pretrained model Build a CNN on a pretrained model, Res

Ricky Chuang 1 Mar 07, 2022
Official pytorch implementation of "DSPoint: Dual-scale Point Cloud Recognition with High-frequency Fusion"

DSPoint Official implementation of "DSPoint: Dual-scale Point Cloud Recognition with High-frequency Fusion". Paper link: https://arxiv.org/abs/2111.10

Ziyao Zeng 14 Feb 26, 2022
The Official Implementation of Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose [NIPS 2021].

Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose Release Notes The offical PyTorch implementation of Neural View Sy

Angtian Wang 20 Oct 09, 2022