Video Frame Interpolation with Transformer (CVPR2022)

Overview

VFIformer

Official PyTorch implementation of our CVPR2022 paper Video Frame Interpolation with Transformer

Dependencies

  • python >= 3.8
  • pytorch >= 1.8.0
  • torchvision >= 0.9.0

Prepare Dataset

  1. Vimeo90K Triplet dataset
  2. MiddleBury Other dataset
  3. UCF101 dataset
  4. SNU-FILM dataset

To train on the Vimeo90K, we have to first compute the ground-truth flows between frames using Lite-flownet, you can clone the Lite-flownet repo and put compute_flow_vimeo.py we provide under its main directory and run (remember to change the data path):

python compute_flow_vimeo.py

Get Started

  1. Clone this repo.
    git clone https://github.com/Jia-Research-Lab/VFIformer.git
    cd VFIformer
    
  2. Modify the argument --data_root in train.py according to your Vimeo90K path.

Evaluation

  1. Download the pre-trained models and place them into the pretrained_models/ folder.

    • Pre-trained models can be downloaded from Google Drive
      • pretrained_VFIformer: the final model in the main paper
      • pretrained_VFIformerSmall: the smaller version of the model mentioned in the supplementary file
  2. Test on the Vimeo90K testing set.

    Modify the argument --data_root according to your data path, run:

    python test.py --data_root [your Vimeo90K path] --testset VimeoDataset --net_name VFIformer --resume ./pretrained_models/pretrained_VFIformer/net_220.pth --save_result
    

    If you want to test with the smaller model, please change the --net_name and --resume accordingly:

    python test.py --data_root [your Vimeo90K path] --testset VimeoDataset --net_name VFIformerSmall --resume ./pretrained_models/pretrained_VFIformerSmall/net_220.pth --save_result
    

    The testing results are saved in the test_results/ folder. If you do not want to save the image results, you can remove the --save_result argument in the commands optionally.

  3. Test on the MiddleBury dataset.

    Modify the argument --data_root according to your data path, run:

    python test.py --data_root [your MiddleBury path] --testset MiddleburyDataset --net_name VFIformer --resume ./pretrained_models/pretrained_VFIformer/net_220.pth --save_result
    
  4. Test on the UCF101 dataset.

    Modify the argument --data_root according to your data path, run:

    python test.py --data_root [your UCF101 path] --testset UFC101Dataset --net_name VFIformer --resume ./pretrained_models/pretrained_VFIformer/net_220.pth --save_result
    
  5. Test on the SNU-FILM dataset.

    Modify the argument --data_root according to your data path. Choose the motion level and modify the argument --test_level accordingly, run:

    python FILM_test.py --data_root [your SNU-FILM path] --test_level [easy/medium/hard/extreme] --net_name VFIformer --resume ./pretrained_models/pretrained_VFIformer/net_220.pth
    

Training

  1. First train the flow estimator. (Note that skipping this step will not cause a significant impact on performance. We keep this step here only to be consistent with our paper.)
    python -m torch.distributed.launch --nproc_per_node=4 --master_port=4174 train.py --launcher pytorch --gpu_ids 0,1,2,3 \
            --loss_flow --use_tb_logger --batch_size 48 --net_name IFNet --name train_IFNet --max_iter 300 --crop_size 192 --save_epoch_freq 5
    
  2. Then train the whole framework.
    python -m torch.distributed.launch --nproc_per_node=8 --master_port=4175 train.py --launcher pytorch --gpu_ids 0,1,2,3,4,5,6,7 \
            --loss_l1 --loss_ter --loss_flow --use_tb_logger --batch_size 24 --net_name VFIformer --name train_VFIformer --max_iter 300 \
            --crop_size 192 --save_epoch_freq 5 --resume_flownet ./weights/train_IFNet/snapshot/net_final.pth
    
  3. To train the smaller version, run:
    python -m torch.distributed.launch --nproc_per_node=8 --master_port=4175 train.py --launcher pytorch --gpu_ids 0,1,2,3,4,5,6,7 \
            --loss_l1 --loss_ter --loss_flow --use_tb_logger --batch_size 24 --net_name VFIformerSmall --name train_VFIformerSmall --max_iter 300 \
            --crop_size 192 --save_epoch_freq 5 --resume_flownet ./weights/train_IFNet/snapshot/net_final.pth
    

Test on your own data

  1. Modify the arguments --img0_path and --img1_path according to your data path, run:
    python demo.py --img0_path [your img0 path] --img1_path [your img1 path] --save_folder [your save path] --net_name VFIformer --resume ./pretrained_models/pretrained_VFIformer/net_220.pth
    

Acknowledgement

We borrow some codes from RIFE and SwinIR. We thank the authors for their great work.

Citation

Please consider citing our paper in your publications if it is useful for your research.

@inproceedings{lu2022vfiformer,
    title={Video Frame Interpolation with Transformer},
    author={Liying Lu, Ruizheng Wu, Huaijia Lin, Jiangbo Lu, and Jiaya Jia},
    booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2022},
}

Contact

[email protected]

Owner
DV Lab
Deep Vision Lab
DV Lab
PyTorch implementation of U-TAE and PaPs for satellite image time series panoptic segmentation.

Panoptic Segmentation of Satellite Image Time Series with Convolutional Temporal Attention Networks (ICCV 2021) This repository is the official implem

71 Jan 04, 2023
A curated list of awesome Active Learning

Awesome Active Learning 🤩 A curated list of awesome Active Learning ! 🤩 Background (image source: Settles, Burr) What is Active Learning? Active lea

BAI Fan 431 Jan 03, 2023
This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

[CVPRW 2021] - Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation

Anirudh S Chakravarthy 6 May 03, 2022
The FIRST GANs-based omics-to-omics translation framework

OmiTrans Please also have a look at our multi-omics multi-task DL freamwork 👀 : OmiEmbed The FIRST GANs-based omics-to-omics translation framework Xi

Xiaoyu Zhang 6 Dec 14, 2022
Repository for the paper : Meta-FDMixup: Cross-Domain Few-Shot Learning Guided byLabeled Target Data

1 Meta-FDMIxup Repository for the paper : Meta-FDMixup: Cross-Domain Few-Shot Learning Guided byLabeled Target Data. (ACM MM 2021) paper News! the rep

Fu Yuqian 44 Nov 18, 2022
[ICCV'21] UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction

UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction Project Page | Paper | Supplementary | Video This reposit

331 Dec 28, 2022
Camera-caps - Examine the camera capabilities for V4l2 cameras

camera-caps This is a graphical user interface over the v4l2-ctl command line to

Jetsonhacks 25 Dec 26, 2022
Official implementation of VQ-Diffusion

Official implementation of VQ-Diffusion: Vector Quantized Diffusion Model for Text-to-Image Synthesis

Microsoft 592 Jan 03, 2023
Cereal box identification in store shelves using computer vision and a single train image per model.

Product Recognition on Store Shelves Description You can read the task description here. Report You can read and download our report here. Step A - Mu

Nicholas Baraghini 1 Jan 21, 2022
Official code of ICCV2021 paper "Residual Attention: A Simple but Effective Method for Multi-Label Recognition"

CSRA This is the official code of ICCV 2021 paper: Residual Attention: A Simple But Effective Method for Multi-Label Recoginition Demo, Train and Vali

163 Dec 22, 2022
MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity

MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity Introduction The 3D LiDAR place recognition aim

16 Dec 08, 2022
Artificial Intelligence search algorithm base on Pacman

Pacman Search Artificial Intelligence search algorithm base on Pacman Source The Pacman Projects by the University of California, Berkeley. Layouts Di

Day Fundora 6 Nov 17, 2022
A python script to convert images to animated sus among us crewmate twerk jifs as seen on r/196

img_sussifier A python script to convert images to animated sus among us crewmate twerk jifs as seen on r/196 Examples How to use install python pip i

41 Sep 30, 2022
Open-L2O: A Comprehensive and Reproducible Benchmark for Learning to Optimize Algorithms

Open-L2O This repository establishes the first comprehensive benchmark efforts of existing learning to optimize (L2O) approaches on a number of proble

VITA 161 Jan 02, 2023
A Pythonic library for Nvidia Codec.

A Pythonic library for Nvidia Codec. The project is still in active development; expect breaking changes. Why another Python library for Nvidia Codec?

Zesen Qian 12 Dec 27, 2022
This repository is based on Ultralytics/yolov5, with adjustments to enable rotate prediction boxes.

Rotate-Yolov5 This repository is based on Ultralytics/yolov5, with adjustments to enable rotate prediction boxes. Section I. Description The codes are

xinzelee 90 Dec 13, 2022
A PyTorch Image-Classification With AlexNet And ResNet50.

PyTorch 图像分类 依赖库的下载与安装 在终端中执行 pip install -r -requirements.txt 完成项目依赖库的安装 使用方式 数据集的准备 STL10 数据集 下载:STL-10 Dataset 存储位置:将下载后的数据集中 train_X.bin,train_y.b

FYH 4 Feb 22, 2022
Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting (ICCV, 2021)

DKPNet ICCV 2021 Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting Baseline of DKPNet is availa

19 Oct 14, 2022
implicit displacement field

Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields [project page][paper][cite] Geometry-Consistent Neural Shape Represe

Yifan Wang 100 Dec 19, 2022
This is a repository for a Semantic Segmentation inference API using the Gluoncv CV toolkit

BMW Semantic Segmentation GPU/CPU Inference API This is a repository for a Semantic Segmentation inference API using the Gluoncv CV toolkit. The train

BMW TechOffice MUNICH 56 Nov 24, 2022