Official code for "Towards An End-to-End Framework for Flow-Guided Video Inpainting" (CVPR2022)

Overview

E2FGVI (CVPR 2022)

PWC PWC

Python 3.7 pytorch 1.6.0

English | 简体中文

This repository contains the official implementation of the following paper:

Towards An End-to-End Framework for Flow-Guided Video Inpainting
Zhen Li#, Cheng-Ze Lu#, Jianhua Qin, Chun-Le Guo*, Ming-Ming Cheng
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

[Paper] [Demo Video (Youtube)] [演示视频 (B站)] [Project Page (TBD)] [Poster (TBD)]

You can try our colab demo here: Open In Colab

News

  • 2022.05.15: We release E2FGVI-HQ, which can handle videos with arbitrary resolution. This model could generalize well to much higher resolutions, while it only used 432x240 videos for training. Besides, it performs better than our original model on both PSNR and SSIM metrics. 🔗 Download links: [Google Drive] [Baidu Disk] 🎥 Demo video: [Youtube] [B站]

  • 2022.04.06: Our code is publicly available.

Demo

teaser

More examples (click for details):

Coco (click me)
Tennis
Space
Motocross

Overview

overall_structure

🚀 Highlights:

  • SOTA performance: The proposed E2FGVI achieves significant improvements on all quantitative metrics in comparison with SOTA methods.
  • Highly effiency: Our method processes 432 × 240 videos at 0.12 seconds per frame on a Titan XP GPU, which is nearly 15× faster than previous flow-based methods. Besides, our method has the lowest FLOPs among all compared SOTA methods.

Work in Progress

  • Update website page
  • Hugging Face demo
  • Efficient inference

Dependencies and Installation

  1. Clone Repo

    git clone https://github.com/MCG-NKU/E2FGVI.git
  2. Create Conda Environment and Install Dependencies

    conda env create -f environment.yml
    conda activate e2fgvi
    • Python >= 3.7
    • PyTorch >= 1.5
    • CUDA >= 9.2
    • mmcv-full (following the pipeline to install)

    If the environment.yml file does not work for you, please follow this issue to solve the problem.

Get Started

Prepare pretrained models

Before performing the following steps, please download our pretrained model first.

Model 🔗 Download Links Support Arbitrary Resolution ? PSNR / SSIM / VFID (DAVIS)
E2FGVI [Google Drive] [Baidu Disk] 33.01 / 0.9721 / 0.116
E2FGVI-HQ [Google Drive] [Baidu Disk] 33.06 / 0.9722 / 0.117

Then, unzip the file and place the models to release_model directory.

The directory structure will be arranged as:

release_model
   |- E2FGVI-CVPR22.pth
   |- E2FGVI-HQ-CVPR22.pth
   |- i3d_rgb_imagenet.pt (for evaluating VFID metric)
   |- README.md

Quick test

We provide two examples in the examples directory.

Run the following command to enjoy them:

# The first example (using split video frames)
python test.py --model e2fgvi (or e2fgvi_hq) --video examples/tennis --mask examples/tennis_mask  --ckpt release_model/E2FGVI-CVPR22.pth (or release_model/E2FGVI-HQ-CVPR22.pth)
# The second example (using mp4 format video)
python test.py --model e2fgvi (or e2fgvi_hq) --video examples/schoolgirls.mp4 --mask examples/schoolgirls_mask  --ckpt release_model/E2FGVI-CVPR22.pth (or release_model/E2FGVI-HQ-CVPR22.pth)

The inpainting video will be saved in the results directory. Please prepare your own mp4 video (or split frames) and frame-wise masks if you want to test more cases.

Note: E2FGVI always rescales the input video to a fixed resolution (432x240), while E2FGVI-HQ does not change the resolution of the input video. If you want to custom the output resolution, please use the --set_size flag and set the values of --width and --height.

Example:

# Using this command to output a 720p video
python test.py --model e2fgvi_hq --video <video_path> --mask <mask_path>  --ckpt release_model/E2FGVI-HQ-CVPR22.pth --set_size --width 1280 --height 720

Prepare dataset for training and evaluation

Dataset YouTube-VOS DAVIS
Details For training (3,471) and evaluation (508) For evaluation (50 in 90)
Images [Official Link] (Download train and test all frames) [Official Link] (2017, 480p, TrainVal)
Masks [Google Drive] [Baidu Disk] (For reproducing paper results)

The training and test split files are provided in datasets/<dataset_name>.

For each dataset, you should place JPEGImages to datasets/<dataset_name>.

Then, run sh datasets/zip_dir.sh (Note: please edit the folder path accordingly) for compressing each video in datasets/<dataset_name>/JPEGImages.

Unzip downloaded mask files to datasets.

The datasets directory structure will be arranged as: (Note: please check it carefully)

datasets
   |- davis
      |- JPEGImages
         |- <video_name>.zip
         |- <video_name>.zip
      |- test_masks
         |- <video_name>
            |- 00000.png
            |- 00001.png   
      |- train.json
      |- test.json
   |- youtube-vos
      |- JPEGImages
         |- <video_id>.zip
         |- <video_id>.zip
      |- test_masks
         |- <video_id>
            |- 00000.png
            |- 00001.png
      |- train.json
      |- test.json   
   |- zip_file.sh

Evaluation

Run one of the following commands for evaluation:

 # For evaluating E2FGVI model
 python evaluate.py --model e2fgvi --dataset <dataset_name> --data_root datasets/ --ckpt release_model/E2FGVI-CVPR22.pth
 # For evaluating E2FGVI-HQ model
 python evaluate.py --model e2fgvi_hq --dataset <dataset_name> --data_root datasets/ --ckpt release_model/E2FGVI-HQ-CVPR22.pth

You will get scores as paper reported if you evaluate E2FGVI. The scores of E2FGVI-HQ can be found in [Prepare pretrained models].

The scores will also be saved in the results/<model_name>_<dataset_name> directory.

Please --save_results for further evaluating temporal warping error.

Training

Our training configures are provided in train_e2fgvi.json (for E2FGVI) and train_e2fgvi_hq.json (for E2FGVI-HQ).

Run one of the following commands for training:

 # For training E2FGVI
 python train.py -c configs/train_e2fgvi.json
 # For training E2FGVI-HQ
 python train.py -c configs/train_e2fgvi_hq.json

You could run the same command if you want to resume your training.

The training loss can be monitored by running:

tensorboard --logdir release_model                                                   

You could follow this pipeline to evaluate your model.

Results

Quantitative results

quantitative_results

Citation

If you find our repo useful for your research, please consider citing our paper:

@inproceedings{liCvpr22vInpainting,
   title={Towards An End-to-End Framework for Flow-Guided Video Inpainting},
   author={Li, Zhen and Lu, Cheng-Ze and Qin, Jianhua and Guo, Chun-Le and Cheng, Ming-Ming},
   booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
   year={2022}
}

Contact

If you have any question, please feel free to contact us via zhenli1031ATgmail.com or czlu919AToutlook.com.

License

Licensed under a Creative Commons Attribution-NonCommercial 4.0 International for Non-commercial use only. Any commercial use should get formal permission first.

Acknowledgement

This repository is maintained by Zhen Li and Cheng-Ze Lu.

This code is based on STTN, FuseFormer, Focal-Transformer, and MMEditing.

Owner
Media Computing Group @ Nankai University
Media Computing Group at Nankai University, led by Prof. Ming-Ming Cheng.
Media Computing Group @ Nankai University
Code for "Finding Regions of Heterogeneity in Decision-Making via Expected Conditional Covariance" at NeurIPS 2021

Finding Regions of Heterogeneity in Decision-Making via Expected Conditional Covariance Justin Lim, Christina X Ji, Michael Oberst, Saul Blecker, Leor

Sontag Lab 3 Feb 03, 2022
clDice - a Novel Topology-Preserving Loss Function for Tubular Structure Segmentation

README clDice - a Novel Topology-Preserving Loss Function for Tubular Structure Segmentation CVPR 2021 Authors: Suprosanna Shit and Johannes C. Paetzo

110 Dec 29, 2022
Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-metric spaces.

Non-Metric Space Library (NMSLIB) Important Notes NMSLIB is generic but fast, see the results of ANN benchmarks. A standalone implementation of our fa

2.9k Jan 04, 2023
A collection of metrics for evaluating timbre dissimilarity using the TorchMetrics API

Timbre Dissimilarity Metrics A collection of metrics for evaluating timbre dissimilarity using the TorchMetrics API Installation pip install -e . Usag

Ben Hayes 21 Jan 05, 2022
Pointer-generator - Code for the ACL 2017 paper Get To The Point: Summarization with Pointer-Generator Networks

Note: this code is no longer actively maintained. However, feel free to use the Issues section to discuss the code with other users. Some users have u

Abi See 2.1k Jan 04, 2023
Self-supervised learning on Graph Representation Learning (node-level task)

graph_SSL Self-supervised learning on Graph Representation Learning (node-level task) How to run the code To run GRACE, sh run_GRACE.sh To run GCA, sh

Namkyeong Lee 3 Dec 31, 2021
PyTorch implementation of Weak-shot Fine-grained Classification via Similarity Transfer

SimTrans-Weak-Shot-Classification This repository contains the official PyTorch implementation of the following paper: Weak-shot Fine-grained Classifi

BCMI 60 Dec 02, 2022
A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis

A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis Project Page | Paper A Shading-Guided Generative Implicit Model

Xingang Pan 115 Dec 18, 2022
The best solution of the Weather Prediction track in the Yandex Shifts challenge

yandex-shifts-weather The repository contains information about my solution for the Weather Prediction track in the Yandex Shifts challenge https://re

Ivan Yu. Bondarenko 15 Dec 18, 2022
An self sufficient AI that crawls the web to learn how to generate art from keywords

Roxx-IO - The Smart Artist AI! TO DO / IDEAS Implement Web-Scraping Functionality Figure out a less annoying (and an off button for it) text to speech

Tatz 5 Mar 21, 2022
A Simple Key-Value Data-store written in Python

mercury-db This is a File Based Key-Value Datastore that supports basic CRUD (Create, Read, Update, Delete) operations developed using Python. The dat

Vaidhyanathan S M 1 Jan 09, 2022
Lipschitz-constrained Unsupervised Skill Discovery

Lipschitz-constrained Unsupervised Skill Discovery This repository is the official implementation of Seohong Park, Jongwook Choi*, Jaekyeom Kim*, Hong

Seohong Park 17 Dec 18, 2022
One Million Scenes for Autonomous Driving

ONCE Benchmark This is a reproduced benchmark for 3D object detection on the ONCE (One Million Scenes) dataset. The code is mainly based on OpenPCDet.

148 Dec 28, 2022
JDet is Object Detection Framework based on Jittor.

JDet is Object Detection Framework based on Jittor.

135 Dec 14, 2022
GANTheftAuto is a fork of the Nvidia's GameGAN

Description GANTheftAuto is a fork of the Nvidia's GameGAN, which is research focused on emulating dynamic game environments. The early research done

Harrison 801 Dec 27, 2022
K-FACE Analysis Project on Pytorch

Installation Setup with Conda # create a new environment conda create --name insightKface python=3.7 # or over conda activate insightKface #install t

Jung Jun Uk 7 Nov 10, 2022
Implementation for "Conditional entropy minimization principle for learning domain invariant representation features"

Implementation for "Conditional entropy minimization principle for learning domain invariant representation features". The code is reproduced from thi

1 Nov 02, 2022
Mixed Neural Likelihood Estimation for models of decision-making

Mixed neural likelihood estimation for models of decision-making Mixed neural likelihood estimation (MNLE) enables Bayesian parameter inference for mo

mackelab 9 Dec 22, 2022
TGS Salt Identification Challenge

TGS Salt Identification Challenge This is an open solution to the TGS Salt Identification Challenge. Note Unfortunately, we can no longer provide supp

neptune.ai 123 Nov 04, 2022
Speed-Test - You can check your intenet speed using this tool

Speed-Test Tool By Hez_X AVAILABLE ON : Termux & Kali linux & Ubuntu (Linux E

Hez-X 3 Feb 17, 2022