Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

Overview

πŸ“– Depth-Aware Generative Adversarial Network for Talking Head Video Generation (CVPR 2022)

πŸ”₯ If DaGAN is helpful in your photos/projects, please help to ⭐ it or recommend it to your friends. Thanks πŸ”₯

[Paper]   [Project Page]   [Demo]   [Poster Video]

Fa-Ting Hong, Longhao Zhang, Li Shen, Dan Xu
The Hong Kong University of Science and Technology

Cartoon Sample

cartoon.mp4

Human Sample

celeb.mp4

Voxceleb1 Dataset

🚩 Updates

  • πŸ”₯ πŸ”₯ βœ… May 19, 2022: The depth face model trained on Voxceleb2 is released! (The corresponding checkpoint of DaGAN will release soon). Click the LINK

  • πŸ”₯ πŸ”₯ βœ… April 25, 2022: Integrated into Huggingface Spaces πŸ€— using Gradio. Try out the web demo: Hugging Face Spaces (GPU version will come soon!)

  • πŸ”₯ πŸ”₯ βœ… Add SPADE model, which produces more natural results.

πŸ”§ Dependencies and Installation

Installation

We now provide a clean version of DaGAN, which does not require customized CUDA extensions.

  1. Clone repo

    git clone https://github.com/harlanhong/CVPR2022-DaGAN.git
    cd CVPR2022-DaGAN
  2. Install dependent packages

    pip install -r requirements.txt
    
    ## Install the Face Alignment lib
    cd face-alignment
    pip install -r requirements.txt
    python setup.py install

⚑ Quick Inference

We take the paper version for an example. More models can be found here.

YAML configs

See config/vox-adv-256.yaml to get description of each parameter.

Pre-trained checkpoint

The pre-trained checkpoint of face depth network and our DaGAN checkpoints can be found under following link: OneDrive.

Inference! To run a demo, download checkpoint and run the following command:

CUDA_VISIBLE_DEVICES=0 python demo.py  --config config/vox-adv-256.yaml --driving_video path/to/driving --source_image path/to/source --checkpoint path/to/checkpoint --relative --adapt_scale --kp_num 15 --generator DepthAwareGenerator 

The result will be stored in result.mp4. The driving videos and source images should be cropped before it can be used in our method. To obtain some semi-automatic crop suggestions you can use python crop-video.py --inp some_youtube_video.mp4. It will generate commands for crops using ffmpeg.

πŸ’» Training

Datasets

  1. VoxCeleb. Please follow the instruction from https://github.com/AliaksandrSiarohin/video-preprocessing.

Train on VoxCeleb

To train a model on specific dataset run:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --master_addr="0.0.0.0" --master_port=12348 run.py --config config/vox-adv-256.yaml --name DaGAN --rgbd --batchsize 12 --kp_num 15 --generator DepthAwareGenerator

The code will create a folder in the log directory (each run will create a new name-specific directory). Checkpoints will be saved to this folder. To check the loss values during training see log.txt. By default the batch size is tunned to run on 8 GeForce RTX 3090 gpu (You can obtain the best performance after about 150 epochs). You can change the batch size in the train_params in .yaml file.

🚩 Please use multiple GPUs to train your own model, if you use only one GPU, you would meet the inplace problem.

Also, you can watch the training loss by running the following command:

tensorboard --logdir log/DaGAN/log

When you kill your process for some reasons in the middle of training, a zombie process may occur, you can kill it using our provided tool:

python kill_port.py PORT

Training on your own dataset

  1. Resize all the videos to the same size e.g 256x256, the videos can be in '.gif', '.mp4' or folder with images. We recommend the later, for each video make a separate folder with all the frames in '.png' format. This format is loss-less, and it has better i/o performance.

  2. Create a folder data/dataset_name with 2 subfolders train and test, put training videos in the train and testing in the test.

  3. Create a config config/dataset_name.yaml, in dataset_params specify the root dir the root_dir: data/dataset_name. Also adjust the number of epoch in train_params.

πŸ“œ Acknowledgement

Our DaGAN implementation is inspired by FOMM. We appreciate the authors of FOMM for making their codes available to public.

πŸ“œ BibTeX

@inproceedings{hong2022depth,
            title={Depth-Aware Generative Adversarial Network for Talking Head Video Generation},
            author={Hong, Fa-Ting and Zhang, Longhao and Shen, Li and Xu, Dan},
            journal={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
            year={2022}
          }

πŸ“§ Contact

If you have any question, please email [email protected].

Stochastic gradient descent with model building

Stochastic Model Building (SMB) This repository includes a new fast and robust stochastic optimization algorithm for training deep learning models. Th

S. Ilker Birbil 22 Jan 19, 2022
Dirty Pixels: Towards End-to-End Image Processing and Perception

Dirty Pixels: Towards End-to-End Image Processing and Perception This repository contains the code for the paper Dirty Pixels: Towards End-to-End Imag

50 Nov 18, 2022
[ICLR 2022 Oral] F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization

F8Net Fixed-Point 8-bit Only Multiplication for Network Quantization (ICLR 2022 Oral) OpenReview | arXiv | PDF | Model Zoo | BibTex PyTorch implementa

Snap Research 76 Dec 13, 2022
Here I will explain the flow to deploy your custom deep learning models on Ultra96V2.

Xilinx_Vitis_AI This repo will help you to Deploy your Deep Learning Model on Ultra96v2 Board. Prerequisites Vitis Core Development Kit 2019.2 This co

Amin Mamandipoor 1 Feb 08, 2022
Codebase for Image Classification Research, written in PyTorch.

pycls pycls is an image classification codebase, written in PyTorch. It was originally developed for the On Network Design Spaces for Visual Recogniti

Facebook Research 2k Jan 01, 2023
School of Artificial Intelligence at the Nanjing University (NJU)School of Artificial Intelligence at the Nanjing University (NJU)

F-Principle This is an exercise problem of the digital signal processing (DSP) course at School of Artificial Intelligence at the Nanjing University (

Thyrix 5 Nov 23, 2022
PyGAD, a Python 3 library for building the genetic algorithm and training machine learning algorithms (Keras & PyTorch).

PyGAD: Genetic Algorithm in Python PyGAD is an open-source easy-to-use Python 3 library for building the genetic algorithm and optimizing machine lear

Ahmed Gad 1.1k Dec 26, 2022
Monk is a low code Deep Learning tool and a unified wrapper for Computer Vision.

Monk - A computer vision toolkit for everyone Why use Monk Issue: Want to begin learning computer vision Solution: Start with Monk's hands-on study ro

Tessellate Imaging 507 Dec 04, 2022
A object detecting neural network powered by the yolo architecture and leveraging the PyTorch framework and associated libraries.

Yolo-Powered-Detector A object detecting neural network powered by the yolo architecture and leveraging the PyTorch framework and associated libraries

Luke Wilson 1 Dec 03, 2021
This repository contains the code for the paper Neural RGB-D Surface Reconstruction

Neural RGB-D Surface Reconstruction Paper | Project Page | Video Neural RGB-D Surface Reconstruction Dejan Azinović, Ricardo Martin-Brualla, Dan B Gol

Dejan 406 Jan 04, 2023
[AAAI 2022] Separate Contrastive Learning for Organs-at-Risk and Gross-Tumor-Volume Segmentation with Limited Annotation

A paper Introduction This is an official release of the paper Separate Contrastive Learning for Organs-at-Risk and Gross-Tumor-Volume Segmentation wit

Jiacheng Wang 14 Dec 08, 2022
Train the HRNet model on ImageNet

High-resolution networks (HRNets) for Image classification News [2021/01/20] Add some stronger ImageNet pretrained models, e.g., the HRNet_W48_C_ssld_

HRNet 866 Jan 04, 2023
PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick."

PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick." [Project page] [Paper

Gyungin Shin 59 Sep 25, 2022
Code for ICML 2021 paper: How could Neural Networks understand Programs?

OSCAR This repository contains the source code of our ICML 2021 paper How could Neural Networks understand Programs?. Environment Run following comman

Dinglan Peng 115 Dec 17, 2022
Advances in Neural Information Processing Systems (NeurIPS), 2020.

What is being transferred in transfer learning? This repo contains the code for the following paper: Behnam Neyshabur*, Hanie Sedghi*, Chiyuan Zhang*.

Google Research 36 Aug 26, 2022
Official implementation of EfficientPose

EfficientPose This is the official implementation of EfficientPose. We based our work on the Keras EfficientDet implementation xuannianz/EfficientDet

2 May 17, 2022
Aiming at the common training datsets split, spectrum preprocessing, wavelength select and calibration models algorithm involved in the spectral analysis process

Aiming at the common training datsets split, spectrum preprocessing, wavelength select and calibration models algorithm involved in the spectral analysis process, a complete algorithm library is esta

Fu Pengyou 50 Jan 07, 2023
AoT is a system for automatically generating off-target test harness by using build information.

AoT: Auto off-Target Automatically generating off-target test harness by using build information. Brought to you by the Mobile Security Team at Samsun

Samsung 10 Oct 19, 2022
Adversarial Graph Representation Adaptation for Cross-Domain Facial Expression Recognition (AGRA, ACM 2020, Oral)

Cross Domain Facial Expression Recognition Benchmark Implementation of papers: Cross-Domain Facial Expression Recognition: A Unified Evaluation Benchm

89 Dec 09, 2022
DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation This project hosts the code for implementing the DCT-MASK algorithms

Alibaba Cloud 57 Nov 27, 2022