Code for "Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose"

Overview

Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose

We provide PyTorch implementations for our arxiv paper "Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose"(http://arxiv.org/abs/2002.10137).

Note that this code is protected under patent. It is for research purposes only at your university (research institution) only. If you are interested in business purposes/for-profit use, please contact Prof.Liu (the corresponding author, email: [email protected]).

We provide a demo video here (please search for "Talking Face" in this page and click the "demo video" button).

Colab

Our Proposed Framework

Prerequisites

  • Linux or macOS
  • NVIDIA GPU
  • Python 3
  • MATLAB

Getting Started

Installation

  • You can create a virtual env, and install all the dependencies by
pip install -r requirements.txt

Download pre-trained models

  • Including pre-trained general models and models needed for face reconstruction, identity feature extraction etc
  • Download from BaiduYun(extract code:usdm) or GoogleDrive and copy to corresponding subfolders (Audio, Deep3DFaceReconstruction, render-to-video).

Download face model for 3d face reconstruction

Fine-tune on a target peron's short video

    1. Prepare a talking face video that satisfies: 1) contains a single person, 2) 25 fps, 3) longer than 12 seconds, 4) without large body translation (e.g. move from the left to the right of the screen). An example is here. Rename the video to [person_id].mp4 (e.g. 1.mp4) and copy to Data subfolder.

Note: You can make a video to 25 fps by

ffmpeg -i xxx.mp4 -r 25 xxx1.mp4
    1. Extract frames and lanmarks by
cd Data/
python extract_frame1.py [person_id].mp4
    1. Conduct 3D face reconstruction. First should compile code in Deep3DFaceReconstruction/tf_mesh_renderer/mesh_renderer/kernels to .so, following its readme, and modify line 28 in rasterize_triangles.py to your directory. Then run
cd Deep3DFaceReconstruction/
CUDA_VISIBLE_DEVICES=0 python demo_19news.py ../Data/[person_id]

This process takes about 2 minutes on a Titan Xp.

cd Audio/code/
python train_19news_1.py [person_id] [gpu_id]

The saved models are in Audio/model/atcnet_pose0_con3/[person_id]. This process takes about 5 minutes on a Titan Xp.

    1. Fine-tune the gan network. Run
cd render-to-video/
python train_19news_1.py [person_id] [gpu_id]

The saved models are in render-to-video/checkpoints/memory_seq_p2p/[person_id]. This process takes about 40 minutes on a Titan Xp.

Test on a target peron

Place the audio file (.wav or .mp3) for test under Audio/audio/. Run [with generated poses]

cd Audio/code/
python test_personalized.py [audio] [person_id] [gpu_id]

or [with poses from short video]

cd Audio/code/
python test_personalized2.py [audio] [person_id] [gpu_id]

This program will print 'saved to xxx.mov' if the videos are successfully generated. It will output 2 movs, one is a video with face only (_full9.mov), the other is a video with background (_transbigbg.mov).

Colab

A colab demo is here.

Acknowledgments

The face reconstruction code is from Deep3DFaceReconstruction, the arcface code is from insightface, the gan code is developed based on pytorch-CycleGAN-and-pix2pix.

Owner
Ran Yi
Assistant Professor at CSE Dept, SJTU
Ran Yi
Open Sound Strip, Sequence or Record in Audacity

Audacity Tools For Blender Sound editing in Blender Video Sequence Editor with Audacity integrated. Send/receive the full edited sequence or single st

64 Dec 31, 2022
This is a realtime voice translator program which gets input from user at any language and converts it to the desired language that the user asks

This is a realtime voice translator program which gets input from user at any language and converts it to the desired language that the user asks ...

Mohan Ram S 1 Dec 30, 2021
Audio fingerprinting and recognition in Python

dejavu Audio fingerprinting and recognition algorithm implemented in Python, see the explanation here: How it works Dejavu can memorize audio by liste

Will Drevo 6k Jan 06, 2023
📺Headless全自动B站直播录播、切片、上传一体工具

DDRecorder Headless全自动B站直播录播、切片、上传一体工具 感谢 FortuneDayssss/BilibiliUploader 安装指南(Windows) 在Release下载zip包解压。 修改配置文件config.json 双击运行DDRecorder.exe (这将使用co

322 Dec 27, 2022
Voicefixer aims at the restoration of human speech regardless how serious its degraded.

Voicefixer aims at the restoration of human speech regardless how serious its degraded.

Leo 324 Dec 26, 2022
Marsyas - Music Analysis, Retrieval and Synthesis for Audio Signals

Welcome to MARSYAS. MARSYAS is a software framework for rapid prototyping of audio applications, with flexibility and extensibility as primary concer

Marsyas Developers Group 364 Oct 31, 2022
A Python 3 script for capturing and recording a SDR stream to a WAV file (or serving it to a HTTP audio stream).

rfsoapyfile A Python 3 script for capturing and recording a SDR stream to a WAV file (or serving it to a HTTP audio stream). The script is threaded fo

4 Dec 19, 2022
A voice based calculator by using termux api in Android

termux_voice_calculator This is. A voice based calculator by using termux api in Android Instagram account 👉 👈 Requirements and installation Downloa

ʕ´•ᴥ•`ʔ╠ŞĦỮβĦa̷m̷╣ʕ´•ᴥ•`ʔ 2 Apr 29, 2022
Using python to generate a bat script of repetitive lines of code that differ in some way but can sort out a group of audio files according to their common names

Batch Sorting Using python to generate a bat script of repetitive lines of code that differ in some way but can sort out a group of audio files accord

David Mainoo 1 Oct 29, 2021
A simple python script to play bell sound in your system infinitely, just for fun and experimental purposes

A simple python script to play bell sound in your system infinitely, just for fun and experimental purposes

نافع الهلالي 1 Oct 29, 2021
Accompanying code for our paper "Point Cloud Audio Processing"

Point Cloud Audio Processing Krishna Subramani1, Paris Smaragdis1 1UIUC Paper For the necessary libraries/prerequisites, please use conda/anaconda to

Krishna Subramani 17 Nov 17, 2022
DaisyXmusic ❤ A bot that can play music on Telegram Group and Channel Voice Chats

DaisyXmusic ❤ is the best and only Telegram VC player with playlists, Multi Playback, Channel play and more

TeamOfDaisyX 34 Oct 22, 2022
:sound: Play and Record Sound with Python :snake:

Play and Record Sound with Python This Python module provides bindings for the PortAudio library and a few convenience functions to play and record Nu

spatialaudio.net 750 Dec 31, 2022
Python tools for the corpus analysis of popular music.

CATCHY Corpus Analysis Tools for Computational Hook discovery Python tools for the corpus analysis of popular music recordings. The tools can be used

Jan VB 20 Aug 20, 2022
Code for paper 'Audio-Driven Emotional Video Portraits'.

Audio-Driven Emotional Video Portraits [CVPR2021] Xinya Ji, Zhou Hang, Kaisiyuan Wang, Wayne Wu, Chen Change Loy, Xun Cao, Feng Xu [Project] [Paper] G

197 Dec 31, 2022
Omniscient Mozart, being able to transcribe everything in the music, including vocal, drum, chord, beat, instruments, and more.

OMNIZART Omnizart is a Python library that aims for democratizing automatic music transcription. Given polyphonic music, it is able to transcribe pitc

MCTLab 1.3k Jan 08, 2023
TwitterMusicBot - A Twitter bot with Spotify integration.

A Twitter Music Bot 🤖 🎵 🎶 I created this project to learn more about APIs, so it only works for student purposes. Initially, delving into the Spoti

Gustavo Oliveira 2 Jan 02, 2022
Spotifyd - An open source Spotify client running as a UNIX daemon.

Spotifyd An open source Spotify client running as a UNIX daemon. Spotifyd streams music just like the official client, but is more lightweight and sup

8.5k Jan 09, 2023
An audio digital processing toolbox based on a workflow/pipeline principle

AudioTK Audio ToolKit is a set of audio filters. It helps assembling workflows for specific audio processing workloads. The audio workflow is split in

Matthieu Brucher 238 Oct 18, 2022
Gradient - A Python program designed to create a reactive and ambient music listening experience

Gradient is a Python program designed to create a reactive and ambient music listening experience.

Alexander Vega 2 Jan 24, 2022