SinGlow: Generative Flow for SVS tasks in Tensorflow 2

Related tags

AudioSinGlow
Overview

SinGlow: Generative Flow for SVS tasks in Tensorflow 2

python 3 tensorflow 2

See more in the paper: SinGlow: Singing Voice Synthesis with Glow ---- Help Virtual Singers More Human-like

SinGlow is a part of my Singing voice synthesis system. It can extract features of sound, particularly songs and musics. Then we can use these features (or perfect encoding) for feature migrating tasks. For example migrate features of real singers' song to those virtual singers' songs.

This project is developed above the project GLOW-tf2 under MIT licence, and the following words are from its developers.

My implementation of GLOW from the paper https://arxiv.org/pdf/1807.03039 in Tensorflow 2. GLOW is an interesting generative model as it uses invertible neural network to transform images to normal distribution and vice versa. Additionally, it is strongly based on RealNVP, so knowing it would be helpful to understand GLOW's contribution.

Table of Contents


Abstract

Singing voice synthesis (SVS) is a task using the computer to generate songs with lyrics. So far, researchers are focusing on tunning the pre-recorded sound pieces according to rigid rules. For example, in Vocaloid, one of the commercial SVS systems, there are 8 principal parameters modifiable by song creators. The system uses these parameters to synthesise sound pieces pre-recorded from professional voice actors. We notice a common difference between computer-generated songs and real singers' songs. This difference can be addressed to help the generated ones become more like the real-singer ones.

In this paper, we propose SinGlow, as a solution to minimise this difference. SinGlow is one of the Normalising Flow that directly uses the calculated Negative Log-Likelihood value to optimise the trainable parameters. This feature gives SinGlow the ability to perfectly encode inputs into feature vectors, which allows us to manipulate the feature space to minimise the difference we discussed before. To our best knowledge, we are the first to propose an application of Normalising Flow in SVS fields.

In our experiments, SinGlow shows the ability to encode sound and make the input virtual-singer songs more human-like.

Structure

SinGlow
│   train.py //need modification, replace data dirs with yours
│   common_definitions.py //model configurations are located here
│   data_loarder.py //construct tfrecord dataset from wav or mp3 data, and load it
│   model.py //Glow Model / SinGlow Model
│   pipeline.py //training pipeline
│   README.md
├───utils
│       utils.py //originate from Glow-OpenAI
│       weightnorm.py //originate from Tensorflow
├───checkpoints
│       weights.h5 //the model weights file
├───runs //outputs and rerecords
├───logs //tensorboard logdir
├───design //model architecture information
└───notebooks
        run.ipynb //dataset construction and applying model
        experiment.ipynb //evaluate model
        README.md //some user guide

Requirements

pip3 install -r requirements.txt

Training

After every epoch, the network's weights will be stored in the checkpoints directory defined in common_definitions.py.

There are also some sampling of the network (image generation mode) that are going to be stored in results directory. Additionally, TensorBoard is used to track z's mean and variance, as well as the negative log-likelihood.

In optimal state, z should have zero mean and one variance. Additionally, the TensorBoard stores sampling with temperature of 0.7.

python3 train.py [-h] [--dataset [DATASET]] [--k_glow [K_GLOW]] [--l_glow [L_GLOW]]
       [--img_size [IMG_SIZE]] [--channel_size [CHANNEL_SIZE]]

optional arguments:
  -h, --help            show this help message and exit
  --dataset [DATASET]   The dataset to train on ("mnist", "cifar10", "cifar100")
  --k_glow [K_GLOW]     The amount of blocks per layer
  --l_glow [L_GLOW]     The amount of layers
  --img_size [IMG_SIZE] The width and height of the input images
  --channel_size [CHANNEL_SIZE]
                        The channel size of the input images

CONTRIBUTING

To contribute to the project, these steps can be followed. Anyone that contributes will surely be recognized and mentioned here!

Contributions to the project are made using the "Fork & Pull" model. The typical steps would be:

  1. create an account on github
  2. fork this repository
  3. make a local clone
  4. make changes on the local copy
  5. commit changes git commit -m "my message"
  6. push to your GitHub account: git push origin
  7. create a Pull Request (PR) from your GitHub fork (go to your fork's webpage and click on "Pull Request." You can then add a message to describe your proposal.)

LICENSE

This open-source project is licensed under MIT License.

Reference

TODO the reference information

中文注释

这是一个基于流模型的歌曲特征提取,并进行风格迁移的项目。我们一定程度上实现了将真实人声歌曲的特征迁移到虚拟歌手的歌曲上。

我们接下来的计划是继续优化模型,并在歌曲切割上取得进展,向着研究落地努力。


我们有一个堆满创意点子的秘密基地,里面有很多有意思的小伙伴。生活什么的、技术什么的、二次元什么的都可以聊得开。

欢迎加入我们的小群:兔叽的魔术工房。群内会经常发布各种各样的企划,总会遇上你感兴趣的。

Owner
Haobo Yang
A 3rd-year undergraduate student, hope to be an AI Architect in the future.
Haobo Yang
A python script that can play .mp3 URLs upon the ringing or motion detection of a Ring doorbell. The sound plays through Sonos speakers.

Ring x Sonos A python script that plays .mp3 files whenever a doorbell is rung or a doorbell detects motion. Features Music! Authors @braden Running T

braden 0 Nov 12, 2021
Mousai is a simple application that can identify song like Shazam

Mousai is a simple application that can identify song like Shazam. It saves the artist, album, and title of the identified song in a JSON file.

Dave Patrick 662 Jan 07, 2023
This Is Telegram Music UserBot To Play Music Without Being Admin

This Is Telegram Music UserBot To Play Music Without Being Admin

Krishna Kumar 36 Sep 13, 2022
voice assistant made with python that search for covid19 data(like total cases, deaths and etc) in a specific country

covid19-voice-assistant voice assistant made with python that search for covid19 data(like total cases, deaths and etc) in a specific country installi

Miguel 2 Dec 05, 2021
Sound-Equalizer- This is a Sound Equalizer GUI App Using Python's PyQt5

Sound-Equalizer- This is a Sound Equalizer GUI App Using Python's PyQt5. It gives you the ability to play, pause, and Equalize any one-channel wav audio file and play 3 different instruments.

Mustafa Megahed 1 Jan 10, 2022
Basically Play Pauses the song when it is safe to do so. when you die in a round

Basically Play Pauses the song when it is safe to do so. when you die in a round

AG_1436 1 Feb 13, 2022
NovaMusic is a music sharing robot. Users can get music and music lyrics using inline queries.

A music sharing telegram robot using Redis database and Telebot python library using Redis database.

Hesam Norin 7 Oct 21, 2022
A python program to cut longer MP3 files (i.e. recordings of several songs) into the individual tracks.

I'm writing a python script to cut longer MP3 files (i.e. recordings of several songs) into the individual tracks called ReCut. So far there are two

Dönerspiess 1 Oct 27, 2021
C++ library for audio and music analysis, description and synthesis, including Python bindings

Essentia Essentia is an open-source C++ library for audio analysis and audio-based music information retrieval released under the Affero GPL license.

Music Technology Group - Universitat Pompeu Fabra 2.3k Jan 03, 2023
Python wrapper around sox.

pysox Python wrapper around sox. Read the Docs here. This library was presented in the following paper: R. M. Bittner, E. J. Humphrey and J. P. Bello,

Rachel Bittner 446 Dec 07, 2022
無料で使える中品質なテキスト読み上げソフトウェア、VOICEVOXのコア

無料で使える中品質なテキスト読み上げソフトウェア、VOICEVOXのコア

Hiroshiba 0 Aug 29, 2022
Pythonic bindings for FFmpeg's libraries.

PyAV PyAV is a Pythonic binding for the FFmpeg libraries. We aim to provide all of the power and control of the underlying library, but manage the gri

PyAV 1.8k Jan 03, 2023
Inner ear models for Python

cochlea cochlea is a collection of inner ear models. All models are easily accessible as Python functions. They take sound signal as input and return

98 Jan 05, 2023
Sync Toolbox - Python package with reference implementations for efficient, robust, and accurate music synchronization based on dynamic time warping (DTW)

Sync Toolbox - Python package with reference implementations for efficient, robust, and accurate music synchronization based on dynamic time warping (DTW)

Meinard Mueller 66 Jan 02, 2023
Users can transcribe their favorite piano recordings to MIDI files after installation

Users can transcribe their favorite piano recordings to MIDI files after installation

190 Dec 17, 2022
Code for paper 'Audio-Driven Emotional Video Portraits'.

Audio-Driven Emotional Video Portraits [CVPR2021] Xinya Ji, Zhou Hang, Kaisiyuan Wang, Wayne Wu, Chen Change Loy, Xun Cao, Feng Xu [Project] [Paper] G

197 Dec 31, 2022
Mina - A Telegram Music Bot 5 mandatory Assistant written in Python using Pyrogram and Py-Tgcalls

Mina - A Telegram Music Bot 5 mandatory Assistant written in Python using Pyrogram and Py-Tgcalls

3 Feb 07, 2022
live coding in python + supercollider

live coding in python + supercollider

Zack 6 Feb 06, 2022
cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python

audioread Decode audio files using whichever backend is available. The library currently supports: Gstreamer via PyGObject. Core Audio on Mac OS X via

beetbox 419 Dec 26, 2022
Datamoshing with FFmpeg

ffmosher Datamoshing with FFmpeg Drag and drop video onto mosh.bat to create a datamoshed video. To datamosh an image, please ensure the file is in a

18 Sep 11, 2022