SinGlow: Generative Flow for SVS tasks in Tensorflow 2

Related tags

AudioSinGlow
Overview

SinGlow: Generative Flow for SVS tasks in Tensorflow 2

python 3 tensorflow 2

See more in the paper: SinGlow: Singing Voice Synthesis with Glow ---- Help Virtual Singers More Human-like

SinGlow is a part of my Singing voice synthesis system. It can extract features of sound, particularly songs and musics. Then we can use these features (or perfect encoding) for feature migrating tasks. For example migrate features of real singers' song to those virtual singers' songs.

This project is developed above the project GLOW-tf2 under MIT licence, and the following words are from its developers.

My implementation of GLOW from the paper https://arxiv.org/pdf/1807.03039 in Tensorflow 2. GLOW is an interesting generative model as it uses invertible neural network to transform images to normal distribution and vice versa. Additionally, it is strongly based on RealNVP, so knowing it would be helpful to understand GLOW's contribution.

Table of Contents


Abstract

Singing voice synthesis (SVS) is a task using the computer to generate songs with lyrics. So far, researchers are focusing on tunning the pre-recorded sound pieces according to rigid rules. For example, in Vocaloid, one of the commercial SVS systems, there are 8 principal parameters modifiable by song creators. The system uses these parameters to synthesise sound pieces pre-recorded from professional voice actors. We notice a common difference between computer-generated songs and real singers' songs. This difference can be addressed to help the generated ones become more like the real-singer ones.

In this paper, we propose SinGlow, as a solution to minimise this difference. SinGlow is one of the Normalising Flow that directly uses the calculated Negative Log-Likelihood value to optimise the trainable parameters. This feature gives SinGlow the ability to perfectly encode inputs into feature vectors, which allows us to manipulate the feature space to minimise the difference we discussed before. To our best knowledge, we are the first to propose an application of Normalising Flow in SVS fields.

In our experiments, SinGlow shows the ability to encode sound and make the input virtual-singer songs more human-like.

Structure

SinGlow
│   train.py //need modification, replace data dirs with yours
│   common_definitions.py //model configurations are located here
│   data_loarder.py //construct tfrecord dataset from wav or mp3 data, and load it
│   model.py //Glow Model / SinGlow Model
│   pipeline.py //training pipeline
│   README.md
├───utils
│       utils.py //originate from Glow-OpenAI
│       weightnorm.py //originate from Tensorflow
├───checkpoints
│       weights.h5 //the model weights file
├───runs //outputs and rerecords
├───logs //tensorboard logdir
├───design //model architecture information
└───notebooks
        run.ipynb //dataset construction and applying model
        experiment.ipynb //evaluate model
        README.md //some user guide

Requirements

pip3 install -r requirements.txt

Training

After every epoch, the network's weights will be stored in the checkpoints directory defined in common_definitions.py.

There are also some sampling of the network (image generation mode) that are going to be stored in results directory. Additionally, TensorBoard is used to track z's mean and variance, as well as the negative log-likelihood.

In optimal state, z should have zero mean and one variance. Additionally, the TensorBoard stores sampling with temperature of 0.7.

python3 train.py [-h] [--dataset [DATASET]] [--k_glow [K_GLOW]] [--l_glow [L_GLOW]]
       [--img_size [IMG_SIZE]] [--channel_size [CHANNEL_SIZE]]

optional arguments:
  -h, --help            show this help message and exit
  --dataset [DATASET]   The dataset to train on ("mnist", "cifar10", "cifar100")
  --k_glow [K_GLOW]     The amount of blocks per layer
  --l_glow [L_GLOW]     The amount of layers
  --img_size [IMG_SIZE] The width and height of the input images
  --channel_size [CHANNEL_SIZE]
                        The channel size of the input images

CONTRIBUTING

To contribute to the project, these steps can be followed. Anyone that contributes will surely be recognized and mentioned here!

Contributions to the project are made using the "Fork & Pull" model. The typical steps would be:

  1. create an account on github
  2. fork this repository
  3. make a local clone
  4. make changes on the local copy
  5. commit changes git commit -m "my message"
  6. push to your GitHub account: git push origin
  7. create a Pull Request (PR) from your GitHub fork (go to your fork's webpage and click on "Pull Request." You can then add a message to describe your proposal.)

LICENSE

This open-source project is licensed under MIT License.

Reference

TODO the reference information

中文注释

这是一个基于流模型的歌曲特征提取,并进行风格迁移的项目。我们一定程度上实现了将真实人声歌曲的特征迁移到虚拟歌手的歌曲上。

我们接下来的计划是继续优化模型,并在歌曲切割上取得进展,向着研究落地努力。


我们有一个堆满创意点子的秘密基地,里面有很多有意思的小伙伴。生活什么的、技术什么的、二次元什么的都可以聊得开。

欢迎加入我们的小群:兔叽的魔术工房。群内会经常发布各种各样的企划,总会遇上你感兴趣的。

Owner
Haobo Yang
A 3rd-year undergraduate student, hope to be an AI Architect in the future.
Haobo Yang
GNOME powered sound conversion

SoundConverter A simple sound converter application for the GNOME environment. It reads anything the GStreamer library can read, and writes Ogg Vorbis

Gautier Portet 188 Dec 17, 2022
All-In-One Digital Audio Workstation and Plugin Suite

How to install Windows Mac OS X Fedora Ubuntu How to Build Debian and Ubuntu Fedora All Other Linux Distros Mac OS X Windows What is MusiKernel? MusiK

j3ffhubb 111 Sep 21, 2021
Pythonic bindings for FFmpeg's libraries.

PyAV PyAV is a Pythonic binding for the FFmpeg libraries. We aim to provide all of the power and control of the underlying library, but manage the gri

PyAV 1.8k Jan 03, 2023
PianoPlayer - Automatic fingering generator for piano scores

PianoPlayer - Automatic fingering generator for piano scores

Marco Musy 571 Jan 02, 2023
Terminal-based music player written in Python for the best music in the world 🎵 🎧 💻

audius-terminal-player Terminal-based music player written in Python for the best music in the world 🎵 🎧 💻 Browse and listen to Audius from the com

Audius 21 Jul 23, 2022
Musillow is a music recommender app that finds songs similar to your favourites.

MUSILLOW The music recommender app Check it out now!!! View Demo · Report Bug · Request Feature About The App Musillow is a music recommender app that

3 Feb 03, 2022
❤️ This Is The EzilaXMusicPlayer Advaced Repo 🎵

Telegram EzilaXMusicPlayer Bot 🎵 A bot that can play music on telegram group's voice Chat ❤️ Requirements 📝 FFmpeg NodeJS nodesource.com Python 3.7+

Sadew Jayasekara 11 Nov 12, 2022
convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format.

convert-to-opus-cli convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format. Installation Must have installed ffmp

4 Dec 21, 2022
Basically Play Pauses the song when it is safe to do so. when you die in a round

Basically Play Pauses the song when it is safe to do so. when you die in a round

AG_1436 1 Feb 13, 2022
Official implementation of A cappella: Audio-visual Singing VoiceSeparation, from BMVC21

Y-Net Official implementation of A cappella: Audio-visual Singing VoiceSeparation, British Machine Vision Conference 2021 Project page: ipcv.github.io

Juan F. Montesinos 12 Oct 22, 2022
This bot can stream audio or video files and urls in telegram voice chats

Voice Chat Streamer This bot can stream audio or video files and urls in telegram voice chats :) 🎯 Follow me and star this repo for more telegram bot

WiskeyWorm 4 Oct 09, 2022
Bot Music Pintar. Created by Rio

🎶 Rio Music 🎶 Kalo Fork Star Ya Bang Hehehe Requirements 📝 FFmpeg NodeJS nodesource.com Python 3.8+ or 3.7 PyTgCalls Generate String Using Replit ⤵

RioProjectX 7 Jun 15, 2022
IDing the songs played on the do you radio show

IDing the songs played on the do you radio show

Rasmus Jones 36 Nov 15, 2022
A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.

Audiomentations A Python library for audio data augmentation. Inspired by albumentations. Useful for deep learning. Runs on CPU. Supports mono audio a

Iver Jordal 1.2k Jan 07, 2023
Sync Toolbox - Python package with reference implementations for efficient, robust, and accurate music synchronization based on dynamic time warping (DTW)

Sync Toolbox - Python package with reference implementations for efficient, robust, and accurate music synchronization based on dynamic time warping (DTW)

Meinard Mueller 66 Jan 02, 2023
A simple music player, powered by Python, utilising various libraries such as Tkinter and Pygame

A simple music player, powered by Python, utilising various libraries such as Tkinter and Pygame

PotentialCoding 2 May 12, 2022
Implicit neural differentiable FM synthesizer

Implicit neural differentiable FM synthesizer The purpose of this project is to emulate arbitrary sounds with FM synthesis, where the parameters of th

Andreas Jansson 34 Nov 06, 2022
spafe: Simplified Python Audio-Features Extraction

spafe aims to simplify features extractions from mono audio files. The library can extract of the following features: BFCC, LFCC, LPC, LPCC, MFCC, IMFCC, MSRCC, NGCC, PNCC, PSRCC, PLP, RPLP, Frequenc

Ayoub Malek 310 Jan 01, 2023
python script for getting mp3 files from yaoutube playlist

mp3-from-youtube-playlist python script for getting mp3 files from youtube playlist. Do your non-tech brown relatives ask you for downloading music fr

Shuhan Mirza 7 Oct 19, 2022
Omniscient Mozart, being able to transcribe everything in the music, including vocal, drum, chord, beat, instruments, and more.

OMNIZART Omnizart is a Python library that aims for democratizing automatic music transcription. Given polyphonic music, it is able to transcribe pitc

MCTLab 1.3k Jan 08, 2023