Official implementation of A cappella: Audio-visual Singing VoiceSeparation, from BMVC21

Overview

Y-Net

Official implementation of A cappella: Audio-visual Singing VoiceSeparation, British Machine Vision Conference 2021

Project page: ipcv.github.io/Acappella/
Paper: Arxiv, BMVC (not available yet)

Running a demo / Y-Net Inference

We provide simple functions to load models with pre-trained weights. Steps:

  1. Clone the repo or download y-net>VnBSS>models (models can run as a standalone package)
  2. Load a model:
from VnBSS import y_net_gr # or from models import y_net_gr 
model = y_net_gr(n=1)

Check a demo fully working:
Open In Colab

Citation

@inproceedings{acappella,
    author    = {Juan F. Montesinos and
                 Venkatesh S. Kadandale and
                 Gloria Haro},
    title     = {A cappella: Audio-visual Singing VoiceSeparation},
    booktitle = {British Machine Vision Conference (BMVC)},
    year      = {2021},

}

Repository under construction .
.
.
.
.
.
.
.

Training / Using DEV code

###Training The most difficult part is to prepare the dataset as everything is builded upon a very specific format.
To run training:
python run.py -m model_name --workname experiment_name --arxiv_path directory_of_experiments --pretrained_from path_pret_weights
You can inspect the argparse at default.py>argparse_default.
Possible model names are: y_net_g, y_net_gr, y_net_m,y_net_r,u_net,llcp

Testing

  1. Go to manuscript_scripts and replace checkpoint paths by yours in the testing scripts.
  2. Run: bash manuscript_scripts/test_gr_r.sh
  3. Replace the paths of manuscript_scripts/auto_metrics.py by your experiment_directory path.
  4. Run: python manuscript_scripts/auto_metrics.py to visualise results.

It's a complicated framework. HELP!

The best option to run the framework is to debug! Having a runable code helps to see input shapes, dataflow and to run line by line. Download The circle of life demo with the files already processed. It will act like a dataset of 6 samples. You can download it from Google Drive 1.1 Gb.

  1. Unzip the file
  2. run python run.py -m y_net_gr (for example)

Everything has been configured to run by default this way.

The model

Each effective model is wrapped by a nn.Module which takes care of computing the STFT, the mask, returning the waveform etcetera... This wrapper can be found at VnBSS>models>y_net.py>YNet. To get rid of this you can simply inherit the class, take minimum layers and keep the core_forward method, which is the inference step without the miscelanea.

FAQs

  1. How to change the optimizer's hyperparameters?
    Go to config>optimizer.json
  2. How to change clip duration, video framerate, STFT parameters or audio samplerate?
    Go to config>__init__.py
  3. How to change the batch size or the amount of epochs?
    Go to config>hyptrs.json
  4. How to dump predictions from the training and test set
    Go to default.py. Modify DUMP_FILES (can be controlled at a subset level). force argument skips the iteration-wise conditions and dumps for every single network prediction.
  5. Is tensorboard enabled?
    Yes, you will find tensorboard records at your_experiment_directory/used_workname/tensorboard
  6. Can I resume an experiment?
    Yes, if you set exactly the same experiment folder and workname, the system will detect it and will resume from there.
  7. I'm trying to resume but found AssertionError If there is an exception before running the model
  8. How to change the amount of layers of U-Net
    U-net is build dynamically given a list of layers per block as shown in models>__init__.py from outer to inner blocks.
  9. How to modify the default network values?
    The json file config>net_cfg.json overwrites any default configuration from the model.
Owner
Juan F. Montesinos
PhD student at Pompeu Fabra university Barcelona
Juan F. Montesinos
Python I/O for STEM audio files

stempeg = stems + ffmpeg Python package to read and write STEM audio files. Technically, stems are audio containers that combine multiple audio stream

Fabian-Robert Stรถter 72 Dec 23, 2022
A collection of python scripts for extracting and analyzing acoustics from audio files.

pyAcoustics A collection of python scripts for extracting and analyzing acoustics from audio files. Contents 1 Common Use Cases 2 Major revisions 3 Fe

Tim 74 Dec 26, 2022
Oliva music bot help to play vc music

OLIVA V2 ๐ŸŽต Requirements ๐Ÿ“ FFmpeg NodeJS nodesource.com Python 3.7+ PyTgCalls Commands ๐Ÿ›  For all in group /play - reply to youtube url or song file

SOULใ€…Hา‰Aา‰Cา‰Kา‰Eา‰Rา‰ 2 Oct 22, 2021
Library for working with sound files of the format: .ogg, .mp3, .wav

Library for working with sound files of the format: .ogg, .mp3, .wav. By work is meant - playing sound files in a straight line and in the background, obtaining information about the sound file (auth

Romanin 2 Dec 15, 2022
A voice assistant which can handle your everyday task and allows you to book items from your favourite store!

Voicely Table of Contents About The Project Built With Getting Started Prerequisites Installation Usage Roadmap Contributing License Contact Acknowled

Awantika Nigam 2 Nov 17, 2021
A voice assistant which can be used to interact with your computer and controls your pc operations

Introduction ๐Ÿ‘จโ€๐Ÿ’ป It is a voice assistant which can be used to interact with your computer and also you have been seeing it in Iron man movies, but t

Sujith 84 Dec 22, 2022
Minimal command-line music player written in Python

pyms Minimal command-line music player written in Python. Designed with elegance and minimalism. Resizes dynamically with your terminal. Dependencies

12 Sep 23, 2022
An 8D music player made to enjoy Halloween this year!๐Ÿค˜

HAPPY HALLOWEEN buddy! Split Player Hello There! Welcome to SplitPlayer... Supposed To Be A 8DPlayer.... You Decide.... It can play the ordinary audio

Akshat Kumar Singh 1 Nov 04, 2021
The official repository for Audio ALBERT

AALBERT Here is also the official repository of AALBERT, which is Pytorch lightning reimplementation of the paper, Audio ALBERT: A Lite Bert for Self-

pohan 55 Dec 11, 2022
Synchronize a local directory of songs' (MP3, MP4) metadata (genre, ratings) and playlists with a Plex server.

PlexMusicSync Synchronize a local directory of songs' (MP3, MP4) metadata (genre, ratings) and playlists (m3u, m3u8) with a Plex server. The song file

Tom Goetz 9 Jul 07, 2022
This is an AI that runs in the terminal. It is a voice assistant that can do common activities and can also help in your coding doubts like

This is an AI that runs in the terminal. It is a voice assistant that can do common activities and can also help in your coding doubts like

OneBit 1 Nov 05, 2021
PianoPlayer - Automatic fingering generator for piano scores

PianoPlayer - Automatic fingering generator for piano scores

Marco Musy 571 Jan 02, 2023
kapre: Keras Audio Preprocessors

Kapre Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time. Tested on Python 3.6 and 3.7 Why Kapre? vs. Pre-co

Keunwoo Choi 867 Dec 29, 2022
Stream Music ๐ŸŽต ๐˜ผ ๐™—๐™ค๐™ฉ ๐™ฉ๐™๐™–๐™ฉ ๐™˜๐™–๐™ฃ ๐™ฅ๐™ก๐™–๐™ฎ ๐™ข๐™ช๐™จ๐™ž๐™˜ ๐™ค๐™ฃ ๐™๐™š๐™ก๐™š๐™œ๐™ง๐™–๐™ข ๐™‚๐™ง๐™ค๐™ช๐™ฅ ๐™–๐™ฃ๐™™ ๐˜พ๐™๐™–๐™ฃ๐™ฃ๐™š๐™ก ๐™‘๐™ค๐™ž๐™˜๐™š ๐˜พ๐™๐™–๐™ฉ๐™จ ๐˜ผ๐™ซ๐™–๐™ž๐™ก?

Stream Music ๐ŸŽต ๐˜ผ ๐™—๐™ค๐™ฉ ๐™ฉ๐™๐™–๐™ฉ ๐™˜๐™–๐™ฃ ๐™ฅ๐™ก๐™–๐™ฎ ๐™ข๐™ช๐™จ๐™ž๐™˜ ๐™ค๐™ฃ ๐™๐™š๐™ก๐™š๐™œ๐™ง๐™–๐™ข ๐™‚๐™ง๐™ค๐™ช๐™ฅ ๐™–๐™ฃ๐™™ ๐˜พ๐™๐™–๐™ฃ๐™ฃ๐™š๐™ก ๐™‘๐™ค๐™ž๐™˜๐™š ๐˜พ๐™๐™–๐™ฉ๐™จ ๐˜ผ๐™ซ๐™–๐™ž๐™ก?

Sadew Jayasekara 15 Nov 12, 2022
An audio digital processing toolbox based on a workflow/pipeline principle

AudioTK Audio ToolKit is a set of audio filters. It helps assembling workflows for specific audio processing workloads. The audio workflow is split in

Matthieu Brucher 238 Oct 18, 2022
nicfit 425 Jan 01, 2023
Audio fingerprinting and recognition in Python

dejavu Audio fingerprinting and recognition algorithm implemented in Python, see the explanation here: How it works Dejavu can memorize audio by liste

Will Drevo 6k Jan 06, 2023
Gradient - A Python program designed to create a reactive and ambient music listening experience

Gradient is a Python program designed to create a reactive and ambient music listening experience.

Alexander Vega 2 Jan 24, 2022
Terminal-based audio-to-text converter

att Terminal-based audio-to-text converter Project description A terminal-based audio-to-text converter written in python, enabling you to convert .wa

Sven Eschlbeck 4 Dec 15, 2022
Audio book player for senior visually impaired.

PI Zero W Audio Book Motivation and requirements My dad is practically blind and at 80 years has trouble hearing and operating tiny or more complicate

Andrej Hosna 29 Dec 25, 2022