Audio2midi - Automatic Audio-to-symbolic Arrangement

Related tags

Audioaudio2midi
Overview

Automatic Audio-to-symbolic Arrangement

This is the repository of the project "Audio-to-symbolic Arrangement via Cross-modal Music Representation Learning" by Ziyu Wang, Dejing Xu, Gus Xia and Ying Shan (arXiv:2112.15110).

Model Overview

We propose an audio-to-symbolic generative model to transfer an input audio to its piano arrangement (in MIDI format). The task is similar to piano cover song production based on the original pop songs, or piano score reduction based on a classical symphony.

The input audio is a pop-song audio under arbitrary instrumentation. The output is a MIDI piano arrangement of the original audio which keeps most musical information (incdluding chord, groove and lead melody). The model is built upon the previous projects including EC2-VAE, PianoTree VAE, and Poly-Dis. The novelty of this study is the cross-modal representation disentanglement design and the 3-step training strategy shown below.

workflow

Application

This audio-to-symbolic conversion and arrangement module can lead to many applications (e.g., music score generation, automatic acccompaniment etc.) In this project, we demonstrate a specific application: audio-level pop style transfer, in which we transfer a pop song "vocal + (full band) accompaniment" to "vocal + piano accompaniment". The automation is an integration of spleeter source separation, audio-to-symbolic conversion, performance rendering, and synthesis (as shown in the diagram below).

style_transfer_app

Quick Start

In demo/audio2midi_out we presents generated results in separated song folders. In each song folder, there is a MIDI file and an audio file.

  • The MIDI file is the generated piano arrangement by our model (with rule-based performance rendering).
  • The audio file is a combination of the generated piano arragement and the original vocal (separated by Spleeter) using GarageBand.

In this section, we demonstrate how to use our model to reproduce the demo. Later on, you can generate piano arrangement with your own inputs.

Step One: Prepare the python environment specified by requirements.txt.

Step Two: Download the pre-trained model parameter files and put them under params. (Link)

Step Three: Source separation the audio files at demo/example_audio using Spleeter and output the results at demo/spleeter_out (see the original repository for more detail). If you do not want to run Spleeter right now, we have prepare the seprated result for you at demo/spleeter_out.

Step Four: Run the following command at the project root directory:

python scripts/inference_pop_style_transfer.py ./demo/spleeter_out/example1/accompaniment.wav ./demo/audio2midi_out/example1
python scripts/inference_pop_style_transfer.py ./demo/spleeter_out/example2/accompaniment.wav ./demo/audio2midi_out/example2

Here, the first parameter after inference_pop_style_transfer.py is the input audio file path, and the second parameter is the output MIDI folder. The command also allows applying different models, switching autoregressive mode, and many other options. See more details by

python scripts/inference_pop_style_transfer.py -h

The generated MIDI files can be found at demo/audio2midi_out. Then, the MIDI arrangement and vocal audio can be combined and further processed in the DAW.

Project Details

Before discussing the audio-to-midi conversion in more detail, we introduce different models and training stages that can be specified in the command.

Model Types

There are four model types (or model_id):

  • (a2s): the proposed model (requiring chord extraction because it learns disentangled latent chord representation).
  • (a2s-nochd): the proposed model without chord latent representation (i.e., removing chord encoder or chord decoder).
  • (supervised): pure supervised training (audio encoder + symbolic decoder (without symbolic feature decoding)). No resampling and KL penalty.
  • (supervised+kl): model architecture is same as supervised. Resampling and KL penalty is applied (to audio-texture representation).

In general, a2s and a2s-nochd perform the best as they are the proposed models. The rest are baseline models in the paper. (Note: the other baseline model (Chord-based generation) is not implemented because it does not depend on any audio information.)

Stages (Autoregressive Mode)

As shown in the diagram above, a2s and a2s-nochd have 3 training stages, where stage 1 and 2 are not ready to be used, and stage 3 has two options (i.e., pure audio-based and autoregression).

  • (Stage 3a): pure audio-based. The model input is the audio itself.
  • (Stage 3b): autoregression. The model input is the audio itself and previous 2-measure output.

During inference, autoregressive=True mode means we use a stage 3b model auto regressively, and use stage 3a model to compute the initial segment. On the other hand, autoregressive=False model means we use a stage 3a model to arrangement every 2-measure segment independently. In general, autoregressive=False gives output of slightly higher quality.

Audio2midi Command

Run audio2midi in command line using the command:

python scripts/inference_pop_style_transfer.py <input_acc_audio_path> <output_midi_folder_path>

Note: the command should be run at the project root directory. For clarity, the user can only specify the output MIDI folder (where possibly multiple versions of MIDI arrangement will be produced in the same folder). To specify output MIDI filename, see the python interfance section below.

The command has the following functions:

  1. To specify model_id using -m or --model (default to be a2s), e.g.,

    python scripts/inference_pop_style_transfer.py <input_acc_audio_path> <output_midi_folder_path> --model a2s-nochd
    
  2. To turn on autoregressive mode using -a or --auto (detault to be autoregressive=False), e.g.,

    python scripts/inference_pop_style_transfer.py <input_acc_audio_path> <output_midi_folder_path> --auto
    
  3. By default, the beat and chord analysis (a preprocessing step by madmom) will be applied to the separated accompaniment track. The analysis can also be applied to the original audio using -o or --original, (though there are usually no aparent difference), e.g.,

    python scripts/inference_pop_style_transfer.py <input_acc_audio_path> <output_midi_folder_path> --original <input_audio_path>
    
  4. At the early stage of this research, we only have stage 3a and stage 3b has not been developed yet. We provide the model parameter of the old stage 3a model and name it after the alternative model. The generation results also sound good in general. The alternative model can be called by --alt, e.g.,

    python scripts/inference_pop_style_transfer.py <input_acc_audio_path> <output_midi_folder_path> --alt
    

    Alternative model can be also called in autoregressive mode, where the model for initial computation will be replaced with the alternation.

  5. To run through all the model types, autoregressive modes and alternative models, using --model all, e.g.,

    python scripts/inference_pop_style_transfer.py <input_acc_audio_path> <output_midi_folder_path> --model all
    

For more detail, check it out using

python scripts/inference_pop_style_transfer.py -h

Audio2midi Python Interface

To use the model in a more flexible fashion (e.g., use your own trained model parameter, combine the a2s routine in your project, or run multiple files at a time), we provide the python interface. Specifically, call the functions in the pop_style_transfer package. For example, in a python script outside this project, we can write the following python code:

from audio2midi.pop_style_transfer import acc_audio_to_midi, load_model

model, model0 = load_model(model_id='a2s', 
                           autoregressive=False,
                           alt=False
                          )

acc_audio_to_midi(input_acc_audio_path=...,
                  output_midi_path=...,
                  model=model, 
                  model0=model0,
                  input_audio_path=..., 
                  input_analysis_npy_path=...,
                  save_analysis_npy_path=..., 
                  batch_size=...
                 )

Note: Unlike in the command line mode where output location can only be a folder directory, we can specify filename in the python interface. Also, chord and beat preprocessing can be loaded or stored, and batch size can be chaged.

Data

We use the POP909 dataset and its time-aligned audio files. The audio files are colloected from the internet. The data should be processed and put under the data folder. Specifically, there are 4 sub-folders:

  • data/audio-midi-combined: contain 909 folders of 909 songs. In ecah folder, we have
    • audio.mp3 the original audio
    • midi folder containing the MIDI arrangement and annotation files.
    • split audio folder containing the vocal-accompaniment separation result using spleeter:2stems model.
  • data/quantized_pop909_4_bin: containing 909 npz files storing pre-processed symbolic data from MIDI and annotation files.
  • data/audio_stretched: containing 909 pairs of wav and pickle files storing the accompaniment audio time-stretched to 13312 frames per (symbolic) beat under 22050 sample rate (approximately 99.384 bpm).
  • data/train_split: containing a pickle file for train and valid split ids at song level.

The pre-processed symbolic-part of the dataset (i.e., quantized_pop909_4_bin and train_split can be downloaded at link.

Training the model

To train the model, run the scripts in ./scripts/run*.py, at project root directory. The result will be saved in the result folder. The model should be trained in a sequential manner. For example, run

python ./scripts/run_a2s_stage1.py

When the stage i model is trained, put the model parameter file path in line 13 of /scripts/run_a2s_stage{j}.py, and run

python ./scripts/run_a2s_stage{j}.py

Here, (i, j) can be (i=1, j=2), (i=2, j=3), (i=2, j=4). (3 for stage 3a and 4 for stage 3b). The other model types can be run in a similar fashion. The training result can be downloaded at link.

Summary of Materials to Download

References

Owner
Ziyu Wang
Computer music, Deep Music Generation & Artificial musical intelligence
Ziyu Wang
Scalable audio processing framework written in Python with a RESTful API

TimeSide : scalable audio processing framework and server written in Python TimeSide is a python framework enabling low and high level audio analysis,

Parisson 340 Jan 04, 2023
Audio processor to map oracle notes in the VoG raid in Destiny 2 to call outs.

vog_oracles Audio processor to map oracle notes in the VoG raid in Destiny 2 to call outs. Huge thanks to mzucker on GitHub for the note detection cod

19 Sep 29, 2022
Music player and music library manager for Linux, Windows, and macOS

Ex Falso / Quod Libet - A Music Library / Editor / Player Quod Libet is a music management program. It provides several different ways to view your au

Quod Libet 1.2k Jan 07, 2023
MusicBrainz Picard

MusicBrainz Picard MusicBrainz Picard is a cross-platform (Linux/Mac OS X/Windows) application written in Python and is the official MusicBrainz tagge

MetaBrainz Foundation 3k Dec 31, 2022
python wrapper for rubberband

pyrubberband A python wrapper for rubberband. For now, this just provides lightweight wrappers for pitch-shifting and time-stretching. All processing

Brian McFee 106 Nov 28, 2022
A simple voice detection system which can be applied practically for designing a device with capability to detect a baby’s cry and automatically turning on music

Auto-Baby-Cry-Detection-with-Music-Player A simple voice detection system which can be applied practically for designing a device with capability to d

2 Dec 15, 2021
Synchronize a local directory of songs' (MP3, MP4) metadata (genre, ratings) and playlists with a Plex server.

PlexMusicSync Synchronize a local directory of songs' (MP3, MP4) metadata (genre, ratings) and playlists (m3u, m3u8) with a Plex server. The song file

Tom Goetz 9 Jul 07, 2022
Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.

Summary Pyroomacoustics is a software package aimed at the rapid development and testing of audio array processing algorithms. The content of the pack

Audiovisual Communications Laboratory 1k Jan 09, 2023
DaisyXmusic ❤ A bot that can play music on Telegram Group and Channel Voice Chats

DaisyXmusic ❤ is the best and only Telegram VC player with playlists, Multi Playback, Channel play and more

TeamOfDaisyX 34 Oct 22, 2022
PatrikZero's CS:GO Hearing protection

Program that lowers volume when you die and get flashed in CS:GO. It aims to lower the chance of hearing damage by reducing overall sound exposure. Uses game state integration. Anti-cheat safe.

Patrik Žúdel 224 Dec 04, 2022
A Youtube audio player for your terminal

AudioLine A lightweight Youtube audio player for your terminal Explore the docs » View Demo · Report Bug · Request Feature · Send a Pull Request About

Haseeb Khalid 26 Jan 04, 2023
ᴀ ʙᴏᴛ ᴛʜᴀᴛ ᴄᴀɴ ᴘʟᴀʏ ᴍᴜꜱɪᴄ ɪɴ ᴛᴇʟᴇɢʀᴀᴍ ɢʀᴏᴜᴘ ᴏɴ ᴠᴏɪᴄᴇ ᴄᴀʟʟ

GJ516 LOVER'S ııllıllı ♥️ ➤⃝Gᴊ516_ᴍᴜꜱɪᴄ_ʙᴏᴛ ♥️ ıllıllı ᴀ ʙᴏᴛ ᴛʜᴀᴛ ᴄᴀɴ ᴘʟᴀʏ ᴍᴜꜱɪᴄ ɪɴ ᴛᴇʟᴇɢʀᴀᴍ ɢʀᴏᴜᴘ ᴏɴ ᴠᴏɪᴄᴇ ᴄᴀʟʟ Requirements 📝 FFmpeg NodeJS nodesou

1 Nov 22, 2021
A voice control utility for Spotify

Spotify Voice Control A voice control utility for Spotify · Report Bug · Request

Shoubhit Dash 27 Jan 01, 2023
Carnatic Notes Predictor for audio files

Carnatic Notes Predictor for audio files Link for live application: https://share.streamlit.io/pradeepak1/carnatic-notes-predictor-for-audio-files/mai

1 Nov 06, 2021
nicfit 425 Jan 01, 2023
Manipulate audio with a simple and easy high level interface

Pydub Pydub lets you do stuff to audio in a way that isn't stupid. Stuff you might be looking for: Installing Pydub API Documentation Dependencies Pla

James Robert 6.6k Jan 01, 2023
Analysis of voices based on the Mel-frequency band

Speaker_partition_module Analysis of voices based on the Mel-frequency band. Goal: Identification of voices speaking (diarization) and calculation of

1 Feb 06, 2022
The official repository for Audio ALBERT

AALBERT Here is also the official repository of AALBERT, which is Pytorch lightning reimplementation of the paper, Audio ALBERT: A Lite Bert for Self-

pohan 55 Dec 11, 2022
Codes for "Efficient Long-Range Attention Network for Image Super-resolution"

ELAN Codes for "Efficient Long-Range Attention Network for Image Super-resolution", arxiv link. Dependencies & Installation Please refer to the follow

xindong zhang 124 Dec 22, 2022
Python library for audio and music analysis

librosa A python package for music and audio analysis. Documentation See https://librosa.org/doc/ for a complete reference manual and introductory tut

librosa 5.6k Jan 06, 2023