Emotional conditioned music generation using transformer-based model.

Related tags

Deep LearningEMOPIA
Overview

This is the official repository of EMOPIA: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation. The paper has been accepted by International Society for Music Information Retrieval Conference 2021.

  • Note: We release the transcribed MIDI files. As for the audio part, due to the copyright issue, we will only release the YouTube ID of the tracks and the timestamp of them. You might use open source crawler to get the audio file.

Use EMOPIA by MusPy

  1. install muspy
pip install muspy
  1. Use it in your script
import muspy

emopia = muspy.EMOPIADataset("data/emopia/", download_and_extract=True)
emopia.convert()
music = emopia[0]
print(music.annotations[0].annotation)

You can get the label of the piece of music:

{'emo_class': '1', 'YouTube_ID': '0vLPYiPN7qY', 'seg_id': '0'}
  • emo_class: ['1', '2', '3', '4']
  • YouTube_ID: the YouTube ID of this piece of music
  • seg_id: means this piece of music is the ith piece we take from this song. (zero-based).

For more usage please refer to MusPy.

Emotion Classification

For the classification models and codes, please refer to this repo.

Conditional Generation

Environment

  1. Install PyTorch and fast transformer:

    • torch==1.7.0 (Please install it according to your CUDA version.)

    • fast transformer :

      pip install --user pytorch-fast-transformers 
      

      or refer to the original repository

  2. Other requirements:

    pip install -r requirements.txt

Usage

Inference

  1. Download the checkpoints and put them into exp/

    • Manually:

    • By commend: (install gdown: pip install gdown)

      #baseline:
      gdown --id 1Q9vQYnNJ0hXBFwcxdWQgDNmzoW3MLl3h --output exp/baseline.zip
      
      # no-pretrained transformer
      gdown --id 1ZULJgBRu2Wb3jxFmGfAHP1v_tjoryFM7 --output exp/no-pretrained_transformer.zip
      
      # pretrained transformer
      gdown --id 19Seq18b2JNzOamEQMG1uarKjj27HJkHu --output exp/pretrained_transformer.zip
      
  2. Inference options:

  • num_songs: number of midis you want to generate.

  • out_dir: the folder where the generated midi will be saved. If not specified, midi files will be saved to exp/MODEL_YOU_USED/gen_midis/.

  • task_type: the task_type needs to be the same as the task specified during training.

    • '4-cls' for 4 class conditioning
    • 'Arousal' for only conditioning on arousal
    • 'Valence' for only conditioning on Valence
    • 'ignore' for not conditioning
  • emo_tag: the target class of emotion you want to assign.

    • If the task_type is '4-cls', emo_tag can be: 1,2,3,4, which refers to Q1, Q2, Q3, Q4.
    • If the task_type is 'Arousal', emo_tag can be: 1, 2. 1 for High arousal, 2 for Low arousal.
    • If the task_type is 'Valence', emo_tag can be: 1, 2. 1 for High Valence, 2 for Low Valence.
  1. Inference

    python main_cp.py --mode inference --task_type 4-cls --load_ckt CHECKPOINT_FOLDER --load_ckt_loss 25 --num_songs 10 --emo_tag 1 
    

Train the model by yourself

  1. Prepare the data follow the steps.

  2. training options:

  • exp_name: the folder name that the checkpoints will be saved.

  • data_parallel: use data_parallel to let the training process faster. (0: not use, 1: use)

  • task_type: the conditioning task:

    • '4-cls' for 4 class conditioning
    • 'Arousal' for only conditioning on arousal
    • 'Valence' for only conditioning on Valence
    • 'ignore' for not conditioning

    a. Only train on EMOPIA: (no-pretrained transformer in the paper)

      python main_cp.py --path_train_data emopia --exp_name YOUR_EXP_NAME --load_ckt none
    

    b. Pre-train the transformer on AILabs17k:

      python main_cp.py --path_train_data ailabs --exp_name YOUR_EXP_NAME --load_ckt none --task_type ignore
    

    c. fine-tune the transformer on EMOPIA: For example, you want to use the pre-trained model stored in 0309-1857 with loss= 30 to fine-tune:

      python main_cp.py --path_train_data emopia --exp_name YOUR_EXP_NAME --load_ckt 0309-1857 --load_ckt_loss 30
    

Baseline

  1. The baseline code is based on the work of Learning to Generate Music with Sentiment

  2. According to the author, the model works best when it is trained with 4096 neurons of LSTM, but takes 12 days for training. Therefore, due to the limit of computational resource, we used the size of 512 neurons instead of 4096.

  3. In order to use this as evaluation against our model, the target emotion classes is expanded to 4Q instead of just positive/negative.

Authors

The paper is a co-working project with Joann, SeungHeon and Nabin. This repository is mentained by Joann and me.

License

The EMOPIA dataset is released under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0). It is provided primarily for research purposes and is prohibited to be used for commercial purposes. When sharing your result based on EMOPIA, any act that defames the original music owner is strictly prohibited.

The hand drawn piano in the logo comes from Adobe stock. The author is Burak. I purchased it under standard license.

Cite the dataset

@inproceedings{{EMOPIA},
         author = {Hung, Hsiao-Tzu and Ching, Joann and Doh, Seungheon and Kim, Nabin and Nam, Juhan and Yang, Yi-Hsuan},
         title = {{MOPIA}: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation},
         booktitle = {Proc. Int. Society for Music Information Retrieval Conf.},
         year = {2021}
}
Owner
hung anna
hung anna
Code, Models and Datasets for OpenViDial Dataset

OpenViDial This repo contains downloading instructions for the OpenViDial dataset in 《OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Vis

119 Dec 08, 2022
LyaNet: A Lyapunov Framework for Training Neural ODEs

LyaNet: A Lyapunov Framework for Training Neural ODEs Provide the model type--config-name to train and test models configured as those shown in the pa

Ivan Dario Jimenez Rodriguez 21 Nov 21, 2022
"Segmenter: Transformer for Semantic Segmentation" reproduced via mmsegmentation

Segmenter-based-on-OpenMMLab "Segmenter: Transformer for Semantic Segmentation, arxiv 2105.05633." reproduced via mmsegmentation. We reproduce Segment

EricKani 22 Feb 24, 2022
PRTR: Pose Recognition with Cascade Transformers

PRTR: Pose Recognition with Cascade Transformers Introduction This repository is the official implementation for Pose Recognition with Cascade Transfo

mlpc-ucsd 133 Dec 30, 2022
A Simple Long-Tailed Rocognition Baseline via Vision-Language Model

BALLAD This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model. Requirements Python3 Pytorch(1.7.

Teli Ma 4 Jan 20, 2022
Library extending Jupyter notebooks to integrate with Apache TinkerPop and RDF SPARQL.

Graph Notebook: easily query and visualize graphs The graph notebook provides an easy way to interact with graph databases using Jupyter notebooks. Us

Amazon Web Services 501 Dec 28, 2022
Boost learning for GNNs from the graph structure under challenging heterophily settings. (NeurIPS'20)

Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu,

GEMS Lab: Graph Exploration & Mining at Scale, University of Michigan 70 Dec 18, 2022
L-Verse: Bidirectional Generation Between Image and Text

Far beyond learning long-range interactions of natural language, transformers are becoming the de-facto standard for many vision tasks with their power and scalabilty

Kim, Taehoon 102 Dec 21, 2022
Code for paper 'Hand-Object Contact Consistency Reasoning for Human Grasps Generation' at ICCV 2021

GraspTTA Hand-Object Contact Consistency Reasoning for Human Grasps Generation (ICCV 2021). Project Page with Videos Demo Quick Results Visualization

Hanwen Jiang 47 Dec 09, 2022
Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.

CycleGAN PyTorch | project page | paper Torch implementation for learning an image-to-image translation (i.e. pix2pix) without input-output pairs, for

Jun-Yan Zhu 11.5k Dec 30, 2022
Learning Representations that Support Robust Transfer of Predictors

Transfer Risk Minimization (TRM) Code for Learning Representations that Support Robust Transfer of Predictors Prepare the Datasets Preprocess the Scen

Yilun Xu 15 Dec 07, 2022
Deep and online learning with spiking neural networks in Python

Introduction The brain is the perfect place to look for inspiration to develop more efficient neural networks. One of the main differences with modern

Jason Eshraghian 447 Jan 03, 2023
This is the pytorch code for the paper Curious Representation Learning for Embodied Intelligence.

Curious Representation Learning for Embodied Intelligence This is the pytorch code for the paper Curious Representation Learning for Embodied Intellig

19 Oct 19, 2022
implicit displacement field

Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields [project page][paper][cite] Geometry-Consistent Neural Shape Represe

Yifan Wang 100 Dec 19, 2022
Equivariant GNN for the prediction of atomic multipoles up to quadrupoles.

Equivariant Graph Neural Network for Atomic Multipoles Description Repository for the Model used in the publication 'Learning Atomic Multipoles: Predi

16 Nov 22, 2022
[NeurIPS 2021] "G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of Teacher Discriminators"

G-PATE This is the official code base for our NeurIPS 2021 paper: "G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of T

AI Secure 14 Oct 12, 2022
Original code for "Zero-Shot Domain Adaptation with a Physics Prior"

Zero-Shot Domain Adaptation with a Physics Prior [arXiv] [sup. material] - ICCV 2021 Oral paper, by Attila Lengyel, Sourav Garg, Michael Milford and J

Attila Lengyel 40 Dec 21, 2022
[NeurIPS 2020] Official repository for the project "Listening to Sound of Silence for Speech Denoising"

Listening to Sounds of Silence for Speech Denoising Introduction This is the repository of the "Listening to Sounds of Silence for Speech Denoising" p

Henry Xu 40 Dec 20, 2022
unet for image segmentation

Implementation of deep learning framework -- Unet, using Keras The architecture was inspired by U-Net: Convolutional Networks for Biomedical Image Seg

zhixuhao 4.1k Dec 31, 2022
Implementation of Google Brain's WaveGrad high-fidelity vocoder

WaveGrad Implementation (PyTorch) of Google Brain's high-fidelity WaveGrad vocoder (paper). First implementation on GitHub with high-quality generatio

Ivan Vovk 363 Dec 27, 2022