Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

Last update: Dec 20, 2022

Related tags

Deep Learning StrengthNet

Overview

StrengthNet

Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

https://arxiv.org/abs/2110.03156

Dependency

Ubuntu 18.04.5 LTS

GPU: Quadro RTX 6000
Driver version: 450.80.02
CUDA version: 11.0

Python 3.5

tensorflow-gpu 2.0.0b1 (cudnn=7.6.0)
scipy
pandas
matplotlib
librosa

Environment set-up

For example,

conda create -n strengthnet python=3.5
conda activate strengthnet
pip install -r requirements.txt
conda install cudnn=7.6.0

Usage

Run python utils.py to extract .wav to .h5;
Run python train.py to train a CNN-BLSTM based StrengthNet;

Evaluating new samples

Put the waveforms you wish to evaluate in a folder. For example, / /
Run python test.py --rootdir / /

This script will evaluate all the .wav files in / /, and write the results to / / /StrengthNet_result_raw.txt.

By default, the output/strengthnet.h5 pretrained model is used.

Citation

If you find this work useful in your research, please consider citing:

@misc{liu2021strengthnet,
      title={StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis}, 
      author={Rui Liu and Berrak Sisman and Haizhou Li},
      year={2021},
      eprint={2110.03156},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

Resources

The ESD corpus is released by the HLT lab, NUS, Singapore.

The strength scores for the English samples of the ESD corpus are available here.

Acknowledgements:

MOSNet: https://github.com/lochenchou/MOSNet

Relative Attributes: Relative Attributes

License

This work is released under MIT License (see LICENSE file for details).

Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

Related tags

Overview

StrengthNet

Dependency

Environment set-up

Usage

Evaluating new samples

Citation

Resources

Acknowledgements:

License

Owner

RuiLiu

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. In CVPR 2022.

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

A New Open-Source Off-road Environment for Benchmark Generalization of Autonomous Driving

LibFewShot: A Comprehensive Library for Few-shot Learning.

Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

Facebook AI Image Similarity Challenge: Descriptor Track

Assginment for UofT CSC420: Intro to Image Understanding

ProjectOxford-ClientSDK - This repo has moved :house: Visit our website for the latest SDKs & Samples

Safe Policy Optimization with Local Features

[CVPR 2022] Official Pytorch code for OW-DETR: Open-world Detection Transformer

AsymmetricGAN - Dual Generator Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code.

Add gui for YoloV5 using PyQt5

Tensorflow implementation of Swin Transformer model.

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation, CVPR2022

Latte: Cross-framework Python Package for Evaluation of Latent-based Generative Models

Efficient Speech Processing Tookit for Automatic Speaker Recognition

Blender scripts for computing geodesic distance

Chunkmogrify: Real image inversion via Segments

MAGMA - a GPT-style multimodal model that can understand any combination of images and language