SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

Last update: Nov 20, 2021

Related tags

Text Data & NLP SASE

Overview

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

We propose a SASE model with adaptive noise distribution, which achieves state of the art results on the VioceBank+DEMAND dataset.
We simulated the federated learning setting of a real environment and verified the robustness of the proposed SASE noise reduction model in a real environment through experiments and visualization.
The proposed SASE model is computed based on the complex domain, and the TF-GA block is used to extract richer information of speech distribution and noise distribution, while SA-GOEA and SA-GUEA are adaptive to learn the distribution mask of noise.
In this paper, we propose a model aggregation optimization weighting strategy that is more applicable to FLbased speech enhancement tasks.

Dependencies

python >=3.6 (3.8.5 was used in the experiments)
PyTorch == 1.10.0+cu113
flwr == 2.0.1

How to run the code

1. Prepare data

VoiceBank+DEMAND can be accessed from this [link](## SUPERSEDED: THIS DATASET HAS BEEN REPLACED. ## Noisy speech database for training speech enhancement algorithms and TTS models)
CommonVoice(Chinese) link +Noise92 [link](NOISEX (cmu.edu))

2. Train on the VoiceBank+DEMAND dataset

python main.py

3. Train on the CommonVoice(Chinese)+Noise92 dataset with Federated learning

./run-server.sh
./run-client.sh
- You can change the number of clients by changing NUM_CLIENTS

4. Generate wav files and evaluate

python main.py -g --resume "model_file" -df "wavs_root"

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

Related tags

Overview

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

Dependencies

How to run the code

1. Prepare data

2. Train on the VoiceBank+DEMAND dataset

3. Train on the CommonVoice(Chinese)+Noise92 dataset with Federated learning

4. Generate wav files and evaluate

Result

1. Evaluate on VoiceBank+DEMAND dataset

2. Evaluate on CommonVoice+Noise92 dataset

Owner

Tower

DVC-NLP-Simple-usecase

A programming language with logic of Python, and syntax of all languages.

Korean stereoypte detector with TUNiB-Electra and K-StereoSet

SimBERT升级版（SimBERTv2）！

Traditional Chinese Text Recognition Dataset: Synthetic Dataset and Labeled Data

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

Every Google, Azure & IBM text to speech voice for free

Twitter Sentiment Analysis using #tag, words and username

Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

A fast hierarchical dimensionality reduction algorithm.

PyTorch original implementation of Cross-lingual Language Model Pretraining.

Python module (C extension and plain python) implementing Aho-Corasick algorithm

Faster, modernized fork of the language identification tool langid.py

ACL'2021: Learning Dense Representations of Phrases at Scale

novel deep learning research works with PaddlePaddle

Parrot is a paraphrase based utterance augmentation framework purpose built to accelerate training NLU models

A calibre plugin that generates Word Wise and X-Ray files then sends them to Kindle. Supports KFX, AZW3 and MOBI eBooks. X-Ray supports 18 languages.

Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition