A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.

Overview

Audiomentations

Build status Code coverage Code Style: Black Licence: MIT

A Python library for audio data augmentation. Inspired by albumentations. Useful for deep learning. Runs on CPU. Supports mono audio and partially multichannel audio. Can be integrated in training pipelines in e.g. Tensorflow/Keras or Pytorch. Has helped people get world-class results in Kaggle competitions. Is used by companies making next-generation audio products.

Need a Pytorch alternative with GPU support? Check out torch-audiomentations!

Setup

Python version support PyPI version Number of downloads from PyPI per month

pip install audiomentations

Optional requirements

Some features have extra dependencies. Extra python package dependencies can be installed by running

pip install audiomentations[extras]

Feature Extra dependencies
Load 24-bit wav files fast wavio
LoudnessNormalization pyloudnorm
Mp3Compression ffmpeg and [pydub or lameenc]

Note: ffmpeg can be installed via e.g. conda or from the official ffmpeg download page.

Usage example

from audiomentations import Compose, AddGaussianNoise, TimeStretch, PitchShift, Shift
import numpy as np

SAMPLE_RATE = 16000

augment = Compose([
    AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.5),
    TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),
    PitchShift(min_semitones=-4, max_semitones=4, p=0.5),
    Shift(min_fraction=-0.5, max_fraction=0.5, p=0.5),
])

# Generate 2 seconds of dummy audio for the sake of example
samples = np.random.uniform(low=-0.2, high=0.2, size=(32000,)).astype(np.float32)

# Augment/transform/perturb the audio data
augmented_samples = augment(samples=samples, sample_rate=SAMPLE_RATE)

Go to audiomentations/augmentations/transforms.py to see the waveform transforms you can apply, and what arguments they have.

See audiomentations/augmentations/spectrogram_transforms.py for spectrogram transforms.

Waveform transforms

AddBackgroundNoise

Added in v0.9.0

Mix in another sound, e.g. a background noise. Useful if your original sound is clean and you want to simulate an environment where background noise is present.

Can also be used for mixup, as in https://arxiv.org/pdf/1710.09412.pdf

A folder of (background noise) sounds to be mixed in must be specified. These sounds should ideally be at least as long as the input sounds to be transformed. Otherwise, the background sound will be repeated, which may sound unnatural.

Note that the gain of the added noise is relative to the amount of signal in the input. This implies that if the input is completely silent, no noise will be added.

AddGaussianNoise

Added in v0.1.0

Add gaussian noise to the samples

AddGaussianSNR

Added in v0.7.0

Add gaussian noise to the samples with random Signal to Noise Ratio (SNR)

AddImpulseResponse

Added in v0.7.0

Convolve the audio with a random impulse response. Impulse responses can be created using e.g. http://tulrich.com/recording/ir_capture/

Some datasets of impulse responses are publicly available:

  • EchoThief containing 115 impulse responses acquired in a wide range of locations.
  • The MIT McDermott dataset containing 271 impulse responses acquired in everyday places.

Impulse responses are represented as wav files in the given ir_path.

AddShortNoises

Added in v0.9.0

Mix in various (bursts of overlapping) sounds with random pauses between. Useful if your original sound is clean and you want to simulate an environment where short noises sometimes occur.

A folder of (noise) sounds to be mixed in must be specified.

ClippingDistortion

Added in v0.8.0

Distort signal by clipping a random percentage of points

The percentage of points that will ble clipped is drawn from a uniform distribution between the two input parameters min_percentile_threshold and max_percentile_threshold. If for instance 30% is drawn, the samples are clipped if they're below the 15th or above the 85th percentile.

FrequencyMask

Added in v0.7.0

Mask some frequency band on the spectrogram. Inspired by https://arxiv.org/pdf/1904.08779.pdf

Gain

Added in v0.11.0

Multiply the audio by a random amplitude factor to reduce or increase the volume. This technique can help a model become somewhat invariant to the overall gain of the input audio.

Warning: This transform can return samples outside the [-1, 1] range, which may lead to clipping or wrap distortion, depending on what you do with the audio in a later stage. See also https://en.wikipedia.org/wiki/Clipping_(audio)#Digital_clipping

Mp3Compression

Added in v0.12.0

Compress the audio using an MP3 encoder to lower the audio quality. This may help machine learning models deal with compressed, low-quality audio.

This transform depends on either lameenc or pydub/ffmpeg.

Note that bitrates below 32 kbps are only supported for low sample rates (up to 24000 hz).

Note: When using the lameenc backend, the output may be slightly longer than the input due to the fact that the LAME encoder inserts some silence at the beginning of the audio.

LoudnessNormalization

Added in v0.14.0

Apply a constant amount of gain to match a specific loudness. This is an implementation of ITU-R BS.1770-4.

Warning: This transform can return samples outside the [-1, 1] range, which may lead to clipping or wrap distortion, depending on what you do with the audio in a later stage. See also https://en.wikipedia.org/wiki/Clipping_(audio)#Digital_clipping

Normalize

Added in v0.6.0

Apply a constant amount of gain, so that highest signal level present in the sound becomes 0 dBFS, i.e. the loudest level allowed if all samples must be between -1 and 1. Also known as peak normalization.

PitchShift

Added in v0.4.0

Pitch shift the sound up or down without changing the tempo

PolarityInversion

Added in v0.11.0

Flip the audio samples upside-down, reversing their polarity. In other words, multiply the waveform by -1, so negative values become positive, and vice versa. The result will sound the same compared to the original when played back in isolation. However, when mixed with other audio sources, the result may be different. This waveform inversion technique is sometimes used for audio cancellation or obtaining the difference between two waveforms. However, in the context of audio data augmentation, this transform can be useful when training phase-aware machine learning models.

Resample

Added in v0.8.0

Resample signal using librosa.core.resample

To do downsampling only set both minimum and maximum sampling rate lower than original sampling rate and vice versa to do upsampling only.

Shift

Added in v0.5.0

Shift the samples forwards or backwards, with or without rollover

TimeMask

Added in v0.7.0

Make a randomly chosen part of the audio silent. Inspired by https://arxiv.org/pdf/1904.08779.pdf

TimeStretch

Added in v0.2.0

Time stretch the signal without changing the pitch

Trim

Added in v0.7.0

Trim leading and trailing silence from an audio signal using librosa.effects.trim

Spectrogram transforms

SpecChannelShuffle

Added in v0.13.0

Shuffle the channels of a multichannel spectrogram. This can help combat positional bias.

SpecFrequencyMask

Added in v0.13.0

Mask a set of frequencies in a spectrogram, à la Google AI SpecAugment. This type of data augmentation has proved to make speech recognition models more robust.

The masked frequencies can be replaced with either the mean of the original values or a given constant (e.g. zero).

Known limitations

  • Some transforms do not support multichannel audio yet. See Multichannel audio
  • Expects the input dtype to be float32, and have values between -1 and 1.
  • The code runs on CPU, not GPU. For a GPU-compatible version, check out pytorch-audiomentations
  • Multiprocessing is not officially supported yet. See also #46

Contributions are welcome!

Multichannel audio

The following table is valid for v0.14.0 - v0.16.0 only

Transform Supports multichannel audio?
AddBackgroundNoise -
AddGaussianNoise Yes
AddGaussianSNR Yes
AddImpulseResponse -
AddShortNoises -
ClippingDistortion Yes
FrequencyMask Yes
Gain Yes
LoudnessNormalization Yes, up to 5 channels
Mp3Compression -
Normalize Yes
PitchShift Yes
PolarityInversion Yes
Resample -
Shift Yes
SpecChannelShuffle Yes
SpecFrequencyMask Yes
TimeMask Yes
TimeStretch Yes
Trim -

Changelog

v0.16.0 (2021-02-11)

  • Implement SpecCompose for applying a pipeline of spectrogram transforms. Thanks to omerferhatt.
  • Fix a bug in SpecChannelShuffle where it did not support more than 3 audio channels. Thanks to omerferhatt.
  • Limit scipy version range to >=1.0,<1.6 to avoid issues with loading 24-bit wav files. Support for scipy>=1.6 will be added later.

v0.15.0 (2020-12-10)

  • Fix picklability of instances of AddImpulseResponse, AddBackgroundNoise and AddShortNoises
  • Add an option leave_length_unchanged to AddImpulseResponse

v0.14.0 (2020-12-06)

  • Implement LoudnessNormalization
  • Implement randomize_parameters in Compose. Thanks to SolomidHero.
  • Add multichannel support to AddGaussianNoise, AddGaussianSNR, ClippingDistortion, FrequencyMask, PitchShift, Shift, TimeMask and TimeStretch

v0.13.0 (2020-11-10)

  • Show a warning if a waveform had to be resampled after loading it. This is because resampling is slow. Ideally, files on disk should already have the desired sample rate.
  • Correctly find audio files with upper case filename extensions.
  • Lay the foundation for spectrogram transforms. Implement SpecChannelShuffle and SpecFrequencyMask.
  • Fix a bug where AddBackgroundNoise crashed when trying to add digital silence to an input. Thanks to juheeuu.
  • Configurable LRU cache for transforms that use external sound files. Thanks to alumae.
  • Officially add multichannel support to Normalize

v0.12.1 (2020-09-28)

  • Speed up AddBackgroundNoise, AddShortNoises and AddImpulseResponse by loading wav files with scipy or wavio instead of librosa.

v0.12.0 (2020-09-23)

  • Implement Mp3Compression
  • Python <= 3.5 is no longer officially supported, since Python 3.5 has reached end-of-life
  • Expand range of supported librosa versions
  • Officially support multichannel audio in Gain and PolarityInversion
  • Add m4a and opus to the list of recognized audio filename extensions
  • Breaking change: Internal util functions are no longer exposed directly. If you were doing e.g. from audiomentations import calculate_rms, now you have to do from audiomentations.core.utils import calculate_rms

v0.11.0 (2020-08-27)

  • Implement Gain and PolarityInversion. Thanks to Spijkervet for the inspiration.

v0.10.1 (2020-07-27)

  • Improve the performance of AddBackgroundNoise and AddShortNoises by optimizing the implementation of calculate_rms.
  • Improve compatibility of output files written by the demo script. Thanks to xwJohn.
  • Fix division by zero bug in Normalize. Thanks to ZFTurbo.

v0.10.0 (2020-05-05)

  • Breaking change: AddImpulseResponse, AddBackgroundNoise and AddShortNoises now include subfolders when searching for files. This is useful when your sound files are organized in subfolders.
  • AddImpulseResponse, AddBackgroundNoise and AddShortNoises now support aiff files in addition to flac, mp3, ogg and wav
  • Fix filter instability bug in FrequencyMask. Thanks to kvilouras.

v0.9.0 (2020-02-20)

  • Disregard non-audio files when looking for impulse response files
  • Remember randomized/chosen effect parameters. This allows for freezing the parameters and applying the same effect to multiple sounds. Use transform.freeze_parameters() and transform.unfreeze_parameters() for this.
  • Fix a bug in ClippingDistortion where the min_percentile_threshold was not respected as expected.
  • Implement transform.serialize_parameters(). Useful for when you want to store metadata on how a sound was perturbed.
  • Switch to a faster convolve implementation. This makes AddImpulseResponse significantly faster.
  • Add a rollover parameter to Shift. This allows for introducing silence instead of a wrapped part of the sound.
  • Expand supported range of librosa versions
  • Add support for flac in AddImpulseResponse
  • Implement AddBackgroundNoise transform. Useful for when you want to add background noise to all of your sound. You need to give it a folder of background noises to choose from.
  • Implement AddShortNoises. Useful for when you want to add (bursts of) short noise sounds to your input audio.
  • Improve handling of empty input

v0.8.0 (2020-01-28)

  • Add shuffle parameter in Composer
  • Add Resample transformation
  • Add ClippingDistortion transformation
  • Add fade parameter to TimeMask

Thanks to askskro

v0.7.0 (2020-01-14)

Add new transforms:

  • AddGaussianSNR
  • AddImpulseResponse
  • FrequencyMask
  • TimeMask
  • Trim

Thanks to karpnv

v0.6.0 (2019-05-27)

  • Implement peak normalization

v0.5.0 (2019-02-23)

  • Implement Shift transform
  • Ensure p is within bounds

v0.4.0 (2019-02-19)

  • Implement PitchShift transform
  • Fix output dtype of AddGaussianNoise

v0.3.0 (2019-02-19)

Implement leave_length_unchanged in TimeStretch

v0.2.0 (2019-02-18)

  • Add TimeStretch transform
  • Parametrize AddGaussianNoise

v0.1.0 (2019-02-15)

Initial release. Includes only one transform: AddGaussianNoise

Development

Install the dependencies specified in requirements.txt

Code style

Format the code with black

Run tests and measure code coverage

pytest

Generate demo sounds for empirical evaluation

python -m demo.demo

Alternatives

Audiomentations isn't the only python library that can do various types of audio data augmentation/degradation! Here's an overview:

Name Github stars License Last commit GPU support?
audio-degradation-toolbox Github stars License Last commit No
audio_degrader Github stars License Last commit No
audiomentations Github stars License Last commit No
kapre Github stars License Last commit Yes, Keras/Tensorflow
muda Github stars License Last commit No
nlpaug Github stars License Last commit No
pydiogment Github stars License Last commit No
python-audio-effects Github stars License Last commit No
sigment Github stars License Last commit No
SpecAugment Github stars License Last commit Yes, Pytorch & Tensorflow
spec_augment Github stars License Last commit Yes, Pytorch
torch-audiomentations Github stars License Last commit Yes, Pytorch
WavAugment Github stars License Last commit Yes, Pytorch

Acknowledgements

Thanks to Nomono for backing audiomentations.

Thanks to all contributors who help improving audiomentations.

Comments
  • Bug in RoomSimulator execution

    Bug in RoomSimulator execution

    Code

    augmenter = Compose(
                    [
                        RoomSimulator(
                            p=1.0,
                            leave_length_unchanged=True,
                        ),
                        AddBackgroundNoise(
                            sounds_path=os.path.join(random.choice(BACKGROUND_NOISE_FILES)),
                            min_snr_in_db=15,
                            max_snr_in_db=35,
                            p=1.0,
                        )
                    ]
                )
                wave = augmenter(samples=wave, sample_rate=sample_rate)
    

    Trace:

     File "/home/sk/anaconda3/envs/vc/lib/python3.8/site-packages/audiomentations/core/composition.py", line 88, in __call__
        samples = transform(samples, sample_rate)
      File "/home/sk/anaconda3/envs/vc/lib/python3.8/site-packages/audiomentations/core/transforms_interface.py", line 62, in __call__
        self.randomize_parameters(samples, sample_rate)
      File "/home/sk/anaconda3/envs/vc/lib/python3.8/site-packages/audiomentations/augmentations/room_simulator.py", line 339, in randomize_parameters
        self.room.compute_rir()
      File "/home/sk/anaconda3/envs/vc/lib/python3.8/site-packages/pyroomacoustics/room.py", line 2156, in compute_rir
        vis = self.visibility[s][m, :].astype(np.int32)
    IndexError: list index out of range
    
    

    It happens occasionally on specific wav files.

    bug 
    opened by skol101 13
  • How to apply to stereo?

    How to apply to stereo?

    Currently audiomentations doesn't work with stereo audio. Most of transformations failed with:

    librosa.util.exceptions.ParameterError: Invalid shape for monophonic audio: ndim=2, shape=(240000, 2)

    One of possible workaround is to apply same transormations for each channel independently. But how to use the same transformations with same parameters?

    opened by ZFTurbo 12
  • Documentations

    Documentations

    [Draft]

    This MR adds docs and their respective web objects in descending order.
    Docs completed so far are:

    • [x] Trim
    • [x] Time stretch
    • [ ] Time Mask
    • [x] Tanh Distortion
    • [ ] Shift
    • [ ] Seven band parametric eq
    • [ ] Room simulator
    • [ ] Reverse
    • [ ] Resample
    • [ ] Polarity inversion
    • [ ] Pitch Shift
    • [ ] Peaking filter
    • [ ] Padding
    • [ ] Normalize
    • [ ] mp3 compression
    • [ ] Low shelf filter
    • [ ] Low pass filter
    • [ ] Loudness normalization
    • [ ] Limiter
    • [ ] Lambda
    • [ ] High shelf filter
    • [ ] High pass filter
    • [ ] Gain
    • [ ] Gain transition
    • [ ] Clipping distortion
    • [ ] Clip
    • [ ] Band stop filter
    • [ ] Band pass filter
    • [ ] Apply impulse response
    • [ ] Air absoption
    • [ ] Add short noises
    • [x] Add gaussian SNR
    • [x] Add gaussian noise
    • [x] Add background noise
    opened by Thanatoz-1 11
  • Can this repo be used in multi thread?

    Can this repo be used in multi thread?

    I encountered this error, which caused by TimeMask

    time mask: t0 138974  t: 21416  m.shpae: (21416,)  newshpae: (93184,)  rawshpae: (93184,)  slice shape: (0,)
    err:  operands could not be broadcast together with shapes (0,) (21416,) (0,) 
    
    augmenter = Compose([
            TimeStretch(min_rate=0.8, max_rate=1.25, p=1.0),
            AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.2),
            PitchShift(min_semitones=-4, max_semitones=4, p=0.2),
            Shift(min_fraction=-0.5, max_fraction=0.5, p=0.2),
            FrequencyMask(min_frequency_band=0.0, max_frequency_band=0.5, p=0.2),
            TimeMask(min_band_part=0.0, max_band_part=0.5, fade=False, p=0.2)
    ], p=1.0, shuffle=False)
    
    def aug_sample(samples, sample_rate):
      assert samples.dtype == np.int16 
      samples =  samples * max(0.01, np.max(np.abs(samples))) / 32768.0
      samples = samples.astype(np.float32)
      samples = augmenter(samples=samples, sample_rate=sample_rate)
      samples *= 32767 / max(0.01, np.max(np.abs(samples)))
      samples = samples.astype(np.int16)
      return samples
    

    using aug_sample to load wav by threading.Thread like mapreduce.

    opened by zh794390558 11
  • Mp3Compression NoBackendError

    Mp3Compression NoBackendError

    I am trying to use Mp3Compression with backend="pydub". But having this NoBackendError below. Do you have any idea about the issue? Regards,

    • Windows10
    • python 3.8.5
    • ffmpeg version = 4.3.1 (conda package)
    • pydub version = 0.23.1 (conda package)
    C:\Users\user\anaconda3\envs\ml\lib\site-packages\librosa\core\audio.py:162: UserWarning: PySoundFile failed. Trying audioread instead.
      warnings.warn("PySoundFile failed. Trying audioread instead.")
    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    ~\anaconda3\envs\ml\lib\site-packages\librosa\core\audio.py in load(path, sr, mono, offset, duration, dtype, res_type)
        145     try:
    --> 146         with sf.SoundFile(path) as sf_desc:
        147             sr_native = sf_desc.samplerate
    
    ~\anaconda3\envs\ml\lib\site-packages\soundfile.py in __init__(self, file, mode, samplerate, channels, subtype, endian, format, closefd)
        628                                          format, subtype, endian)
    --> 629         self._file = self._open(file, mode_int, closefd)
        630         if set(mode).issuperset('r+') and self.seekable():
    
    ~\anaconda3\envs\ml\lib\site-packages\soundfile.py in _open(self, file, mode_int, closefd)
       1182             raise TypeError("Invalid file: {0!r}".format(self.name))
    -> 1183         _error_check(_snd.sf_error(file_ptr),
       1184                      "Error opening {0!r}: ".format(self.name))
    
    ~\anaconda3\envs\ml\lib\site-packages\soundfile.py in _error_check(err, prefix)
       1356         err_str = _snd.sf_error_number(err)
    -> 1357         raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
       1358 
    
    RuntimeError: Error opening 'C:\\Users\\user\\AppData\\Local\\Temp\\tmp_compressed_37ed35fe-9a1.mp3': Format not recognised.
    
    During handling of the above exception, another exception occurred:
    
    NoBackendError                            Traceback (most recent call last)
    <ipython-input-296-889e15591d71> in <module>
         25 ])
         26 
    ---> 27 augmented_data = augment(samples=data.T[0], sample_rate=SAMPLE_RATE)
         28 
         29 plt.plot(augmented_data)
    
    ~\anaconda3\envs\ml\lib\site-packages\audiomentations\core\composition.py in __call__(self, samples, sample_rate)
         50                 random.shuffle(transforms)
         51             for transform in transforms:
    ---> 52                 samples = transform(samples, sample_rate)
         53 
         54         return samples
    
    ~\anaconda3\envs\ml\lib\site-packages\audiomentations\core\transforms_interface.py in __call__(self, samples, sample_rate)
         74                     )
         75                 )
    ---> 76             return self.apply(samples, sample_rate)
         77         return samples
         78 
    
    ~\anaconda3\envs\ml\lib\site-packages\audiomentations\augmentations\transforms.py in apply(self, samples, sample_rate)
       1054             return self.apply_lameenc(samples, sample_rate)
       1055         elif self.backend == "pydub":
    -> 1056             return self.apply_pydub(samples, sample_rate)
       1057         else:
       1058             raise Exception("Backend {} not recognized".format(self.backend))
    
    ~\anaconda3\envs\ml\lib\site-packages\audiomentations\augmentations\transforms.py in apply_pydub(self, samples, sample_rate)
       1134         file_handle.close()
       1135 
    -> 1136         degraded_samples, _ = librosa.load(tmp_file_path, sample_rate)
       1137 
       1138         os.unlink(tmp_file_path)
    
    ~\anaconda3\envs\ml\lib\site-packages\librosa\core\audio.py in load(path, sr, mono, offset, duration, dtype, res_type)
        161         if isinstance(path, (str, pathlib.PurePath)):
        162             warnings.warn("PySoundFile failed. Trying audioread instead.")
    --> 163             y, sr_native = __audioread_load(path, offset, duration, dtype)
        164         else:
        165             raise (exc)
    
    ~\anaconda3\envs\ml\lib\site-packages\librosa\core\audio.py in __audioread_load(path, offset, duration, dtype)
        185 
        186     y = []
    --> 187     with audioread.audio_open(path) as input_file:
        188         sr_native = input_file.samplerate
        189         n_channels = input_file.channels
    
    ~\anaconda3\envs\ml\lib\site-packages\audioread\__init__.py in audio_open(path, backends)
        114 
        115     # All backends failed!
    --> 116     raise NoBackendError()
    
    NoBackendError: 
    
    opened by darkcurrent 10
  • Add low-pass, high-pass and band-pass filter transforms

    Add low-pass, high-pass and band-pass filter transforms

    Implementing https://github.com/iver56/audiomentations/issues/7, https://github.com/iver56/audiomentations/issues/8 and https://github.com/iver56/audiomentations/issues/9.

    opened by atamazian 9
  • Tweak BPF and BSF defaults and express bandwidth as a ratio of the center freq

    Tweak BPF and BSF defaults and express bandwidth as a ratio of the center freq

    • [x] Adopt this approach: https://github.com/asteroid-team/torch-audiomentations/blob/master/torch_audiomentations/augmentations/band_pass_filter.py#L18
    opened by iver56 8
  • Random interruption pulses to augment ECG lead-off condition

    Random interruption pulses to augment ECG lead-off condition

    Hello,

    I just wrote a function, for ECG augmentation. The main purpose is simulate the signals when the ECG electrodes are in lead-off condition. It can also be used for audio data augmentation. The function generates random number of flat pulses with some noise on it. The pulses' amplitudes are between signal max-min range and pulses have random width. Maybe the function can be added here.

    Regards,

    import wavio
    import numpy as np
    from matplotlib import pyplot as plt
    
    audio_seg=wavio.read("sample_audio.wav").data.astype(np.float32).T[0]
    
    def InterruptSignal(audio_seg, max_num_interruptions, noise_level, p):
        # audio_seg:             Audio segment (1D Numpy array)
        # max_num_interruptions: Maximum number of interruption pulses. Pulses are generated randomly
        # noise_level:           The level of noise to be added the pulses. "CNN's don't like constant values :)"
        # p:                     Triggering probability
        if np.random.random() < p / 1.:
            max_peak = np.max(audio_seg).astype(int)
            min_peak = np.min(audio_seg).astype(int)
            len_audio                = audio_seg.shape[-1]
            num_random_interruptions = np.random.randint(max_num_interruptions)+1
            interruption_start_times = (len_audio * np.random.random(num_random_interruptions)*4/5).astype(int)
            for interruption in interruption_start_times:
                interruption_len = int(len_audio*np.random.random()//5)
                interruption_val = np.random.randint(min_peak, max_peak)
                audio_seg[interruption:interruption+interruption_len] = interruption_val+np.random.normal(0, 1, interruption_len)*noise_level
        return audio_seg
    
    fig, (ax1, ax2) = plt.subplots(2)
    fig.suptitle('Two random interruptions with some noise')
    ax1.plot(audio_seg[:5000])
    ax2.plot(InterruptSignal(audio_seg[:5000],
                             max_num_interruptions=3,
                             noise_level=100,
                             p=1))
    plt.show()
    
    interruptions
    opened by darkcurrent 8
  • Unittest to pytest

    Unittest to pytest

    Implements https://github.com/iver56/audiomentations/issues/176, except from unittest.mock in test_room_simulator.py. I can replace it with pytest-mock if necessary.

    opened by atamazian 7
  • AddBackgroundNoise MultichannelAudioNotSupportedException

    AddBackgroundNoise MultichannelAudioNotSupportedException

    When I use AddBackgroundNoise method, most of the time I got following error: MultichannelAudioNotSupportedException: AddBackgroundNoise only supports mono audio, not multichannel audio But the noise folder which I have pointed has 16kHz, 16bit PCM(little/signed), 1 channel wav files in it.

    opened by darkcurrent 7
  • CircleCI obsolete Docker image

    CircleCI obsolete Docker image

    When I run CircleCI check on my forked repo, I've got this message:

    "You’re using a deprecated Docker convenience image. Upgrade to a next-gen Docker convenience image."

    Does this need to be fixed by changing .circleci/config.yml, or it can be left as it is for now?

    opened by atamazian 6
  • Implement RandomCrop transform

    Implement RandomCrop transform

    Should have a target duration parameter that can be specified either in seconds or in number of samples

    If the input sound is longer than the target duration, pick a random offset (so we don't always output just the beginning of the audio) and crop the sound to the target duration

    If the input sound is shorter than the target duration, pad the end of the sound (append digital silence) so the duration matches the target duration. Maybe it makes sense to support various padding modes here, just like in the Padding transform.

    good first issue 
    opened by iver56 2
  • LC3 compression transform

    LC3 compression transform

    Audio transmitted via bluetooth can sometimes be compressed with LC3. Let's make a transform that simulates this lossy audio encoding.

    https://github.com/google/liblc3

    enhancement help wanted 
    opened by iver56 0
  • Unify add gaussian noise transforms to one

    Unify add gaussian noise transforms to one

    • Deprecate (and later remove) min_amplitude and max_amplitude in AddGaussianNoise
    • Add min_snr_in_db, max_snr_in_db, min_absolute_rms_db, max_absolute_rms_db and mode
    • Deprecate (and later remove) AddGaussianSNR
    opened by iver56 0
  • Idea: Rename AddBackgroundNoise?

    Idea: Rename AddBackgroundNoise?

    I feel like it isn't limited to adding specifically background noise - it could be any type of noise or actually any audio (it doesn't have to be noise although that is a common thing to do)

    Any suggestions for a better name? AddNoise? AddNoiseFromFiles? MixInSoundFile?

    opened by iver56 0
Releases(v0.27.0)
Owner
Iver Jordal
Machine learning, computer vision, music technology, demoscene, web technology, games, startups
Iver Jordal
Python tools for the corpus analysis of popular music.

CATCHY Corpus Analysis Tools for Computational Hook discovery Python tools for the corpus analysis of popular music recordings. The tools can be used

Jan VB 20 Aug 20, 2022
The venturimeter works on the principle of Bernoulli's equation, i.e., the pressure decreases as the velocity increases.

The venturimeter works on the principle of Bernoulli's equation, i.e., the pressure decreases as the velocity increases. The cross-section of the throat is less than the cross-section of the inlet pi

Shankar Mahadevan L 1 Dec 03, 2021
Delta TTA(Text To Audio) SoftWare

Text-To-Audio-Windows Delta TTA(Text To Audio) SoftWare Info You Can Use It For Convert Your Text To Audio File You Just Write Your Text And Your End

Delta Inc. 2 Dec 14, 2021
Converting UGG files from Rode Wireless Go II transmitters (unsompressed recordings) to WAV format

Rode_WirelessGoII_UGG2wav Converting UGG files from Rode Wireless Go II transmitters (uncompressed recordings) to WAV format Story I backuped the .ugg

Ján Mazanec 31 Dec 22, 2022
Accompanying code for our paper "Point Cloud Audio Processing"

Point Cloud Audio Processing Krishna Subramani1, Paris Smaragdis1 1UIUC Paper For the necessary libraries/prerequisites, please use conda/anaconda to

Krishna Subramani 17 Nov 17, 2022
A useful tool to generate chord progressions according to melody MIDIs

Auto chord generator, pure python package that generate chord progressions according to given melodies

Billy Yi 53 Dec 30, 2022
This Is Telegram Music UserBot To Play Music Without Being Admin

This Is Telegram Music UserBot To Play Music Without Being Admin

Krishna Kumar 36 Sep 13, 2022
Play any song directly into your group voice chat.

Telegram VCPlayer Bot Play any song directly into your group voice chat. Official Bot : VCPlayerBot | Discussion Group : VoiceChat Music Player Suppor

Shubham Kumar 50 Nov 21, 2022
music library manager and MusicBrainz tagger

beets Beets is the media library management system for obsessive music geeks. The purpose of beets is to get your music collection right once and for

beetbox 11.3k Dec 31, 2022
Muzic: Music Understanding and Generation with Artificial Intelligence

Muzic is a research project on AI music that empowers music understanding and generation with deep learning and artificial intelligence.

Microsoft 2.6k Dec 30, 2022
This bot can stream audio or video files and urls in telegram voice chats

Voice Chat Streamer This bot can stream audio or video files and urls in telegram voice chats :) 🎯 Follow me and star this repo for more telegram bot

WiskeyWorm 4 Oct 09, 2022
❤️ Hi There Im Cozmo Music Bot A next gen powerful telegram group Music bot for get your Songs and music @Venuja_Sadew

🎵 Cozmo MUSIC 🎵 Cozmo Music is a Music powerfull bot for playing music on telegram voice chat groups. Requirements FFmpeg NodeJS nodesource.com Pyth

Venuja Sadew 3 Jan 08, 2022
Carnatic Notes Predictor for audio files

Carnatic Notes Predictor for audio files Link for live application: https://share.streamlit.io/pradeepak1/carnatic-notes-predictor-for-audio-files/mai

1 Nov 06, 2021
Extract the songs from your osu! libary into proper mp3 form, complete with metadata and album art!

osu-Extract Extract the songs from your osu! libary into proper mp3 form, complete with metadata and album art! Requirements python3 mutagen pillow Us

William Carter 2 Mar 09, 2022
The project aims to develop a personal-assistant for Windows & Linux-based systems

The project aims to develop a personal-assistant for Windows & Linux-based systems. Samiksha draws its inspiration from virtual assistants like Cortana for Windows, and Siri for iOS. It has been desi

SHUBHANSHU RAI 1 Jan 16, 2022
Full LAKH MIDI dataset converted to MuseNet MIDI output format (9 instruments + drums)

LAKH MuseNet MIDI Dataset Full LAKH MIDI dataset converted to MuseNet MIDI output format (9 instruments + drums) Bonus: Choir on Channel 10 Please CC

Alex 6 Nov 20, 2022
Sequencer: Deep LSTM for Image Classification

Sequencer: Deep LSTM for Image Classification Created by Yuki Tatsunami Masato Taki This repository contains implementation for Sequencer. Abstract In

Yuki Tatsunami 111 Dec 16, 2022
GiantMIDI-Piano is a classical piano MIDI dataset contains 10,854 MIDI files of 2,786 composers

GiantMIDI-Piano is a classical piano MIDI dataset contains 10,854 MIDI files of 2,786 composers

Bytedance Inc. 1.3k Jan 04, 2023
Code for csig audio deepfake detection

FMFCC Audio Deepfake Detection Solution This repo provides an solution for the 多媒体伪造取证大赛. Our solution achieve the 1st in the Audio Deepfake Detection

BokingChen 9 Jun 04, 2022
Conferencing Speech Challenge

ConferencingSpeech 2021 challenge This repository contains the datasets list and scripts required for the ConferencingSpeech challenge. For more detai

73 Nov 29, 2022