kapre: Keras Audio Preprocessors

Overview

Kapre

Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time.

Tested on Python 3.6 and 3.7

Why Kapre?

vs. Pre-computation

  • You can optimize DSP parameters
  • Your model deployment becomes much simpler and consistent.
  • Your code and model has less dependencies

vs. Your own implementation

  • Quick and easy!
  • Consistent with 1D/2D tensorflow batch shapes
  • Data format agnostic (channels_first and channels_last)
  • Less error prone - Kapre layers are tested against Librosa (stft, decibel, etc) - which is (trust me) trickier than you think.
  • Kapre layers have some extended APIs from the default tf.signals implementation such as..
    • A perfectly invertible STFT and InverseSTFT pair
    • Mel-spectrogram with more options
  • Reproducibility - Kapre is available on pip with versioning

Workflow with Kapre

  1. Preprocess your audio dataset. Resample the audio to the right sampling rate and store the audio signals (waveforms).
  2. In your ML model, add Kapre layer e.g. kapre.time_frequency.STFT() as the first layer of the model.
  3. The data loader simply loads audio signals and feed them into the model
  4. In your hyperparameter search, include DSP parameters like n_fft to boost the performance.
  5. When deploying the final model, all you need to remember is the sampling rate of the signal. No dependency or preprocessing!

Installation

pip install kapre

API Documentation

Please refer to Kapre API Documentation at https://kapre.readthedocs.io

One-shot example

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, BatchNormalization, ReLU, GlobalAveragePooling2D, Dense, Softmax
from kapre import STFT, Magnitude, MagnitudeToDecibel
from kapre.composed import get_melspectrogram_layer, get_log_frequency_spectrogram_layer

# 6 channels (!), maybe 1-sec audio signal, for an example.
input_shape = (44100, 6)
sr = 44100
model = Sequential()
# A STFT layer
model.add(STFT(n_fft=2048, win_length=2018, hop_length=1024,
               window_name=None, pad_end=False,
               input_data_format='channels_last', output_data_format='channels_last',
               input_shape=input_shape))
model.add(Magnitude())
model.add(MagnitudeToDecibel())  # these three layers can be replaced with get_stft_magnitude_layer()
# Alternatively, you may want to use a melspectrogram layer
# melgram_layer = get_melspectrogram_layer()
# or log-frequency layer
# log_stft_layer = get_log_frequency_spectrogram_layer() 

# add more layers as you want
model.add(Conv2D(32, (3, 3), strides=(2, 2)))
model.add(BatchNormalization())
model.add(ReLU())
model.add(GlobalAveragePooling2D())
model.add(Dense(10))
model.add(Softmax())

# Compile the model
model.compile('adam', 'categorical_crossentropy') # if single-label classification

# train it with raw audio sample inputs
# for example, you may have functions that load your data as below.
x = load_x() # e.g., x.shape = (10000, 6, 44100)
y = load_y() # e.g., y.shape = (10000, 10) if it's 10-class classification
# then..
model.fit(x, y)
# Done!

Citation

Please cite this paper if you use Kapre for your work.

@inproceedings{choi2017kapre,
  title={Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras},
  author={Choi, Keunwoo and Joo, Deokjin and Kim, Juho},
  booktitle={Machine Learning for Music Discovery Workshop at 34th International Conference on Machine Learning},
  year={2017},
  organization={ICML}
}
Comments
  • Migrated functions to tf.keras

    Migrated functions to tf.keras

    This PR addresses #52 by removing the dependency on keras and switching to tensorflow.keras

    Proposed version is 0.1.6 because of pull request #56

    In particular, #56 keeps the dependency on keras with from keras.utils.conv_utils import conv_output_length

    opened by douglas125 27
  • Spectrogram?

    Spectrogram?

    I have an older version of Kapre that has time_frequency.Spectrogram, which is trainable.

    However, the new version of Kapre doesn't have Spectrogram anymore. Why?

    opened by turian 8
  • Melspectrogram cant be set 'trainable_fb=False'

    Melspectrogram cant be set 'trainable_fb=False'

    Melspectrogram cant be set 'trainable_fb=False',after I set trainable_fb=False,trainable_kernel=False,but seems like it doesnot work.It is still trainable.

    opened by zhh6 8
  • htk=true for mel frequencies

    htk=true for mel frequencies

    We noticed the current implenetation of the mel_frequencies function (based on Librosa) doesn't include the htk=True option, which is handy when training CNNs because then the frequency scale is fully logarithmic which, in principle, makes more sense for frequency invariant convolutional filters.

    What was the motivation for removing this? Any chance it can be added?

    opened by justinsalamon 8
  • Added parallel STFT implementation

    Added parallel STFT implementation

    Hi!

    As I comented in #98, I added a parallel STFT implementation based on the map_fn function following the indications of @zaccharieramzi here.

    I've added a use_parallel_stft param (disabled by default) that allows to use this function. I've put this param in as many functions as I can. I also added test cases for every function I can, including an specific test that checks that the result of the tf.signal.stft is equals to the result of the parallel_stft function.

    Hope this could serve us well meanwhile tensorflow resolves its issues with the fft implementation.

    opened by JPery 7
  • Amplitude-to-decibel conversion produces different results on different batches

    Amplitude-to-decibel conversion produces different results on different batches

    Related to #16, I found another issue that contributes to different prediction results depending on batch size (and the batches themselves). In particular, it occurs when using converting spectrograms to decibels.

    https://github.com/keunwoochoi/kapre/blob/master/kapre/backend_keras.py#L17

    The maximum is taken over the entire tensor, instead of per example in the batch. This results in different normalization when the examples in a batch are different.

    bug 
    opened by auroracramer 7
  • Inverse Spectrogram and Mel-Spectrogram Layer?

    Inverse Spectrogram and Mel-Spectrogram Layer?

    Namaste!

    kapre has become an integral part of all my audio Deep Learning experiments. Powerful! Thanks for providing such a great software!

    I was thinking... I guess it would make sense to have layers for inverse spectrogram and inverse mel-spectrogram. Thinking about Autoencoders, this would be even more powerful. I know that reconstructing samples from spectrograms is not the best, but it is possible to a certain degree.

    What do you think about that feature request?

    Best, Tristan

    opened by AI-Guru 7
  • Hey! The input is too short!

    Hey! The input is too short!

    Hi,

    I'm encountering an assertion problem when calling your code with a Tensorflow backend.

    input_shape = (44100,1)

    Could this be a be a problem with "channels_first" / "channels_last"?

    Best, Alex

    opened by slychief 6
  • Pip?

    Pip?

    It seems you were on pip, but are no longer. Is there anything I could do to help get kapre back on there? We want to use this library in a commercial application, and for our process pip packages are easier to support than a git repository.

    opened by ff-rfeather 6
  • trainable_stft error

    trainable_stft error

    Following your example but missing layer definition trainable_stft or something, can you provide example with error resolution?

    `# 6 channels (!), maybe 1-sec audio signal
    input_shape = (6, 44100) 
    sr = 44100
    model = Sequential()
    model.add(Melspectrogram(n_dft=512, n_hop=256, input_shape=src_shape,
                             border_mode='same', sr=sr, n_mels=128,
                             fmin=0.0, fmax=sr/2, power=1.0,
                             return_decibel=False, trainable_fb=False,
                             trainable_kernel=False
                             name='trainable_stft'))`
    
      File "<ipython-input-24-cea5588ddf1e>", line 13
        name='trainable_stft'))
           ^
    SyntaxError: invalid syntax
    
    opened by sildeag 6
  • `STFT` layer output shape deviates from `STFTTflite` layer in batch dimension

    `STFT` layer output shape deviates from `STFTTflite` layer in batch dimension

    Use Case

    I want to convert a STFT layer in my model to a STFTTflite to deploy it to my mobile device. In the documentation I found that another dimension is added to account for complex numbers. But I also encountered a behaviour that is not documented.

    Expected Behaviour

    input_shape = (2048, 1)  # mono signal
    
    model = keras.models.Sequential()  # TFLite incompatible model
    model.add(kapre.STFT(n_fft=1024, hop_length=512, input_shape=input_shape))
    
    tflite_model = keras.models.Sequential()  # TFLite compatible model
    tflite_model.add(kapre.STFTTflite(n_fft=1024, hop_length=512, input_shape=input_shape))
    

    model has the output shape (None, 3, 513, 1). Therefore, tflite_model should have the output shape (None, 3, 513, 1, 2).

    Observed Behaviour

    The output shape of tflite_model is (1, 3, 513, 1, 2) instead of (None, 3, 513, 1, 2).

    Problem Solution

    • If this behaviour is unwanted:
      • Change the model output format so that the batch dimension is correctly shaped.
    • Otherwise:
      • Explain in the documentation why the batch dimension is shaped to 1.
      • Explain in the documentation how to include this layer into models which expect the batch dimension to be shaped None.
    opened by PhilippMatthes 5
  • Problem incorporating SpecAugument in the training process

    Problem incorporating SpecAugument in the training process

    Hi,

    I'm trying to add a SpecAug layer in the training process of a CNN using the code below:

    
    CLIP_DURATION = 5 
    SAMPLING_RATE = 41000
    NUM_CHANNELS = 1
    
    INPUT_SHAPE = ((CLIP_DURATION * SAMPLING_RATE), NUM_CHANNELS)
    
    melgram = get_melspectrogram_layer(input_shape = INPUT_SHAPE,
                              n_fft = 2048,
                              hop_length = 512,
                              return_decibel=True,
                              n_mels = 40,
                              mel_f_min = 500,
                              mel_f_max = 15000,
                              input_data_format='channels_last',
                              output_data_format='channels_last')
    
    spec_augment = SpecAugment(freq_mask_param=5,
                              time_mask_param=10,
                              n_freq_masks=2,
                              n_time_masks=3,
                              mask_value=-100)   
    
    model = Sequential()
    model.add(melgram)
    model.add(spec_augment)
    

    The CNN summary looks like this:

    Model: "sequential_2"
    _________________________________________________________________
     Layer (type)                Output Shape              Param #   
    =================================================================
     melspectrogram (Sequential)  (None, 397, 40, 1)       0         
                                                                     
     spec_augment_1 (SpecAugment  (None, 397, 40, 1)       0         
     )                                                               
                                                                     
    =================================================================
    Total params: 0
    Trainable params: 0
    Non-trainable params: 0
    _________________________________________________________________
    

    Compiling and fitting the model

    model.compile(loss = 'sparse_categorical_crossentropy', optimizer='adam', metrics = 'accuracy')
    
    early_stop = EarlyStopping(monitor='loss', patience=5)
    
    reduce_LR = ReduceLROnPlateau(monitor="val_loss",factor=0.1,patience=4)
    
    checkpointer = ModelCheckpoint(filepath = 'saved_models/bird_song_classification.hdf5')
    
    model.fit(X_train, y_train, validation_data = (X_val, y_val), epochs = 50, batch_size = 32, callbacks = [early_stop, checkpointer, reduce_LR])
    

    Then I get the following error:

    Epoch 1/50
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    [<ipython-input-35-e58a056ab523>](https://localhost:8080/#) in <module>
          7 checkpointer = ModelCheckpoint(filepath = 'saved_models/bird_song_classification.hdf5')
          8 
    ----> 9 model.fit(X_train, y_train, validation_data = (X_val, y_val), epochs = 50, batch_size = 32, callbacks = [early_stop, checkpointer, reduce_LR])
    
    6 frames
    [/usr/local/lib/python3.7/dist-packages/kapre/augmentation.py](https://localhost:8080/#) in tf___apply_masks_to_axis(self, x, axis, mask_param, n_masks)
         78                 try:
         79                     do_return = True
    ---> 80                     retval_ = ag__.converted_call(ag__.ld(tf).where, (ag__.ld(mask), ag__.ld(self).mask_value, ag__.ld(x)), None, fscope)
         81                 except:
         82                     do_return = False
    
    TypeError: in user code:
    
        File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1051, in train_function  *
            return step_function(self, iterator)
        File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1040, in step_function  **
            outputs = model.distribute_strategy.run(run_step, args=(data,))
        File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1030, in run_step  **
            outputs = model.train_step(data)
        File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 889, in train_step
            y_pred = self(x, training=True)
        File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
            raise e.with_traceback(filtered_tb) from None
        File "/tmp/__autograph_generated_filepzvfxhgz.py", line 63, in tf__call
            ag__.if_stmt((ag__.ld(training) in (None, False)), if_body_2, else_body_2, get_state_2, set_state_2, ('do_return', 'retval_'), 2)
        File "/tmp/__autograph_generated_filepzvfxhgz.py", line 58, in else_body_2
            retval_ = ag__.converted_call(ag__.ld(tf).map_fn, (), dict(elems=ag__.ld(x), fn=ag__.ld(self)._apply_spec_augment, dtype=ag__.ld(tf).float32, fn_output_signature=ag__.ld(tf).float32), fscope)
        File "/tmp/__autograph_generated_filef27o6c1f.py", line 44, in tf___apply_spec_augment
            ag__.if_stmt((ag__.ld(self).n_time_masks >= 1), if_body_1, else_body_1, get_state_1, set_state_1, ('x',), 1)
        File "/tmp/__autograph_generated_filef27o6c1f.py", line 39, in if_body_1
            x = ag__.converted_call(ag__.ld(self)._apply_masks_to_axis, (ag__.ld(x),), dict(axis=ag__.ld(time_axis), mask_param=ag__.ld(self).time_mask_param, n_masks=ag__.ld(self).n_time_masks), fscope)
        File "/tmp/__autograph_generated_file3vip8w4x.py", line 80, in tf___apply_masks_to_axis
            retval_ = ag__.converted_call(ag__.ld(tf).where, (ag__.ld(mask), ag__.ld(self).mask_value, ag__.ld(x)), None, fscope)
    
        TypeError: Exception encountered when calling layer "spec_augment_1" (type SpecAugment).
        
        in user code:
        
            File "/usr/local/lib/python3.7/dist-packages/kapre/augmentation.py", line 299, in call  *
                elems=x, fn=self._apply_spec_augment, dtype=tf.float32, fn_output_signature=tf.float32
            File "/usr/local/lib/python3.7/dist-packages/kapre/augmentation.py", line 273, in _apply_spec_augment  *
                x = self._apply_masks_to_axis(
            File "/usr/local/lib/python3.7/dist-packages/kapre/augmentation.py", line 254, in _apply_masks_to_axis  *
                return tf.where(mask, self.mask_value, x)
        
            TypeError: Input 'e' of 'SelectV2' Op has type float32 that does not match type int32 of argument 't'.
        
        
        Call arguments received by layer "spec_augment_1" (type SpecAugment):
          • x=tf.Tensor(shape=(None, 397, 40, 1), dtype=float32)
          • training=True
          • kwargs=<class 'inspect._empty'>
    

    The shape of X_train is

    (2182, 205000, 1)
    

    I'm using Tensorflow 2.9.2, and Python 3.7.15

    When I remove the SpecAug layer everything runs fine. I've tested using only the melspec + a mobile net at the end and it runs smooth. The problem is apparently related to SpecAug layer.

    Do you have any idea what could be going wrong here? I appreciate any guidance related to the problem. Best regards.

    opened by nnbuainain 2
  • Full-integer quantization and kapre layers

    Full-integer quantization and kapre layers

    I am training a model which includes the mel-spectrogram block from get_melspectrogram_layer() right after the input layer. Training goes well, and I am able to change the specific mel-spec-layers to their TFLite-counterparts (STFTTflite, MagnitudeTflite) afterwards. I have checked also that the model performs as well as before.

    The model also perfoms as expected when converting the model to .tflite using dynamic range quantization. However, when using full-integer quantization, the model loses its accuracy (see (https://www.tensorflow.org/lite/performance/post_training_quantization#integer_only).

    I suppose the mel-spec starts to significantly differ as in full-integer quantization, the input values are projected to new range (int8). Is there any way to make it work with full-integer quantization?

    I guess I need to separate the mel-spec-layer from the model as a preprocessing step in order to succeed with full-integer quantization, i.e., apply the input quantization to the output values of mel-spec layer. But then I would have to deploy two models to the edge device, where the input goes first to the mel-spec-block and then to the rest of the model (?).

    I am using TensorFlow 2.7.0 and kapre 0.3.7.

    Here is my code for testing the tflite-model:

    preds = []
    # Test and evaluate the TFLite-converted model on unseen test data
    for i, sample in enumerate(X_test_full_scaled):
        X = sample
        
        if input_details['dtype'] == np.int8:
            input_scale, input_zero_point = input_details["quantization"]
            X = sample / input_scale + input_zero_point
        
        X = X.reshape((1, 8000, 1)).astype(input_details["dtype"])
        
        interpreter.set_tensor(input_index, X)
        interpreter.invoke()
        pred = interpreter.get_tensor(output_index)
        
        output_scale, output_zero_point = output_details['quantization']
        if output_details['dtype'] == np.int8:
            pred = pred.astype(np.float32)
            pred = (pred - output_zero_point) * output_scale
        
        pred = np.argmax(pred, axis=1)[0]
        preds.append(pred)
    
    preds = np.array(preds)
    
    opened by eppane 3
  • Calling Magnitude() and Phase() simultaneously

    Calling Magnitude() and Phase() simultaneously

    Hi,

    I am looking to call Magnitude() and Phase() simultaneously for the same STFT input and concatenate the magnitude and phase before feeding into the convolution layers in my CNN sequential Keras model.

    Is this possible?

    Best,

    Yang

    opened by HsuanYang-Wang 1
  • about kapre.utiils

    about kapre.utiils

    Hi, when i used "from kapre.utils import Normalization2D", I met this error which said No module named 'kapre.utils'. I see your package, and found that there is surely no utils.py. I am wondering how to slove it.

    Best wishes, Daisy

    opened by YiningWang2 1
  • Function missing in updated version

    Function missing in updated version

    I noticed there is a functon "kapre.utils.Normalization2D" in the old version, while I cannot find it in the updated version. Why? Is there have any alternative functions?

    opened by v3551G 1
  • trainable DSP parameters

    trainable DSP parameters

    hello contributers and community.

    I love your repo! It's eases so much for me! Although having the precomputation in the model is already great I'd like to know how you can optimize DSP parameters. It looks like that this is a feature from old versions (e.g. 0.2) and by default I dont see any trainable params in this layer.

    Could you please state if this is still available and how to use it?

    happy hacking Paul

    opened by bytosaur 2
Releases(Kapre-0.3.7)
Owner
Keunwoo Choi
MIR, machine learning, music recommendation.
Keunwoo Choi
Generative Adversarial Text to Image Synthesis

Text To Image Synthesis This is a tensorflow implementation of synthesizing images. The images are synthesized using the GAN-CLS Algorithm from the pa

Hao 575 Jan 08, 2023
This is a re-implementation of TransGAN: Two Pure Transformers Can Make One Strong GAN (CVPR 2021) in PyTorch.

TransGAN: Two Transformers Can Make One Strong GAN [YouTube Video] Paper Authors: Yifan Jiang, Shiyu Chang, Zhangyang Wang CVPR 2021 This is re-implem

Ahmet Sarigun 79 Jan 05, 2023
Multi-Template Mouse Brain MRI Atlas (MBMA): both in-vivo and ex-vivo

Multi-template MRI mouse brain atlas (both in vivo and ex vivo) Mouse Brain MRI atlas (both in-vivo and ex-vivo) (repository relocated from the origin

8 Nov 18, 2022
Code for Paper: Self-supervised Learning of Motion Capture

Self-supervised Learning of Motion Capture This is code for the paper: Hsiao-Yu Fish Tung, Hsiao-Wei Tung, Ersin Yumer, Katerina Fragkiadaki, Self-sup

Hsiao-Yu Fish Tung 87 Jul 25, 2022
This repository contains code to run experiments in the paper "Signal Strength and Noise Drive Feature Preference in CNN Image Classifiers."

Signal Strength and Noise Drive Feature Preference in CNN Image Classifiers This repository contains code to run experiments in the paper "Signal Stre

0 Jan 19, 2022
Implementation for "Manga Filling Style Conversion with Screentone Variational Autoencoder" (SIGGRAPH ASIA 2020 issue)

Manga Filling with ScreenVAE SIGGRAPH ASIA 2020 | Project Website | BibTex This repository is for ScreenVAE introduced in the following paper "Manga F

30 Dec 24, 2022
Charsiu: A transformer-based phonetic aligner

Charsiu: A transformer-based phonetic aligner [arXiv] Note. This is a preview version. The aligner is under active development. New functions, new lan

jzhu 166 Dec 09, 2022
Code for BMVC2021 "MOS: A Low Latency and Lightweight Framework for Face Detection, Landmark Localization, and Head Pose Estimation"

MOS-Multi-Task-Face-Detect Introduction This repo is the official implementation of "MOS: A Low Latency and Lightweight Framework for Face Detection,

104 Dec 08, 2022
Real-time analysis of intracranial neurophysiology recordings.

py_neuromodulation Click this button to run the "Tutorial ML with py_neuro" notebooks: The py_neuromodulation toolbox allows for real time capable pro

Interventional Cognitive Neuromodulation - Neumann Lab Berlin 15 Nov 03, 2022
PyTorch implementation of the paper: "Preference-Adaptive Meta-Learning for Cold-Start Recommendation", IJCAI, 2021.

PAML PyTorch implementation of the paper: "Preference-Adaptive Meta-Learning for Cold-Start Recommendation", IJCAI, 2021. (Continuously updating ) Int

15 Nov 18, 2022
GPT, but made only out of gMLPs

GPT - gMLP This repository will attempt to crack long context autoregressive language modeling (GPT) using variations of gMLPs. Specifically, it will

Phil Wang 80 Dec 01, 2022
Artificial Intelligence search algorithm base on Pacman

Pacman Search Artificial Intelligence search algorithm base on Pacman Source The Pacman Projects by the University of California, Berkeley. Layouts Di

Day Fundora 6 Nov 17, 2022
A set of simple scripts to process the Imagenet-1K dataset as TFRecords and make index files for NVIDIA DALI.

Overview This is a set of simple scripts to process the Imagenet-1K dataset as TFRecords and make index files for NVIDIA DALI. Make TFRecords To run t

8 Nov 01, 2022
Bio-Computing Platform Featuring Large-Scale Representation Learning and Multi-Task Deep Learning “螺旋桨”生物计算工具集

English | 简体中文 Latest News 2021.10.25 Paper "Docking-based Virtual Screening with Multi-Task Learning" is accepted by BIBM 2021. 2021.07.29 PaddleHeli

633 Jan 04, 2023
Volsdf - Volume Rendering of Neural Implicit Surfaces

Volume Rendering of Neural Implicit Surfaces Project Page | Paper | Data This re

Lior Yariv 221 Jan 07, 2023
Performant, differentiable reinforcement learning

deluca Performant, differentiable reinforcement learning Notes This is pre-alpha software and is undergoing a number of core changes. Updates to follo

Google 114 Dec 27, 2022
Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation, NeurIPS 2021 Spotlight

PCAN for Multiple Object Tracking and Segmentation This is the offical implementation of paper PCAN for MOTS. We also present a trailer that consists

ETH VIS Group 328 Dec 29, 2022
My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control

My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control

yobi byte 29 Oct 09, 2022
Sudoku solver - A sudoku solver with python

sudoku_solver A sudoku solver What is Sudoku? Sudoku (Japanese: 数独, romanized: s

Sikai Lu 0 May 22, 2022
A JAX-based research framework for writing differentiable numerical simulators with arbitrary discretizations

jaxdf - JAX-based Discretization Framework Overview | Example | Installation | Documentation ⚠️ This library is still in development. Breaking changes

UCL Biomedical Ultrasound Group 65 Dec 23, 2022