当前位置:网站首页>Python loading voice class custom dataset

Python loading voice class custom dataset

2020-11-09 12:31:00 The war of rebellion

  pytorch It is convenient for the following common public datasets API Interface , But when we need to use our own data sets to train neural networks , You need to customize the dataset , stay pytorch in , Provides some classes , It's easy for us to define our own data set

  • torch.utils.data.Dataset: All subclasses that inherit from him should override   __len()__  , __getitem()__  These two methods
    •  __len()__ : Returns the amount of data in the dataset
    •   __getitem()__ : Returns a data that supports subscript indexing
  • torch.utils.data.DataLoader: Wrapping data sets , You can set batch_size、 whether shuffle....

First step

   Self defined Dataset All need to inherit torch.utils.data.Dataset class , And rewrite its two member methods :

  • __len()__: Reading data , Return data and tags
  • __getitem()__: Returns the length of the dataset
from torch.utils.data import Dataset

class AudioDataset(Dataset):
    def __init__(self, ...):
        """ Class initialization """

    def __getitem__(self, item):
        """ How to read the data every time , Return data and tags """
        return data, label

    def __len__(self):
        """ Returns the length of the entire dataset """
        return total

matters needing attention :Dataset Only responsible for data abstraction , One call getiitem Only one sample is returned

Case study :

   File directory structure

  • p225
    • ***.wav
    • ***.wav
    • ***.wav
    • ...
  • dataset.py

Purpose : Read p225 Audio data in the folder

 1 class AudioDataset(Dataset):
 2     def __init__(self, data_folder, sr=16000, dimension=8192):
 3         self.data_folder = data_folder
 4         self.sr = sr
 5         self.dim = dimension
 7         #  Get a list of audio names 
 8         self.wav_list = []
 9         for root, dirnames, filenames in os.walk(data_folder):
10             for filename in fnmatch.filter(filenames, "*.wav"):  #  Implement the filtering or filtering of special characters in the list , Return match “.wav” A list of characters 
11                 self.wav_list.append(os.path.join(root, filename))
13     def __getitem__(self, item):
14         #  Read an audio file , Return every audio data 
15         filename = self.wav_list[item]
16         wb_wav, _ = librosa.load(filename, sr=self.sr)
18         #  take   frame 
19         if len(wb_wav) >= self.dim:
20             max_audio_start = len(wb_wav) - self.dim
21             audio_start = np.random.randint(0, max_audio_start)
22             wb_wav = wb_wav[audio_start: audio_start + self.dim]
23         else:
24             wb_wav = np.pad(wb_wav, (0, self.dim - len(wb_wav)), "constant")
26         return wb_wav, filename
28     def __len__(self):
29         #  The total number of audio files 
30         return len(self.wav_list)

matters needing attention :19-24 That's ok : The length of each audio is different , If you read the data directly and return it , It will cause dimension mismatch and error , Therefore, you can only take one audio file and read one frame at a time , This obviously doesn't use all the voice data ,

The second step

   Instantiation Dataset  object

Dataset= AudioDataset("./p225", sr=16000)

If you want to pass batch To read data, you can skip to step 3 , If you want to read data one by one, you can see my next operation

#  Instantiation AudioDataset object 
train_set = AudioDataset("./p225", sr=16000)

for i, data in enumerate(train_set):
    wb_wav, filname = data
    print(i, wb_wav.shape, filname)

    if i == 3:
    # 0 (8192,) ./p225\p225_001.wav
    # 1 (8192,) ./p225\p225_002.wav
    # 2 (8192,) ./p225\p225_003.wav
    # 3 (8192,) ./p225\p225_004.wav

The third step

   If you want to pass batch Reading data , Need to use DataLoader For packaging

Why use DataLoader?

  1. The input to deep learning is mini_batch form
  2. The sample loading may need to be randomly disordered ,shuffle operation
  3. Sample loading requires multithreading

  pytorch Provided  DataLoader  Encapsulates the above functions , It's more convenient to use .

DataLoader(dataset, batch_size=1, shuffle=False, sampler=None, num_workers=0, collate_fn=default_collate, pin_memory=False, drop_last=False)


  • dataset: Loaded dataset (Dataset object )
  • batch_size How many samples should be loaded per batch ( The default value is :1)
  • shuffle: Every epoch Whether to scramble the data
  • sampler Define a strategy for extracting samples from a dataset . If specified , You can't specify shuffle .
  • batch_sampler Be similar to sampler, But one index at a time . And batch_size、shuffle、sampler and drop_last Mutually exclusive .
  • num_workers: Number of processes loaded with multiple processes ,0 Represents not using multithreading
  • collate_fn: How to splice multiple sample data into one batch, The default splicing method is generally used
  • pin_memory: Whether to save the data in pin memory District ,pin memory Data in go to GPU It's going to be faster
  • drop_last:dataset The number of data in may not be batch_size Integer multiple ,drop_last by True There will be less than one more batch The data is discarded

return : Data loader

Case study :

#  Instantiation AudioDataset object 
train_set = AudioDataset("./p225", sr=16000)
train_loader = DataLoader(train_set, batch_size=8, shuffle=True)

for (i, data) in enumerate(train_loader):
    wav_data, wav_name = data
    print(wav_data.shape)   # torch.Size([8, 8192])
    print(i, wav_name)
    # ('./p225\\p225_293.wav', './p225\\p225_156.wav', './p225\\p225_277.wav', './p225\\p225_210.wav',
    # './p225\\p225_126.wav', './p225\\p225_021.wav', './p225\\p225_257.wav', './p225\\p225_192.wav')

Let's have some chestnuts to digest :

chestnuts 1

   This example is the one that has been used in this article , chestnuts 1 It's just a merger

   File directory structure

  • p225
    • ***.wav
    • ***.wav
    • ***.wav
    • ...
  • dataset.py

Purpose : Read p225 Audio data in the folder

 1 import fnmatch
 2 import os
 3 import librosa
 4 import numpy as np
 5 from torch.utils.data import Dataset
 6 from torch.utils.data import DataLoader
 9 class Aduio_DataLoader(Dataset):
10     def __init__(self, data_folder, sr=16000, dimension=8192):
11         self.data_folder = data_folder
12         self.sr = sr
13         self.dim = dimension
15         #  Get a list of audio names 
16         self.wav_list = []
17         for root, dirnames, filenames in os.walk(data_folder):
18             for filename in fnmatch.filter(filenames, "*.wav"):  #  Implement the filtering or filtering of special characters in the list , Return match “.wav” A list of characters 
19                 self.wav_list.append(os.path.join(root, filename))
21     def __getitem__(self, item):
22         #  Read an audio file , Return every audio data 
23         filename = self.wav_list[item]
24         print(filename)
25         wb_wav, _ = librosa.load(filename, sr=self.sr)
27         #  take   frame 
28         if len(wb_wav) >= self.dim:
29             max_audio_start = len(wb_wav) - self.dim
30             audio_start = np.random.randint(0, max_audio_start)
31             wb_wav = wb_wav[audio_start: audio_start + self.dim]
32         else:
33             wb_wav = np.pad(wb_wav, (0, self.dim - len(wb_wav)), "constant")
35         return wb_wav, filename
37     def __len__(self):
38         #  The total number of audio files 
39         return len(self.wav_list)
42 train_set = Aduio_DataLoader("./p225", sr=16000)
43 train_loader = DataLoader(train_set, batch_size=8, shuffle=True)
46 for (i, data) in enumerate(train_loader):
47     wav_data, wav_name = data
48     print(wav_data.shape)   # torch.Size([8, 8192])
49     print(i, wav_name)
50     # ('./p225\\p225_293.wav', './p225\\p225_156.wav', './p225\\p225_277.wav', './p225\\p225_210.wav',
51     # './p225\\p225_126.wav', './p225\\p225_021.wav', './p225\\p225_257.wav', './p225\\p225_192.wav')

matters needing attention

  1. 27-33 That's ok : The length of each audio is different , If you read the data directly and return it , It will cause dimension mismatch and error , Therefore, you can only take one audio file and read one frame at a time , This obviously doesn't use all the voice data ,
  2. 48 That's ok : We are __getitem__ There's no such thing as numpy Array to tensor Format , But no 48 The row display data is tensor Format . It needs attention here

chestnuts 2

   Compared to the case 1, Case two is the point , Because we can't just read one frame from one audio file at a time , And then read another audio file , Usually , A piece of audio has many frames , What we need is to read one in sequence batch_size Audio frame of , First read the first audio file , If one is satisfied batch, You don't have to read the second batch, If there is less than one batch Then read the second audio file , To add .

   I give a suggestion , First read each audio file in order , By window length 8192、 Frame shift 4096 Frame the voice , Then joining together . obtain ( frames , Frame length ,1)(frame_num, frame_len, 1) The array of is saved to h5 in . And then use the above  torch.utils.data.Dataset  and  torch.utils.data.DataLoader  Reading data .

Specific implementation code :

   First step : Create a H5_generation Scripts are used to convert data into h5 Format file :

   The second step : adopt Dataset from h5 Format file to read data

import numpy as np
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
import h5py

def load_h5(h5_path):
    # load training data
    with h5py.File(h5_path, 'r') as hf:
        print('List of arrays in input file:', hf.keys())
        X = np.array(hf.get('data'), dtype=np.float32)
        Y = np.array(hf.get('label'), dtype=np.float32)
    return X, Y

class AudioDataset(Dataset):
    """ Data loader """
    def __init__(self, data_folder):
        self.data_folder = data_folder
        self.X, self.Y = load_h5(data_folder)   # (3392, 8192, 1)

    def __getitem__(self, item):
        #  Return an audio data 
        X = self.X[item]
        Y = self.Y[item]

        return X, Y

    def __len__(self):
        return len(self.X)

train_set = AudioDataset("./speaker225_resample_train.h5")
train_loader = DataLoader(train_set, batch_size=64, shuffle=True, drop_last=True)

for (i, wav_data) in enumerate(train_loader):
    X, Y = wav_data
    print(i, X.shape)
    # 0 torch.Size([64, 8192, 1])
    # 1 torch.Size([64, 8192, 1])
    # ...

I'm trying to __init__ In the middle of h5 file , But it can cause a memory explosion , It's strange , So I had to leave ,

Reference resources

pytorch Study ( Four )— Custom datasets ( It's quite detailed )


本文为[The war of rebellion]所创,转载请带上原文链接,感谢