当前位置：网站首页>Python loading voice class custom dataset

Python loading voice class custom dataset

2020-11-09 12:31:00 【The war of rebellion】

　　pytorch It is convenient for the following common public datasets API Interface , But when we need to use our own data sets to train neural networks , You need to customize the dataset , stay pytorch in , Provides some classes , It's easy for us to define our own data set

torch.utils.data.Dataset： All subclasses that inherit from him should override __len()__ , __getitem()__ These two methods
- __len()__ ： Returns the amount of data in the dataset
- __getitem()__ ： Returns a data that supports subscript indexing
torch.utils.data.DataLoader： Wrapping data sets , You can set batch_size、 whether shuffle....

First step

　　 Self defined Dataset All need to inherit torch.utils.data.Dataset class , And rewrite its two member methods ：

__len()__： Reading data , Return data and tags
__getitem()__： Returns the length of the dataset

from torch.utils.data import Dataset


class AudioDataset(Dataset):
    def __init__(self, ...):
        """ Class initialization """
        pass

    def __getitem__(self, item):
        """ How to read the data every time , Return data and tags """
        return data, label

    def __len__(self):
        """ Returns the length of the entire dataset """
        return total

matters needing attention ：Dataset Only responsible for data abstraction , One call getiitem Only one sample is returned

Case study ：

　　 File directory structure

p225
- ***.wav
- ***.wav
- ***.wav
- ...
dataset.py

Purpose ： Read p225 Audio data in the folder

 1 class AudioDataset(Dataset):
 2     def __init__(self, data_folder, sr=16000, dimension=8192):
 3         self.data_folder = data_folder
 4         self.sr = sr
 5         self.dim = dimension
 6 
 7         #  Get a list of audio names 
 8         self.wav_list = []
 9         for root, dirnames, filenames in os.walk(data_folder):
10             for filename in fnmatch.filter(filenames, "*.wav"):  #  Implement the filtering or filtering of special characters in the list , Return match “.wav” A list of characters 
11                 self.wav_list.append(os.path.join(root, filename))
12 
13     def __getitem__(self, item):
14         #  Read an audio file , Return every audio data 
15         filename = self.wav_list[item]
16         wb_wav, _ = librosa.load(filename, sr=self.sr)
17 
18         #  take   frame 
19         if len(wb_wav) >= self.dim:
20             max_audio_start = len(wb_wav) - self.dim
21             audio_start = np.random.randint(0, max_audio_start)
22             wb_wav = wb_wav[audio_start: audio_start + self.dim]
23         else:
24             wb_wav = np.pad(wb_wav, (0, self.dim - len(wb_wav)), "constant")
25 
26         return wb_wav, filename
27 
28     def __len__(self):
29         #  The total number of audio files 
30         return len(self.wav_list)

matters needing attention ：19-24 That's ok ： The length of each audio is different , If you read the data directly and return it , It will cause dimension mismatch and error , Therefore, you can only take one audio file and read one frame at a time , This obviously doesn't use all the voice data ,

The second step

　　 Instantiation Dataset object

Dataset= AudioDataset("./p225", sr=16000)

If you want to pass batch To read data, you can skip to step 3 , If you want to read data one by one, you can see my next operation

#  Instantiation AudioDataset object 
train_set = AudioDataset("./p225", sr=16000)

for i, data in enumerate(train_set):
    wb_wav, filname = data
    print(i, wb_wav.shape, filname)

    if i == 3:
        break
    # 0 (8192,) ./p225\p225_001.wav
    # 1 (8192,) ./p225\p225_002.wav
    # 2 (8192,) ./p225\p225_003.wav
    # 3 (8192,) ./p225\p225_004.wav

The third step

　　 If you want to pass batch Reading data , Need to use DataLoader For packaging

Why use DataLoader？

The input to deep learning is mini_batch form
The sample loading may need to be randomly disordered ,shuffle operation
Sample loading requires multithreading

　　pytorch Provided DataLoader Encapsulates the above functions , It's more convenient to use .

DataLoader(dataset, batch_size=1, shuffle=False, sampler=None, num_workers=0, collate_fn=default_collate, pin_memory=False, drop_last=False)

Parameters ：

dataset： Loaded dataset （Dataset object ）
batch_size： How many samples should be loaded per batch （ The default value is ：1）
shuffle： Every epoch Whether to scramble the data
sampler： Define a strategy for extracting samples from a dataset . If specified , You can't specify shuffle .
batch_sampler： Be similar to sampler, But one index at a time . And batch_size、shuffle、sampler and drop_last Mutually exclusive .
num_workers： Number of processes loaded with multiple processes ,0 Represents not using multithreading
collate_fn： How to splice multiple sample data into one batch, The default splicing method is generally used
pin_memory： Whether to save the data in pin memory District ,pin memory Data in go to GPU It's going to be faster
drop_last：dataset The number of data in may not be batch_size Integer multiple ,drop_last by True There will be less than one more batch The data is discarded

return ： Data loader

Case study ：

#  Instantiation AudioDataset object 
train_set = AudioDataset("./p225", sr=16000)
train_loader = DataLoader(train_set, batch_size=8, shuffle=True)

for (i, data) in enumerate(train_loader):
    wav_data, wav_name = data
    print(wav_data.shape)   # torch.Size([8, 8192])
    print(i, wav_name)
    # ('./p225\\p225_293.wav', './p225\\p225_156.wav', './p225\\p225_277.wav', './p225\\p225_210.wav',
    # './p225\\p225_126.wav', './p225\\p225_021.wav', './p225\\p225_257.wav', './p225\\p225_192.wav')

Let's have some chestnuts to digest ：

chestnuts 1

　　 This example is the one that has been used in this article , chestnuts 1 It's just a merger

　　 File directory structure

p225
- ***.wav
- ***.wav
- ***.wav
- ...
dataset.py

Purpose ： Read p225 Audio data in the folder

 1 import fnmatch
 2 import os
 3 import librosa
 4 import numpy as np
 5 from torch.utils.data import Dataset
 6 from torch.utils.data import DataLoader
 7 
 8 
 9 class Aduio_DataLoader(Dataset):
10     def __init__(self, data_folder, sr=16000, dimension=8192):
11         self.data_folder = data_folder
12         self.sr = sr
13         self.dim = dimension
14 
15         #  Get a list of audio names 
16         self.wav_list = []
17         for root, dirnames, filenames in os.walk(data_folder):
18             for filename in fnmatch.filter(filenames, "*.wav"):  #  Implement the filtering or filtering of special characters in the list , Return match “.wav” A list of characters 
19                 self.wav_list.append(os.path.join(root, filename))
20 
21     def __getitem__(self, item):
22         #  Read an audio file , Return every audio data 
23         filename = self.wav_list[item]
24         print(filename)
25         wb_wav, _ = librosa.load(filename, sr=self.sr)
26 
27         #  take   frame 
28         if len(wb_wav) >= self.dim:
29             max_audio_start = len(wb_wav) - self.dim
30             audio_start = np.random.randint(0, max_audio_start)
31             wb_wav = wb_wav[audio_start: audio_start + self.dim]
32         else:
33             wb_wav = np.pad(wb_wav, (0, self.dim - len(wb_wav)), "constant")
34 
35         return wb_wav, filename
36 
37     def __len__(self):
38         #  The total number of audio files 
39         return len(self.wav_list)
40 
41 
42 train_set = Aduio_DataLoader("./p225", sr=16000)
43 train_loader = DataLoader(train_set, batch_size=8, shuffle=True)
44 
45 
46 for (i, data) in enumerate(train_loader):
47     wav_data, wav_name = data
48     print(wav_data.shape)   # torch.Size([8, 8192])
49     print(i, wav_name)
50     # ('./p225\\p225_293.wav', './p225\\p225_156.wav', './p225\\p225_277.wav', './p225\\p225_210.wav',
51     # './p225\\p225_126.wav', './p225\\p225_021.wav', './p225\\p225_257.wav', './p225\\p225_192.wav')

matters needing attention ：

27-33 That's ok ： The length of each audio is different , If you read the data directly and return it , It will cause dimension mismatch and error , Therefore, you can only take one audio file and read one frame at a time , This obviously doesn't use all the voice data ,

48 That's ok ： We are __getitem__ There's no such thing as numpy Array to tensor Format , But no 48 The row display data is tensor Format . It needs attention here

chestnuts 2

　　 Compared to the case 1, Case two is the point , Because we can't just read one frame from one audio file at a time , And then read another audio file , Usually , A piece of audio has many frames , What we need is to read one in sequence batch_size Audio frame of , First read the first audio file , If one is satisfied batch, You don't have to read the second batch, If there is less than one batch Then read the second audio file , To add .

　　 I give a suggestion , First read each audio file in order , By window length 8192、 Frame shift 4096 Frame the voice , Then joining together . obtain （ frames , Frame length ,1）（frame_num, frame_len, 1） The array of is saved to h5 in . And then use the above torch.utils.data.Dataset and torch.utils.data.DataLoader Reading data .

Specific implementation code ：

　　 First step ： Create a H5_generation Scripts are used to convert data into h5 Format file ：

　　 The second step ： adopt Dataset from h5 Format file to read data

import numpy as np
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
import h5py

def load_h5(h5_path):
    # load training data
    with h5py.File(h5_path, 'r') as hf:
        print('List of arrays in input file:', hf.keys())
        X = np.array(hf.get('data'), dtype=np.float32)
        Y = np.array(hf.get('label'), dtype=np.float32)
    return X, Y


class AudioDataset(Dataset):
    """ Data loader """
    def __init__(self, data_folder):
        self.data_folder = data_folder
        self.X, self.Y = load_h5(data_folder)   # (3392, 8192, 1)

    def __getitem__(self, item):
        #  Return an audio data 
        X = self.X[item]
        Y = self.Y[item]

        return X, Y

    def __len__(self):
        return len(self.X)


train_set = AudioDataset("./speaker225_resample_train.h5")
train_loader = DataLoader(train_set, batch_size=64, shuffle=True, drop_last=True)


for (i, wav_data) in enumerate(train_loader):
    X, Y = wav_data
    print(i, X.shape)
    # 0 torch.Size([64, 8192, 1])
    # 1 torch.Size([64, 8192, 1])
    # ...