当前位置：网站首页>PyTorch Study Notes 08 - Loading Datasets

PyTorch Study Notes 08 - Loading Datasets

2022-07-31 06:32:00 【qq_50749521】

PyTorch学习笔记08——加载数据集

在这里插入图片描述
in the last diabetes dataset,We are using the entire datasetinput计算的.Consider this timemini_batch的输入方式.

三个概念：

epoch:All training samples are called one round at a timeepoch
Batch-Size：when batch training,The number of samples included in each batch
iteration：One pass per batch is called oneiteration

比如一个数据集有200个样本,divide him40块,Every piece has it5个样本.
那么batch = 40, batch_size = 5.
训练的时候,Train per block,put a piece5sample rounds,叫做1个itearion.
这40The blocks are all round again,就是200All samples are trained once,叫做1个epoch.

DataLoader：A way to load a dataset
What can he do for us？We are going to do mini-batch training,To improve the randomness of training,We can do it on the datasetshuffle.
When sending a dataset that supports index and length knowndataloader里,automatically correctdatasetGenerate small batches.

dataset -> Shuffle ->
Loader

在这里插入图片描述
How to define your datasetDataset？
Provide a conceptual code：

from torch.utils.data import Dataset
from torch.utils.data import DataLoader

class DiabetesDataset(Dataset):
    def __init__(self):
        pass
    def __getitem__(self, index):
        pass
    def __len__(self):
        pass

dataset = DiabetesDataset()
train_loader = DataLoader(dataset = dataset,
                          batch_size = 32,
                          shuffle = True,
                          num_workers = 2)

Pytorch提供了一种Dataset类,This is an abstract class,We know that abstract classes cannot be instantiated,但可以被继承.

上面的DiabetesDatasetIt is an inheritance that we wrote ourselvesDataset的类.表达式getitem、lenAll are magic functions,Return the value and the length of the dataset, respectively.
实例化DiabetesDataset后,通过Dataloaderto automatically create mini-batch datasets. 这里用batch_size, shuffle,
process number来初始化.

batch_size = 32Determine the number of samples per batch,shuffle = TrueConfirm to scramble the dataset,num_workers = 2Indicates when this data will be read in the future,构成mini_batch的时候,Usually multithreading is used.Here, two threads are used to read data in parallel.CPUMore cores can be set a little more.

这样,We succeeded in getting the dataset the way we wantedtrain_loader,可以开始训练了~

for epoch in range(100):
    for index, data in enumerate(train_loader, 0):
#index 返回的是batch(总样本数/batch_size)索引,data返回（inputs, labels）的张量数据

Mini-batch training on diabetes data,整个代码如下：

import torch
import numpy as np
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

class DiabetesDataset(Dataset):
    def __init__(self, filepath):
        xy = np.loadtxt(filepath, delimiter = ',', dtype = np.float32)#Directly read and store in memory
        self.len = xy.shape[0] #样本数量
        self.x_data = torch.from_numpy(xy[:, :-1])
        self.y_data = torch.from_numpy(xy[:, [-1]])
    def __getitem__(self, index):
        return self.x_data[index], self.y_data[index]#返回元组
    def __len__(self):
        return self.len
    
dataset = DiabetesDataset('F:\ASR-source\Dataset\diabetes.csv.gz')
train_loader = DataLoader(dataset = dataset,
                          batch_size = 32,
                          shuffle = True,
                         num_workers = 0)
batch_size = 32
batch = np.round(dataset.__len__() / batch_size)
batch

24.0

class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.linear1 = torch.nn.Linear(8, 1)
        self.sigmoid = torch.nn.Sigmoid()
        
    def forward(self, x):
        x = self.sigmoid(self.linear1(x))
        return x

mymodel = Model()

criterion = torch.nn.BCELoss(reduction='mean')
optimizer = torch.optim.SGD(mymodel.parameters(), lr = 0.01)

epoch_list = []
loss_list = []
sum_loss = 0
if __name__ == '__main__':
    for epoch in range(100):
        for index, data in enumerate(train_loader, 0):  #train_loader存的是分割组合后的小批量训练样本和对应的标签
            inputs, labels = data #inputs labels都是张量
            y_pred = mymodel(inputs)
            loss = criterion(y_pred, labels)

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            sum_loss += loss.item()

            print('epoch = ', epoch + 1,'index = ', index+1, 'loss = ', loss.item())
        epoch_list.append(epoch)
        loss_list.append(sum_loss/batch)
        print(sum_loss/batch)
        sum_loss = 0

epoch =  1 index =  1 loss =  0.6523504257202148
epoch =  1 index =  2 loss =  0.6662447452545166
epoch =  1 index =  3 loss =  0.6510850191116333
epoch =  1 index =  4 loss =  0.622829794883728
epoch =  1 index =  5 loss =  0.6272122263908386
epoch =  1 index =  6 loss =  0.5990191102027893
epoch =  1 index =  7 loss =  0.6213780045509338
epoch =  1 index =  8 loss =  0.6761874556541443
epoch =  1 index =  9 loss =  0.6133689880371094
epoch =  1 index =  10 loss =  0.6413829326629639
epoch =  1 index =  11 loss =  0.6246744394302368
epoch =  1 index =  12 loss =  0.6163585782051086
epoch =  1 index =  13 loss =  0.599936306476593
epoch =  1 index =  14 loss =  0.6216733455657959
epoch =  1 index =  15 loss =  0.6504020094871521
epoch =  1 index =  16 loss =  0.6451072096824646
epoch =  1 index =  17 loss =  0.6215073466300964
epoch =  1 index =  18 loss =  0.6641662120819092
epoch =  1 index =  19 loss =  0.6364893317222595
epoch =  1 index =  20 loss =  0.6020426154136658
epoch =  1 index =  21 loss =  0.617006778717041
epoch =  1 index =  22 loss =  0.653681218624115
epoch =  1 index =  23 loss =  0.5835389494895935
epoch =  1 index =  24 loss =  0.6029499173164368
0.6296080400546392
epoch =  2 index =  1 loss =  0.6385740637779236
epoch =  2 index =  2 loss =  0.6440627574920654
epoch =  2 index =  3 loss =  0.6580216288566589
........

在这里插入图片描述
Replace with the following model,Iterate two hundred times,结果是这样的

class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.linear1 = torch.nn.Linear(8, 6)
        self.linear2 = torch.nn.Linear(6, 4)
        self.linear3 = torch.nn.Linear(4, 1)
        self.relu = torch.nn.ReLU()
        self.sigmoid = torch.nn.Sigmoid()
        
    def forward(self, x):
        x = self.sigmoid(self.linear1(x))
        x = self.relu(self.linear2(x))
        x = self.sigmoid(self.linear3(x))#注意最后一步不能使用relu,避免无法计算梯度
        return x