当前位置：网站首页>Deep learning (II) into machine learning and deep learning programming

Deep learning (II) into machine learning and deep learning programming

2022-07-28 06:12:00 【Ali forever】

Preface

The last article introduced how to construct a basic neural network , This article mainly introduces how to use pytorch Framework to implement this process , Also attached is teacher Li Hongyi's homework HW1 Code analysis of .

One 、PyTorch And TensorFlow What is it? ？

PyTorch And TensorFlow Are the framework of deep learning , There are similarities between them , These are two frameworks that are very commonly used nowadays .
What's the difference between them ？ The following table can give a summary ：

frame	PyTorch	TensorFlow
Realization way	Command programming	Symbolic programming
Definition of graph	Dynamic graph	Static diagram
efficiency	Low	High
difficulty	Low	High

The so-called imperative programming , That is to say, write code intuitively , There is no need to input data for calculation after the whole code framework is built , Very intuitive , Very easy to debug . Symbolic programming needs to set up the framework logic of the code first , Finally, input the data , Implementation code , This kind of programming is not interactive , But the efficiency is very high .

Two 、 The construction of neural network .

1. Import and stock in

To complete the basic code of deep learning, we need to introduce some libraries to help us complete

from torch.utils.data import Dataset, DataLoader
import torch.nn as nn

2.Dataset and DataLoader

Dataset Responsible for telling the program where the data is , What is the data corresponding to each index , Used to store samples and predicted values .
DataLoader Is used to group data , For such a large data set, the random extraction size is set to batch The number of , Regroup , Then form a new data set .
Insert picture description here
You can see , Originally, a large data set was split into a combination of multiple data sets .
Among them shuffle Parameters are used to control whether random sampling , If set to False, Then the new data set formed by each extraction is the same

class MyDataset(Dataset):
    def __init__(self):
        self.data=...  // Here it is used to read the data set and mark the index 
    def __getitem__(self, index):
       return self.data[index]  // Here it is used to find the data of the corresponding index 
    def __len__(self):
        return len(self.data) // Returns the total length of the dataset

dataset=MyDataset(file)
dataloader=DataLoader(dataset,batch_size=4,shuffle=True)

3.Gradient Descent

After constructing the dataset , We also need to know how to calculate the gradient descent , This is a LossFunction The core of optimization .
First of all, we need to know a very important parameter :require_grad
One tensor Set up require_grad by True Under the circumstances , To this tensor And by this tensor Other calculated tensor Derivation , And the derivative value exists tensor Of grad Properties of the , It is convenient for the optimizer to update parameters
Then we need to know another function :backward()
This function is used to derive , But what he accepts must be a scalar , Take the following example as an excuse , If z Not scalar , This function will report an error , You need to be in backward() Set the parameters of the vector , As the weight of vector , for example z.backward(torch.tensor([2.,2.,2.])) In this way, we can find the derivative of a vector of three variables . But after each derivation , Need to clear optimizer.zero_grad(), Otherwise, it will continue to take derivative again
Let's give you a simple example :

x=torch.tensor([[1.,0.],[-1.,1.],require_grad=True)
z=x.pow(2).sum()
z.backward()
print(x.grad)

Insert picture description here

4. Building neural networks

We know that neural network is a process of connection layer reduction , The final result will only output one layer , So we should construct neural network according to this logic .
First we have to understand an important function :nn.Sequential()
This function is a container , The modules used to build the neural network are added to... In the order of being passed into the constructor nn.Sequential() In the container , utilize nn.Sequential() Build the model architecture , Called when the model propagates forward forward() Method , The input received by the model is first passed into nn.Sequential() In the first network module included . then , The output of the first network module is passed into the second network module as input , Calculate and propagate in order , until nn.Sequential() The output of the last module in .

class MyModel(nn.Module):
	def _init_(self):
		super(MyModel,self)._init_()
		self.net = nn.Sequential(
			nn.Linear(10,32), //L1
			nn.Sigmoid(),   //L2
			nn.Linear(32,1) //L3
			)
	def forward(self,x):
		return self.net(x) // Equivalent to output L1(x),L2(L1),L3(L2)

Next we need to construct our LossFunction 了 , Generally, you can choose MSE And cross entropy

//Mean Squared Error (for regression tasks)
criterion = nn.MSELoss()
//Cross Entropy (for classification tasks)
criterion = nn.CrossEntropyLoss()
loss = criterion(model_output, expected_value)

Then we need to optimize , That is, adjusting parameters , Here we choose SGD

torch.optim.SGD(model.parameters(), lr, momentum = 0)

For each parameter adjustment, the gradient must be cleared optimizer.zero_grad(), Gradient descent derivation loss.backward(), And adjusting parameters optimizer.step().

5. Complete code

//part1  preparation 
dataset = MyDataset(File)
tr_set = DataLoader(dataset,16,shuffle=True)
model = Mymodel().to(device) // choice cpu still gpu
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameter(),0.1)

//part2  Training models 
for epoch in range(n_epochs):
	model.train()
	for x,y in tr_set:
		optimizer.zero_grad()
		x, y = x.to(device), y.to(device)
		pred = model(x)
		loss = criterion(pred,y)
		loss.backward()
		optimizer.step()

//part3  Validate the model , It is convenient to detect whether the model has been fitted , So as to adjust the super parameters , The network layer added to optimize training will be turned off , So that there is no offset in the evaluation .
model.eval() // Enter verification mode 
total_loss = 0
for x, y in dv_set:
	x, y = x.to(device), y.to(device)
	with torch.no_grad(): // For shielding gradient descent , Force not to calculate the gradient descent , It will not generate dynamic graph , So as to avoid too much training 
		pred = model(x)
		loss = criterion(pred,y)
	total_loss += loss.cpu().item() * len(x)
	avg_loss = total_loss / len(dv_set.dataset)

//part4  test model , Add test set , Look at the results 
model.eval()
preds=[]
for x in tt_set:
	x=x.to(device)
	with torch.no_grad():
		pred=model(x)
		preds.append(pred.cpu())
		
//part5  Save model and load model ( To comment out )
torch.save(model.state_dict(), path)
ckpt = torch.load(path)
model.load_state_dict(ckpt)

3、 ... and 、 Operation and standard code

1. Job description

Insert picture description here

After the problem description , Let's disassemble the sample code step by step .

2. Seed function

Use the same network structure , That is, the learning rate used , The number of iterations ,batch size Is the same , But in the end, the results are different , The question is CPU It won't appear in , But if GPU Run up , It is possible to have this problem , This is because GPU It will provide the infrastructure of multi-core parallel computing , Call additional random sources , So that the results cannot be accurately reproduced . So it is very important to fix our seed function .
torch.backends.cudnn.deterministic
Put this flag Set as True Words , The convolution algorithm for each return will be determined , The default algorithm .
torch.backends.cudnn.benchmark
This parameter will make the program take some time , Search for the most suitable algorithm for each convolution layer of the whole network , So as to realize the acceleration of the network , But in general PyTorch Is the default use of cuDnn Accelerated , So this parameter is generally set to False. And this parameter has a fatal problem , If the structure of neural network is dynamic , Pre optimization is required every time , Choose the most appropriate algorithm , This greatly reduces efficiency .
torch.manual_seed(seed)
This is a seed bank that generates random numbers , After calling , Use... In the back torch.rand(1) Then we will follow the seed bank to generate our random numbers .

def same_seed(seed):
    '''Fixes random number generator seeds for reproducibility.'''
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)

3. Dataset segmentation function

The main purpose of generating random numbers is to segment our data set , To get the training set , Verify the effect of the set , Here, choose to save three parameter data sets , Ratio of validation values , And seed labels .
random_split(dataset, lengths, generator=<torch._C.Generator object>)
This function requires us to provide a data set dataset, The length of the split length, Store in a list , And the generator of random seeds torch.Generator(), It can help us split automatically , And the separated data type is np.array type

def train_valid_split(data_set, valid_ratio, seed):
    '''Split provided training data into training set and validation set'''
    valid_set_size = int(valid_ratio * len(data_set))
    train_set_size = len(data_set) - valid_set_size
    train_set, valid_set = random_split(data_set, [train_set_size, valid_set_size], generator=torch.Generator().manual_seed(seed))
    return np.array(train_set), np.array(valid_set)

4.predict function

The prediction function is similar to the basic code above , I won't elaborate here

def predict(test_loader, model, device):
    model.eval() # Set your model to evaluation mode.
    preds = []
    for x in tqdm(test_loader):
        x = x.to(device)
        with torch.no_grad():
            pred = model(x)
            preds.append(pred.detach().cpu())
    preds = torch.cat(preds, dim=0).numpy()
    return preds

5. Construct the function of neural network

It has already been explained here , I'm just going to talk about it here

class COVID19Dataset(Dataset):
    ''' x: Features. y: Targets, if none, do prediction. '''
    def __init__(self, x, y=None):
        if y is None:
            self.y = y
        else:
            self.y = torch.FloatTensor(y)
        self.x = torch.FloatTensor(x)

    def __getitem__(self, idx):
        if self.y is None:
            return self.x[idx]
        else:
            return self.x[idx], self.y[idx]

    def __len__(self):
        return len(self.x)
class My_Model(nn.Module):
    def __init__(self, input_dim):
        super(My_Model, self).__init__()
        # TODO: modify model's structure, be aware of dimensions.
        self.layers = nn.Sequential(
            nn.Linear(input_dim, 16),
            nn.ReLU(),
            nn.Linear(16, 8),
            nn.ReLU(),
            nn.Linear(8, 1)
        )

    def forward(self, x):
        x = self.layers(x)
        x = x.squeeze(1) # (B, 1) -> (B)
        return x

6. Choose the right features

Because there are many characteristics in the data , So we can also reduce eigenvalues to train , Avoid overfitting
Here we need to pay attention to [:, -1] Refers to all elements in the last column , and [:, :-1] Is all the elements up to the last column

def select_feat(train_data, valid_data, test_data, select_all=True):
    '''Selects useful features to perform regression'''
    y_train, y_valid = train_data[:, -1], valid_data[:, -1]
    raw_x_train, raw_x_valid, raw_x_test = train_data[:, :-1], valid_data[:, :-1], test_data

    if select_all:
        feat_idx = list(range(raw_x_train.shape[1]))
    else:
        feat_idx = [0, 1, 2, 3, 4]  # TODO: Select suitable feature columns.

    return raw_x_train[:, feat_idx], raw_x_valid[:, feat_idx], raw_x_test[:, feat_idx], y_train, y_valid

7. Model training function

The training model actually includes two aspects: training model and verification model . The training model needs to be configured first lossfunction Follow optimizer, Then we can train according to the previous basic code , And record every training loss value , And then calculate the average loss value , Then we need to verify , See if there is any fitting , The verification steps are very similar to the training steps . Our model also provides early termination of training , If the reduction cannot be achieved many times loss when , In order to improve the efficiency of training , We will choose to stop training , And make a reminder .
tqdm(iterable)
This is a progress bar function , Just put an iteratable object in it , You can display the progress bar
.set_description(f’Epoch [{epoch+1}/{n_epochs}]')
.set_postfix({‘loss’: loss.detach().item()})
.set_description You can add a description in front of the progress bar ,.set_postfix You can add a series of descriptions after the progress bar , Like this :
Insert picture description here
SummaryWriter()
Used to build a tensorboard object , It is convenient to call later tensorboard The function of is visualized .
SummaryWriter().add_scalar(‘Loss/train’, mean_train_loss, step)
This is tensorboard Create a chart , The first parameter is the name of the chart , The second is y Axis data , The third parameter is x Axis data

def trainer(train_loader, valid_loader, model, config, device):

    criterion = nn.MSELoss(reduction='mean') # Define your loss function, do not modify this.

    # Define your optimization algorithm. 
    # TODO: Please check https://pytorch.org/docs/stable/optim.html to get more available algorithms.
    # TODO: L2 regularization (optimizer(weight decay...) or implement by your self).
    optimizer = torch.optim.SGD(model.parameters(), lr=config['learning_rate'], momentum=0.9) 

    writer = SummaryWriter() # Writer of tensoboard.

    if not os.path.isdir('./models'):
        os.mkdir('./models') # Create directory of saving models.

    n_epochs, best_loss, step, early_stop_count = config['n_epochs'], math.inf, 0, 0

    for epoch in range(n_epochs):
        model.train() # Set your model to train mode.
        loss_record = []

        # tqdm is a package to visualize your training progress.
        train_pbar = tqdm(train_loader, position=0, leave=True)

        for x, y in train_pbar:
            optimizer.zero_grad()               # Set gradient to zero.
            x, y = x.to(device), y.to(device)   # Move your data to device. 
            pred = model(x)             
            loss = criterion(pred, y)
            loss.backward()                     # Compute gradient(backpropagation).
            optimizer.step()                    # Update parameters.
            step += 1
            loss_record.append(loss.detach().item())
            
            # Display current epoch number and loss on tqdm progress bar.
            train_pbar.set_description(f'Epoch [{
      epoch+1}/{
      n_epochs}]')
            train_pbar.set_postfix({
    'loss': loss.detach().item()})

        mean_train_loss = sum(loss_record)/len(loss_record)
        writer.add_scalar('Loss/train', mean_train_loss, step)

        model.eval() # Set your model to evaluation mode.
        loss_record = []
        for x, y in valid_loader:
            x, y = x.to(device), y.to(device)
            with torch.no_grad():
                pred = model(x)
                loss = criterion(pred, y)

            loss_record.append(loss.item())
            
        mean_valid_loss = sum(loss_record)/len(loss_record)
        print(f'Epoch [{
      epoch+1}/{
      n_epochs}]: Train loss: {
      mean_train_loss:.4f}, Valid loss: {
      mean_valid_loss:.4f}')
        writer.add_scalar('Loss/valid', mean_valid_loss, step)

        if mean_valid_loss < best_loss:
            best_loss = mean_valid_loss
            torch.save(model.state_dict(), config['save_path']) # Save your best model
            print('Saving model with loss {:.3f}...'.format(best_loss))
            early_stop_count = 0
        else: 
            early_stop_count += 1

        if early_stop_count >= config['early_stop']:
            print('\nModel is not improving, so we halt the training session.')
            return

8. Parameter configuration

We need to configure some super parameters and ordinary parameters , In order to make it easy to check, wrap it in a container .

device = 'cuda' if torch.cuda.is_available() else 'cpu'
config = {
    
    'seed': 5201314,      # Your seed number, you can pick your lucky number. :)
    'select_all': True,   # Whether to use all features.
    'valid_ratio': 0.2,   # validation_size = train_size * valid_ratio
    'n_epochs': 3000,     # Number of epochs. 
    'batch_size': 256, 
    'learning_rate': 1e-5,              
    'early_stop': 400,    # If model has not improved for this many consecutive epochs, stop training. 
    'save_path': './models/model.ckpt'  # Your model will be saved here.
}

9. To configure DataLoader

First , We need to import the training set first covid.train.csv And test set covid.test.csv, Then use random seeds to segment the training set , Then choose whether you want all the characteristic values , Then you can covid19 Of dataset Encapsulate them , Reuse DataLoader Carry out small batch segmentation to get the final DataLoader

# Set seed for reproducibility
same_seed(config['seed'])


# train_data size: 2699 x 118 (id + 37 states + 16 features x 5 days) 
# test_data size: 1078 x 117 (without last day's positive rate)
train_data, test_data = pd.read_csv('./covid.train.csv').values, pd.read_csv('./covid.test.csv').values
train_data, valid_data = train_valid_split(train_data, config['valid_ratio'], config['seed'])

# Print out the data size.
print(f"""train_data size: {
      train_data.shape} valid_data size: {
      valid_data.shape} test_data size: {
      test_data.shape}""")

# Select features
x_train, x_valid, x_test, y_train, y_valid = select_feat(train_data, valid_data, test_data, config['select_all'])

# Print out the number of features.
print(f'number of features: {
      x_train.shape[1]}')

train_dataset, valid_dataset, test_dataset = COVID19Dataset(x_train, y_train), \
                                            COVID19Dataset(x_valid, y_valid), \
                                            COVID19Dataset(x_test)

# Pytorch data loader loads pytorch dataset into batches.
train_loader = DataLoader(train_dataset, batch_size=config['batch_size'], shuffle=True, pin_memory=True)
valid_loader = DataLoader(valid_dataset, batch_size=config['batch_size'], shuffle=True, pin_memory=True)
test_loader = DataLoader(test_dataset, batch_size=config['batch_size'], shuffle=False, pin_memory=True)

10. Training models

model = My_Model(input_dim=x_train.shape[1]).to(device) # put your model and data on the same computation device.
trainer(train_loader, valid_loader, model, config, device)

11. test model

def save_pred(preds, file):
    ''' Save predictions to specified file '''
    with open(file, 'w') as fp:
        writer = csv.writer(fp)
        writer.writerow(['id', 'tested_positive'])
        for i, p in enumerate(preds):
            writer.writerow([i, p])

model = My_Model(input_dim=x_train.shape[1]).to(device)
model.load_state_dict(torch.load(config['save_path']))
preds = predict(test_loader, model, device) 
save_pred(preds, 'pred.csv')

summary

This article introduces Li Hongyi's in-depth learning introductory programming code and the source code analysis of his first assignment , I hope you can really get started and learn deeply , Attached below is a mind map for memory .
Insert picture description here