当前位置:网站首页>Deep learning (II) into machine learning and deep learning programming
Deep learning (II) into machine learning and deep learning programming
2022-07-28 06:12:00 【Ali forever】
Machine learning and deep learning programming
Preface
The last article introduced how to construct a basic neural network , This article mainly introduces how to use pytorch Framework to implement this process , Also attached is teacher Li Hongyi's homework HW1 Code analysis of .
One 、PyTorch And TensorFlow What is it? ?
PyTorch And TensorFlow Are the framework of deep learning , There are similarities between them , These are two frameworks that are very commonly used nowadays .
What's the difference between them ? The following table can give a summary :
| frame | PyTorch | TensorFlow |
|---|---|---|
| Realization way | Command programming | Symbolic programming |
| Definition of graph | Dynamic graph | Static diagram |
| efficiency | Low | High |
| difficulty | Low | High |
The so-called imperative programming , That is to say, write code intuitively , There is no need to input data for calculation after the whole code framework is built , Very intuitive , Very easy to debug . Symbolic programming needs to set up the framework logic of the code first , Finally, input the data , Implementation code , This kind of programming is not interactive , But the efficiency is very high .
Two 、 The construction of neural network .
1. Import and stock in
To complete the basic code of deep learning, we need to introduce some libraries to help us complete
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
2.Dataset and DataLoader
Dataset Responsible for telling the program where the data is , What is the data corresponding to each index , Used to store samples and predicted values .
DataLoader Is used to group data , For such a large data set, the random extraction size is set to batch The number of , Regroup , Then form a new data set .
You can see , Originally, a large data set was split into a combination of multiple data sets .
Among them shuffle Parameters are used to control whether random sampling , If set to False, Then the new data set formed by each extraction is the same
class MyDataset(Dataset):
def __init__(self):
self.data=... // Here it is used to read the data set and mark the index
def __getitem__(self, index):
return self.data[index] // Here it is used to find the data of the corresponding index
def __len__(self):
return len(self.data) // Returns the total length of the dataset
dataset=MyDataset(file)
dataloader=DataLoader(dataset,batch_size=4,shuffle=True)
3.Gradient Descent
After constructing the dataset , We also need to know how to calculate the gradient descent , This is a LossFunction The core of optimization .
First of all, we need to know a very important parameter :require_grad
One tensor Set up require_grad by True Under the circumstances , To this tensor And by this tensor Other calculated tensor Derivation , And the derivative value exists tensor Of grad Properties of the , It is convenient for the optimizer to update parameters
Then we need to know another function :backward()
This function is used to derive , But what he accepts must be a scalar , Take the following example as an excuse , If z Not scalar , This function will report an error , You need to be in backward() Set the parameters of the vector , As the weight of vector , for example z.backward(torch.tensor([2.,2.,2.])) In this way, we can find the derivative of a vector of three variables . But after each derivation , Need to clear optimizer.zero_grad(), Otherwise, it will continue to take derivative again
Let's give you a simple example :
x=torch.tensor([[1.,0.],[-1.,1.],require_grad=True)
z=x.pow(2).sum()
z.backward()
print(x.grad)

4. Building neural networks
We know that neural network is a process of connection layer reduction , The final result will only output one layer , So we should construct neural network according to this logic .
First we have to understand an important function :nn.Sequential()
This function is a container , The modules used to build the neural network are added to... In the order of being passed into the constructor nn.Sequential() In the container , utilize nn.Sequential() Build the model architecture , Called when the model propagates forward forward() Method , The input received by the model is first passed into nn.Sequential() In the first network module included . then , The output of the first network module is passed into the second network module as input , Calculate and propagate in order , until nn.Sequential() The output of the last module in .
class MyModel(nn.Module):
def _init_(self):
super(MyModel,self)._init_()
self.net = nn.Sequential(
nn.Linear(10,32), //L1
nn.Sigmoid(), //L2
nn.Linear(32,1) //L3
)
def forward(self,x):
return self.net(x) // Equivalent to output L1(x),L2(L1),L3(L2)
Next we need to construct our LossFunction 了 , Generally, you can choose MSE And cross entropy
//Mean Squared Error (for regression tasks)
criterion = nn.MSELoss()
//Cross Entropy (for classification tasks)
criterion = nn.CrossEntropyLoss()
loss = criterion(model_output, expected_value)
Then we need to optimize , That is, adjusting parameters , Here we choose SGD
torch.optim.SGD(model.parameters(), lr, momentum = 0)
For each parameter adjustment, the gradient must be cleared optimizer.zero_grad(), Gradient descent derivation loss.backward(), And adjusting parameters optimizer.step().
5. Complete code
//part1 preparation
dataset = MyDataset(File)
tr_set = DataLoader(dataset,16,shuffle=True)
model = Mymodel().to(device) // choice cpu still gpu
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameter(),0.1)
//part2 Training models
for epoch in range(n_epochs):
model.train()
for x,y in tr_set:
optimizer.zero_grad()
x, y = x.to(device), y.to(device)
pred = model(x)
loss = criterion(pred,y)
loss.backward()
optimizer.step()
//part3 Validate the model , It is convenient to detect whether the model has been fitted , So as to adjust the super parameters , The network layer added to optimize training will be turned off , So that there is no offset in the evaluation .
model.eval() // Enter verification mode
total_loss = 0
for x, y in dv_set:
x, y = x.to(device), y.to(device)
with torch.no_grad(): // For shielding gradient descent , Force not to calculate the gradient descent , It will not generate dynamic graph , So as to avoid too much training
pred = model(x)
loss = criterion(pred,y)
total_loss += loss.cpu().item() * len(x)
avg_loss = total_loss / len(dv_set.dataset)
//part4 test model , Add test set , Look at the results
model.eval()
preds=[]
for x in tt_set:
x=x.to(device)
with torch.no_grad():
pred=model(x)
preds.append(pred.cpu())
//part5 Save model and load model ( To comment out )
torch.save(model.state_dict(), path)
ckpt = torch.load(path)
model.load_state_dict(ckpt)
3、 ... and 、 Operation and standard code
1. Job description


After the problem description , Let's disassemble the sample code step by step .
2. Seed function
Use the same network structure , That is, the learning rate used , The number of iterations ,batch size Is the same , But in the end, the results are different , The question is CPU It won't appear in , But if GPU Run up , It is possible to have this problem , This is because GPU It will provide the infrastructure of multi-core parallel computing , Call additional random sources , So that the results cannot be accurately reproduced . So it is very important to fix our seed function .
torch.backends.cudnn.deterministic
Put this flag Set as True Words , The convolution algorithm for each return will be determined , The default algorithm .
torch.backends.cudnn.benchmark
This parameter will make the program take some time , Search for the most suitable algorithm for each convolution layer of the whole network , So as to realize the acceleration of the network , But in general PyTorch Is the default use of cuDnn Accelerated , So this parameter is generally set to False. And this parameter has a fatal problem , If the structure of neural network is dynamic , Pre optimization is required every time , Choose the most appropriate algorithm , This greatly reduces efficiency .
torch.manual_seed(seed)
This is a seed bank that generates random numbers , After calling , Use... In the back torch.rand(1) Then we will follow the seed bank to generate our random numbers .
def same_seed(seed):
'''Fixes random number generator seeds for reproducibility.'''
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(seed)
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(seed)
3. Dataset segmentation function
The main purpose of generating random numbers is to segment our data set , To get the training set , Verify the effect of the set , Here, choose to save three parameter data sets , Ratio of validation values , And seed labels .
random_split(dataset, lengths, generator=<torch._C.Generator object>)
This function requires us to provide a data set dataset, The length of the split length, Store in a list , And the generator of random seeds torch.Generator(), It can help us split automatically , And the separated data type is np.array type
def train_valid_split(data_set, valid_ratio, seed):
'''Split provided training data into training set and validation set'''
valid_set_size = int(valid_ratio * len(data_set))
train_set_size = len(data_set) - valid_set_size
train_set, valid_set = random_split(data_set, [train_set_size, valid_set_size], generator=torch.Generator().manual_seed(seed))
return np.array(train_set), np.array(valid_set)
4.predict function
The prediction function is similar to the basic code above , I won't elaborate here
def predict(test_loader, model, device):
model.eval() # Set your model to evaluation mode.
preds = []
for x in tqdm(test_loader):
x = x.to(device)
with torch.no_grad():
pred = model(x)
preds.append(pred.detach().cpu())
preds = torch.cat(preds, dim=0).numpy()
return preds
5. Construct the function of neural network
It has already been explained here , I'm just going to talk about it here
class COVID19Dataset(Dataset):
''' x: Features. y: Targets, if none, do prediction. '''
def __init__(self, x, y=None):
if y is None:
self.y = y
else:
self.y = torch.FloatTensor(y)
self.x = torch.FloatTensor(x)
def __getitem__(self, idx):
if self.y is None:
return self.x[idx]
else:
return self.x[idx], self.y[idx]
def __len__(self):
return len(self.x)
class My_Model(nn.Module):
def __init__(self, input_dim):
super(My_Model, self).__init__()
# TODO: modify model's structure, be aware of dimensions.
self.layers = nn.Sequential(
nn.Linear(input_dim, 16),
nn.ReLU(),
nn.Linear(16, 8),
nn.ReLU(),
nn.Linear(8, 1)
)
def forward(self, x):
x = self.layers(x)
x = x.squeeze(1) # (B, 1) -> (B)
return x
6. Choose the right features
Because there are many characteristics in the data , So we can also reduce eigenvalues to train , Avoid overfitting
Here we need to pay attention to [:, -1] Refers to all elements in the last column , and [:, :-1] Is all the elements up to the last column
def select_feat(train_data, valid_data, test_data, select_all=True):
'''Selects useful features to perform regression'''
y_train, y_valid = train_data[:, -1], valid_data[:, -1]
raw_x_train, raw_x_valid, raw_x_test = train_data[:, :-1], valid_data[:, :-1], test_data
if select_all:
feat_idx = list(range(raw_x_train.shape[1]))
else:
feat_idx = [0, 1, 2, 3, 4] # TODO: Select suitable feature columns.
return raw_x_train[:, feat_idx], raw_x_valid[:, feat_idx], raw_x_test[:, feat_idx], y_train, y_valid
7. Model training function
The training model actually includes two aspects: training model and verification model . The training model needs to be configured first lossfunction Follow optimizer, Then we can train according to the previous basic code , And record every training loss value , And then calculate the average loss value , Then we need to verify , See if there is any fitting , The verification steps are very similar to the training steps . Our model also provides early termination of training , If the reduction cannot be achieved many times loss when , In order to improve the efficiency of training , We will choose to stop training , And make a reminder .
tqdm(iterable)
This is a progress bar function , Just put an iteratable object in it , You can display the progress bar
.set_description(f’Epoch [{epoch+1}/{n_epochs}]')
.set_postfix({‘loss’: loss.detach().item()})
.set_description You can add a description in front of the progress bar ,.set_postfix You can add a series of descriptions after the progress bar , Like this :
SummaryWriter()
Used to build a tensorboard object , It is convenient to call later tensorboard The function of is visualized .
SummaryWriter().add_scalar(‘Loss/train’, mean_train_loss, step)
This is tensorboard Create a chart , The first parameter is the name of the chart , The second is y Axis data , The third parameter is x Axis data
def trainer(train_loader, valid_loader, model, config, device):
criterion = nn.MSELoss(reduction='mean') # Define your loss function, do not modify this.
# Define your optimization algorithm.
# TODO: Please check https://pytorch.org/docs/stable/optim.html to get more available algorithms.
# TODO: L2 regularization (optimizer(weight decay...) or implement by your self).
optimizer = torch.optim.SGD(model.parameters(), lr=config['learning_rate'], momentum=0.9)
writer = SummaryWriter() # Writer of tensoboard.
if not os.path.isdir('./models'):
os.mkdir('./models') # Create directory of saving models.
n_epochs, best_loss, step, early_stop_count = config['n_epochs'], math.inf, 0, 0
for epoch in range(n_epochs):
model.train() # Set your model to train mode.
loss_record = []
# tqdm is a package to visualize your training progress.
train_pbar = tqdm(train_loader, position=0, leave=True)
for x, y in train_pbar:
optimizer.zero_grad() # Set gradient to zero.
x, y = x.to(device), y.to(device) # Move your data to device.
pred = model(x)
loss = criterion(pred, y)
loss.backward() # Compute gradient(backpropagation).
optimizer.step() # Update parameters.
step += 1
loss_record.append(loss.detach().item())
# Display current epoch number and loss on tqdm progress bar.
train_pbar.set_description(f'Epoch [{
epoch+1}/{
n_epochs}]')
train_pbar.set_postfix({
'loss': loss.detach().item()})
mean_train_loss = sum(loss_record)/len(loss_record)
writer.add_scalar('Loss/train', mean_train_loss, step)
model.eval() # Set your model to evaluation mode.
loss_record = []
for x, y in valid_loader:
x, y = x.to(device), y.to(device)
with torch.no_grad():
pred = model(x)
loss = criterion(pred, y)
loss_record.append(loss.item())
mean_valid_loss = sum(loss_record)/len(loss_record)
print(f'Epoch [{
epoch+1}/{
n_epochs}]: Train loss: {
mean_train_loss:.4f}, Valid loss: {
mean_valid_loss:.4f}')
writer.add_scalar('Loss/valid', mean_valid_loss, step)
if mean_valid_loss < best_loss:
best_loss = mean_valid_loss
torch.save(model.state_dict(), config['save_path']) # Save your best model
print('Saving model with loss {:.3f}...'.format(best_loss))
early_stop_count = 0
else:
early_stop_count += 1
if early_stop_count >= config['early_stop']:
print('\nModel is not improving, so we halt the training session.')
return
8. Parameter configuration
We need to configure some super parameters and ordinary parameters , In order to make it easy to check, wrap it in a container .
device = 'cuda' if torch.cuda.is_available() else 'cpu'
config = {
'seed': 5201314, # Your seed number, you can pick your lucky number. :)
'select_all': True, # Whether to use all features.
'valid_ratio': 0.2, # validation_size = train_size * valid_ratio
'n_epochs': 3000, # Number of epochs.
'batch_size': 256,
'learning_rate': 1e-5,
'early_stop': 400, # If model has not improved for this many consecutive epochs, stop training.
'save_path': './models/model.ckpt' # Your model will be saved here.
}
9. To configure DataLoader
First , We need to import the training set first covid.train.csv And test set covid.test.csv, Then use random seeds to segment the training set , Then choose whether you want all the characteristic values , Then you can covid19 Of dataset Encapsulate them , Reuse DataLoader Carry out small batch segmentation to get the final DataLoader
# Set seed for reproducibility
same_seed(config['seed'])
# train_data size: 2699 x 118 (id + 37 states + 16 features x 5 days)
# test_data size: 1078 x 117 (without last day's positive rate)
train_data, test_data = pd.read_csv('./covid.train.csv').values, pd.read_csv('./covid.test.csv').values
train_data, valid_data = train_valid_split(train_data, config['valid_ratio'], config['seed'])
# Print out the data size.
print(f"""train_data size: {
train_data.shape} valid_data size: {
valid_data.shape} test_data size: {
test_data.shape}""")
# Select features
x_train, x_valid, x_test, y_train, y_valid = select_feat(train_data, valid_data, test_data, config['select_all'])
# Print out the number of features.
print(f'number of features: {
x_train.shape[1]}')
train_dataset, valid_dataset, test_dataset = COVID19Dataset(x_train, y_train), \
COVID19Dataset(x_valid, y_valid), \
COVID19Dataset(x_test)
# Pytorch data loader loads pytorch dataset into batches.
train_loader = DataLoader(train_dataset, batch_size=config['batch_size'], shuffle=True, pin_memory=True)
valid_loader = DataLoader(valid_dataset, batch_size=config['batch_size'], shuffle=True, pin_memory=True)
test_loader = DataLoader(test_dataset, batch_size=config['batch_size'], shuffle=False, pin_memory=True)
10. Training models
model = My_Model(input_dim=x_train.shape[1]).to(device) # put your model and data on the same computation device.
trainer(train_loader, valid_loader, model, config, device)
11. test model
def save_pred(preds, file):
''' Save predictions to specified file '''
with open(file, 'w') as fp:
writer = csv.writer(fp)
writer.writerow(['id', 'tested_positive'])
for i, p in enumerate(preds):
writer.writerow([i, p])
model = My_Model(input_dim=x_train.shape[1]).to(device)
model.load_state_dict(torch.load(config['save_path']))
preds = predict(test_loader, model, device)
save_pred(preds, 'pred.csv')
summary
This article introduces Li Hongyi's in-depth learning introductory programming code and the source code analysis of his first assignment , I hope you can really get started and learn deeply , Attached below is a mind map for memory .

边栏推荐
- Reinforcement learning - proximal policy optimization algorithms
- How much does small program development cost? Analysis of two development methods!
- 神经网络学习
- 2: Why read write separation
- 《On Low-Resolution Face Recognition in the Wild:Comparisons and New Techniques》低分辨率人脸识别论文解读
- Digital collections "chaos", 100 billion market change is coming?
- 深度学习(二)走进机器学习与深度学习编程部分
- How to use Bert
- 【5】 Redis master-slave synchronization and redis sentinel (sentinel)
- 更新包与已安装应用签名不一致
猜你喜欢

matplotlib数据可视化

深度学习(自监督:SimCLR)——A Simple Framework for Contrastive Learning of Visual Representations

小程序开发要多少钱?两种开发方法分析!

How to choose an applet development enterprise

小程序开发

强化学习——价值学习中的SARSA

深度学习——Patches Are All You Need

tf.keras搭建神经网络功能扩展

强化学习——多智能体强化学习

Deep learning (self supervision: Moco V2) -- improved bases with momentum contractual learning
随机推荐
小程序开发
vscode uniapp
Uview upload component upload upload auto upload mode image compression
Installing redis under Linux (centos7)
Knowledge point 21 generic
小程序开发流程详细是什么呢?
将项目部署到GPU上,并且运行
Which is more reliable for small program development?
微信小程序开发制作注意这几个重点方面
3: MySQL master-slave replication setup
4个角度教你选小程序开发工具?
【1】 Introduction to redis
《AdaFace: Quality Adaptive Margin for Face Recognition》用于人脸识别的图像质量自适应边缘损失
UNL class diagram
Scenario solution of distributed cluster architecture: cluster clock synchronization
Digital collections "chaos", 100 billion market change is coming?
Reinforcement learning - proximal policy optimization algorithms
Sort method for sorting
Deep learning (incremental learning) - iccv2022:continuous continuous learning
《On Low-Resolution Face Recognition in the Wild:Comparisons and New Techniques》低分辨率人脸识别论文解读