当前位置:网站首页>[secretly kill little buddy pytorch20 days] - [Day2] - [example of picture data modeling process]
[secretly kill little buddy pytorch20 days] - [Day2] - [example of picture data modeling process]
2022-07-05 22:30:00 【aJupyter】
System tutorial 20 Heaven takes Pytorch
Recently with Brother Zhong 、 Huige Do a little punch in ,20 God pytorch, This is the next day . Welcome to one button and three links .
List of articles
import os
import datetime
# Print time
def printbar():
nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
print("\n"+"=========="*8 + "%s"%nowtime)
#mac On the system pytorch and matplotlib stay jupyter You need to change the environment variable when running in
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
!pip install prettytable
!pip install torchkeras
One 、 Prepare the data
cifar2 The data set is cifar10 A subset of a dataset , Only the first two categories are included airplane and automobile.
The training set has airplane and automobile Each picture 5000 Zhang , The test set has airplane and automobile Each picture 1000 Zhang .
cifar2 The goal of the mission is to train a model for the aircraft airplane And motor vehicles automobile Classify two kinds of pictures .
stay Pytorch There are usually two ways to build a picture data pipeline in .
The first is to use torchvision Medium datasets.ImageFolder To read the picture and then use DataLoader To load in parallel .
The second is through inheritance torch.utils.data.Dataset Implement user-defined read logic, and then use DataLoader To load in parallel .
The second method is a general method of reading user-defined data sets , You can read the picture data set , You can also read text data sets .
In this article, we introduce the first method .
import torch
from torch import nn
from torch.utils.data import Dataset,DataLoader
from torchvision import transforms,datasets
transform_train = transforms.Compose(
[transforms.ToTensor()])
transform_valid = transforms.Compose(
[transforms.ToTensor()])
ds_train = datasets.ImageFolder("/home/mw/input/data6936/eat_pytorch_data/data/cifar2/train",
transform = transform_train,target_transform= lambda t:torch.tensor([t]).float())
ds_valid = datasets.ImageFolder("/home/mw/input/data6936/eat_pytorch_data/data/cifar2/test",
transform = transform_train,target_transform= lambda t:torch.tensor([t]).float())
print(ds_train.class_to_idx.values())
print(ds_train.classes)
print(ds_train.imgs)
''' Output : dict_values([0, 1]) ['0_airplane', '1_automobile'] [('/home/mw/input/data6936/eat_pytorch_data/data/cifar2/train/0_airplane/0.jpg', 0), ('/home/mw/input/data6936/eat_pytorch_data/data/cifar2/train/0_airplane/1.jpg', 0), ('/home/mw/input/data6936/eat_pytorch_data/data/cifar2/train/0_airplane/10.jpg', 0), ('/home/mw/input/data6936/eat_pytorch_data/data/cifar2/train/0_airplane/100.jpg', 0), ('/home/mw/input/data6936/eat_pytorch_data/data/cifar2/train/0_airplane/1000.jpg', 0), ('/home/mw/input/data6936/eat_pytorch_data/data/cifar2/train/0_airplane/1001.jpg', 0)] '''
tips:
ImageFolder Is a universal data loader , It requires us to organize the training of data sets in the following format 、 Verify or test pictures .
root/dog/xxx.png
root/dog/xxy.png
root/dog/xxz.png
root/cat/123.png
root/cat/nsdf3.png
root/cat/asd932_.png
dataset=torchvision.datasets.ImageFolder(
root, transform=None,
target_transform=None,
loader=<function default_loader>,
is_valid_file=None)
Parameters, :
root: The root directory of image storage , That is, the upper level directory of the directory where each category folder is located .
transform: The operation of preprocessing pictures ( function ), The original image as input , Return a converted image .
**target_transform:** The operation of preprocessing picture categories , Input is target, Output to its conversion . If you don't pass this parameter , to target No conversion , The order index returned 0,1, 2…
loader: Indicates how the dataset is loaded , Usually, the default loading method is OK .
is_valid_file: Function to get the path of the image file and check whether the file is a valid file ( Used to check for damaged files )
Back to dataset All have the following three properties :
- self.classes: Use one list Save category name
- self.class_to_idx: Dictionary type 、 The index corresponding to the category , And return without any conversion target Corresponding
- self.imgs: preservation (img-path, class) tuple Of list
print(ds_train[0][1])
''' Output : tensor([0.]) '''
dl_train = DataLoader(ds_train,batch_size = 50,shuffle = True,num_workers=3)
dl_valid = DataLoader(ds_valid,batch_size = 50,shuffle = True,num_workers=3)
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
# Check out some samples
from matplotlib import pyplot as plt
plt.figure(figsize=(8,8))
for i in range(9):
img,label = ds_train[i]
img = img.permute(1,2,0)
ax=plt.subplot(3,3,i+1)
ax.imshow(img.numpy())
ax.set_title("label = %d"%label.item())
ax.set_xticks([])
ax.set_yticks([])
plt.show()
tips:
img = img.permute(1,2,0) # Transforming dimensions
Original image size 33232 Turn to 32323
ax=plt.subplot(3,3,i+1) # Cutting subgraph
ax.imshow(img.numpy()) # visualization
# Pytorch The default order of pictures is Batch,Channel,Width,Height
for x,y in dl_train:
print(x.shape,y.shape)
break
''' Output : torch.Size([50, 3, 32, 32]) torch.Size([50, 1]) '''
Two 、 Defining models
Use Pytorch There are usually three ways to build models :
- Use nn.Sequential Build models in a hierarchical order
- Inherit nn.Module Base classes build custom models
- Inherit nn.Module Base classes build models and assist in applying model containers (nn.Sequential,nn.ModuleList,nn.ModuleDict) encapsulate .
Choose to inherit here nn.Module Base classes build custom models .
# test AdaptiveMaxPool2d The effect of
pool = nn.AdaptiveMaxPool2d((1,1))
t = torch.randn(10,8,32,32)
pool(t).shape
''' Output : torch.Size([10, 8, 1, 1]) '''
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(in_channels=3,out_channels=32,kernel_size = 3)
self.pool = nn.MaxPool2d(kernel_size = 2,stride = 2)
self.conv2 = nn.Conv2d(in_channels=32,out_channels=64,kernel_size = 5)
self.dropout = nn.Dropout2d(p = 0.1)
self.adaptive_pool = nn.AdaptiveMaxPool2d((1,1))
self.flatten = nn.Flatten()
self.linear1 = nn.Linear(64,32)
self.relu = nn.ReLU()
self.linear2 = nn.Linear(32,1)
self.sigmoid = nn.Sigmoid()
def forward(self,x):
x = self.conv1(x)
x = self.pool(x)
x = self.conv2(x)
x = self.pool(x)
x = self.dropout(x)
x = self.adaptive_pool(x)
x = self.flatten(x)
x = self.linear1(x)
x = self.relu(x)
x = self.linear2(x)
y = self.sigmoid(x)
return y
net = Net()
print(net)
''' Output : Net( (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1)) (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (conv2): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1)) (dropout): Dropout2d(p=0.1, inplace=False) (adaptive_pool): AdaptiveMaxPool2d(output_size=(1, 1)) (flatten): Flatten(start_dim=1, end_dim=-1) (linear1): Linear(in_features=64, out_features=32, bias=True) (relu): ReLU() (linear2): Linear(in_features=32, out_features=1, bias=True) (sigmoid): Sigmoid() ) '''
import torchkeras
torchkeras.summary(net,input_shape= (3,32,32))
''' ---------------------------------------------------------------- Layer (type) Output Shape Param # ================================================================ Conv2d-1 [-1, 32, 30, 30] 896 MaxPool2d-2 [-1, 32, 15, 15] 0 Conv2d-3 [-1, 64, 11, 11] 51,264 MaxPool2d-4 [-1, 64, 5, 5] 0 Dropout2d-5 [-1, 64, 5, 5] 0 AdaptiveMaxPool2d-6 [-1, 64, 1, 1] 0 Flatten-7 [-1, 64] 0 Linear-8 [-1, 32] 2,080 ReLU-9 [-1, 32] 0 Linear-10 [-1, 1] 33 Sigmoid-11 [-1, 1] 0 ================================================================ Total params: 54,273 Trainable params: 54,273 Non-trainable params: 0 ---------------------------------------------------------------- Input size (MB): 0.011719 Forward/backward pass size (MB): 0.359634 Params size (MB): 0.207035 Estimated Total Size (MB): 0.578388 ---------------------------------------------------------------- '''
3、 ... and 、 Training models
Pytorch It usually requires the user to write a custom training cycle , The code style of the training cycle varies from person to person .
Yes 3 Class typical training cycle code style : Script form training cycle , Function form training cycle , Class form training cycle .
Here is a more general Function form training cycle .
import pandas as pd
from sklearn.metrics import roc_auc_score
model = net
model.optimizer = torch.optim.SGD(model.parameters(),lr = 0.01)
model.loss_func = torch.nn.BCELoss()
model.metric_func = lambda y_pred,y_true: roc_auc_score(y_true.data.numpy(),y_pred.data.numpy())
model.metric_name = "auc"
tips:
from sklearn.metrics import roc_auc_score
roc_auc_score
def train_step(model,features,labels):
# Training mode ,dropout The layer acts
model.train()
# Gradient clear
model.optimizer.zero_grad()
# Forward propagation for loss
predictions = model(features)
loss = model.loss_func(predictions,labels)
metric = model.metric_func(predictions,labels)
# Back propagation gradient
loss.backward()
model.optimizer.step()
return loss.item(),metric.item()
def valid_step(model,features,labels):
# Prediction model ,dropout The layer does not work
model.eval()
# Turn off gradient computation
with torch.no_grad():
predictions = model(features)
loss = model.loss_func(predictions,labels)
metric = model.metric_func(predictions,labels)
return loss.item(), metric.item()
# test train_step effect
features,labels = next(iter(dl_train))
train_step(model,features,labels)
''' Output : (0.6954520344734192, 0.500805152979066) '''
def train_model(model,epochs,dl_train,dl_valid,log_step_freq):
metric_name = model.metric_name
dfhistory = pd.DataFrame(columns = ["epoch","loss",metric_name,"val_loss","val_"+metric_name])
print("Start Training...")
nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
print("=========="*8 + "%s"%nowtime)
for epoch in range(1,epochs+1):
# 1, Training cycle -------------------------------------------------
loss_sum = 0.0
metric_sum = 0.0
step = 1
for step, (features,labels) in enumerate(dl_train, 1):
loss,metric = train_step(model,features,labels)
# Print batch The level of log
loss_sum += loss
metric_sum += metric
if step%log_step_freq == 0:
print(("[step = %d] loss: %.3f, "+metric_name+": %.3f") %
(step, loss_sum/step, metric_sum/step))
# 2, Verification cycle -------------------------------------------------
val_loss_sum = 0.0
val_metric_sum = 0.0
val_step = 1
for val_step, (features,labels) in enumerate(dl_valid, 1):
val_loss,val_metric = valid_step(model,features,labels)
val_loss_sum += val_loss
val_metric_sum += val_metric
# 3, Log -------------------------------------------------
info = (epoch, loss_sum/step, metric_sum/step,
val_loss_sum/val_step, val_metric_sum/val_step)
dfhistory.loc[epoch-1] = info
# Print epoch The level of log
print(("\nEPOCH = %d, loss = %.3f,"+ metric_name + \
" = %.3f, val_loss = %.3f, "+"val_"+ metric_name+" = %.3f")
%info)
nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
print("\n"+"=========="*8 + "%s"%nowtime)
print('Finished Training...')
return dfhistory
epochs = 20
dfhistory = train_model(model,epochs,dl_train,dl_valid,log_step_freq = 50)
Four 、 Evaluation model
dfhistory
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
import matplotlib.pyplot as plt
def plot_metric(dfhistory, metric):
train_metrics = dfhistory[metric]
val_metrics = dfhistory['val_'+metric]
epochs = range(1, len(train_metrics) + 1)
plt.plot(epochs, train_metrics, 'bo--')
plt.plot(epochs, val_metrics, 'ro-')
plt.title('Training and validation '+ metric)
plt.xlabel("Epochs")
plt.ylabel(metric)
plt.legend(["train_"+metric, 'val_'+metric])
plt.show()
plot_metric(dfhistory,"loss")
plot_metric(dfhistory,"auc")
5、 ... and 、 Using the model
def predict(model,dl):
model.eval()
with torch.no_grad():
result = torch.cat([model.forward(t[0]) for t in dl])
return(result.data)
# Prediction probability
y_pred_probs = predict(model,dl_valid)
y_pred_probs
''' tensor([[0.0342], [0.9139], [0.5341], ..., [0.7885], [0.9491], [0.5726]]) '''
# Forecast category
y_pred = torch.where(y_pred_probs>0.5,
torch.ones_like(y_pred_probs),torch.zeros_like(y_pred_probs))
y_pred
''' Output : tensor([[0.], [1.], [0.], ..., [0.], [1.], [1.]]) '''
6、 ... and 、 Save the model
It is recommended to save the parameters Pytorch Model .
print(model.state_dict().keys())
''' Output : odict_keys(['conv1.weight', 'conv1.bias', 'conv2.weight', 'conv2.bias', 'linear1.weight', 'linear1.bias', 'linear2.weight', 'linear2.bias']) '''
# Save model parameters
torch.save(model.state_dict(), "./data/model_parameter.pkl")
net_clone = Net()
net_clone.load_state_dict(torch.load("./data/model_parameter.pkl"))
predict(net_clone,dl_valid)
''' Output : tensor([[0.8983], [0.5431], [0.9716], ..., [0.0663], [0.1317], [0.4519]]) '''
summary
- datasets.ImageFolder
- from sklearn.metrics import roc_auc_score
- nn.AdaptiveMaxPool2d((1,1))
边栏推荐
- Metaverse Ape猿界应邀出席2022·粤港澳大湾区元宇宙和web3.0主题峰会,分享猿界在Web3时代从技术到应用的文明进化历程
- Promql demo service
- Talking about MySQL index
- How to develop and introduce applet plug-ins
- The code generator has deoptimised the styling of xx/typescript.js as it exceeds the max of 500kb
- Draw a red lantern with MATLAB
- What changes has Web3 brought to the Internet?
- Three "factions" in the metauniverse
- 我把开源项目alinesno-cloud-service关闭了
- FBO and RBO disappeared in webgpu
猜你喜欢
Damn, window in ie open()
Technology cloud report: how many hurdles does the computing power network need to cross?
科技云报道荣膺全球云计算大会“云鼎奖”2013-2022十周年特别贡献奖
The statistics of leetcode simple question is the public string that has appeared once
[untitled]
鏈錶之雙指針(快慢指針,先後指針,首尾指針)
How can Bluetooth in notebook computer be used to connect headphones
Postman核心功能解析-参数化和测试报告
南京:全面启用商品房买卖电子合同
分布式解决方案之TCC
随机推荐
509. Fibonacci Number. Sol
Oracle triggers
U盘的文件无法删除文件怎么办?Win11无法删除U盘文件解决教程
Pl/sql basic syntax
Lesson 1: serpentine matrix
Thinkphp5.1 cross domain problem solving
thinkphp5.1跨域问题解决
What about data leakage? " Watson k'7 moves to eliminate security threats
第一讲:蛇形矩阵
Depth first DFS and breadth first BFS -- traversing adjacency tables
Some tutorials install the database on ubantu so as not to occupy computer memory?
1.3 years of work experience, double non naked resignation agency face-to-face experience [already employed]
The difference between MVVM and MVC
Character conversion PTA
Text组件新增内容通过tag_config设置前景色、背景色
Three "factions" in the metauniverse
How can easycvr cluster deployment solve the massive video access and concurrency requirements in the project?
How can Bluetooth in notebook computer be used to connect headphones
Request preview display of binary data and Base64 format data
二叉树(三)——堆排序优化、TOP K问题