当前位置:网站首页>[secretly kill little buddy pytorch20 days] - [Day2] - [example of picture data modeling process]
[secretly kill little buddy pytorch20 days] - [Day2] - [example of picture data modeling process]
2022-07-05 22:30:00 【aJupyter】
System tutorial 20 Heaven takes Pytorch
Recently with Brother Zhong 、 Huige Do a little punch in ,20 God pytorch, This is the next day . Welcome to one button and three links .
List of articles
import os
import datetime
# Print time
def printbar():
nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
print("\n"+"=========="*8 + "%s"%nowtime)
#mac On the system pytorch and matplotlib stay jupyter You need to change the environment variable when running in
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
!pip install prettytable
!pip install torchkeras
One 、 Prepare the data
cifar2 The data set is cifar10 A subset of a dataset , Only the first two categories are included airplane and automobile.
The training set has airplane and automobile Each picture 5000 Zhang , The test set has airplane and automobile Each picture 1000 Zhang .
cifar2 The goal of the mission is to train a model for the aircraft airplane And motor vehicles automobile Classify two kinds of pictures .
stay Pytorch There are usually two ways to build a picture data pipeline in .
The first is to use torchvision Medium datasets.ImageFolder To read the picture and then use DataLoader To load in parallel .
The second is through inheritance torch.utils.data.Dataset Implement user-defined read logic, and then use DataLoader To load in parallel .
The second method is a general method of reading user-defined data sets , You can read the picture data set , You can also read text data sets .
In this article, we introduce the first method .
import torch
from torch import nn
from torch.utils.data import Dataset,DataLoader
from torchvision import transforms,datasets
transform_train = transforms.Compose(
[transforms.ToTensor()])
transform_valid = transforms.Compose(
[transforms.ToTensor()])
ds_train = datasets.ImageFolder("/home/mw/input/data6936/eat_pytorch_data/data/cifar2/train",
transform = transform_train,target_transform= lambda t:torch.tensor([t]).float())
ds_valid = datasets.ImageFolder("/home/mw/input/data6936/eat_pytorch_data/data/cifar2/test",
transform = transform_train,target_transform= lambda t:torch.tensor([t]).float())
print(ds_train.class_to_idx.values())
print(ds_train.classes)
print(ds_train.imgs)
''' Output : dict_values([0, 1]) ['0_airplane', '1_automobile'] [('/home/mw/input/data6936/eat_pytorch_data/data/cifar2/train/0_airplane/0.jpg', 0), ('/home/mw/input/data6936/eat_pytorch_data/data/cifar2/train/0_airplane/1.jpg', 0), ('/home/mw/input/data6936/eat_pytorch_data/data/cifar2/train/0_airplane/10.jpg', 0), ('/home/mw/input/data6936/eat_pytorch_data/data/cifar2/train/0_airplane/100.jpg', 0), ('/home/mw/input/data6936/eat_pytorch_data/data/cifar2/train/0_airplane/1000.jpg', 0), ('/home/mw/input/data6936/eat_pytorch_data/data/cifar2/train/0_airplane/1001.jpg', 0)] '''
tips:
ImageFolder Is a universal data loader , It requires us to organize the training of data sets in the following format 、 Verify or test pictures .
root/dog/xxx.png
root/dog/xxy.png
root/dog/xxz.png
root/cat/123.png
root/cat/nsdf3.png
root/cat/asd932_.png
dataset=torchvision.datasets.ImageFolder(
root, transform=None,
target_transform=None,
loader=<function default_loader>,
is_valid_file=None)
Parameters, :
root: The root directory of image storage , That is, the upper level directory of the directory where each category folder is located .
transform: The operation of preprocessing pictures ( function ), The original image as input , Return a converted image .
**target_transform:** The operation of preprocessing picture categories , Input is target, Output to its conversion . If you don't pass this parameter , to target No conversion , The order index returned 0,1, 2…
loader: Indicates how the dataset is loaded , Usually, the default loading method is OK .
is_valid_file: Function to get the path of the image file and check whether the file is a valid file ( Used to check for damaged files )
Back to dataset All have the following three properties :
- self.classes: Use one list Save category name
- self.class_to_idx: Dictionary type 、 The index corresponding to the category , And return without any conversion target Corresponding
- self.imgs: preservation (img-path, class) tuple Of list
print(ds_train[0][1])
''' Output : tensor([0.]) '''
dl_train = DataLoader(ds_train,batch_size = 50,shuffle = True,num_workers=3)
dl_valid = DataLoader(ds_valid,batch_size = 50,shuffle = True,num_workers=3)
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
# Check out some samples
from matplotlib import pyplot as plt
plt.figure(figsize=(8,8))
for i in range(9):
img,label = ds_train[i]
img = img.permute(1,2,0)
ax=plt.subplot(3,3,i+1)
ax.imshow(img.numpy())
ax.set_title("label = %d"%label.item())
ax.set_xticks([])
ax.set_yticks([])
plt.show()
tips:
img = img.permute(1,2,0) # Transforming dimensions
Original image size 33232 Turn to 32323
ax=plt.subplot(3,3,i+1) # Cutting subgraph
ax.imshow(img.numpy()) # visualization
# Pytorch The default order of pictures is Batch,Channel,Width,Height
for x,y in dl_train:
print(x.shape,y.shape)
break
''' Output : torch.Size([50, 3, 32, 32]) torch.Size([50, 1]) '''
Two 、 Defining models
Use Pytorch There are usually three ways to build models :
- Use nn.Sequential Build models in a hierarchical order
- Inherit nn.Module Base classes build custom models
- Inherit nn.Module Base classes build models and assist in applying model containers (nn.Sequential,nn.ModuleList,nn.ModuleDict) encapsulate .
Choose to inherit here nn.Module Base classes build custom models .
# test AdaptiveMaxPool2d The effect of
pool = nn.AdaptiveMaxPool2d((1,1))
t = torch.randn(10,8,32,32)
pool(t).shape
''' Output : torch.Size([10, 8, 1, 1]) '''
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(in_channels=3,out_channels=32,kernel_size = 3)
self.pool = nn.MaxPool2d(kernel_size = 2,stride = 2)
self.conv2 = nn.Conv2d(in_channels=32,out_channels=64,kernel_size = 5)
self.dropout = nn.Dropout2d(p = 0.1)
self.adaptive_pool = nn.AdaptiveMaxPool2d((1,1))
self.flatten = nn.Flatten()
self.linear1 = nn.Linear(64,32)
self.relu = nn.ReLU()
self.linear2 = nn.Linear(32,1)
self.sigmoid = nn.Sigmoid()
def forward(self,x):
x = self.conv1(x)
x = self.pool(x)
x = self.conv2(x)
x = self.pool(x)
x = self.dropout(x)
x = self.adaptive_pool(x)
x = self.flatten(x)
x = self.linear1(x)
x = self.relu(x)
x = self.linear2(x)
y = self.sigmoid(x)
return y
net = Net()
print(net)
''' Output : Net( (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1)) (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (conv2): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1)) (dropout): Dropout2d(p=0.1, inplace=False) (adaptive_pool): AdaptiveMaxPool2d(output_size=(1, 1)) (flatten): Flatten(start_dim=1, end_dim=-1) (linear1): Linear(in_features=64, out_features=32, bias=True) (relu): ReLU() (linear2): Linear(in_features=32, out_features=1, bias=True) (sigmoid): Sigmoid() ) '''
import torchkeras
torchkeras.summary(net,input_shape= (3,32,32))
''' ---------------------------------------------------------------- Layer (type) Output Shape Param # ================================================================ Conv2d-1 [-1, 32, 30, 30] 896 MaxPool2d-2 [-1, 32, 15, 15] 0 Conv2d-3 [-1, 64, 11, 11] 51,264 MaxPool2d-4 [-1, 64, 5, 5] 0 Dropout2d-5 [-1, 64, 5, 5] 0 AdaptiveMaxPool2d-6 [-1, 64, 1, 1] 0 Flatten-7 [-1, 64] 0 Linear-8 [-1, 32] 2,080 ReLU-9 [-1, 32] 0 Linear-10 [-1, 1] 33 Sigmoid-11 [-1, 1] 0 ================================================================ Total params: 54,273 Trainable params: 54,273 Non-trainable params: 0 ---------------------------------------------------------------- Input size (MB): 0.011719 Forward/backward pass size (MB): 0.359634 Params size (MB): 0.207035 Estimated Total Size (MB): 0.578388 ---------------------------------------------------------------- '''
3、 ... and 、 Training models
Pytorch It usually requires the user to write a custom training cycle , The code style of the training cycle varies from person to person .
Yes 3 Class typical training cycle code style : Script form training cycle , Function form training cycle , Class form training cycle .
Here is a more general Function form training cycle .
import pandas as pd
from sklearn.metrics import roc_auc_score
model = net
model.optimizer = torch.optim.SGD(model.parameters(),lr = 0.01)
model.loss_func = torch.nn.BCELoss()
model.metric_func = lambda y_pred,y_true: roc_auc_score(y_true.data.numpy(),y_pred.data.numpy())
model.metric_name = "auc"
tips:
from sklearn.metrics import roc_auc_score
roc_auc_score
def train_step(model,features,labels):
# Training mode ,dropout The layer acts
model.train()
# Gradient clear
model.optimizer.zero_grad()
# Forward propagation for loss
predictions = model(features)
loss = model.loss_func(predictions,labels)
metric = model.metric_func(predictions,labels)
# Back propagation gradient
loss.backward()
model.optimizer.step()
return loss.item(),metric.item()
def valid_step(model,features,labels):
# Prediction model ,dropout The layer does not work
model.eval()
# Turn off gradient computation
with torch.no_grad():
predictions = model(features)
loss = model.loss_func(predictions,labels)
metric = model.metric_func(predictions,labels)
return loss.item(), metric.item()
# test train_step effect
features,labels = next(iter(dl_train))
train_step(model,features,labels)
''' Output : (0.6954520344734192, 0.500805152979066) '''
def train_model(model,epochs,dl_train,dl_valid,log_step_freq):
metric_name = model.metric_name
dfhistory = pd.DataFrame(columns = ["epoch","loss",metric_name,"val_loss","val_"+metric_name])
print("Start Training...")
nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
print("=========="*8 + "%s"%nowtime)
for epoch in range(1,epochs+1):
# 1, Training cycle -------------------------------------------------
loss_sum = 0.0
metric_sum = 0.0
step = 1
for step, (features,labels) in enumerate(dl_train, 1):
loss,metric = train_step(model,features,labels)
# Print batch The level of log
loss_sum += loss
metric_sum += metric
if step%log_step_freq == 0:
print(("[step = %d] loss: %.3f, "+metric_name+": %.3f") %
(step, loss_sum/step, metric_sum/step))
# 2, Verification cycle -------------------------------------------------
val_loss_sum = 0.0
val_metric_sum = 0.0
val_step = 1
for val_step, (features,labels) in enumerate(dl_valid, 1):
val_loss,val_metric = valid_step(model,features,labels)
val_loss_sum += val_loss
val_metric_sum += val_metric
# 3, Log -------------------------------------------------
info = (epoch, loss_sum/step, metric_sum/step,
val_loss_sum/val_step, val_metric_sum/val_step)
dfhistory.loc[epoch-1] = info
# Print epoch The level of log
print(("\nEPOCH = %d, loss = %.3f,"+ metric_name + \
" = %.3f, val_loss = %.3f, "+"val_"+ metric_name+" = %.3f")
%info)
nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
print("\n"+"=========="*8 + "%s"%nowtime)
print('Finished Training...')
return dfhistory
epochs = 20
dfhistory = train_model(model,epochs,dl_train,dl_valid,log_step_freq = 50)
Four 、 Evaluation model
dfhistory
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
import matplotlib.pyplot as plt
def plot_metric(dfhistory, metric):
train_metrics = dfhistory[metric]
val_metrics = dfhistory['val_'+metric]
epochs = range(1, len(train_metrics) + 1)
plt.plot(epochs, train_metrics, 'bo--')
plt.plot(epochs, val_metrics, 'ro-')
plt.title('Training and validation '+ metric)
plt.xlabel("Epochs")
plt.ylabel(metric)
plt.legend(["train_"+metric, 'val_'+metric])
plt.show()
plot_metric(dfhistory,"loss")
plot_metric(dfhistory,"auc")
5、 ... and 、 Using the model
def predict(model,dl):
model.eval()
with torch.no_grad():
result = torch.cat([model.forward(t[0]) for t in dl])
return(result.data)
# Prediction probability
y_pred_probs = predict(model,dl_valid)
y_pred_probs
''' tensor([[0.0342], [0.9139], [0.5341], ..., [0.7885], [0.9491], [0.5726]]) '''
# Forecast category
y_pred = torch.where(y_pred_probs>0.5,
torch.ones_like(y_pred_probs),torch.zeros_like(y_pred_probs))
y_pred
''' Output : tensor([[0.], [1.], [0.], ..., [0.], [1.], [1.]]) '''
6、 ... and 、 Save the model
It is recommended to save the parameters Pytorch Model .
print(model.state_dict().keys())
''' Output : odict_keys(['conv1.weight', 'conv1.bias', 'conv2.weight', 'conv2.bias', 'linear1.weight', 'linear1.bias', 'linear2.weight', 'linear2.bias']) '''
# Save model parameters
torch.save(model.state_dict(), "./data/model_parameter.pkl")
net_clone = Net()
net_clone.load_state_dict(torch.load("./data/model_parameter.pkl"))
predict(net_clone,dl_valid)
''' Output : tensor([[0.8983], [0.5431], [0.9716], ..., [0.0663], [0.1317], [0.4519]]) '''
summary
- datasets.ImageFolder
- from sklearn.metrics import roc_auc_score
- nn.AdaptiveMaxPool2d((1,1))
边栏推荐
- Learning of mall permission module
- Postman核心功能解析-参数化和测试报告
- Lesson 1: serpentine matrix
- 50. Pow(x, n). O(logN) Sol
- Binary tree (III) -- heap sort optimization, top k problem
- Three "factions" in the metauniverse
- Oracle hint understanding
- 分布式解决方案之TCC
- Understand the basic concept of datastore in Android kotlin and why SharedPreferences should be stopped in Android
- 链表之双指针(快慢指针,先后指针,首尾指针)
猜你喜欢
Pl/sql basic case
Business introduction of Zhengda international futures company
Metasploit(msf)利用ms17_010(永恒之蓝)出现Encoding::UndefinedConversionError问题
Web3为互联网带来了哪些改变?
A substring with a length of three and different characters in the leetcode simple question
Postman core function analysis - parameterization and test report
Interview questions for famous enterprises: Coins represent a given value
Nacos 的安装与服务的注册
Stored procedures and stored functions
科技云报道:算力网络,还需跨越几道坎?
随机推荐
A substring with a length of three and different characters in the leetcode simple question
C language - structural basis
EasyCVR集群部署如何解决项目中的海量视频接入与大并发需求?
The difference between MVVM and MVC
Oracle is sorted by creation time. If the creation time is empty, the record is placed last
MySQL服务莫名宕机的解决方案
了解 Android Kotlin 中 DataStore 的基本概念以及为什么应该停止在 Android 中使用 SharedPreferences
Sparse array [matrix]
Distance entre les points et les lignes
Oracle advanced query
boundary IoU 的计算方式
2022软件测试工程师涨薪攻略,3年如何达到30K
Golang writes the opening chapter of selenium framework
请求二进制数据和base64格式数据的预览显示
元宇宙中的三大“派系”
南京:全面启用商品房买卖电子合同
Nacos 的安装与服务的注册
Postman核心功能解析-参数化和测试报告
Hcip day 16
2022-07-05:给定一个数组,想随时查询任何范围上的最大值。 如果只是根据初始数组建立、并且以后没有修改, 那么RMQ方法比线段树方法好实现,时间复杂度O(N*logN),额外空间复杂度O(N*