当前位置:网站首页>Data loading and preprocessing
Data loading and preprocessing
2022-07-01 04:46:00 【CyrusMay】
Pytorch( Four ) —— Data preprocessing
1. Use torch.utils.data.Dataset Read the data
- Read data by inheriting this class
The file path is :
import torch
from torch.utils.data import Dataset,DataLoader
import os
import csv
import glob
import random
from PIL import Image
from torchvision import transforms
import visdom
from torchvision.datasets import ImageFolder
class AnimalData(Dataset):
def __init__(self,root,resize = [28,28],mode="train"):
super(AnimalData,self).__init__()
self.root = root
self.resize = resize # [h,w]
# Get the labels of each category according to the name of the subfolder
self.class2label = {
}
for name in sorted(os.listdir(os.path.join(self.root))):
if not os.path.isdir(os.path.join(self.root,name)):
continue
self.class2label[name] = len(self.class2label.keys())
print(self.class2label)
# from csv The storage path and label of the loaded data in the file
images,labels = self.load_csv("animal.csv")
# According to the requirements of the task , Return the data
if mode == "train":
self.images = images[:int(0.6*len(images))]
self.labels = labels[:int(0.6*len(images))]
elif mode == "val":
self.images = images[int(0.6 * len(images)):int(0.8 * len(images))]
self.labels = labels[int(0.6 * len(images)):int(0.8 * len(images))]
elif mode == "test":
self.images = images[int(0.8 * len(images)):]
self.labels = labels[int(0.8 * len(images)):]
def load_csv(self,file_name):
if not os.path.exists(file_name):
images = []
for name in self.class2label.keys():
# glob.glob() Method can match the files in this path , Return to the full path
images += glob.glob(os.path.join(self.root,name,"*.png"))
images += glob.glob(os.path.join(self.root,name,".jpg"))
# Scrambling data
random.shuffle(images)
# write in csv file , Easy to read next time
with open(file_name,"w",encoding="utf-8",newline="") as f:
writer = csv.writer(f)
for path in images:
name = path.split(os.sep)[1]
label = self.class2label[name]
writer.writerow([path,label])
# adopt csv Load data
with open(file_name,"r",encoding="utf-8") as f:
reader = csv.reader(f)
images = []
labels = []
for line in reader:
images.append(line[0])
labels.append(int(line[1]))
return images,labels
# Override the method , Returns the data size
def __len__(self):
return len(self.images)
# Anti standardization , Easy to visualize
def de_normalize(self,x_hat):
mean = torch.tensor([0.485, 0.456, 0.406]).unsqueeze(1).unsqueeze(1)
std = torch.tensor([0.229, 0.224, 0.225]).unsqueeze(1).unsqueeze(1)
x = x_hat *std + mean
return x
# Override the method , return Tensor Format data and labels
def __getitem__(self,idx):
label = torch.tensor(self.labels[idx])
tf = transforms.Compose([
lambda x: Image.open(x).convert("RGB"), # Read the picture
transforms.Resize([int(self.resize[0]*1.25),int(self.resize[1]*1.25)]),
transforms.RandomRotation(15), # Data to enhance
transforms.CenterCrop(self.resize), # Centralized cutting
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
image = tf(self.images[idx])
return image,label
if __name__ == '__main__':
resize = [128,100]
db = AnimalData(root="animal",resize=resize)
{'cat': 0, 'dog': 1, 'rabbit': 2}
2. Use torch.utils.data.DataLoader Load data
if __name__ == '__main__':
resize = [128,100]
db = AnimalData(root="animal",resize=resize)
it_db = iter(db)
vis = visdom.Visdom()
image,label = next(it_db)
vis.image(db.de_normalize(image),win="iter_image",opts=dict(title="iter_image"))
# Using a data loader , Set up batch
loader = DataLoader(dataset=db,batch_size=16,shuffle=True,num_workers=8) # num_workers The parameter is multi thread reading data
for x,y in loader:
vis.images(db.de_normalize(x),win="batch_imags",nrow=4,opts=dict(title="batch"))
3. Use torchvision.datasets.ImageFolder For fast data reading
# ImageFolder The above process can be realized in one step
tf = transforms.Compose([
transforms.Resize([int(resize[0] * 1.25), int(resize[1] * 1.25)]),
transforms.RandomRotation(15), # Data to enhance
transforms.CenterCrop(resize), # Centralized cutting
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
db = ImageFolder(root = "animal",
transform=tf)
by CyrusMay 2022 06 30
How many twists and turns do you have in your life
To go to the other side of happiness
can Live this life without regret
Ordinary but not plain
—————— May day ( Qingkong future )——————
边栏推荐
- [pat (basic level) practice] - [simple simulation] 1064 friends
- C read / write application configuration file app exe. Config and display it on the interface
- Dataloader的使用
- STM32 光敏电阻传感器&两路AD采集
- 分布式-总结列表
- Construction of Meizhou nursing laboratory: equipment configuration
- Difficulties in the development of knowledge map & the importance of building industry knowledge map
- VIM简易使用教程
- 【硬十宝典】——2.【基础知识】开关电源各种拓扑结构的特点
- LM small programmable controller software (based on CoDeSys) note 19: errors do not match the profile of the target
猜你喜欢
Dede collection plug-in does not need to write rules
神经网络的基本骨架-nn.Moudle的使用
The longest increasing subsequence and its optimal solution, total animal weight problem
Maixll dock quick start
2022 a special equipment related management (elevator) simulation test and a special equipment related management (elevator) certificate examination
STM32扩展板 温度传感器和温湿度传感器的使用
Use of dataloader
STM32 extended key scan
2022 tea master (intermediate) examination question bank and tea master (intermediate) examination questions and analysis
Common methods in transforms
随机推荐
解决:拖动xib控件到代码文件中,报错setValue:forUndefinedKey:this class is not key value coding-compliant for the key
神经网络-非线性激活
手动实现一个简单的栈
VIM简易使用教程
Use and modification of prior network model
Pytorch convolution operation
Shell之Unix运维常用命令
Daily algorithm & interview questions, 28 days of special training in large factories - the 13th day (array)
Seven crimes of counting software R & D Efficiency
Thoughts on the construction of Meizhou cell room
【硬十宝典】——2.【基础知识】开关电源各种拓扑结构的特点
LeetCode_ 28 (implement strstr())
pytorch神经网络搭建 模板
All in all, the low code still needs to solve these four problems
2022-02-15 (399. Division evaluation)
Summary of acl2021 information extraction related papers
C -- array
How do I sort a list of strings in dart- How can I sort a list of strings in Dart?
CF1638E. Colorful operations Kodori tree + differential tree array
[FTP] the solution to "227 entering passive mode" during FTP connection