当前位置:网站首页>[pytorch learning notes] datasets and dataloaders
[pytorch learning notes] datasets and dataloaders
2022-07-03 15:00:00 【liiiiiiiiiiiiike】
Why set it separately Dataloaders
pytorch We hope to separate the data set code from the model training code , In order to get better readability and modularity .pytorch Provide two interface functions :torch.utils.data.Dataloader and torch.utils.data.Dataset. Load your own dataset and pytorch Built in datasets .
Load data set
Here is a demonstration of how to torchvision load fashion-mnist An example of a data set , The code is as follows :
import torch
from torch.utils.data import Dataset
from torchvision import datasets
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt
training_data = datasets.FashionMNIST(
root = 'data', # root It's storage training / The path of the test data
train = True, # Specify training or test
download=True, # If data is not available , Then download data from the Internet to root
transform=ToTensor()
)
test_data = datasets.FashionMNIST(
root = 'data',
train = False,
download=True,
transform=ToTensor()
)
Iterative and visual data sets
We can get Datasets Index manually like a list , The visualization code is as follows :
labels_map = {
0: "T-Shirt",
1: "Trouser",
2: "Pullover",
3: "Dress",
4: "Coat",
5: "Sandal",
6: "Shirt",
7: "Sneaker",
8: "Bag",
9: "Ankle Boot",
}
figure = plt.figure(figsize=(8, 8))
cols, rows = 3, 3
for i in range(1, cols * rows + 1):
sample_idx = torch.randint(len(training_data), size=(1,)).item()
img, label = training_data[sample_idx]
figure.add_subplot(rows, cols, i)
plt.title(labels_map[label])
plt.axis("off")
plt.imshow(img.squeeze(), cmap="gray")
plt.show()

Create a custom dataset
Customize dataset Class must implement three functions ,_init、len__ and __getitem. take fashionmnist Images are stored in a directory img_dir, Their labels are stored in a CSV In file . The code is as follows :
import os
import pandas as pd
from torchvision.io import read_image
class CustomImageDataset(Dataset):
def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
# initialization
self.img_labels = pd.read_csv(annotations_file) # Read picture labels
self.img_dir = img_dir # Picture path
self.transform = transform # Picture enhancement field
self.target_transform = target_transform # Label enhanced fields
def __len__(self):
# Returns the number of samples in the dataset
return len(self.img_labels) #
def __getitem__(self, idx):
# Read a single sample and call the image transformation function , Return the transformed function
img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
image = read_image(img_path)
label = self.img_labels.iloc[idx, 1]
if self.transform:
image = self.transform(image)
if self.target_transform:
label = self.target_transform(label)
return image, label
Use DataLoaders Prepare data for training
In training the model , Usually mini-batch To train , At every epoch Reshuffle to reduce model overfitting , And use python multiprocessing Accelerate data retrieval .dataloader Is an iterative object :
from torch.utils.data import DataLoader
train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=64, shuffle=True)
Traverse DataLoader
Data set loaded into DataLoader You can traverse the data set as needed , Each iteration below will return 64 Images and labels .
# Display image and label.
train_features, train_labels = next(iter(train_dataloader)) # 64 Image and label data
print(f"Feature batch shape: {
train_features.size()}")
print(f"Labels batch shape: {
train_labels.size()}")
img = train_features[0].squeeze()# Select the first image , And will batch Dimension compression
label = train_labels[0]
plt.imshow(img, cmap="gray")
plt.show()
print(f"Label: {
label}")

边栏推荐
- C string format (decimal point retention / decimal conversion, etc.)
- 牛客 BM83 字符串變形(大小寫轉換,字符串反轉,字符串替換)
- 从书本《皮囊》摘录的几个句子
- C language to implement a password manager (under update)
- NOI OPENJUDGE 1.4(15)
- 5-1 blocking / non blocking, synchronous / asynchronous
- [wechat applet] wxss template style
- Dllexport et dllimport
- Optical cat super account password and broadband account password acquisition
- Global and Chinese markets for indoor HDTV antennas 2022-2028: Research Report on technology, participants, trends, market size and share
猜你喜欢

Vs+qt multithreading implementation -- run and movetothread

远程服务器后台挂起 nohup

High quality workplace human beings must use software to recommend, and you certainly don't know the last one

Vs+qt application development, set software icon icon

Introduction to opengl4.0 tutorial computing shaders

My QT learning path -- how qdatetimeedit is empty
![[engine development] rendering architecture and advanced graphics programming](/img/a4/3526a4e0f68e49c1aa5ce23b578781.jpg)
[engine development] rendering architecture and advanced graphics programming

Open under vs2019 UI file QT designer flash back problem

dllexport和dllimport

5.2-5.3
随机推荐
Write a 2-minute countdown.
Class part2
Global and Chinese market of solder bars 2022-2028: Research Report on technology, participants, trends, market size and share
Tensor 省略号(三个点)切片
Incluxdb2 buckets create database
4-29——4.32
远程服务器后台挂起 nohup
C language fcntl function
什么是embedding(把物体编码为一个低维稠密向量),pytorch中nn.Embedding原理及使用
To improve efficiency or increase costs, how should developers understand pair programming?
Optical cat super account password and broadband account password acquisition
Troubleshooting method of CPU surge
从书本《皮囊》摘录的几个句子
PS tips - draw green earth with a brush
High quality workplace human beings must use software to recommend, and you certainly don't know the last one
Global and Chinese markets for transparent OLED displays 2022-2028: Research Report on technology, participants, trends, market size and share
Talking about part of data storage in C language
cpu飙升排查方法
5.4-5.5
Tensor ellipsis (three points) slice