当前位置:网站首页>[pytorch learning notes] datasets and dataloaders
[pytorch learning notes] datasets and dataloaders
2022-07-03 15:00:00 【liiiiiiiiiiiiike】
Why set it separately Dataloaders
pytorch We hope to separate the data set code from the model training code , In order to get better readability and modularity .pytorch Provide two interface functions :torch.utils.data.Dataloader and torch.utils.data.Dataset. Load your own dataset and pytorch Built in datasets .
Load data set
Here is a demonstration of how to torchvision load fashion-mnist An example of a data set , The code is as follows :
import torch
from torch.utils.data import Dataset
from torchvision import datasets
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt
training_data = datasets.FashionMNIST(
root = 'data', # root It's storage training / The path of the test data
train = True, # Specify training or test
download=True, # If data is not available , Then download data from the Internet to root
transform=ToTensor()
)
test_data = datasets.FashionMNIST(
root = 'data',
train = False,
download=True,
transform=ToTensor()
)
Iterative and visual data sets
We can get Datasets Index manually like a list , The visualization code is as follows :
labels_map = {
0: "T-Shirt",
1: "Trouser",
2: "Pullover",
3: "Dress",
4: "Coat",
5: "Sandal",
6: "Shirt",
7: "Sneaker",
8: "Bag",
9: "Ankle Boot",
}
figure = plt.figure(figsize=(8, 8))
cols, rows = 3, 3
for i in range(1, cols * rows + 1):
sample_idx = torch.randint(len(training_data), size=(1,)).item()
img, label = training_data[sample_idx]
figure.add_subplot(rows, cols, i)
plt.title(labels_map[label])
plt.axis("off")
plt.imshow(img.squeeze(), cmap="gray")
plt.show()
Create a custom dataset
Customize dataset Class must implement three functions ,_init、len__ and __getitem. take fashionmnist Images are stored in a directory img_dir, Their labels are stored in a CSV In file . The code is as follows :
import os
import pandas as pd
from torchvision.io import read_image
class CustomImageDataset(Dataset):
def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
# initialization
self.img_labels = pd.read_csv(annotations_file) # Read picture labels
self.img_dir = img_dir # Picture path
self.transform = transform # Picture enhancement field
self.target_transform = target_transform # Label enhanced fields
def __len__(self):
# Returns the number of samples in the dataset
return len(self.img_labels) #
def __getitem__(self, idx):
# Read a single sample and call the image transformation function , Return the transformed function
img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
image = read_image(img_path)
label = self.img_labels.iloc[idx, 1]
if self.transform:
image = self.transform(image)
if self.target_transform:
label = self.target_transform(label)
return image, label
Use DataLoaders Prepare data for training
In training the model , Usually mini-batch To train , At every epoch Reshuffle to reduce model overfitting , And use python multiprocessing Accelerate data retrieval .dataloader Is an iterative object :
from torch.utils.data import DataLoader
train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=64, shuffle=True)
Traverse DataLoader
Data set loaded into DataLoader You can traverse the data set as needed , Each iteration below will return 64 Images and labels .
# Display image and label.
train_features, train_labels = next(iter(train_dataloader)) # 64 Image and label data
print(f"Feature batch shape: {
train_features.size()}")
print(f"Labels batch shape: {
train_labels.size()}")
img = train_features[0].squeeze()# Select the first image , And will batch Dimension compression
label = train_labels[0]
plt.imshow(img, cmap="gray")
plt.show()
print(f"Label: {
label}")
边栏推荐
- QT - draw something else
- Global and Chinese market of trimethylamine 2022-2028: Research Report on technology, participants, trends, market size and share
- On MEM series functions of C language
- QT program font becomes larger on computers with different resolutions, overflowing controls
- The latest M1 dedicated Au update Adobe audit CC 2021 Chinese direct installation version has solved the problems of M1 installation without flash back!
- How does vs+qt set the software version copyright, obtain the software version and display the version number?
- Mmdetection learning rate and batch_ Size relationship
- Unity hierarchical bounding box AABB tree
- [graphics] hair simulation in tressfx
- 【Transform】【NLP】首次提出Transformer,Google Brain团队2017年论文《Attention is all you need》
猜你喜欢
Série yolov5 (i) - - netron, un outil de visualisation de réseau
ASTC texture compression (adaptive scalable texture compression)
Bucket sorting in C language
To improve efficiency or increase costs, how should developers understand pair programming?
[ue4] Niagara's indirect draw
Yolov5 series (I) -- network visualization tool netron
4-20-4-23 concurrent server, TCP state transition;
Adobe Premiere Pro 15.4 has been released. It natively supports Apple M1 and adds the function of speech to text
C string format (decimal point retention / decimal conversion, etc.)
CentOS7部署哨兵Redis(带架构图,清晰易懂)
随机推荐
Qt—绘制其他东西
Leetcode sword offer find the number I (nine) in the sorted array
[opengl] advanced chapter of texture - principle of flowmap
Global and Chinese markets for infrared solutions (for industrial, civil, national defense and security applications) 2022-2028: Research Report on technology, participants, trends, market size and sh
Yolov5进阶之九 目标追踪实例1
[ue4] HISM large scale vegetation rendering solution
C language to implement a password manager (under update)
1017 a divided by B (20 points)
Composite type (custom type)
Tensor 省略号(三个点)切片
Global and Chinese market of Bus HVAC systems 2022-2028: Research Report on technology, participants, trends, market size and share
什么是embedding(把物体编码为一个低维稠密向量),pytorch中nn.Embedding原理及使用
Rasterization: a practical implementation (2)
Incluxdb2 buckets create database
Global and Chinese market of lighting control components 2022-2028: Research Report on technology, participants, trends, market size and share
C language memory function
Detailed explanation of four modes of distributed transaction (Seata)
Container of symfony
To improve efficiency or increase costs, how should developers understand pair programming?
Byte practice surface longitude