当前位置:网站首页>三、如何搞自定义数据集?
三、如何搞自定义数据集?
2022-07-29 05:22:00 【MY头发乱了】
前言
MNIST数据这个最最基础的数据集已经被走在程序猿道路上的同学们玩坏了,所以今天教大家如何搞一个自定义数据集。
一、定义的数据集,未做预处理。
下面展示一些 内联代码片。
import os
from torch.utils.data import Dataset ,DataLoader
from PIL import Image
#1.创建数据集类,使用torch.utils.data中的Dataset方法。
class My_Dataset(Dataset):
#2.循环找到文件路径,并添加标签
def __init__(self,main_dir,data_type,transforms):
self.dataset=[]#空列表为装新增一个标签的数据库
self.transforms=transforms
if data_type==0:
data_filename='train'
elif data_type is 1:
data_filename='val'
else:
data_filename='test'
for i , cls_filename in enumerate(
os.listdir(os.path.join(main_dir,data_filename))):
for i ,img_data in enumerate(os.listdir(
os.path.join(main_dir,data_filename,cls_filename))):
self.dataset.append([os.path.join(main_dir,
data_filename,cls_filename,img_data),int(img_data[0]) ])
#3.计算图片长度,方便后面迭代
def __len__(self):
return len(self.dataset)#为了获取图片长度,方便迭代
#4、取出图片路径,并打开,便于做数据预处理
def __getitem__(self, index):
img,label=self.dataset[index]
img_data=Image.open(img)
img_data=self.transforms(img_data)
return img_data,label示例:@TOC
二、定义数据集,并做数据预处理。
包括旋转、裁剪、转为张量、扩大、正则化等等。
1.预处理部分
#4、取出图片路径,并打开,便于做数据预处理
def __getitem__(self, index):
img,label=self.dataset[index]
img_data=self.data_process(Image.open(img))
return img_data,label
#5.数据处理,数据增强、加噪声等等
def data_process(self,x):
return transforms.Compose([transforms.ToTensor(),
transforms.Normalize(mean=(0.5,),std=(0.5,))])(x)2.定义数据过程
代码如下(示例):
import os
from torchvision import transforms
from torch.utils.data import Dataset ,DataLoader
from PIL import Image
#1.创建数据集类,使用torch.utils.data中的Dataset方法。
class My_Dataset(Dataset):
#2.循环找到文件路径,并添加标签
def __init__(self,main_dir,data_type):
self.dataset=[]#空列表为装新增一个标签的数据库
if data_type==0:
data_filename='train'
elif data_type is 1:
data_filename='val'
else:
data_filename='test'
for i , cls_filename in enumerate(
os.listdir(os.path.join(main_dir,data_filename))):
for i ,img_data in enumerate(os.listdir(
os.path.join(main_dir,data_filename,cls_filename))):
self.dataset.append([os.path.join(main_dir,
data_filename,cls_filename,img_data),i ])
#3.计算图片长度,方便后面迭代
def __len__(self):
return len(self.dataset)#为了获取图片长度,方便迭代
#4、取出图片路径,并打开,便于做数据预处理
def __getitem__(self, index):
img,label=self.dataset[index]
img_data=self.data_process(Image.open(img))
return img_data,label
#5.数据处理,数据增强、加噪声等等
def data_process(self,x):
return transforms.Compose([transforms.ToTensor(),
transforms.Normalize(mean=(0.5,),std=(0.5,))])(x)
``边栏推荐
- 【bug】XLRDError: Excel xlsx file; not supported
- [semantic segmentation] Introduction to mapillary dataset
- The differences and reasons between MySQL with and without quotation marks when querying string types
- ANR优化:导致 OOM 崩溃及相对应的解决方案
- 【TensorRT】将 PyTorch 转化为可部署的 TensorRT
- 性能优化之趣谈线程池:线程开的越多就越好吗?
- How to obtain openid of wechat applet in uni app project
- 五、图像像素统计
- 【Transformer】TransMix: Attend to Mix for Vision Transformers
- [semantic segmentation] setr_ Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformer
猜你喜欢

ASM插桩:学完ASM Tree api,再也不用怕hook了

【DL】关于tensor(张量)的介绍和理解
![[database] database course design - vaccination database](/img/4d/e8aff67e3c643fae651c9f62af2db9.png)
[database] database course design - vaccination database

DataX installation

PyTorch的数据读取机制
![[convolution kernel design] scaling up your kernels to 31x31: revising large kernel design in CNN](/img/71/f3fdf677cd5fddefffd4715e747297.png)
[convolution kernel design] scaling up your kernels to 31x31: revising large kernel design in CNN

【语义分割】Fully Attentional Network for Semantic Segmentation

预训练语言模型的使用方法

【Transformer】AdaViT: Adaptive Vision Transformers for Efficient Image Recognition

【语义分割】SETR_Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformer
随机推荐
【Transformer】SegFormer:Simple and Efficient Design for Semantic Segmentation with Transformers
Personal learning website
isAccessible()方法:使用反射技巧让你的性能提升数倍
Flutter正在被悄悄放弃?浅析Flutter的未来
【目标检测】Generalized Focal Loss V1
[convolution kernel design] scaling up your kernels to 31x31: revising large kernel design in CNN
[target detection] KL loss: bounding box progression with uncertainty for accurate object detection
并发编程学习笔记 之 原子操作类AtomicReference、AtomicStampedReference详解
"Full flash measurement" database acceleration solution
Spring, summer, autumn and winter with Miss Zhang (3)
torch.nn.Parameter()函数理解
Reporting service 2016 custom authentication
电脑视频暂停再继续,声音突然变大
The difference between asyncawait and promise
研究生新生培训第三周:ResNet+ResNeXt
anaconda中移除旧环境、增加新环境、查看环境、安装库、清理缓存等操作命令
The differences and reasons between MySQL with and without quotation marks when querying string types
【语义分割】Fully Attentional Network for Semantic Segmentation
【比赛网站】收集机器学习/深度学习比赛网站(持续更新)
【语义分割】SETR_Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformer