当前位置:网站首页>[Pytorch study notes] 11. Take a subset of the Dataset and shuffle the order of the Dataset (using Subset, random_split)
[Pytorch study notes] 11. Take a subset of the Dataset and shuffle the order of the Dataset (using Subset, random_split)
2022-08-05 05:42:00 【takedachia】
(pytorch版本:1.2)
我们在使用Dataset定义好数据集后,These problems are often encountered when dealing with datasets:如何把Dataset拆分成两个子集(as used to specify training and test sets、k折交叉验证等)?How to do random splits?How to scramble oneDataset内数据的顺序?
Dataset取子集、拆分
使用 torch.utils.data.Subset() Data sets can be subsetted.
传入一个Dataset,A sequence sliceindices,to get a subset.
1.我们可以传入一个range():
indices = range(18353) # Take the label as the first0个到第18352个数据
sub_imgs = torch.utils.data.Subset(imgs, indices)
len(imgs), len(sub_imgs)
2.interval can be taken:
indices = range(18353, 27153) # Take the label as the first18353个到第27152个数据
sub_imgs = torch.utils.data.Subset(imgs, indices)
len(imgs), len(sub_imgs)
3.可以传入一个List.有ListYou can use list comprehensions:
indices = [x for x in range(1234)]
sub_imgs = torch.utils.data.Subset(imgs, indices)
len(imgs), len(sub_imgs)
打乱Dataset内数据的顺序
We can pass in an out-of-order one directlyindexIt can achieve the purpose of out-of-order data set:
from torch import randperm
lenth = randperm(len(Leaf_dataset_train)).tolist() # Generate out-of-order indexes
rand_train = torch.utils.data.Subset(imgs, lenth)
# Show the first image、original label
X = rand_train[0]
plt.imshow(torch.transpose(X[0],0,2)), lenth[0]
After we shuffle the order, we can take subsets to perform on the datasetkfold cross-validation and other behaviors.
随机拆分Dataset
使用 torch.utils.data.random_split() The dataset can be split directly,Randomly divided into multiple portions.
可以传入一个List,注意传入的ListThe size of each subset is included in the sequence(数量),And the sum of these numbers must be等于传入Dataset的长度.
示例:
# 这里Leaf_dataset_trainmust be equal in size 17000+1353
train_set, test_set = torch.utils.data.random_split(Leaf_dataset_train, [17000, 1353])
print(len(train_set), len(test_set))
边栏推荐
- 【Kaggle项目实战记录】一个图片分类项目的步骤和思路分享——以树叶分类为例(用Pytorch)
- ES6 新特性:Class 的继承
- Flink accumulator Counter 累加器 和 计数器
- SharedPreferences和SQlite数据库
- 基于STM32F407的一个温度传感器报警系统(用的是DS18B20温度传感器,4针0.96寸OLED显示屏,并且附带日期显示)
- 拿出接口数组对象中的所有name值,取出同一个值
- 大型Web网站高并发架构方案
- 2021电赛资源及经验总结
- 学习总结week3_3迭代器_模块
- The University of Göttingen proposed CLIPSeg, a model that can perform three segmentation tasks at the same time
猜你喜欢
2021电赛资源及经验总结
AIDL detailed explanation
【MySQL】数据库多表链接的查询方式
基于STM32F407的一个温度传感器报警系统(用的是DS18B20温度传感器,4针0.96寸OLED显示屏,并且附带日期显示)
【After a while 6】Machine vision video 【After a while 2 was squeezed out】
It turns out that the MAE proposed by He Yuming is still a kind of data enhancement
记我的第一篇CCF-A会议论文|在经历六次被拒之后,我的论文终于中啦,耶!
【Pytorch学习笔记】10.如何快速创建一个自己的Dataset数据集对象(继承Dataset类并重写对应方法)
华科提出首个用于伪装实例分割的一阶段框架OSFormer
【Pytorch学习笔记】9.分类器的分类结果如何评估——使用混淆矩阵、F1-score、ROC曲线、PR曲线等(以Softmax二分类为例)
随机推荐
关于基于若依框架的路由跳转
CVPR最佳论文得主清华黄高团队提出首篇动态网络综述
WCH系列芯片CoreMark跑分
AWS 常用服务
【Over 15】A week of learning lstm
【数据库和SQL学习笔记】6.SELECT查询4:嵌套查询、对查询结果进行操作
Kubernetes常备技能
服务网格istio 1.12.x安装
通过Flink-Sql将Kafka数据写入HDFS
【22李宏毅机器学习】课程大纲概述
el-pagination左右箭头替换成文字上一页和下一页
tensorflow的session和内存溢出
【数据库和SQL学习笔记】10.(T-SQL语言)函数、存储过程、触发器
Flink EventTime和Watermarks案例分析
Tensorflow2 与 Pytorch 在张量Tensor基础操作方面的对比整理汇总
It turns out that the MAE proposed by He Yuming is still a kind of data enhancement
如何编写一个优雅的Shell脚本(一)
伪RTOS-ProroThread在CH573芯片上的移植
RecycleView和ViewPager2
CVPR2020 - 自校准卷积