当前位置:网站首页>[Pytorch study notes] 11. Take a subset of the Dataset and shuffle the order of the Dataset (using Subset, random_split)
[Pytorch study notes] 11. Take a subset of the Dataset and shuffle the order of the Dataset (using Subset, random_split)
2022-08-05 05:42:00 【takedachia】
(pytorch版本:1.2)
我们在使用Dataset定义好数据集后,These problems are often encountered when dealing with datasets:如何把Dataset拆分成两个子集(as used to specify training and test sets、k折交叉验证等)?How to do random splits?How to scramble oneDataset内数据的顺序?
Dataset取子集、拆分
使用 torch.utils.data.Subset() Data sets can be subsetted.
传入一个Dataset,A sequence sliceindices,to get a subset.
1.我们可以传入一个range():
indices = range(18353) # Take the label as the first0个到第18352个数据
sub_imgs = torch.utils.data.Subset(imgs, indices)
len(imgs), len(sub_imgs)

2.interval can be taken:
indices = range(18353, 27153) # Take the label as the first18353个到第27152个数据
sub_imgs = torch.utils.data.Subset(imgs, indices)
len(imgs), len(sub_imgs)

3.可以传入一个List.有ListYou can use list comprehensions:
indices = [x for x in range(1234)]
sub_imgs = torch.utils.data.Subset(imgs, indices)
len(imgs), len(sub_imgs)

打乱Dataset内数据的顺序
We can pass in an out-of-order one directlyindexIt can achieve the purpose of out-of-order data set:
from torch import randperm
lenth = randperm(len(Leaf_dataset_train)).tolist() # Generate out-of-order indexes
rand_train = torch.utils.data.Subset(imgs, lenth)
# Show the first image、original label
X = rand_train[0]
plt.imshow(torch.transpose(X[0],0,2)), lenth[0]

After we shuffle the order, we can take subsets to perform on the datasetkfold cross-validation and other behaviors.
随机拆分Dataset
使用 torch.utils.data.random_split() The dataset can be split directly,Randomly divided into multiple portions.
可以传入一个List,注意传入的ListThe size of each subset is included in the sequence(数量),And the sum of these numbers must be等于传入Dataset的长度.
示例:
# 这里Leaf_dataset_trainmust be equal in size 17000+1353
train_set, test_set = torch.utils.data.random_split(Leaf_dataset_train, [17000, 1353])
print(len(train_set), len(test_set))

边栏推荐
- 大型Web网站高并发架构方案
- 门徒Disciples体系:致力于成为“DAO世界”中的集大成者。
- AIDL detailed explanation
- Flink 状态与容错 ( state 和 Fault Tolerance)
- 基于STM32F407的WIFI通信(使用的是ESP8266模块)
- 怎么更改el-table-column的边框线
- ECCV2022 | RU&谷歌提出用CLIP进行zero-shot目标检测!
- [Practice 1] Diabetes Genetic Risk Detection Challenge [IFLYTEK Open Platform]
- MySQL
- Flink EventTime和Watermarks案例分析
猜你喜欢
随机推荐
Flink Distributed Cache 分布式缓存
Web Component-处理数据
哥廷根大学提出CLIPSeg,能同时作三个分割任务的模型
[Pytorch study notes] 9. How to evaluate the classification results of the classifier - using confusion matrix, F1-score, ROC curve, PR curve, etc. (taking Softmax binary classification as an example)
Detailed explanation of BroadCast Receiver (broadcast)
It turns out that the MAE proposed by He Yuming is still a kind of data enhancement
MSRA提出学习实例和分布式视觉表示的极端掩蔽模型ExtreMA
怎样在Disciples门徒获得收益?
盘点关于发顶会顶刊论文,你需要知道写作上的这些事情!
Flink HA配置
关于使用QML的MediaPlayer实现视频和音频的播放时遇到的一些坑
HQL statement execution process
CVPR 2022 |节省70%的显存,训练速度提高2倍
flink yarn-session的两种使用方式
鼠标放上去变成销售效果
[Pytorch study notes] 10. How to quickly create your own Dataset dataset object (inherit the Dataset class and override the corresponding method)
el-pagination分页分页设置
Mysql-连接https域名的Mysql数据源踩的坑
SharedPreferences and SQlite database
【Pytorch学习笔记】8.训练类别不均衡数据时,如何使用WeightedRandomSampler(权重采样器)






