当前位置:网站首页>用于分类任务的数据集划分脚本
用于分类任务的数据集划分脚本
2022-07-01 11:21:00 【江小白jlj】
该脚本实现分类任务的数据集划分,稍加修改下,也可用于目标检测划分训练集、验证集、测试集。
import os
from shutil import copy, rmtree
import random
def mk_file(file_path: str):
if os.path.exists(file_path):
# 如果文件夹存在,则先删除原文件夹及其所包含的所有文件,并重新创建新的文件夹
rmtree(file_path)
os.makedirs(file_path)
def main():
# 保证随机可复现
random.seed(0)
# 将数据集中10%的数据划分到验证集中
split_rate = 0.1
# 指向存放分类图像数据的主目录
cwd = os.getcwd()
data_root = os.path.join(cwd, "flower_data")
origin_flower_path = os.path.join(data_root, "flower_photos")
assert os.path.exists(origin_flower_path), "path '{}' does not exist.".format(origin_flower_path)
# 获取主目录下所有文件夹的地址信息
flower_class = [cla for cla in os.listdir(origin_flower_path)
if os.path.isdir(os.path.join(origin_flower_path, cla))]
# 建立保存训练集的文件夹
train_root = os.path.join(data_root, "train")
mk_file(train_root)
for cla in flower_class:
# 建立每个类别对应的文件夹
mk_file(os.path.join(train_root, cla))
# 建立保存验证集的文件夹
val_root = os.path.join(data_root, "val")
mk_file(val_root)
for cla in flower_class:
# 建立每个类别对应的文件夹
mk_file(os.path.join(val_root, cla))
# 根据数据集类别进行划分
for cla in flower_class:
cla_path = os.path.join(origin_flower_path, cla)
images = os.listdir(cla_path)
num = len(images)
# 随机采样验证集的索引
eval_index = random.sample(images, k=int(num*split_rate))
for index, image in enumerate(images):
if image in eval_index:
# 将分配至验证集中的文件复制到相应目录
image_path = os.path.join(cla_path, image)
new_path = os.path.join(val_root, cla)
copy(image_path, new_path) # copy(原始路径,新的路径)
else:
# 将分配至训练集中的文件复制到相应目录
image_path = os.path.join(cla_path, image)
new_path = os.path.join(train_root, cla)
copy(image_path, new_path)
print("\r[{}] processing [{}/{}]".format(cla, index+1, num), end="") # processing bar
print()
print("processing done!")
if __name__ == '__main__':
main()
边栏推荐
- 分享psd格式怎么预览的方法和psd文件缩略图插件[通俗易懂]
- escape sequence
- redis中value/String
- VScode快捷键(最全)[通俗易懂]
- “目标检测”+“视觉理解”实现对输入图像的理解及翻译(附源代码)
- 华为设备配置大型网络WLAN基本业务
- Export and import of incluxdb on WIN platform
- The first anniversary of the data security law, which four major changes are coming?
- 商汤进入解禁期:核心管理层自愿禁售 强化公司长期价值信心
- 树莓派4B安装tensorflow2.0[通俗易懂]
猜你喜欢
Tempest HDMI leak receive 5
CVPR22 |CMT:CNN和Transformer的高效结合(开源)
Introduction to unittest framework and the first demo
Oneconnect plans to be listed in Hong Kong on July 4: a loss of nearly 3 billion in two years, with a market capitalization evaporation of more than 90%
Exploration and practice of inress in kubernetes
Google's new paper Minerva: solving quantitative reasoning problems with language models
Wonderful! MarkBERT
华为设备配置大型网络WLAN基本业务
"Target detection" + "visual understanding" to realize the understanding and translation of the input image (with source code)
Tempest HDMI leak receive 3
随机推荐
2022/6/30学习总结
索引失效的几种情况
金鱼哥RHCA回忆录:DO447使用Ansible与API通信--使用Ansible Tower API启动作业
Kafuka learning path (I) Kafuka installation and simple use
Yoda unified data application -- Exploration and practice of fusion computing in ant risk scenarios
Huawei Equipment configure les services de base du réseau WLAN à grande échelle
kafuka学习之路(一)kafuka安装和简单使用
京东与腾讯续签合作:向腾讯发行A类股 价值最高达2.2亿美元
Ten years of sharpening a sword: unveiling the secrets of ant group's observability platform antmonitor
关于Keil编译程序出现“File has been changed outside the editor,reload?”的解决方法
Unittest框架中跳过要执行的测试用例
Leetcode 181 Employees exceeding the manager's income (June 29, 2022)
Paxos 入门
放弃深圳高薪工作回老家
Neurips 2022 | cell image segmentation competition officially launched!
JS日期格式化转换方法
Network security learning notes 01 network security foundation
Xiaomi mobile phone unlocking BL tutorial
TEMPEST HDMI泄漏接收 4
escape sequence