当前位置:网站首页>用于分类任务的数据集划分脚本
用于分类任务的数据集划分脚本
2022-07-01 11:21:00 【江小白jlj】
该脚本实现分类任务的数据集划分,稍加修改下,也可用于目标检测划分训练集、验证集、测试集。
import os
from shutil import copy, rmtree
import random
def mk_file(file_path: str):
if os.path.exists(file_path):
# 如果文件夹存在,则先删除原文件夹及其所包含的所有文件,并重新创建新的文件夹
rmtree(file_path)
os.makedirs(file_path)
def main():
# 保证随机可复现
random.seed(0)
# 将数据集中10%的数据划分到验证集中
split_rate = 0.1
# 指向存放分类图像数据的主目录
cwd = os.getcwd()
data_root = os.path.join(cwd, "flower_data")
origin_flower_path = os.path.join(data_root, "flower_photos")
assert os.path.exists(origin_flower_path), "path '{}' does not exist.".format(origin_flower_path)
# 获取主目录下所有文件夹的地址信息
flower_class = [cla for cla in os.listdir(origin_flower_path)
if os.path.isdir(os.path.join(origin_flower_path, cla))]
# 建立保存训练集的文件夹
train_root = os.path.join(data_root, "train")
mk_file(train_root)
for cla in flower_class:
# 建立每个类别对应的文件夹
mk_file(os.path.join(train_root, cla))
# 建立保存验证集的文件夹
val_root = os.path.join(data_root, "val")
mk_file(val_root)
for cla in flower_class:
# 建立每个类别对应的文件夹
mk_file(os.path.join(val_root, cla))
# 根据数据集类别进行划分
for cla in flower_class:
cla_path = os.path.join(origin_flower_path, cla)
images = os.listdir(cla_path)
num = len(images)
# 随机采样验证集的索引
eval_index = random.sample(images, k=int(num*split_rate))
for index, image in enumerate(images):
if image in eval_index:
# 将分配至验证集中的文件复制到相应目录
image_path = os.path.join(cla_path, image)
new_path = os.path.join(val_root, cla)
copy(image_path, new_path) # copy(原始路径,新的路径)
else:
# 将分配至训练集中的文件复制到相应目录
image_path = os.path.join(cla_path, image)
new_path = os.path.join(train_root, cla)
copy(image_path, new_path)
print("\r[{}] processing [{}/{}]".format(cla, index+1, num), end="") # processing bar
print()
print("processing done!")
if __name__ == '__main__':
main()
边栏推荐
- 为什么一定要从DevOps走向BizDevOps?
- Share the method of how to preview PSD format and PSD file thumbnail plug-in [easy to understand]
- Neo4j 中文开发者月刊 - 202206期
- 【MAUI】为 Label、Image 等控件添加点击事件
- Win平台下influxDB导出、导入
- 证券账户销户后果 开户安全吗
- Cann operator: using iterators to efficiently realize tensor data cutting and blocking processing
- Ultra detailed black apple installation graphic tutorial sent to EFI configuration collection and system
- Compile and debug net6 source code
- Dameng data rushes to the scientific innovation board: it plans to raise 2.4 billion yuan. Feng Yucai was once a professor of Huake
猜你喜欢

Network security learning notes 01 network security foundation

TEMPEST HDMI泄漏接收 5

Exposure:A White-Box Photo Post-Processing Framework阅读札记

2022/6/30学习总结

英特爾實驗室公布集成光子學研究新進展

CPI tutorial - asynchronous interface creation and use

妙啊!MarkBERT

Yoda unified data application -- Exploration and practice of fusion computing in ant risk scenarios

redis配置环境变量

The idea runs with an error command line is too long Shorten command line for...
随机推荐
商汤进入解禁期:核心管理层自愿禁售 强化公司长期价值信心
Flip the array gracefully
Introduction to unittest framework and the first demo
金融壹账通拟7月4日香港上市:2年亏近30亿 市值蒸发超90%
内核同步机制
(POJ - 1456) supermarket
Compile and debug net6 source code
Continuous delivery -pipeline getting started
Export and import of incluxdb on WIN platform
Activity workflow engine
英特尔实验室公布集成光子学研究新进展
BAIC bluevale: performance under pressure, extremely difficult period
CPI tutorial - asynchronous interface creation and use
kubernetes之ingress探索实践
Tempest HDMI leak reception 4
escape sequence
Tianrunyun, invested by Tian Suning, was listed: its market value was 2.2 billion Hong Kong, and its first year profit decreased by 75%
redis中value/set
redis中value/list
Can I open an account today and buy stocks today? Is it safe to open an account online?