当前位置:网站首页>Dataset partitioning script for classification tasks
Dataset partitioning script for classification tasks
2022-07-01 11:41:00 【Jiangxiaobai JLJ】
This script realizes the data set division of classification tasks , With a little modification , It can also be used for target detection to divide training sets 、 Verification set 、 Test set .
import os
from shutil import copy, rmtree
import random
def mk_file(file_path: str):
if os.path.exists(file_path):
# If the folder exists , First delete the original folder and all the files it contains , And re create a new folder
rmtree(file_path)
os.makedirs(file_path)
def main():
# Make sure that random repeatable
random.seed(0)
# Centralize data 10% Divide the data into validation sets
split_rate = 0.1
# Point to the main directory where classified image data is stored
cwd = os.getcwd()
data_root = os.path.join(cwd, "flower_data")
origin_flower_path = os.path.join(data_root, "flower_photos")
assert os.path.exists(origin_flower_path), "path '{}' does not exist.".format(origin_flower_path)
# Get the address information of all folders under the home directory
flower_class = [cla for cla in os.listdir(origin_flower_path)
if os.path.isdir(os.path.join(origin_flower_path, cla))]
# Saved training set folder
train_root = os.path.join(data_root, "train")
mk_file(train_root)
for cla in flower_class:
# Create a folder for each category
mk_file(os.path.join(train_root, cla))
# Create a folder to save the validation set
val_root = os.path.join(data_root, "val")
mk_file(val_root)
for cla in flower_class:
# Create a folder for each category
mk_file(os.path.join(val_root, cla))
# Classification according to data set category
for cla in flower_class:
cla_path = os.path.join(origin_flower_path, cla)
images = os.listdir(cla_path)
num = len(images)
# Index of random sampling verification set
eval_index = random.sample(images, k=int(num*split_rate))
for index, image in enumerate(images):
if image in eval_index:
# Copy the files assigned to the validation set to the corresponding directory
image_path = os.path.join(cla_path, image)
new_path = os.path.join(val_root, cla)
copy(image_path, new_path) # copy( The original path , New path )
else:
# Copy the files assigned to the training set to the corresponding directory
image_path = os.path.join(cla_path, image)
new_path = os.path.join(train_root, cla)
copy(image_path, new_path)
print("\r[{}] processing [{}/{}]".format(cla, index+1, num), end="") # processing bar
print()
print("processing done!")
if __name__ == '__main__':
main()
边栏推荐
- Brief explanation of the working principle, usage scenarios and importance of fingerprint browser
- The developer said, "this doesn't need to be tested, just return to the normal process". What about the testers?
- 树莓派4B安装tensorflow2.0[通俗易懂]
- 8 best practices to protect your IAC security!
- JS日期格式化转换方法
- CAD如何设置标注小数位
- TEMPEST HDMI泄漏接收 3
- Binary stack (I) - principle and C implementation
- Are the consequences of securities account cancellation safe
- Tempest HDMI leak receive 5
猜你喜欢

Theoretical basis of graph

ACLY与代谢性疾病
![[Maui] add click events for label, image and other controls](/img/d6/7ac9632681c970ed99c9e4d3934ddc.jpg)
[Maui] add click events for label, image and other controls

陈珙:微服务,它还那么纯粹吗?

二叉堆(一) - 原理与C实现

索引失效的几种情况

名创拟7月13日上市:最高发行价22.1港元 单季净利下降19%

Redis的攻击手法

Oneconnect plans to be listed in Hong Kong on July 4: a loss of nearly 3 billion in two years, with a market capitalization evaporation of more than 90%

Jd.com renewed its cooperation with Tencent: issuing class A shares to Tencent with a maximum value of US $220million
随机推荐
epoll介绍
TEMPEST HDMI泄漏接收 3
sshd_config 中 PermitRootLogin 的探讨
redis中value/SortedSet
伸展树(一) - 概念和C实现
Can solo be accessed through IPv6?
Unittest框架中测试用例编写规范以及如何运行测试用例
[buuctf.reverse] 144_ [xman2018 qualifying]easyvm
Exploration and practice of inress in kubernetes
优雅地翻转数组
深入理解 grpc part1
微信小程序开发 – 用户授权登陆「建议收藏」
Binary stack (I) - principle and C implementation
TEMPEST HDMI泄漏接收 5
2022/6/30学习总结
力扣首页简介动画
CAD如何设置标注小数位
Exposure: a white box photo post processing framework reading notes
商汤进入解禁期:核心管理层自愿禁售 强化公司长期价值信心
树莓派4B安装tensorflow2.0[通俗易懂]