当前位置:网站首页>Dataset partitioning script for classification tasks
Dataset partitioning script for classification tasks
2022-07-01 11:41:00 【Jiangxiaobai JLJ】
This script realizes the data set division of classification tasks , With a little modification , It can also be used for target detection to divide training sets 、 Verification set 、 Test set .
import os
from shutil import copy, rmtree
import random
def mk_file(file_path: str):
if os.path.exists(file_path):
# If the folder exists , First delete the original folder and all the files it contains , And re create a new folder
rmtree(file_path)
os.makedirs(file_path)
def main():
# Make sure that random repeatable
random.seed(0)
# Centralize data 10% Divide the data into validation sets
split_rate = 0.1
# Point to the main directory where classified image data is stored
cwd = os.getcwd()
data_root = os.path.join(cwd, "flower_data")
origin_flower_path = os.path.join(data_root, "flower_photos")
assert os.path.exists(origin_flower_path), "path '{}' does not exist.".format(origin_flower_path)
# Get the address information of all folders under the home directory
flower_class = [cla for cla in os.listdir(origin_flower_path)
if os.path.isdir(os.path.join(origin_flower_path, cla))]
# Saved training set folder
train_root = os.path.join(data_root, "train")
mk_file(train_root)
for cla in flower_class:
# Create a folder for each category
mk_file(os.path.join(train_root, cla))
# Create a folder to save the validation set
val_root = os.path.join(data_root, "val")
mk_file(val_root)
for cla in flower_class:
# Create a folder for each category
mk_file(os.path.join(val_root, cla))
# Classification according to data set category
for cla in flower_class:
cla_path = os.path.join(origin_flower_path, cla)
images = os.listdir(cla_path)
num = len(images)
# Index of random sampling verification set
eval_index = random.sample(images, k=int(num*split_rate))
for index, image in enumerate(images):
if image in eval_index:
# Copy the files assigned to the validation set to the corresponding directory
image_path = os.path.join(cla_path, image)
new_path = os.path.join(val_root, cla)
copy(image_path, new_path) # copy( The original path , New path )
else:
# Copy the files assigned to the training set to the corresponding directory
image_path = os.path.join(cla_path, image)
new_path = os.path.join(train_root, cla)
copy(image_path, new_path)
print("\r[{}] processing [{}/{}]".format(cla, index+1, num), end="") # processing bar
print()
print("processing done!")
if __name__ == '__main__':
main()
边栏推荐
- 内核同步机制
- 华为HMS Core携手超图为三维GIS注入新动能
- Continuous delivery -pipeline getting started
- Value/list in redis
- (POJ - 1456) supermarket
- Exploration and practice of inress in kubernetes
- Is it safe for Huatai Securities to open an account online?
- Can I open an account today and buy stocks today? Is it safe to open an account online?
- Custom grpc plug-in
- Mingchuang plans to be listed on July 13: the highest issue price is HK $22.1, and the net profit in a single quarter decreases by 19%
猜你喜欢

2022/6/28学习总结

CAD如何設置標注小數比特

为什么一定要从DevOps走向BizDevOps?

Acly and metabolic diseases

Unittest框架中跳过要执行的测试用例

Adjacency matrix undirected graph (I) - basic concepts and C language

Brief analysis of edgedb architecture

TEMPEST HDMI泄漏接收 3

Tianrunyun, invested by Tian Suning, was listed: its market value was 2.2 billion Hong Kong, and its first year profit decreased by 75%

名创拟7月13日上市:最高发行价22.1港元 单季净利下降19%
随机推荐
构建外部模块(Building External Modules)
Istio, ebpf and rsocket Broker: in depth study of service grid
Activity workflow engine
树莓派4B安装tensorflow2.0[通俗易懂]
CPU 上下文切换的机制和类型 (CPU Context Switch)
Why must we move from Devops to bizdevops?
Unittest框架中测试用例编写规范以及如何运行测试用例
Question: what professional qualities should test engineers have?
Can solo be accessed through IPv6?
Continuous delivery -pipeline getting started
Istio、eBPF 和 RSocket Broker:深入研究服务网格
Raspberry pie 4B installation tensorflow2.0[easy to understand]
2022/6/29学习总结
How to set decimal places in CAD
Learning summary on June 29, 2022
Goldfish rhca memoirs: do447 uses ansible to communicate with API -- using ansible tower API to start jobs
[Maui] add click events for label, image and other controls
solo 可以通过 IPV6 访问吗?
Can I open an account today and buy stocks today? Is it safe to open an account online?
Binary stack (I) - principle and C implementation