当前位置:网站首页>划分训练集,验证集,测试集
划分训练集,验证集,测试集
2022-08-02 10:45:00 【Star_.】
import os
import random
from shutil import copy2
def data_set_split(src_data_folder, target_data_folder, slice_data = [0.4,0.3,0.3]):
''' 读取源数据文件夹,生成划分好的文件夹,分为trian、val、test三个文件夹进行 :param src_data_folder: r"D:\Desktop\segmentation_2021\data" :param target_data_folder: 目标文件夹 r"D:\Desktop\segmentation_2021\a" :param slice_data: 划分数据比例比例 训练 验证 测试所占百分比 :return: '''
print("开始数据集划分")
class_names = os.listdir(src_data_folder)
# 在目标目录下创建文件夹
split_names = ['train', 'val', 'test']
for split_name in split_names:
split_path = os.path.join(target_data_folder, split_name)
if os.path.isdir(split_path):
pass
else:
os.mkdir(split_path)
# 然后在split_path的目录下创建类别文件夹
for class_name in class_names:
class_split_path = os.path.join(split_path, class_name)
if os.path.isdir(class_split_path):
pass
else:
os.mkdir(class_split_path)
# 按照比例划分数据集,并进行数据图片的复制
# 首先进行分类遍历
for class_name in class_names:
current_class_data_path = os.path.join(src_data_folder, class_name)
current_all_data = os.listdir(current_class_data_path)
current_data_length = len(current_all_data)
current_data_index_list = list(range(current_data_length))
random.shuffle(current_data_index_list)
train_folder = os.path.join(os.path.join(target_data_folder, 'train'), class_name)
val_folder = os.path.join(os.path.join(target_data_folder, 'val'), class_name)
test_folder = os.path.join(os.path.join(target_data_folder, 'test'), class_name)
train_stop_flag = current_data_length * slice_data[0]
val_stop_flag = current_data_length * (slice_data[0] + slice_data[1])
current_idx = 0
train_num = 0
val_num = 0
test_num = 0
for i in current_data_index_list:
src_img_path = os.path.join(current_class_data_path, current_all_data[i])
if current_idx <= train_stop_flag:
copy2(src_img_path, train_folder)
# print("{}复制到了{}".format(src_img_path, train_folder))
train_num = train_num + 1
elif (current_idx > train_stop_flag) and (current_idx <= val_stop_flag):
copy2(src_img_path, val_folder)
# print("{}复制到了{}".format(src_img_path, val_folder))
val_num = val_num + 1
else:
copy2(src_img_path, test_folder)
# print("{}复制到了{}".format(src_img_path, test_folder))
test_num = test_num + 1
current_idx = current_idx + 1
print("*********************************{}*************************************".format(class_name))
print(
"{}类按照{}:{}:{}的比例划分完成,一共{}张图片".format(class_name, slice_data[0], slice_data[1], slice_data[2], current_data_length))
print("训练集{}:{}张".format(train_folder, train_num))
print("验证集{}:{}张".format(val_folder, val_num))
print("测试集{}:{}张".format(test_folder, test_num))
if __name__ == '__main__':
src_data_folder = r"D:\Desktop\segmentation_2021\data"
target_data_folder = r"D:\Desktop\segmentation_2021\a"
data_set_split(src_data_folder, target_data_folder, slice_data=[0.6,0.2,0.2])
边栏推荐
- Geoffery Hinton: The Next Big Thing in Deep Learning
- LayaBox---TypeScript---Module
- 翁恺C语言程序设计网课笔记合集
- 4年手工测试被应届生取代了,用血与泪的教训给xdm一个忠告,该学自动化了...
- 博云入选Gartner中国DevOps代表厂商
- LayaBox---TypeScript---Module Analysis
- The 38-year-old daughter is not in love and has no stable job, the old mother is crying
- STM32+MPU6050 Design Portable Mini Desktop Clock (Automatically Adjust Time Display Direction)
- Event object, do you know it well?
- Do you agree with this view?Most businesses are digitizing just to ease anxiety
猜你喜欢

保姆级教程:写出自己的移动应用和小程序(篇二)

云原生应用平台的核心模块有哪些

STM32+MPU6050 Design Portable Mini Desktop Clock (Automatically Adjust Time Display Direction)

牛客刷题——剑指offer(第三期)

配置mysql失败了,这是怎么回事呢?

从测试入门到测试架构师,这10年,他是这样让自己成才的

npm ERR! 400 Bad Request - PUT xxx - Cannot publish over previously published version “1.0.0“.

通过方法引用获取方法名

小几届的学弟问我,软件测试岗是选11k的华为还是20k的小公司,我直呼受不了,太凡尔赛了~

零代码工具推荐---HiFlow
随机推荐
wireshark的安装教程(暖气片安装方法图解)
ASP.NET Core 6框架揭秘实例演示[31]:路由&quot;高阶&quot;用法
记一次mysql查询慢的优化历程
R language time series data arithmetic operation: use the log function to log the time series data, and use the diff function to calculate the successive difference of the logarithmic time series data
LayaBox---TypeScript---迭代器和生成器
armv7与armv8的区别(v8和w12的区别)
鸿星尔克再捐一个亿
After 21 years of graduation, I switched to software testing. From 0 income to a monthly salary of over 10,000, I am really lucky...
3年测试在职,月薪还不足2w,最近被裁员,用亲身经历给大家提个醒...
循环语句综合练习
R language ggplot2 visualization: use the ggbarplot function of the ggpubr package to visualize the horizontal column chart (bar chart), use the orientation parameter to set the column chart to be tra
通过方法引用获取方法名
神通数据库,批量插入数据的时候失败
Turning and anti-climbing attack and defense
LayaBox - TypeScript - merge statement
LayaBox---TypeScript---JSX
You Only Hypothesize Once: 用旋转等变描述子估计变换做点云配准(已开源)
Event object, do you know it well?
牛客刷题——剑指offer(第三期)
X86函数调用模型分析