当前位置:网站首页>Data processing and data set preparation
Data processing and data set preparation
2022-06-12 04:36:00 【Chorgy】
Data processing and data set preparation
Catalog
One 、 Batch renaming of pictures
- Folder structure :
- source
- animals
- cat
- cattle
- dog
- horse
- pig
- fruits
- apple
- banana
- durian
- grape
- orange
- vehicles
- bus
- car
- plane
- ship
- train
- animals
- source
- Implementation code :
import os
import cv2 as cv
sourcePath = '../../../DataSet/source/'
animalPath = sourcePath + 'animals/'
fruitPath = sourcePath + 'fruits/'
vehiclePath = sourcePath + 'vehicles/'
# Rename the fruit picture
for folder_list in os.listdir(fruitPath):
count = 1
# Read the subfolders under each directory
subfolder = fruitPath + folder_list + '/'
for file in os.listdir(subfolder):
old_name = subfolder + file
new_name = subfolder + "%s_%d.jpg" % (folder_list, count)
print(old_name, "====>", new_name)
os.rename(old_name, new_name)
count = count + 1
# Animal picture rename
for folder_list in os.listdir(animalPath):
count = 1
# Read the subfolders under each directory
subfolder = animalPath + folder_list + '/'
for file in os.listdir(subfolder):
old_name = subfolder + file
new_name = subfolder + "%s_%d.jpg" % (folder_list, count)
print(old_name, "====>", new_name)
os.rename(old_name, new_name)
count = count + 1
# Traffic picture rename
for folder_list in os.listdir(vehiclePath):
count = 1
# Read the subfolders under each directory
subfolder = vehiclePath + folder_list + '/'
for file in os.listdir(subfolder):
old_name = subfolder + file
new_name = subfolder + "%s_%d.jpg" % (folder_list, count)
print(old_name, "====>", new_name)
os.rename(old_name, new_name)
count = count + 1
- Realization effect :

Two 、 Data to enhance + Gaussian noise
During the game, the camera is used to capture images for recognition , The captured image contains a lot of noise , The original data set is the original image without noise . For example, a picture of a dog is affected by indoor light , There's a lot of red noise , It is easy to be recognized as apple . This happened last year , Therefore, this year is aimed at solving , Add noise to the data set .
- Implementation code :
# Image denoising
# There will be a lot of noise in the actual pictures
# There is a big gap with the original computer drawing
# Therefore, it is necessary to add noise to the original image
import numpy as np
import cv2 as cv
import os
from keras.preprocessing.image import ImageDataGenerator
from keras.preprocessing.image import img_to_array
from keras.preprocessing.image import load_img
# Noise adding function
def Gasuss_Noise(image, mean=0, var=0.001):
''' Add Gaussian noise image: original image mean : mean value var : variance , The bigger it is , The louder the noise '''
image = cv.imread(image)
image = np.array(image/255, dtype=float) # Normalize the pixel value of the original image , Divide 255 Make the pixel value in 0-1 Between
noise = np.random.normal(mean, var ** 0.5, image.shape) # Create an average of mean, The variance of var Image matrix with Gaussian distribution
out = image + noise # Add the noise and the original image to get the noisy image
if out.min() < 0:
low_clip = -1.
else:
low_clip = 0.
out = np.clip(out, low_clip, 1.0) # clip The function limits the size of the element to low_clip and 1 Between , Less than low_clip Instead of , Greater than 1 The use of 1 Instead of
out = np.uint8(out * 255) # De normalize , multiply 255 Restore the pixel value of the noisy image
# cv.imshow("gasuss", out)
noise = noise * 255
return [noise, out]
# Image enhancement function
def Date_Enhancement(img_input_path, img_output_path):
image = load_img(img_input_path)
im1 = image.point(lambda p: p * 0.6)
# im1.show()
im1 = img_to_array(im1) # Image to array
im1 = np.expand_dims(im1, axis=0) # Add a dimension
img_dag = ImageDataGenerator(
rotation_range=10,
width_shift_range=0.001, # horizen offset
height_shift_range=0.001, # vertical offset
shear_range=0.02,
# zoom_range=[0.6, 0.9],
brightness_range = [0.9, 1.1],
horizontal_flip=False, # Flip horizontal
fill_mode="constant", cval=40
) # rotate , Width movement range , Height movement range , Clipping range , Flip horizontally to open , Fill mode
img_generator = img_dag.flow(im1,
batch_size=1,
save_to_dir= img_output_path,
save_prefix= "image",
save_format= "jpg") # Test an image bath_size=1
count = 0 # Counter
for raw_pic_path in img_generator:
count += 1
if count == 1: # How many samples are generated before exiting
break
####################################################################
# The original address
sourcePath = '../../../DataSet/source/'
animalPath = sourcePath + 'animals/'
fruitPath = sourcePath + 'fruits/'
vehiclePath = sourcePath + 'vehicles/'
# Intermediate address
# It is used to save the picture after the first step
middlePath = '../../../DataSet/middle/'
animalMiddlePath = middlePath + 'animals/'
fruitMiddlePath = middlePath + 'fruits/'
vehicleMiddlePath = middlePath + 'vehicles/'
# Destination address
# Save the image data processed in the second step
processPath = '../../../DataSet/process/'
animalProcessPath = processPath + 'animals/'
fruitProcessPath = processPath + 'fruits/'
vehicleProcessPath = processPath + 'vehicles/'
for folder_name in os.listdir(fruitPath):
rename_count = 1
# Read the subfolders under each directory
sub_folder = fruitPath + folder_name + '/'
middle_save_path = fruitMiddlePath + folder_name + '/'
# Data to enhance
for raw_pic in os.listdir(sub_folder):
raw_pic_path = sub_folder + raw_pic
for i in range(0, 10):
Date_Enhancement(raw_pic_path, middle_save_path)
print(folder_name, "Enhance Done")
# Gaussian noise
for middle_pic in os.listdir(middle_save_path):
middle_save_name = middle_save_path + middle_pic
final_save_path = fruitProcessPath + folder_name + '/'
final_save_name = final_save_path + "%s_%d.jpg" % (folder_name, rename_count)
noise, out = Gasuss_Noise(middle_save_name, mean=0, var=0.002)
cv.imwrite(final_save_name, out)
rename_count = rename_count + 1
print(folder_name, "Process Done")
for folder_name in os.listdir(animalPath):
rename_count = 1
# Read the subfolders under each directory
sub_folder = animalPath + folder_name + '/'
middle_save_path = animalMiddlePath + folder_name + '/'
# Data to enhance
for raw_pic in os.listdir(sub_folder):
raw_pic_path = sub_folder + raw_pic
for i in range(0, 10):
Date_Enhancement(raw_pic_path, middle_save_path)
print(folder_name, "Enhance Done")
# Gaussian noise
for middle_pic in os.listdir(middle_save_path):
middle_save_name = middle_save_path + middle_pic
final_save_path = animalProcessPath + folder_name + '/'
final_save_name = final_save_path + "%s_%d.jpg" % (folder_name, rename_count)
noise, out = Gasuss_Noise(middle_save_name, mean=0, var=0.002)
cv.imwrite(final_save_name, out)
rename_count = rename_count + 1
print(folder_name, "Process Done")
for folder_name in os.listdir(vehiclePath):
rename_count = 1
# Read the subfolders under each directory
sub_folder = vehiclePath + folder_name + '/'
middle_save_path = vehicleMiddlePath + folder_name + '/'
# Data to enhance
for raw_pic in os.listdir(sub_folder):
rename_count = 1
raw_pic_path = sub_folder + raw_pic
for i in range(0, 10):
Date_Enhancement(raw_pic_path, middle_save_path)
print(folder_name, "Enhance Done")
# Gaussian noise
for middle_pic in os.listdir(middle_save_path):
middle_save_name = middle_save_path + middle_pic
final_save_path = vehicleProcessPath + folder_name + '/'
final_save_name = final_save_path + "%s_%d.jpg" % (folder_name, rename_count)
noise, out = Gasuss_Noise(middle_save_name, mean=0, var=0.002)
cv.imwrite(final_save_name, out)
rename_count = rename_count + 1
print(folder_name, "Process Done")
3、 ... and 、 Data set preparation
【 explain 】python access files / Folders are sorted alphabetically , So the order of labels is :

- Implementation code :
import os
import cv2 as cv
import numpy as np
make = True
check = True
# Picture address
picPath = '../../../DataSet/process/'
if __name__ == "__main__":
if make:
all_data = []
all_label = []
i = -1
for fruit_animal_vehicle in os.listdir(picPath):
for apple_cat_bus in os.listdir(picPath + fruit_animal_vehicle + '/'):
i = i + 1
for pic in os.listdir(picPath + fruit_animal_vehicle + '/' + apple_cat_bus + '/'):
extension = os.path.splitext(pic)[-1]
if extension == '.jpg':
img = cv.imread(picPath + fruit_animal_vehicle + '/' + apple_cat_bus + '/' + pic) # Read image data to img ## Connect two pathname components # cv2.imread The image data read is BGR Format ;
try: # If try Something goes wrong
img = cv.resize(img, (32, 32))[..., (2, 1, 0)] # BGR 2 RGB
all_data.append(img) # .append Method adds... To the end of the array img
all_label.append(i) # .append Method
except: # perform except
continue
all_data = np.asarray(all_data)
all_label = np.asarray(all_label)
np.save("../../../DataSet/dataSet/pic", all_data)
np.save("../../../DataSet/dataSet/label", all_label)
if check:
x = np.load("../../../DataSet/dataSet/pic.npy")
y = np.load("../../../DataSet/dataSet/label.npy")
label = ["cat", "cattle", "dog", "horse", "pig"] \
+ ["apple", "banana", "durian", "grape", "orange"] \
+ ["bus", "car", "plane", "ship", "train"]
count = 0
for d, idx in zip(x, y): ## Package the corresponding elements in the object into tuples , Then return a list of these tuples
print("Class %s %d" % (label[idx], count))
d = cv.resize(d, (395, 395))[..., (2, 1, 0)]
cv.imshow("img", d)
count = count + 1
cv.waitKey(0)
边栏推荐
- In the era of smart retail, Weimeng reshapes the value of "shopping guide"
- Epidemic data analysis platform work report [8.5] additional crawlers and drawings
- New year news of osdu open underground data space Forum
- Recommended system cleaning tools, cocktail Download
- LabVIEW关于TDMS和Binary存储速度
- Notes on relevant knowledge points such as original code / inverse code / complement code, size end, etc
- The memory four area model of C language program
- Dynamic gauge (15) - Minimum toll
- Oracle paging query ~~rownum (line number)
- Mysql主从搭建与Django实现读写分离
猜你喜欢

JWT learning and use

kali_ Change_ Domestic source

疫情数据分析平台工作报告【1】数据采集

Let me tell you the benefits of code refactoring

如何制作数据集并基于yolov5训练成模型并部署

Zabbix6.0 new feature GEOMAP map marker can you use it?

Based on Visual Studio code Net Maui cross platform mobile application development

Memory protection
![[软件工具][原创]voc数据集类别名批量修改工具使用教程](/img/25/31d771c9770bb7f455f35e38672170.png)
[软件工具][原创]voc数据集类别名批量修改工具使用教程

kali下安装pycharm并创建快捷访问
随机推荐
Manually encapsulate a foreacht and map
调用提醒事项
疫情数据分析平台工作报告【3】网站部署
[fpga+fft] design and implementation of FFT frequency meter based on FPGA
Is it safe for Guojin Securities Commission Jinbao to open an account? How should we choose securities companies?
疫情数据分析平台工作报告【1】数据采集
2.28 (defect filling) data type conversion exception handling part multi threading
Work report of epidemic data analysis platform [4] cross domain correlation
LabVIEW关于TDMS和Binary存储速度
Why should a redis cluster use a reverse proxy? Just read this one
Zabbix6.0 new feature GEOMAP map marker can you use it?
Create a new table in the database. There was no problem before. Today
kali下安装pycharm并创建快捷访问
L1-064 AI core code valued at 100 million (20 points)
Is there a row limit for a single MySQL table
Work report of epidemic data analysis platform [1] data collection
[software tool] [original] tutorial on using VOC dataset class alias batch modification tool
Daily practice (28): balance binary tree
Find missing sequence numbers - SQL query to find missing sequence numbers
Betteland introduces milk products of non animal origin, which will be launched in the U.S. market in the near future