当前位置:网站首页>Yolov5 training personal data set summary
Yolov5 training personal data set summary
2022-06-11 04:52:00 【Panbohhhhh】
I have recorded my code in GPU The process, problems and solutions when running on the server side .
Some basic knowledge is required .
Statement
International practice , Write it at the front
-----------------------------------------------------------------------------------------------
Statement : This article is for personal study only , Because of the author's limited ability , Relevant views and information are for reference only .
If there is infringement, please contact .
Welcome readers to ask questions and exchange .
Our goal is the sea of stars !
1. Preparations
Because of the company , notice xshell and xftp No copyright , Don't let me use it . How to solve the remote login to the server has become the first problem .
01. adopt vscode Enable remote connection to the server
(1) install remote plug-in unit .
install Remote Development plug-in unit , Will automatically install so many .

(2) Connect to the server


Click the plus sign here , Blocked by the watermark , Or press shift+ctrl+p, Input Remote-SSH: Connect to Host.

Enter the... You want to access remotely IP Address
You can enter the password every time , Or self configuration

Host Represents the name of the connection ( Customize )
HostName It's the address of the server
IdentityFile It is required for password free login It's local id_rsa Path to file
User Is the user name of the login server
Port It's the port number
IdentitiesOnly The password free login value is yes
02. Through installation anaconda To install the virtual environment
(1) download anaconda
# Switch root user , Input password
su root
# Get into root Catalog , Installation script storage path
cd /root
# download anaconda set up script ( This tutorial uses Tsinghua source )
wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2020.07-Linux-x86_64.sh
# install anaconda
bash Anaconda3-2020.07-Linux-x86_64.sh
# Please, press ENTER to continue -> Press enter to continue
# Read the agreement , Agree to press enter ( skip ctrl+c)
# Do you accept the license terms? [yes|no] -> Agree to enter agreement yes enter
# Anaconda3 will now be installed into this location -> Select installation path ( This article installs /usr/local/anaconda3), Waiting for the installation
# by running conda init? [yes|no] -> Whether to add system environment , Input yes enter
# To be installed
# Refresh the current user environment ( Activate the environment )
source ~/.bashrc
# Add domestic sources
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --set show_channel_urls yes
## New virtual environment
conda create -n python36 python=3.6.5
# Delete virtual environment
conda remove -n python36 --all
# See all environments
conda env list
# Activate the virtual environment
conda activate python36
# Exit the current virtual environment
conda deactivate python36
Be sure to remember the last sentence , Refresh
(2) Use conda install requirement.txt,pip Report errors
$ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip3 install -r requirements.txt
while read requirement; do conda install --yes $requirement || pip install $requirement; done < requirements.txt03. Train yourself in the data set process
(1) There are several types of data sets , Want to be with yaml In the file , What I use here is yolov5s,2 class

(2) Copy one yaml file , Revise it , It needs to be modified according to its own data type .
stay data Under the path , Create a new one of your own xxx.yaml
There are very few things that need to be modified , It's the training set , Test set , Test category , Category name .
But here's the thing ,train: There's a space after that , Follow the path of your data , Be careful , Here is the folder , At that time csdn See one that says train.txt I thought I had a problem

Here I have a problem ,
The data I obtained is relatively clean , stay Annotations Is already marked v5.txt Format ,JPEGImages Inside is the original picture .
Attention to detail :

├── data
│ ├── Annotations Conduct detection Tag file at task time ,xml form , The file name corresponds to the picture name one by one
│ ├── images Deposit .jpg Format image file
│ ├── ImageSets Stored are classified and detected data set segmentation files , contain train.txt, val.txt,trainval.txt,test.txt
│ ├── labels Deposit label Labeling information txt file , Correspond to the picture one by one
├── ImageSets(train,val,test It is suggested to follow 8:1:1 Proportion Division )
│ ├── train.txt With the name of the picture for training
│ ├── val.txt Write the name of the picture used for verification
│ ├── trainval.txt train And val A collection of
│ ├── test.txt Write the name of the picture used for the test
yolov5 It uses txt Format , Each image corresponds to a txt file , Each action in the file contains information about a target , Include class,x_center,y_center,width,height Format .
The format is as follows :

So here comes the question .
Many friends' data are standard xml Format .
First step , We need to take xml convert to txt
xml2txt.py
# -*- coding: utf-8 -*-
import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join
#import shutil
sets = ['train', 'test', 'val']
#classes = ['1', '2', '3', '4', '5']
'''
classes = ['asamu', 'baishikele', 'baokuangli', 'aoliao', 'bingqilinniunai', 'chapai', 'fenda', 'guolicheng',
'haoliyou', 'heweidao', 'hongniu', 'hongniu2', 'hongshaoniurou', 'kafei', 'kaomo_gali', 'kaomo_jiaoyan',
'kaomo_shaokao', 'kaomo_xiangcon', 'kele', 'laotansuancai', 'liaomian', 'lingdukele', 'maidong', 'mangguoxiaolao',
'moliqingcha', 'niunai', 'qinningshui', 'quchenshixiangcao', 'rousongbing', 'suanlafen', 'tangdaren', 'wangzainiunai',
'weic', 'weitanai', 'weitaningmeng', 'wulongcha', 'xuebi', 'xuebi2', 'yingyangkuaixian', 'yuanqishui', 'xuebi-b', 'kebike',
'tangdaren3', 'chacui', 'heweidao2', 'youyanggudong', 'baishikele-2', 'heweidao3', 'yibao', 'kele-b', 'AD', 'jianjiao', 'yezhi',
'libaojian', 'nongfushanquan', 'weitanaiditang', 'ufo', 'zihaiguo', 'nfc', 'yitengyuan', 'xianglaniurou', 'gudasao', 'buding',
'ufo2', 'damaicha', 'chapai2', 'tangdaren2', 'suanlaniurou', 'bingtangxueli', 'weitaningmeng-bottle', 'liziyuan', 'yousuanru',
'rancha-1', 'rancha-2', 'wanglaoji', 'weitanai2', 'qingdaowangzi-1', 'qingdaowangzi-2', 'binghongcha', 'aerbeisi', 'lujikafei',
'kele-b-2', 'anmuxi', 'xianguolao', 'haitai', 'youlemei', 'weiweidounai', 'jindian', '3jia2', 'meiniye', 'rusuanjunqishui',
'taipingshuda', 'yida', 'haochidian', 'wuhounaicha', 'baicha', 'lingdukele-b', 'jianlibao', 'lujiaoxiang', '3+2-2',
'luxiangniurou', 'dongpeng', 'dongpeng-b', 'xianxiayuban', 'niudufen', 'zaocanmofang', 'wanglaoji-c', 'mengniu',
'mengniuzaocan', 'guolicheng2', 'daofandian1', 'daofandian2', 'daofandian3', 'daofandian4', 'yingyingquqi', 'lefuqiu']
'''
def convert(size, box):
dw = 1. / size[0]
dh = 1. / size[1]
x = (box[0] + box[1]) / 2.0
y = (box[2] + box[3]) / 2.0
w = box[1] - box[0]
h = box[3] - box[2]
x = x * dw
w = w * dw
y = y * dh
h = h * dh
return (x, y, w, h)
path = 'your path'# /a/Annotations Here is the path to a
def convert_annotation(image_id):
in_file = open(path+'Annotations/%s.xml' % (image_id))
#print(111,in_file)
out_file = open(path+'labels/%s.txt' % (image_id), 'w')
tree = ET.parse(in_file)
root = tree.getroot()
size = root.find('size')
w = int(size.find('width').text)
h = int(size.find('height').text)
for obj in root.iter('object'):
#try:
#difficult = obj.find('difficult').text
#except AttributeError:
# difficult = obj.find('Difficult').text
cls = obj.find('name').text
if cls not in classes:# or int(difficult) == 1:
continue
cls_id = classes.index(cls)
xmlbox = obj.find('bndbox')
b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text),
float(xmlbox.find('ymax').text))
bb = convert((w, h), b)
out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
wd = getcwd()
print(wd)
for image_set in sets:
if not os.path.exists(path+'labels/'):
os.makedirs(path+'labels/')
image_ids = open(path+'ImageSets/Main/%s.txt' % (image_set)).read().strip().split()
#print(111,image_ids)
list_file = open(path+'%s.txt' % (image_set), 'w')
for image_id in image_ids:
list_file.write(path+'images/%s.jpg\n' % (image_id))
convert_annotation(image_id)
list_file.close()
The second step , Cut an entire data set , For example, you have 100 Data sheet ,80 Zhang used to train ,20 Zhang is used to test .
# -*- coding: utf-8 -*-
import os
import random
trainval_percent = 0.2
train_percent = 0.8
xmlfilepath = 'hat_data/Annotations'
txtsavepath = 'hat_data/ImageSets/Main/'
total_xml = os.listdir(xmlfilepath)
num = len(total_xml)
list = range(num)
tv = int(num * trainval_percent)
tr = int(tv * train_percent)
trainval = random.sample(list, tv)
train = random.sample(trainval, tr)
ftrainval = open(txtsavepath+'trainval.txt', 'w')
ftest = open(txtsavepath+'test.txt', 'w')
ftrain = open(txtsavepath+'train.txt', 'w')
fval = open(txtsavepath+'val.txt', 'w')
for i in list:
name = total_xml[i][:-4] + '\n'
if i in trainval:
ftrainval.write(name)
if i in train:
ftest.write(name)
else:
fval.write(name)
else:
ftrain.write(name)
ftrainval.close()
ftrain.close()
fval.close()
ftest.close()
This will generate several files

The third step , Divide documents according to documents ,
stay yolov5 The standard in corresponds to images and labels, You can change it to your own file name , But then you have to /data/dataset.py Modify the code in , It's OK anyway ,
# -*- coding: utf-8 -*-
import os
import shutil
def locate(path):
temp = []
with open(path+'val.txt','rb') as f:
#r = f.readlines()
for line in f.readlines():
#print(str(line.strip()).split("'")[1])
temp.append(str(line.strip()).split("'")[1])
return temp
def object_save(path,pic,pic_save):
temp = locate(path)
#print(temp)
if not os.path.exists(pic_save+'val'):
os.makedirs(pic_save+'val')
if not os.path.exists(pic_save+'train'):
os.makedirs(pic_save+'train')
for i in os.listdir(pic):
if i[:-4] in temp:
print(i)
shutil.copyfile(pic+i,pic_save+'val/'+i)
else:
shutil.copyfile(pic+i,pic_save+'train/'+i)
if __name__ == '__main__':
ImageSets_path = './ImageSets/Main/'
pic_path = './JPEGImages/'
pic_save_path = './images/'
object_save(ImageSets_path,pic_path,pic_save_path)
labels_path = './Annotations/'
labels_save_path = './labels/'
object_save(ImageSets_path,labels_path,labels_save_path)
(3) modify train.py

(4)python train.py
Various BUG
1.AssertionError: train: No labels in images/train.cache. Can not train without labels. See https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data
I checked a lot of information on the Internet , This problem is caused by data format conversion , I checked again and again , I have no problem .

Change from one place to another , hold yaml The middle path is replaced by the absolute path , All right .
the elderly , metro , mobile phone

边栏推荐
- International qihuo: what are the risks of Zhengda master account
- [Transformer]CoAtNet:Marrying Convolution and Attention for All Data Sizes
- Record of serial baud rate
- Leetcode question brushing series - mode 2 (datastructure linked list) - 19:remove nth node from end of list (medium) delete the penultimate node in the linked list
- Feature engineering feature dimension reduction
- New product release: Lianrui launched a dual port 10 Gigabit bypass network card
- Google Code Coverage best practices
- 【Markdown语法高级】 让你的博客更精彩(三:常用图标模板)
- oh my zsh正确安装姿势
- Tianchi - student test score forecast
猜你喜欢

USB to 232 to TTL overview

USB转232 转TTL概述

Huawei equipment is configured with bgp/mpls IP virtual private network

Codesys get System Time

Exness: liquidity series - order block, imbalance (II)

选择数字资产托管人时,要问的 6 个问题

New library goes online | cnopendata immovable cultural relic data

Analysis of 17 questions in Volume 1 of the new college entrance examination in 2022

Lianrui: how to rationally see the independent R & D of domestic CPU and the development of domestic hardware

Writing a good research title: Tips & Things to avoid
随机推荐
Minor problems encountered in installing the deep learning environment -- the jupyter service is busy
Electrolytic solution for ThinkPad X1 carbon battery
International qihuo: what are the risks of Zhengda master account
New library goes online | cnopendata immovable cultural relic data
Commissioning experience and reliability design of brushless motor
[Transformer]AutoFormerV2:Searching the Search Space of Vision Transformer
Simple linear regression of sklearn series
How the idea gradle project imports local jar packages
exness:流動性系列-訂單塊、不平衡(二)
2022年新高考1卷17题解析
Split all words into single words and delete the product thesaurus suitable for the company
C语言试题三(程序选择题——含知识点详解)
Ican uses fast r-cnn to get an empty object detection result file
Possible errors during alphapose installation test
董明珠称“格力手机做得不比苹果差”哪里来的底气?
lower_ bound,upper_ Bound, two points
Differences between the four MQ
Tips and websites for selecting papers
Best practices and principles of lean product development system
Description of construction scheme of Meizhou P2 Laboratory