当前位置：网站首页>Yolov5 training personal data set summary

Yolov5 training personal data set summary

2022-06-11 04:52:00 【Panbohhhhh】

I have recorded my code in GPU The process, problems and solutions when running on the server side .

Some basic knowledge is required .

Statement

International practice , Write it at the front

-----------------------------------------------------------------------------------------------

Statement : This article is for personal study only , Because of the author's limited ability , Relevant views and information are for reference only .

If there is infringement, please contact .

Welcome readers to ask questions and exchange .

Our goal is the sea of stars ！

1. Preparations

Because of the company , notice xshell and xftp No copyright , Don't let me use it . How to solve the remote login to the server has become the first problem .

01. adopt vscode Enable remote connection to the server

（1） install remote plug-in unit .

install Remote Development plug-in unit , Will automatically install so many .

（2） Connect to the server

Click the plus sign here , Blocked by the watermark , Or press shift+ctrl+p, Input Remote-SSH: Connect to Host.

Enter the... You want to access remotely IP Address

You can enter the password every time , Or self configuration

Host Represents the name of the connection （ Customize ）

HostName It's the address of the server

IdentityFile It is required for password free login It's local id_rsa Path to file

User Is the user name of the login server

Port It's the port number

IdentitiesOnly The password free login value is yes

02. Through installation anaconda To install the virtual environment

（1） download anaconda

#  Switch root user , Input password 
su root
#  Get into root Catalog , Installation script storage path 
cd /root
#  download anaconda set up script （ This tutorial uses Tsinghua source ）
wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2020.07-Linux-x86_64.sh
#  install anaconda
bash Anaconda3-2020.07-Linux-x86_64.sh

# Please, press ENTER to continue ->  Press enter to continue 
#  Read the agreement , Agree to press enter （ skip ctrl+c）
# Do you accept the license terms? [yes|no] ->  Agree to enter agreement yes enter 
# Anaconda3 will now be installed into this location ->  Select installation path （ This article installs /usr/local/anaconda3）, Waiting for the installation 
# by running conda init? [yes|no] ->  Whether to add system environment , Input yes enter 
#  To be installed 

#  Refresh the current user environment （ Activate the environment ）
source ~/.bashrc



# Add domestic sources 
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --set show_channel_urls yes

##  New virtual environment 
conda create -n python36 python=3.6.5
#  Delete virtual environment 
conda remove -n python36 --all

#  See all environments 
conda env list
#  Activate the virtual environment 
conda activate python36
#  Exit the current virtual environment 
conda deactivate python36

Be sure to remember the last sentence , Refresh

（2） Use conda install requirement.txt,pip Report errors

$ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip3 install -r requirements.txt

while read requirement; do conda install --yes $requirement || pip install $requirement; done < requirements.txt

03. Train yourself in the data set process

（1） There are several types of data sets , Want to be with yaml In the file , What I use here is yolov5s,2 class

（2） Copy one yaml file , Revise it , It needs to be modified according to its own data type .

stay data Under the path , Create a new one of your own xxx.yaml

There are very few things that need to be modified , It's the training set , Test set , Test category , Category name .

But here's the thing ,train: There's a space after that , Follow the path of your data , Be careful , Here is the folder , At that time csdn See one that says train.txt I thought I had a problem

Here I have a problem ,

The data I obtained is relatively clean , stay Annotations Is already marked v5.txt Format ,JPEGImages Inside is the original picture .

Attention to detail ：

├── data
│   ├── Annotations   Conduct  detection  Tag file at task time ,xml  form , The file name corresponds to the picture name one by one 
│   ├── images   Deposit  .jpg  Format image file 
│   ├── ImageSets   Stored are classified and detected data set segmentation files , contain train.txt, val.txt,trainval.txt,test.txt
│   ├── labels   Deposit label Labeling information txt file , Correspond to the picture one by one 


├── ImageSets(train,val,test It is suggested to follow 8：1：1 Proportion Division )
│   ├── train.txt   With the name of the picture for training 
│   ├── val.txt   Write the name of the picture used for verification 
│   ├── trainval.txt  train And val A collection of 
│   ├── test.txt   Write the name of the picture used for the test

yolov5 It uses txt Format , Each image corresponds to a txt file , Each action in the file contains information about a target , Include class,x_center,y_center,width,height Format .

The format is as follows :

So here comes the question .

Many friends' data are standard xml Format .

First step , We need to take xml convert to txt

xml2txt.py

# -*- coding: utf-8 -*-
import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join

#import shutil

sets = ['train', 'test', 'val']

#classes = ['1', '2', '3', '4', '5']

'''
classes = ['asamu', 'baishikele', 'baokuangli', 'aoliao', 'bingqilinniunai', 'chapai', 'fenda', 'guolicheng', 
            'haoliyou', 'heweidao', 'hongniu', 'hongniu2', 'hongshaoniurou', 'kafei', 'kaomo_gali', 'kaomo_jiaoyan', 
            'kaomo_shaokao', 'kaomo_xiangcon', 'kele', 'laotansuancai', 'liaomian', 'lingdukele', 'maidong', 'mangguoxiaolao', 
            'moliqingcha', 'niunai', 'qinningshui', 'quchenshixiangcao', 'rousongbing', 'suanlafen', 'tangdaren', 'wangzainiunai', 
            'weic', 'weitanai', 'weitaningmeng', 'wulongcha', 'xuebi', 'xuebi2', 'yingyangkuaixian', 'yuanqishui', 'xuebi-b', 'kebike', 
            'tangdaren3', 'chacui', 'heweidao2', 'youyanggudong', 'baishikele-2', 'heweidao3', 'yibao', 'kele-b', 'AD', 'jianjiao', 'yezhi', 
            'libaojian', 'nongfushanquan', 'weitanaiditang', 'ufo', 'zihaiguo', 'nfc', 'yitengyuan', 'xianglaniurou', 'gudasao', 'buding', 
            'ufo2', 'damaicha', 'chapai2', 'tangdaren2', 'suanlaniurou', 'bingtangxueli', 'weitaningmeng-bottle', 'liziyuan', 'yousuanru', 
            'rancha-1', 'rancha-2', 'wanglaoji', 'weitanai2', 'qingdaowangzi-1', 'qingdaowangzi-2', 'binghongcha', 'aerbeisi', 'lujikafei',
            'kele-b-2', 'anmuxi', 'xianguolao', 'haitai', 'youlemei', 'weiweidounai', 'jindian', '3jia2', 'meiniye', 'rusuanjunqishui',
            'taipingshuda', 'yida', 'haochidian', 'wuhounaicha', 'baicha', 'lingdukele-b', 'jianlibao', 'lujiaoxiang', '3+2-2', 
            'luxiangniurou', 'dongpeng', 'dongpeng-b', 'xianxiayuban', 'niudufen', 'zaocanmofang', 'wanglaoji-c', 'mengniu', 
            'mengniuzaocan', 'guolicheng2', 'daofandian1', 'daofandian2', 'daofandian3', 'daofandian4', 'yingyingquqi', 'lefuqiu']
'''
def convert(size, box):
    dw = 1. / size[0]
    dh = 1. / size[1]
    x = (box[0] + box[1]) / 2.0
    y = (box[2] + box[3]) / 2.0
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x * dw
    w = w * dw
    y = y * dh
    h = h * dh
    return (x, y, w, h)
    
path = 'your path'# /a/Annotations   Here is the path to a

def convert_annotation(image_id):
    in_file = open(path+'Annotations/%s.xml' % (image_id))
    #print(111,in_file)
    out_file = open(path+'labels/%s.txt' % (image_id), 'w')
    tree = ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)
    for obj in root.iter('object'):
        #try:
            #difficult = obj.find('difficult').text
        #except AttributeError:
        #    difficult = obj.find('Difficult').text
        cls = obj.find('name').text
        if cls not in classes:# or int(difficult) == 1:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text),
             float(xmlbox.find('ymax').text))
        bb = convert((w, h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
        
wd = getcwd()
print(wd)
for image_set in sets:
    if not os.path.exists(path+'labels/'):
        os.makedirs(path+'labels/')
    image_ids = open(path+'ImageSets/Main/%s.txt' % (image_set)).read().strip().split()
    #print(111,image_ids)
    list_file = open(path+'%s.txt' % (image_set), 'w')
    for image_id in image_ids:
        list_file.write(path+'images/%s.jpg\n' % (image_id))

        convert_annotation(image_id)
    list_file.close()

The second step , Cut an entire data set , For example, you have 100 Data sheet ,80 Zhang used to train ,20 Zhang is used to test .

# -*- coding: utf-8 -*-
import os
import random
trainval_percent = 0.2
train_percent = 0.8
xmlfilepath = 'hat_data/Annotations'
txtsavepath = 'hat_data/ImageSets/Main/'
total_xml = os.listdir(xmlfilepath)
num = len(total_xml)
list = range(num)
tv = int(num * trainval_percent)
tr = int(tv * train_percent)
trainval = random.sample(list, tv)
train = random.sample(trainval, tr)

ftrainval = open(txtsavepath+'trainval.txt', 'w')
ftest = open(txtsavepath+'test.txt', 'w')
ftrain = open(txtsavepath+'train.txt', 'w')
fval = open(txtsavepath+'val.txt', 'w')
for i in list:
    name = total_xml[i][:-4] + '\n'
    if i in trainval:
        ftrainval.write(name)
        if i in train:
            ftest.write(name)
        else:
            fval.write(name)
    else:
        ftrain.write(name)
ftrainval.close()
ftrain.close()
fval.close()
ftest.close()

This will generate several files

The third step , Divide documents according to documents ,

stay yolov5 The standard in corresponds to images and labels, You can change it to your own file name , But then you have to /data/dataset.py Modify the code in , It's OK anyway ,

# -*- coding: utf-8 -*-
import os
import shutil


def locate(path):
    temp = []

    with open(path+'val.txt','rb') as f:
        #r = f.readlines()
        for line in f.readlines():
            #print(str(line.strip()).split("'")[1])
            temp.append(str(line.strip()).split("'")[1])
    return temp    
        



def object_save(path,pic,pic_save): 
    temp = locate(path)
    #print(temp)
    if not os.path.exists(pic_save+'val'):
        os.makedirs(pic_save+'val')
    if not os.path.exists(pic_save+'train'):
        os.makedirs(pic_save+'train') 
    for i in os.listdir(pic):
        if i[:-4] in temp:
            print(i)
            shutil.copyfile(pic+i,pic_save+'val/'+i)  
        else:
            shutil.copyfile(pic+i,pic_save+'train/'+i)
            
if __name__ == '__main__':
   ImageSets_path = './ImageSets/Main/'
   pic_path = './JPEGImages/'
   pic_save_path = './images/'
   object_save(ImageSets_path,pic_path,pic_save_path)
   
   labels_path = './Annotations/'
   labels_save_path = './labels/'
   object_save(ImageSets_path,labels_path,labels_save_path)

（3） modify train.py

（4）python train.py

Various BUG

1.AssertionError: train: No labels in images/train.cache. Can not train without labels. See https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data

I checked a lot of information on the Internet , This problem is caused by data format conversion , I checked again and again , I have no problem .