当前位置：网站首页>Based on holding YOLOv5 custom implementation of FacePose YOLO structure interpretation, YOLO data format conversion, YOLO process modification"

Based on holding YOLOv5 custom implementation of FacePose YOLO structure interpretation, YOLO data format conversion, YOLO process modification"

2022-08-05 03:26:00 【Burnt Bay】

导读：本篇记录如何在YOLOv5The process of implementing custom datasets and detections above.Starting from the original project data format,关注每个细节,And do the custom task again in the same format.The independent implementation migrates oneprojectto the new pit.

wandb：可视化训练过程

tensorboard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
hyperparameters: lr0=0.01, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, kpt=0.1, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0

loggers['wandb'] = wandb_logger.wandb  # train.pyVisualize weights and biases in ,An account needs to be created

wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 1
wandb: You chose 'Create a W&B account'
wandb: Create an account here: https://wandb.ai/authorize?signup=true
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

需要wandb官网注册,这里是用githubJoint registration is sufficient,and get a key

模型解析

这里介绍anchor设置,with the output of the detection head


def parse_model(d, ch):  # model_dict, input_channels(3)
    logger.info('\n%3s%18s%3s%10s %-40s%-30s' % ('', 'from', 'n', 'params', 'module', 'arguments'))
    anchors, nc, nkpt, gd, gw = d['anchors'], d['nc'], d['nkpt'], d['depth_multiple'], d['width_multiple']

	#anchor的数量,其anchors：[[19, 27, 44, 40, 38, 94], [96, 68, 86, 152, 180, 137], [140, 301, 303, 264, 238, 542], [436, 615, 739, 380, 925, 792]]
    na = (len(anchors[0]) // 2) if isinstance(anchors, list) else anchors  # number of anchors na = 3

	#Improvements to key points in the paper,3×（1+5+2×17）=3×40
    no = na * (nc + 5 + 2*nkpt)   # number of outputs = anchors * (classes + 5)

The optimizer parameters and Batch Size关系

    # Optimizer
    nbs = 64  # nominal batch size
    accumulate = max(round(nbs / total_batch_size), 1)  # accumulate loss before optimizing
	
	#No modification is required herebatch—size而修改decay,The accumulated error is re-optimized
    hyp['weight_decay'] *= total_batch_size * accumulate / nbs  # scale weight_decay
    logger.info(f"Scaled weight_decay = {
      hyp['weight_decay']}")

图像增强

    # class LoadImagesAndLabels(Dataset): # for training/testing
    	...
    	#马赛克增强
        self.mosaic = self.augment and not self.rect  # load 4 images at a time into a mosaic (only during training)
        self.mosaic_border = [-img_size // 2, -img_size // 2]
        self.stride = stride
        self.path = path
        self.kpt_label = kpt_label

		#这里针对Keypointmake improvements.
        self.flip_index = [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15]

COCO与YOLO格式转换

COCO原始格式

${
    POSE_ROOT}
|-- data
`-- |-- coco
    `-- |-- annotations
        |   |-- person_keypoints_train2017.json
        |   `-- person_keypoints_val2017.json
        |-- person_detection_results
        |   |-- COCO_val2017_detections_AP_H_56_person.json
        `-- images
            |-- train2017
            |   |-- 000000000009.jpg
            |   |-- 000000000025.jpg
            |   |-- 000000000030.jpg
            |   |-- ... 
            `-- val2017
                |-- 000000000139.jpg
                |-- 000000000285.jpg
                |-- 000000000632.jpg
                |-- ...

也就是说KeypointsThe labels are placed on the JSON文件中.We can take out a sample and analyze itJSON数据

JSONThe message contains the name of the picture、宽高、id等信息

{
    
	"license": 4,
	"file_name": "000000252219.jpg",
	"coco_url": "http://images.cocodataset.org/val2017/000000252219.jpg",
	"height": 428,"width": 640,
	"date_captured": "2013-11-14 22:32:02",
	"flickr_url": "http://farm4.staticflickr.com/3446/3232237447_13d84bd0a1_z.jpg",
	"id": 252219
}

图片展示如下：
在这里插入图片描述

Its manually annotated information is as follows：

{
    
	"num_keypoints": 17,
	"area": 8511.1568,	"iscrowd": 0,
	"keypoints": [356,198,2,358,193,2,351,194,2,364,192,2,346,194,2,375,207,2,341,211,2,388,236,2,336,238,2,392,263,2,
	343,242,2,373,271,2,347,272,2,372,316,2,348,318,2,372,353,2,355,354,2],
	"image_id": 252219,
	"bbox": [326.28,174.56,71.24,197.25],
	"category_id": 1,"id": 481918
}
我们可以发现,COCO格式中KeypointsThe annotation information of 3×num_keypoins组成,每个三元组格式为：[x,y,v],其中vfor visibility,means to：

 - v=0,表示不可见,and unmarked,此时x=y=0;
 - v=1,表示不可见,已标记;
 - v=2,表示可见,已标记.

{
    
	"num_keypoints": 15,
	"area": 8349.28485,"iscrowd": 0,
	"keypoints": [100,190,2,0,0,0,96,185,2,0,0,0,86,188,2,84,208,2,71,208,2,84,245,2,59,240,2,115,263,2,66,271,2,
	64,268,2,71,264,2,59,324,2,99,322,2,18,363,2,101,377,2],
	"image_id": 252219,
	"bbox": [9.79,167.06,121.94,226.45],
	"category_id": 1,
	"id": 489768
}

bounding boxformat obeys**“xywh”**,即左上角坐标+宽+高

YOLO格式

${
    POSE_ROOT}
|-- data
`-- |-- coco
    `-- |-- annotations
        |   |-- person_keypoints_train2017.json
        |   `-- person_keypoints_val2017.json
        |-- person_detection_results
        |   |-- COCO_val2017_detections_AP_H_56_person.json
        `-- images
        |    |-- train2017
        |    |   |-- 000000000009.jpg
        |    |   |-- 000000000025.jpg
        |    |   |-- ... 
        |    `-- val2017
        |        |-- 000000000139.jpg
        |        |-- 000000000285.jpg
        |        |-- ... 
        `-- labels
        |    |-- train2017
        |    |   |-- 000000000009.txt
        |    |   |-- 000000000025.txt   #Pictured herekeypoint信息,以YOLO格式展示
        |    |   |-- ... 
        |    `-- val2017
        |        |-- 000000000139.txt
        |        |-- 000000000285.txt   #Pictured herekeypoint信息,以YOLO格式展示
        |        |-- ...  
	    `-- train2017.txt    #The content here is：相对路径+图片名字
	    `-- val2017.txt    #The content here is：相对路径+图片名字

Listed here"image_id": 252219的YOLO格式信息

0 0.565469 0.638283 0.111312 0.460864 0.556250 0.462617 2.000000 0.559375 0.450935 2.000000 0.548438 0.453271 2.000000 0.568750 
0.448598 2.000000 0.540625 0.453271 2.000000 0.585938 0.483645 2.000000 0.532813 0.492991 2.000000 0.606250 0.551402 2.000000 
0.525000 0.556075 2.000000 0.612500 0.614486 2.000000 0.535937 0.565421 2.000000 0.582812 0.633178 2.000000 0.542188 0.635514 
2.000000 0.581250 0.738318 2.000000 0.543750 0.742991 2.000000 0.581250 0.824766 2.000000 0.554688 0.827103 2.000000

0 0.110562 0.654871 0.190531 0.529089 0.156250 0.443925 2.000000 0.000000 0.000000 0.000000 0.150000 0.432243 2.000000 0.000000 
0.000000 0.000000 0.134375 0.439252 2.000000 0.131250 0.485981 2.000000 0.110937 0.485981 2.000000 0.131250 0.572430 2.000000 
0.092188 0.560748 2.000000 0.179688 0.614486 2.000000 0.103125 0.633178 2.000000 0.100000 0.626168 2.000000 0.110937 0.616822 
2.000000 0.092188 0.757009 2.000000 0.154688 0.752336 2.000000 0.028125 0.848131 2.000000 0.157812 0.880841 2.000000

0 0.894172 0.652220 0.193219 0.504112 0.837500 0.448598 1.000000 0.840625 0.439252 2.000000 0.000000 0.000000 0.000000 0.862500 
0.443925 2.000000 0.000000 0.000000 0.000000 0.887500 0.483645 2.000000 0.867188 0.485981 2.000000 0.873437 0.567757 2.000000 
0.865625 0.574766 2.000000 0.846875 0.630841 2.000000 0.859375 0.647196 2.000000 0.895312 0.640187 2.000000 0.873437 0.640187 
2.000000 0.920312 0.754673 2.000000 0.845313 0.752336 2.000000 0.964063 0.852804 2.000000 0.828125 0.843458 2.000000

这里,JSON2YOLOFormat conversion function reference linkJSON2YOLO,其算法如下：

img = images['%g' % x['image_id']]
h, w, f = img['height'], img['width'], img['file_name']

# The COCO box format is [top left x, top left y, width, height]
box = np.array(x['bbox'], dtype=np.float64)
box[:2] += box[2:] / 2  # xy top-left corner to center
box[[0, 2]] /= w  # normalize x
box[[1, 3]] /= h  # normalize y

说明YOLOThe format is center point normalized,即XYWH,需要转为 $C_xC_y$ WH（注意,At this point all points are normalized by the width and height of the image）.我们按照上述COCO原始格式,See if you can get itYOLO格式：

"height": 428,"width": 640,
"num_keypoints": 17,
"area": 8511.1568,	"iscrowd": 0,
"keypoints": [356,198,2,358,193,2,351,194,2,364,192,2,346,194,2,375,207,2,341,211,2,388,236,2,336,238,2,392,263,2,
343,242,2,373,271,2,347,272,2,372,316,2,348,318,2,372,353,2,355,354,2],
"image_id": 252219,
"bbox": [326.28,174.56,71.24,197.25],

通过上述算法,可以粗略估计：

bbox：(326+71/2)/640=0.5656, (174+197/2)/428=0.6355, 71/670=0.1109, 197/428=0.460
keypoints[0]: 356/640=0.5562,  198/428=0.4626

This has to do with turn intoYOLOThe result of the format is the same

0 0.565469 0.638283 0.111312 0.460864 0.556250 0.462617 2.000000

300-W转化YOLO格式

300-W人脸数据库,是包含68A popular database of human face keypoints,Its faces come from different datasets egafw、ibug等.其文件格式如下：

-- data
   |-- data_300W
       |-- afw
       |-- helen
       |-- ibug
       |-- lfpw

|-- data
`-- |-- data_300W
    `-- |-- annotations
        |-- afw
        |-- helen
        |-- ibug
        |-- lfpw
        `-- images
        |    |-- train2017
        |    |   |-- 000000000009.jpg
        |    |   |-- 000000000025.jpg
        |    |   |-- ... 
        |    `-- val2017
        |        |-- 000000000139.jpg
        |        |-- 000000000285.jpg
        |        |-- ... 
        `-- labels
        |    |-- train2017
        |    |   |-- 000000000009.txt
        |    |   |-- 000000000025.txt   #Pictured herekeypoint信息,以YOLO格式展示
        |    |   |-- ... 
        |    `-- val2017
        |        |-- 000000000139.txt
        |        |-- 000000000285.txt   #Pictured herekeypoint信息,以YOLO格式展示
        |        |-- ...  
	    `-- train2017.txt    #The content here is：相对路径+图片名字
	    `-- val2017.txt    #The content here is：相对路径+图片名字

300-W格式

查看data_300W/afw/1051618982_1.jpg
在这里插入图片描述
Corresponding to the above picture68Personal face mark is*.pt文件,打开如下

version: 1
n_points:  68
{
482.866335 268.009351
484.241455 298.524244
487.963820 329.985842
491.613829 359.446370
503.992490 387.443021
523.666182 409.551102
543.708366 429.090358
566.283098 442.751692
……
591.348649 385.406662
580.068281 384.385348
563.609110 379.281936
552.917511 366.852392
580.508062 371.198816
592.309498 371.492218
604.011866 371.855814
634.952400 369.536292
604.011866 371.855814
592.309498 371.492218
580.508062 371.198816
}

一共68a binary pair $x_i,y_i)$ ,为方便展示,Some value pairs in the middle are omitted.而coco2yolo格式如下所示,即：

0      xywh    (x, y)
|        |        |
|        |        ` - - Coordinates normalized to the width and height of the image | ` - - 归一化的bounding box,中心点坐标xywith the width and height of the boxwh
` - - iscrowd：Whether the crowded scene,0,N;1,yes.

300-W格式转YOLO格式

也就是说,需要将上述68The data of face key points are transformed into coco2yolo格式.这里,我们参考PIPNetthe preprocessed text,将300WFolders are fully converted to COCOsimilar file format,Include the file target format.This is done to avoid as much as possibleyolo中代码修改.
至此,This format was converted successfully.

工程修改（Pit recording）

YOLOThere are quite a few changes involved,主要在几个方面：

数据集读取;
Detection head modification;

去修改launch文件相关配置;
去修改data/coco_kepts.yamlThe data read path in the file.
去修改models/hub/cfg文件,如yolo5s6_kpts.yamlThe relevant parameters in the ：nkpt 从17change68;
去修改dataset第497行,有关如何读取txt数据的;
去修改dataset第987行,about how the data changes;
修改dataset第365行,有关如何flip数据;
修改loss函数第187,和202行,有关loss_gain;
loss函数中第119行,有关sigmas是直接写死的,都写成1算了;
plots函数中第76、84行,有关plot的问题,Not done yet,Forget drawing;
修改yolo函数第90行,有关self.inplace

train log

autoanchor: Analyzing anchors... anchors/target = 7.86, Best Possible Recall (BPR) = 1.0000
Image sizes 640 train, 640 test
Using 4 dataloader workers
Logging results to runs/train/exp10
Starting training for 300 epochs...

     Epoch   gpu_mem       box       obj       cls       kpt      kptv     total    labels  img_size
     0/299     4.22G   0.07731    0.0573         0    0.3465   0.01299    0.4941        10       640: 100%| 787/787 [02:58<00:00,  4.41it/s]
               Class      Images      Labels           P           R      [email protected]  [email protected]:.95: 100%|  87/87 [00:14<00:00,  6.05it/s]
                 all         689         689      0.0073       0.691     0.00784     0.00137
……

一个epoch需要3mins,共300个epoch; Looking forward to the results！