当前位置:网站首页>Target detection - Yolo series
Target detection - Yolo series
2022-07-01 20:25:00 【zyw2002】
Overview of classical detection methods for deep learning
The significance of the phase in the detection task
- One Stage: YOLO series
Output the coordinates of the upper left and lower right corners of the detection box (x1,y1),(x2,y2)
One CNN Network regression prediction is enough - Two Stage: Faster-rcnn Mask-Rcnn series
And one-stage comparison , One more step to extract candidate boxes ( primary ).
An analogy :
Suppose the company wants to recruit some excellent talents ,one-stage It's a direct audition ,two-stage There are two levels of selection ( It's equivalent to one more elimination game ). one-satge Faster, suitable for real-time detection tasks ,two-stage Better detection .
Indicators to measure detection
IOU Is the intersection ratio between the real box and the prediction box - Accuracy and recall
TP: Be correct (true) Is judged as a positive sample (positive) Number of ( It turned out to be a positive sample )
FP: Be wrong (false) Is judged as a positive sample (positive) Number of ( It turned out to be a negative sample )
FN: Be wrong (true) Is judged as a negative sample (negative) Number of ( It turned out to be a positive sample )
TN: Be correct (true) Is judged as a negative sample (negative) Number of ( It turned out to be a negative sample )
Precision: It describes the probability of finding the right one
Recall: It describes the probability of finding the whole
An analogy :
Suppose you are doing a multi topic , altogether 6 The two options are A~F, The right answer is ABCDE, You choose ABF. Only AB Yes. , The accuracy is 2/3. But there are also CDE No choice , The proportion of finding the whole is 2/5.
- AP and mAP
Generally speaking , Accuracy and retrieval rate cannot be considered at the same time , One is high, the other is relatively low .
Change threshold (IOU Greater than a certain threshold , It is judged as a positive sample ), You can get different precision and recall, Then make presion - recall Graph ( Generally called PR chart )
Go to the upper limit and find And x The area of the shadow surrounded by the axis is AP Value . As shown in the figure above, it is a rectangle A1,A2,A3,A4 The sum of the areas of .
Calculate for each category AP, Calculating mean obtain mAP
YOLO- V1 Overall thinking and network architecture
- classical one-stage Method
- You Only Look Once, The name says everything !
- Transform the detection problem into a regression problem , One CNN Just like the !
- It can detect video in real time , It has a wide range of applications !
The core idea :
input data A picture ——> Then divide the picture into 7*7 The grid of ——> Then each grid corresponds to two candidate boxes ( Length and width are based on experience )——> True value and candidate box get IOU——> choice IOU The large candidate box ——> Fine tune the length and width of the selected candidate box ——> Coordinates of the center point of the predicted box (x,y), Length and width w, h, And confidence ( Is the probability of the object )
- Network architecture
The input image resize To 48483
After many convolutions, we get 771024 Characteristic graph
Then expand the full connection , The first full connection gets 4096 Features , The second full connection gets 1470 Features
Re pass reshape obtain 7730.( The number of grids per picture is 7*7 Of , Each grid corresponds to 30 Eigenvalues , among 30 The first of the eigenvalues 10 Two are the values of the two candidate boxes x,y,w,h,c, hinder 20 It stands for 20 A classification , That is, the probability of belonging to each category )
x,y,w,h It represents the normalized value , Is the relative position and size
Loss function :
- Non maximum suppression
Select... When the detection box overlaps IOU maximal .
YOLOv1 problem
problem 1: Every Cell Predict only one category , If overlap cannot be resolved
problem 2: The detection effect of small objects is general , The aspect ratio is optional but single
YOLO-V2 Improve the details
Improve the details
introduce Batch Normalizatioin
V2 Version discard Dropout(V1 When fully connected, kill some nerves far , Prevent over fitting ,V2 Version does not have a full connection layer ), After convolution, add all Batch Normalization
The inputs of each layer of the network are normalized , Convergence is relatively easier
after Batch Normalization The processed network will improve 2% Of mAP
From the present point of view ,Batch Normalization It has become a necessary network processingGreater resolution
V1 Training is done with 224224, Used during testing 448448
It may cause the model to be acclimatized ,V2 Additional training was carried out during the training 10 Time 448*448 Fine tuning of
After using high-resolution classifier ,YOLOv2 Of mAP Raised the appointment 4%
Network structure
- DarkNet19, The actual input is 416*416
- No, FC layer ,5 Subsampling ,(13*13)
- 1*1 Convolution saves a lot of parameters
5 Next sampling , The length and width are reduced 2^5=32.
416/32=13 , The best value obtained by division is an odd number , It is convenient to find the grid corresponding to the central point
- Cluster extraction a priori box
By clustering 5 A priori box
faster-rcnn The proportions of prior boxes of are 1:1,1:2,2:1, Each proportion corresponds to 3 Different size, So there is 3*3=9 A priori box
- Anchor Box
By introducing anchor boxes, Make the prediction box More (1313n)
Follow faster-rcnn The difference of the series is that the a priori box is not given directly according to the fixed ratio of length and width - Directed Location Prediction
- Feel the field
It is the point on the feature map that can see how large the original image is
The bigger the feeling field , The more you can feel the overall object .
Stack two 33 The convolution of layer , The receptive field is 55.
Stack three 33 The convolution of layer , The receptive field is 77.
Why use the stack of small convolution kernels instead of directly using a large convolution kernel to expand the receptive field ?
- Feature fusion
The feeling field on the last floor is too big , The small target may be lost , Need to integrate previous features .
Split the last layer and the penultimate layer and splice them together . - Multiscale
YOLOv3 Core network model
V3 The biggest improvement is Network structure , Make it more suitable for Small target detection
The features are more detailed , integrate into Multi persistent feature graph information To predict objects of different sizes
The a priori box is richer ,3 Kind of scale, Each of these 3 Specifications , altogether 9 Kind of
softmax improvement , Predict multi label tasks
Improvements :
- many scale
Design three targets, large, medium and small, to predict different sizes
Namely 5252, 2626,13*13( The greater the value , The smaller the candidate library box ), Every BOX Represents different aspect ratios .
as follows , On the left is the image pyramid , Predict pictures of different sizes . But slower .
The picture on the right is through up sampling , Feature maps of different sizes are fused .
- Residual connection
thought : Only good , Bad abandonment , No worse than before .
Core network architecture : darknet53
- No pooling , Because pooling compresses features
- The full connection layer is not practical , stay v2 Has been removed from
- Down sampling through stride =2 Realization , The image size is reduced to half of the original .
- There are three grid sizes
V1 The middle grid is 77
V2 The middle grid is 1313
V3 The middle grid is 13*13 26*26 52*52
Design of a priori frame
It is also obtained by clustering 9 A priori box .
Give the big a priori box to 1313, Give it to 2626, Give the small one to 52*52.softmax Layer substitution
Multi label improvements , Get the probability value belonging to each category , Those greater than a certain threshold belong to this category .
YOLOV-3 Source details
Download address :
PyTorch-YOLOv3 Source code download address
Pre training weights
coco Data sets :
Reference material :
yolov3 The source code parsing
config Folder
- coco.data
classes= 80 # Category
train=data/coco/trainvalno5k.txt # Storage path of training set pictures
valid=data/coco/5k.txt # Storage path of test set pictures
names=data/coco.names # Class alias
backup=backup/ # Record checkpoint Storage location
eval=coco # choice map Calculation method
- create_custom_model.sh
Script files : Users can customize their own models , Run this file to generate the configuration file of the custom model yolov3-custom.cfg. It's comparable yolov3.cfg - custom.data
Information of your own dataset , Used to train yourself in detection tasks : Number of categories , Training set path 、 Verify set path 、 Category name path - yolov3.cfg
yolov3 Configuration information of network model : Convolution layer ( Convolution kernel number 、 Convolution kernel size 、 step …)、yolo Configuration information of layer and other layers . - yolov3-custom.cfg
Configuration information of customized network model , from create_custom_model.sh Script file generation . - yolov3-tiny.cfg
yolov3 Of tiny Configuration information of version network model .
data Folder
coco Folder
yes coco Training set 、 Data set of validation set , Is to run get_coco_dataset.sh The result after the script file .
custom Folder
Is information about custom datasets .
1)images Folder : All training sets 、 Verify the image of the set .
2)labels Folder : Use picture marking software to images Label the pictures in the folder to get the corresponding label file . Each label file is a txt file ,txt Each line of data in the file is a groundthuth Information : Category serial number , Bounding box coordinate information . As shown in the figure ,0 Represents the index number of the category , Next is the coordinate information of the bounding box
3)classes.names Is the category name file of the custom dataset .
4)train.txt File is a collection of training set image paths , Each row of data is the path of an image in the training set .
5)valid.txt File is a collection of verification set image paths , Each row of data is the path of a picture in the training set .
samples Folder
Is the folder where the model test pictures are located , Use to see the test results of the model .
coco Category information of data , similar classes.names.
Script files , Used to obtain coco data , Generate coco Folder and its contents .
Utils Folder
Files for data enhancement , This project is only for data enhancement of horizontal flipping , When the image is flipped , The corresponding annotation information has also been modified , The final return is the flipped image and the label corresponding to the flipped image .
- Guide pack
import torch
import torch.nn.functional as F
import numpy as np
- horisontal_flip()
Input :image,targets Is the original image and label ;
return :images,targets It's the flipped image and label .
function :horisontal_flip() Function is to enhance image data , Make the data set expanded . Here, we only use the image flipping in the horizontal direction .
def horisontal_flip(images, targets): # Mirror and flip the image and label
images = torch.flip(images, [-1]) # Mirror reversal
targets[:, 2] = 1 - targets[:, 2]
# targets It's the corresponding label [ Degree of confidence , Height of center point , Center point width , Frame height , Box width ]
# When the mirror is flipped , Only targets[:, 2],
return images, targets
torch.flip(input,dims) ->tensor
function : Invert the array
Parameters : imput Reversed tensor ; dim Inverted dimension
return : After reversal tensor
because image It is stored in an array (c,h,w), The three dimensions represent color channels , vertical direction , horizontal direction .python in [-1] Represents the last number , That is, the horizontal direction .
targets It's the corresponding label [ Degree of confidence , Height of center point , Center point width , Frame height , Box width ], The height and width are expressed by relative position , The scope is [0,1].
Operate on datasets py file , Contains the fill of the image 、 Image size adjustment 、 Load class of test data set 、 Evaluate the load class of the dataset . The entire file contains 3 A function and 2 Classes , as follows
- Guide pack
import glob
import random
import os
import sys
import numpy as np
from PIL import Image
import torch
import torch.nn.functional as F
from utils.augmentations import horisontal_flip
from torch.utils.data import Dataset
import torchvision.transforms as transforms
- pad_to_square
Input :img Original picture ,pad_value fill padding Value
Output :padding Picture after
function : Add the original picture padding, Expand it into a square . The side length of the square is max(width,length). Then useF.pad
Functions are filled with constants padding
''' Picture filling function : Use the picture pad_value Fill it into a square , Return the filled picture and the filled position information '''
def pad_to_square(img, pad_value): # The picture is filled with squares ,pad_value: Fill in the value filled in the part
c, h, w = img.shape
dim_diff = np.abs(h - w)
# (upper / left) padding and (lower / right) padding
pad1, pad2 = dim_diff // 2, dim_diff - dim_diff // 2
# fill style , If the height is less than the width, fill up and down , If the height is greater than the width , Fill left and right
pad = (0, 0, pad1, pad2) if h <= w else (pad1, pad2, 0, 0)
# Picture filling , Parameters img It's the original picture ,pad It's filling (0,0,pad1,pad2) or (pad1,pad2,0,0),value Is the filled value
img = F.pad(img, pad, "constant", value=pad_value)
return img, pad
- resize
Input :image, Original picture ; size, expect resize To the size of the picture
Output :resize Picture after
function : Implementation / Down sampling function
''' Image resize : Interpolate the square image , Change to fixed size size '''
def resize(image, size):
image = F.interpolate(image.unsqueeze(0), size=size, mode="nearest").squeeze(0) # Decompress the original picture and use “nearest” Method to fill , And then compress
return image
pytorch torch.nn.functional.interpolate Realize interpolation and up sampling
torch.nn.functional.interpolate(input, size=None, scale_factor=None, mode='nearest', align_corners=None)
Parameters :
input (Tensor) – Input tensor
size (int or Tuple[int] or Tuple[int, int] or Tuple[int, int, int]) – Output size
scale_factor (float or Tuple[float]) – Specify how many times the output is the input . If the input is tuple, It should also be formulated as tuple type
mode (str) – Up sampling algorithm that can be used , Yes ’nearest’, ‘linear’, ‘bilinear’, ‘bicubic’ , ‘trilinear’ and ’area’. By default ’nearest’
- random_resize
Input :image Original picture , min_size,max_size The range of random numbers
Output : Adjusted picture
function : Randomly resize the picture
""" Random clipping function : Crop the picture randomly to a certain size ( Using interpolation )"""
def random_resize(images, min_size=288, max_size=448):
new_size = random.sample(list(range(min_size, max_size + 1, 32)), 1)[0]
images = F.interpolate(images, size=new_size, mode="nearest")
return images
- ImageFolder
function : The standard format used to define data sets .
Reading pictures from a folder , The picture padding Square , All input pictures are resized to 416*416, Return the number of pictures
''' Dataset loading class 1: Load and process pictures , The returned image path , And processed pictures '''
# Used to predict : stay detect.py When loading datasets in
class ImageFolder(Dataset): # This is the standard format for defining datasets
def __init__(self, folder_path, img_size=416):# The initialization parameters are : The path of the folder where the test image is located 、 The size of the picture ( The size of the picture used to input to the network )
# Get the path of the pictures under the folder ,files Is a list of image paths
self.files = sorted(glob.glob("%s/*.*" % folder_path))# The example is detect.py in folder_path=data/samples
self.img_size = img_size # Initialize the size of the picture
def __getitem__(self, index): # Get the path of the pictures in the list according to the index
img_path = self.files[index % len(self.files)]
# Convert the picture to tensor The format of
img = transforms.ToTensor()(Image.open(img_path))
# use 0 Fill the picture with a square
img, _ = pad_to_square(img, 0)
# Resize the picture to the specified size
img = resize(img, self.img_size)
return img_path, img # return index Corresponding picture Path and picture
def __len__(self):
return len(self.files) # Number of all pictures
- ListDataset
Dataset class :
pytorch Read the picture , Mainly through Dataset class .Dataset Class as all datasets Base class of , be-all datasets We should inherit it
init: Used to initialize some parameters related to the operation data set
getitem: Define how data is obtained ( Including reading data , Transform the data ), This method supports from 0 To len(self)-1 The index of .obj[index] Equivalent to obj.getitem
len: Get the size of the dataset .len(obj) Equivalent to obj.len()
""" Dataset loading class 2: Load and process pictures and picture labels , The returned image path , Processed pictures , Processed labels """
# Used to assess : stay test.py When loading data sets in
class ListDataset(Dataset):
# Data loading
def __init__(self, list_path, img_size=416, augment=True, multiscale=True, normalized_labels=True):
# Initialize parameters :list_path For verifying the path of the set image txt file , The path of 、img_size For picture size ( The size of the picture input into the network )、augment Whether the data is enhanced 、multiscale Whether to use multi-scale ,normalized_labels Whether the label is normalized
# Get the verification set image path img_files, It's a list
with open(list_path, "r") as file: # open valid.txt file , The content is data/custom/images/train.jpg, Indicates the image path corresponding to the verification set
self.img_files = file.readlines()
# Get the validation set label path label_files: It's a list , Get the label path according to the path of the verification set image , The difference between the two is the folder and the suffix ,
self.label_files = [
path.replace("images", "labels").replace(".png", ".txt").replace(".jpg", ".txt")
for path in self.img_files
# Other settings
self.img_size = img_size
self.max_objects = 100 # The maximum number of targets
self.augment = augment # bool. Whether to use enhancements
self.multiscale = multiscale # bool. Whether multi-scale input , Every time feed to the network batch The size of the picture in is not fixed .
self.normalized_labels = normalized_labels # bool. Default label.txt In the document bbox It's normalized to 0-1 Between
self.min_size = self.img_size - 3 * 32
self.max_size = self.img_size + 3 * 32 # self.min_size and self.max_size The main function of is to generate three different after data processing size Image , The purpose is to make the network have better detection results for small objects and large objects .
self.batch_count = 0 # What is the current number of network training batch
# According to the subscript index Find the corresponding picture , And on the picture 、 Fill the label , Suitable for square , Normalize the label . Return to image path , picture , label
def __getitem__(self, index): # Read data and labels
# ---------
# Image
# ---------
# Get the path of the picture according to the index
img_path = self.img_files[index % len(self.img_files)].rstrip()
img_path = 'F:\\cv\\PyTorch-YOLOv3\\PyTorch-YOLOv3\\data\\coco' + img_path
# print (img_path)
# Change the picture into tensor
img = transforms.ToTensor()(Image.open(img_path).convert('RGB'))
# Change the picture into three channels , Get the width and height of the image
if len(img.shape) != 3:
img = img.unsqueeze(0)
img = img.expand((3, img.shape[1:]))
_, h, w = img.shape
h_factor, w_factor = (h, w) if self.normalized_labels else (1, 1) # If marked bbox Not normalized , What is saved in the mark is the real location
# Fill the picture with squares , Return the filled image , And the information filled pad = (0, 0, pad1, pad2) if h <= w else (pad1, pad2, 0, 0)
img, pad = pad_to_square(img, 0)
# Height and width after filling
_, padded_h, padded_w = img.shape
# ---------
# Label
# ---------
# Index based , Get the label path
label_path = self.label_files[index % len(self.img_files)].rstrip()
label_path='F:\\cv\\PyTorch-YOLOv3\\PyTorch-YOLOv3\\data\\coco\\labels'+ label_path
#print (label_path)
targets = None
if os.path.exists(label_path): # Read the label information of a picture
# Read the bounding box in an image :txt The coordinate information of the bounding box contained in the file is the normalized coordinates
boxes = torch.from_numpy(np.loadtxt(label_path).reshape(-1, 5)) # [0class_id, 1x_c, 2y_c, 3w, 4h] The normalized , Normalization is to accelerate the convergence of the model
# np.loadtxt() The function mainly converts the value in the tag into araray
# Change the normalized coordinates into coordinates suitable for the original picture
# Use (x_c, y_c, w, h) Get real coordinates ( Top left , The lower right )
x1 = w_factor * (boxes[:, 1] - boxes[:, 3] / 2)
y1 = h_factor * (boxes[:, 2] - boxes[:, 4] / 2)
x2 = w_factor * (boxes[:, 1] + boxes[:, 3] / 2)
y2 = h_factor * (boxes[:, 2] + boxes[:, 4] / 2)
# Change the coordinates to those of the picture filled with squares
# The annotation should be adjusted the same as the original drawing pad(0 Left ,1 Right ,2 On ,3 Next )
x1 += pad[0]
y1 += pad[2]
x2 += pad[1]
y2 += pad[3]
# Turn the information in the bounding box into (x,y,w,h) form , And normalize
# (padded_w, padded_h) It is the present. padding Then the width of the picture
boxes[:, 1] = ((x1 + x2) / 2) / padded_w
boxes[:, 2] = ((y1 + y2) / 2) / padded_h
# (w_factor, h_factor) Is the width and height of the original graph
boxes[:, 3] *= w_factor / padded_w
boxes[:, 4] *= h_factor / padded_h
# # The length is 6:(0, Category index ,x,y,w,h)
targets = torch.zeros((len(boxes), 6))
targets[:, 1:] = boxes
# Apply augmentations
if self.augment:
if np.random.random() < 0.5:
img, targets = horisontal_flip(img, targets) # Data to enhance
return img_path, img, targets # return index Corresponding image path , Fill and resize the picture , The normalized format of picture labels (img_id, class_id, x_c, y_c, w, h)
# collate_fn: Implement custom batch Output . How to take samples , Define your own functions to accurately implement the desired functions , And give target Give index
def collate_fn(self, batch):
paths, imgs, targets = list(zip(*batch)) # # Get batch image path 、 picture 、 label
#target Each element of is the information of all bounding boxes of each picture
targets = [boxes for boxes in targets if boxes is not None]
# Read target Each element of , Each element is all the bounding box information of a picture , And mark the border box of each picture with the same serial number
for i, boxes in enumerate(targets):
boxes[:, 0] = i # Add an index to each bounding box , Serial number
targets = torch.cat(targets, 0) # Directly put a batch All of the bbox Merge together , Calculation loss Press when batch Calculation
# Selects new image size every tenth batch
if self.multiscale and self.batch_count % 10 == 0:
self.img_size = random.choice(range(self.min_size, self.max_size + 1, 32))
# Resize images to input shape
# Every time 10 Samples are randomly resized
imgs = torch.stack([resize(img, self.img_size) for img in imgs]) # Resize the image and put it on the stack
self.batch_count += 1
return paths, imgs, targets # Returns normalized [img_id, class_id, x_c, y_c, h, w]
def __len__(self):
return len(self.img_files)
Used to write monitoring data to the file system ( journal ), Save some training information . Such as loss . This logger Class in train.py Use in , Save some information to log file during training .
import tensorflow as tf
class Logger(object):
def __init__(self, log_dir): #log_dir Is the path to the log
"""Create a summary writer logging to log_dir."""
self.writer = tf.summary.create_file_writer(log_dir) # Create a summary writer
# Due to version issues ,tf.summary.FileWriter There may be a mistake , Change it to tf.compat.v1.summary.FileWriter
def scalar_summary(self, tag, value, step): # Record a scalar variable
with self.writer.as_default():
tf.summary.scalar(tag, value, step=step)
def list_of_scalars_summary(self, tag_value_pairs, step):
with self.writer.as_default():
for tag, value in tag_value_pairs:
tf.summary.scalar(tag, value, step=step)
# summary = tf.Summary(value=[tf.Summary.Value(tag=tag, simple_value=value) for tag, value in tag_value_pairs])
# self.writer.add_summary(summary, step)
Contains two parsers :
1. Model configuration parser : Return a list model_defs, Each element of the list is a dictionary , The dictionary represents a layer of the model ( modular ) Information about .
2. Data configuration parser : Return a dictionary , Each key value pair describes , Name and path of data , Or other information .
''' Model configuration parser : analysis yolo-v3 Layer profile function , And return the module definition module_defs,path Namely yolov3.cfg route '''
def parse_model_config(path):
''' Look at this function , Be sure to see first config Under folder yolov3.cfg file , The following is yolov3.cfg Part of the content display : [convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky # Downsample [convolutional] batch_normalize=1 filters=64 size=3 stride=2 pad=1 activation=leaky ... :param path: Model configuration file path ,yolov3.cfg The path of :return: Model definition , List the type , The elements in the list are dictionaries , The dictionary contains the definition parameters of each module '''
# open yolov3.cfg file , And save the contents of the document into the list , Each element of the list is a line of data of the file .
file = open(path, 'r')
lines = file.read().split('\n')
lines = [x for x in lines if x and not x.startswith('#')]# Do not read comments
lines = [x.rstrip().lstrip() for x in lines] # Remove margins
# Define a list modle_defs
module_defs = []
# Read cfg Every line of the :
# 1. If the line is marked with [ start : Representation is the beginning of a new block of the model , to module_defs Add a dictionary to the list
# Dictionary ‘type’=[] Internal content , If [] What's in it convolution, Then add 'batch_normalize':0
# 2. If the content of this line is not in [ start , The representation is the specific content of the block
# The value before the equal sign is the dictionary key, The value after the equal sign is the dictionary value
for line in lines:# Read yolov3.cfg Every line of the document
# If a line is written with [ The beginning description is the beginning of a model ,[] The content in is the name of the module , Such as [convolutional][convolutional][shortcut]....
if line.startswith('['): # This marks the start of a new block
# Add an empty dictionary to the model definition module_defs In the list
# Assign values to the contents of the dictionary : example {’type‘:’convolutional‘}
module_defs[-1]['type'] = line[1:-1].rstrip()
# If the current module is convolutional modular , Assign values to the contents of the dictionary :{’type‘:’convolutional‘,'batch_normalize':0}
if module_defs[-1]['type'] == 'convolutional':
module_defs[-1]['batch_normalize'] = 0
# If a line does not contain [ The description at the beginning is the specific content of the module
key, value = line.split("=")
value = value.strip()#strip() Delete leading and trailing spaces ,rstrip() Delete trailing spaces
# Add this line to the dictionary ,key To the left of the equation ,value Is the content on the right side of the equation
module_defs[-1][key.rstrip()] = value.strip()
return module_defs# Model definition , It's a list , Each element of the list is a dictionary , The dictionary contains specific information about a module
''' Data configuration parser : Parameters path The path to the configuration file '''
def parse_data_config(path):
""" Information contained in data configuration : classes= 80 train=data/coco/trainvalno5k.txt valid=data/coco/5k.txt names=data/coco.names backup=backup/ eval=coco """
# Create a dictionary
options = dict()
# Add elements to the dictionary
options['gpus'] = '0,1,2,3'
options['num_workers'] = '10'
# Read every line of the data configuration file , And the information of each line is stored in the dictionary in the form of key value pairs
with open(path, 'r') as fp:
lines = fp.readlines()
for line in lines:
line = line.strip()
if line == '' or line.startswith('#'):
key, value = line.split('=')
options[key.strip()] = value.strip()
return options# Return a dictionary , Dictionary key As the name (train,valid,names..),value For path or other information
from __future__ import division
import tqdm
import torch
import numpy as np
def to_cpu(tensor):
return tensor.detach().cpu()
''' Load dataset category information : Return to the list of categories '''
def load_classes(path):# The parameter is the path of the category name file . example coco.names or classes.names The path of
fp = open(path, "r")
names = fp.read().split("\n")[:-1]# Save each line of data in the file into the list , This allows the names of each category of the dataset to be stored in a list
return names# Return the list of category names
''' Weight initialization function '''
def weights_init_normal(m):
classname = m.__class__.__name__
if classname.find("Conv") != -1:# Convolution layer weight initialization settings
torch.nn.init.normal_(m.weight.data, 0.0, 0.02)
elif classname.find("BatchNorm2d") != -1:# Batch normalization layer weight initialization settings
torch.nn.init.normal_(m.weight.data, 1.0, 0.02)
torch.nn.init.constant_(m.bias.data, 0.0)
''' Change the size function of the prediction bounding box : Parameter is , Bounding box 、 Current picture size ( Scalar )、 Original picture size . Because the bounding box information of network prediction is , Fill the image 、 The predicted results of resized pictures , Therefore, the predicted bounding box needs to be adjusted to adapt to the target of the original graph '''
def rescale_boxes(boxes, current_dim, original_shape):
# The height and width of the original picture
orig_h, orig_w = original_shape
# Filling information of the original picture : Calculate according to the difference between the width and height of the original drawing .
#pad_x Is the number of pixels with wide sky length , pad_y Number of pixels filled for high
pad_x = max(orig_h - orig_w, 0) * (current_dim / max(original_shape))# The height of the original image is greater than the width . Change the size of the picture / The size of the longest side of the original drawing = The zoom ratio
pad_y = max(orig_w - orig_h, 0) * (current_dim / max(original_shape))
# The predicted bounding box information , Adjust to fit the original
unpad_h = current_dim - pad_y
unpad_w = current_dim - pad_x
# Change the size of the prediction bounding box , Make it applicable to the original picture
boxes[:, 0] = ((boxes[:, 0] - pad_x // 2) / unpad_w) * orig_w# Top left x Coordinates of
boxes[:, 1] = ((boxes[:, 1] - pad_y // 2) / unpad_h) * orig_h# Top left y Coordinates of
boxes[:, 2] = ((boxes[:, 2] - pad_x // 2) / unpad_w) * orig_w
boxes[:, 3] = ((boxes[:, 3] - pad_y // 2) / unpad_h) * orig_h
return boxes# Return the information of the adjusted prediction bounding box /
''' Convert the bounding box information to the upper left and lower right coordinate representation function '''
def xywh2xyxy(x):
y = x.new(x.shape)
y[..., 0] = x[..., 0] - x[..., 2] / 2
y[..., 1] = x[..., 1] - x[..., 3] / 2
y[..., 2] = x[..., 0] + x[..., 2] / 2
y[..., 3] = x[..., 1] + x[..., 3] / 2
return y
""" Measurement calculation : Parameter is true_positive( The value is 0 or 1,list)、 Prediction confidence (list), Forecast category (list), Real category (list) return :p, r, ap, f1, unique_classes.astype("int32")"""
def ap_per_class(tp, conf, pred_cls, target_cls):# Parameters :true_positives, pred_scores, pred_labels 、 Picture real label information
# Sort by confidence , After tp, conf, pred_cls
i = np.argsort(-conf)
#print(' The number of all prediction boxes is ',len(i))
tp, conf, pred_cls = tp[i], conf[i], pred_cls[i]# Ranked by confidence tp( The value is 0,1), conf, pred_cls
# Get the categories contained in the real box in the picture ( The category is not repeated )
unique_classes = np.unique(target_cls)
# Create Precision-Recall curve and compute AP for each class
ap, p, r = [], [], []
for c in tqdm.tqdm(unique_classes, desc="Computing AP"):# Calculate for each category AP
# i: For all prediction bounding box classes pred_cls, Judgment and current c Whether the classes are the same , If it is the same, the position is true Otherwise false, Get and pred_class Boolean list with the same shape
i = pred_cls == c
# ground truth The category in is c The number of
n_gt = (target_cls == c).sum()
# The category in the prediction bounding box is c The number of
n_p = i.sum()
if n_p == 0 and n_gt == 0:
elif n_p == 0 or n_gt == 0:
# Calculation FP and TP
fpc = (1 - tp[i]).cumsum()#i The list records whether the corresponding position of the index is c Category bounding box ,tp Record whether the corresponding position of the index is a positive example box
tpc = (tp[i]).cumsum()
# print('tp[i]',tp[i],len(tp[i]))#tp[i] Yes, the category in all boxes is c Of the prediction box true_positive Information ( The value is 0 or 1,1 Represents and truth box iou Greater than threshold )
# print('fpc',fpc,len(fpc))#fpc For the category c The prediction box of is the prediction box of positive example
# print('tpc', tpc,len(tpc))#tpc For the category c The prediction box with negative examples in the prediction box
# Calculate recall rate
recall_curve = tpc / (n_gt + 1e-16)
# Calculation accuracy
precision_curve = tpc / (tpc + fpc)
# Calculation AP:AP from recall-precision curve
ap.append(compute_ap(recall_curve, precision_curve))
# Compute F1 score (harmonic mean of precision and recall)
p, r, ap = np.array(p), np.array(r), np.array(ap)
f1 = 2 * p * r / (p + r + 1e-16)
return p, r, ap, f1, unique_classes.astype("int32")
""" Calculation AP"""
def compute_ap(recall, precision):# Parameter accuracy and recall
# correct AP calculation
# to Precision-Recall Add head and tail to the curve
mrec = np.concatenate(([0.0], recall, [1.0]))
mpre = np.concatenate(([0.0], precision, [0.0]))
# compute the precision envelope
# A simple application of dynamic programming , Realize in recall=x when ,precision The value is recall=[x, 1] The largest in the range precision
for i in range(mpre.size - 1, 0, -1):
mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
# to calculate area under PR curve, look for points
# where X axis (recall) changes value
# seek recall[i]!=recall[i+1] All positions of , namely recall The location of the change , Easy to calculate PR The area under the curve , namely AP
i = np.where(mrec[1:] != mrec[:-1])[0]
# and sum (\Delta recall) * prec
# Use the integral method to find PR The area under the curve , namely AP
ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
return ap
''' Statistical information calculation : Parameters , Model prediction output (NMS The result of the treatment ), Real label ( Adapted to the original x,y,x,y),iou threshold . return ,true_positive( The value is 0/1, If the overlap between the predicted bounding box and the real bounding box is large, the value is 1, Otherwise 0), Prediction confidence , Forecast category '''
def get_batch_statistics(outputs, targets, iou_threshold):
# outputs It is the result of non maximum suppression (x,y,x,y,object_confs,class_confs,class_preds) The length is 7
batch_metrics = []
for sample_i in range(len(outputs)):# Through each output The bounding box of , Because it is a batch operation , There are many pictures in each batch , Each picture corresponds to one output, So traverse every output
if outputs[sample_i] is None:
''' Picture prediction information :'''
output = outputs[sample_i]# Take the first place sample_i individual output Information , Every output It contains many bounding boxes
pred_boxes = output[:, :4]# Predict the coordinate information of the bounding box
pred_scores = output[:, 4]# The confidence of the prediction bounding box
pred_labels = output[:, -1]# Predict the category of the bounding box
true_positives = np.zeros(pred_boxes.shape[0])#true_positive The length of is pre_boxes The number of
''' Annotation information of pictures (groundtruth):'''
# Coordinate information , The format is (xyxy)
annotations = targets[targets[:, 0] == sample_i][:, 1:]# This sentence corresponds to ID Under the target Match with the image ,dataset.py Inside ListDataset Generic collate_fn Function gives target give ID
# Category information
target_labels = annotations[:, 0] if len(annotations) else []
if len(annotations):
detected_boxes = []# Create an empty list
target_boxes = annotations[:, 1:]# Real bounding box (groundtruth) coordinate
for pred_i, (pred_box, pred_label) in enumerate(zip(pred_boxes, pred_labels)):# Traverse the prediction box : Coordinates and categories
if len(detected_boxes) == len(annotations):
# Ignore if label is not one of the target labels
if pred_label not in target_labels:
# Calculate the prediction box and the real box IOU
iou, box_index = bbox_iou(pred_box.unsqueeze(0), target_boxes).max(0)
# If the prediction box and the real box IOU Greater than threshold , Then it can be considered that the prediction boundary box prediction ’ correct ‘, And put the true_positives Value is set to 1
if iou >= iou_threshold and box_index not in detected_boxes:
true_positives[pred_i] = 1
detected_boxes += [box_index]
batch_metrics.append([true_positives, pred_scores, pred_labels])
return batch_metrics#true_positive, Prediction confidence , Forecast category
""" Not used """
def bbox_wh_iou(wh1, wh2):
wh2 = wh2.t()
w1, h1 = wh1[0], wh1[1]
w2, h2 = wh2[0], wh2[1]
inter_area = torch.min(w1, w2) * torch.min(h1, h2)
union_area = (w1 * h1 + 1e-16) + w2 * h2 - inter_area
return inter_area / union_area
""" Calculate the IOU value """
def bbox_iou(box1, box2, x1y1x2y2=True):
# Get the upper left and lower right coordinate values of the bounding box
if not x1y1x2y2:
# If the representation of the bounding box is (center_x,center_y,width,height) Then the conversion representation format is (x,y,x,y)
b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2
b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2
b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2
b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2
b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3]#box1 The upper left and lower right coordinates of
b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3]#box1 The upper left and lower right coordinates of
# The upper left and lower right coordinates of the intersecting rectangle
inter_rect_x1 = torch.max(b1_x1, b2_x1)
inter_rect_y1 = torch.max(b1_y1, b2_y1)
inter_rect_x2 = torch.min(b1_x2, b2_x2)
inter_rect_y2 = torch.min(b1_y2, b2_y2)
# Area of intersecting rectangles
inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1, min=0) * torch.clamp(
inter_rect_y2 - inter_rect_y1 + 1, min=0
# The area of the Union
b1_area = (b1_x2 - b1_x1 + 1) * (b1_y2 - b1_y1 + 1)
b2_area = (b2_x2 - b2_x1 + 1) * (b2_y2 - b2_y1 + 1)
iou = inter_area / (b1_area + b2_area - inter_area + 1e-16)
return iou# Returns the overlap IOU Value
''' Non maximum suppression function : Return to the bounding box 【x1,y1,x2,y2,conf,class_conf,class_pred】, Parameter is , Model to predict , Confidence threshold ,nms threshold '''
def non_max_suppression(prediction, conf_thres=0.5, nms_thres=0.4):
""" Removes detections with lower object confidence score than 'conf_thres' and performs Non-Maximum Suppression to further filter detections. Returns detections with shape: (x1, y1, x2, y2, object_conf, class_score, class_pred) """
"""(1) Model prediction coordinate format transformation : (center x, center y, width, height) to (x1, y1, x2, y2)"""
# Three yolo layer , There are three sizes of outputs: 13,26,52, So for a picture ,
# Model output shape yes (10647,85),(13*13+26*26+52*52)*3=10647, hinder 85 yes (x,y,w,h, conf, cls) xywh Add a confidence level and 80 A classification .
#prediction The shape of is [1, 10647, 85],85 Before 4 Information is coordinate information (center x, center y, width, height)
# The first 5 Information is the target confidence , The first 6-85 The message is 80 Confidence of classes
prediction[..., :4] = xywh2xyxy(prediction[..., :4])# The coordinate information predicted by the model is represented by (center x, center y, width, height) Format changed to (x1, y1, x2, y2) Format
output = [None for _ in range(len(prediction))]
# Traverse each picture , Prediction of each picture image_pred:
for image_i, image_pred in enumerate(prediction):# Traverse the prediction bounding box
"""(2) Bounding box filtering : Remove the bounding box where the target confidence is lower than the threshold """
image_pred = image_pred[image_pred[:, 4] >= conf_thres]# Filter the bounding box in which the target confidence in each picture prediction bounding box is greater than the threshold
# If none are remaining => process next image
if not image_pred.size(0):# Judge whether there is still a bounding box in the selection of the image passing through the target confidence threshold , If there is no bounding box, execute the next picture NMS
"""(3) Non maximum suppression : according to score Sort to get the maximum , Find this score The largest prediction category is the same calculation iou value , By weighting , Get the final prediction frame (xyxy), Finally from the prediction Removing the iou Greater than the set iou Threshold bounding box ."""
# fraction = Objective confidence *80 Maximum score of categories .
score = image_pred[:, 4] * image_pred[:, 5:].max(1)[0]
# according to score Sort the prediction bounding boxes in the picture
image_pred = image_pred[(-score).argsort()]# shape 【 The number of bounding boxes filtered by the confidence threshold ,85】
# Where is the maximum category confidence and the maximum category confidence ( Indexes , That's the type of prediction )
class_confs, class_preds = image_pred[:, 5:].max(1, keepdim=True)#
detections = torch.cat((image_pred[:, :5], class_confs.float(), class_preds.float()), 1)#(x,y,x,y,object_confs,class_confs,class_preds) The length is 7
keep_boxes = []
while detections.size(0):
# Set the current first bounding box ( The bounding box with the highest current score ) Calculate with the remaining bounding box IoU, And greater than NMS Threshold bounding box
# first bbx And the rest bbx Of iou Greater than nms_thres Discrimination of (0, 1), 1 For more than ,0 Is less than
large_overlap = bbox_iou(detections[0, :4].unsqueeze(0), detections[:, :4]) > nms_thres
# Judge whether their categories are the same , Only when it is the same nms, Same as 1, Different at the same time 0
label_match = detections[0, -1] == detections[:, -1]
# invalid by Indices of boxes with lower confidence scores, large IOUs and matching labels
# Only in two bbx Of iou Greater than thres, And the categories are the same ,invalid by True, Others are False
invalid = large_overlap & label_match
# weights Is the corresponding weight , The format for : take True bbx Medium confidence In a row Tensor
weights = detections[invalid, 4:5]
# Merge overlapping bboxes by order of confidence
# Here's the final bbx It is satisfied with him IOU Greater than threshold, And the same label Some of bbx, according to confidence Reweighted to get
# Not original bbx Reservations .
detections[0, :4] = (weights * detections[invalid, :4]).sum(0) / weights.sum()
keep_boxes += [detections[0]]
## Get rid of these invalid, namely iou Greater than the threshold and predict the same
detections = detections[~invalid]
if keep_boxes:
output[image_i] = torch.stack(keep_boxes)
return output# return NMS Back bounding box (x,y,x,y,object_confs,class_confs,class_preds) The length is 7、
def build_targets(pred_boxes, pred_cls, target, anchors, ignore_thres):
ByteTensor = torch.cuda.ByteTensor if pred_boxes.is_cuda else torch.ByteTensor
FloatTensor = torch.cuda.FloatTensor if pred_boxes.is_cuda else torch.FloatTensor
nB = pred_boxes.size(0)
nA = pred_boxes.size(1)
nC = pred_cls.size(-1)
nG = pred_boxes.size(2)
# Output tensors
obj_mask = ByteTensor(nB, nA, nG, nG).fill_(0)
noobj_mask = ByteTensor(nB, nA, nG, nG).fill_(1)
class_mask = FloatTensor(nB, nA, nG, nG).fill_(0)
iou_scores = FloatTensor(nB, nA, nG, nG).fill_(0)
tx = FloatTensor(nB, nA, nG, nG).fill_(0)
ty = FloatTensor(nB, nA, nG, nG).fill_(0)
tw = FloatTensor(nB, nA, nG, nG).fill_(0)
th = FloatTensor(nB, nA, nG, nG).fill_(0)
tcls = FloatTensor(nB, nA, nG, nG, nC).fill_(0)
# Convert to position relative to box
target_boxes = target[:, 2:6] * nG
gxy = target_boxes[:, :2]
gwh = target_boxes[:, 2:]
# Get anchors with best iou
ious = torch.stack([bbox_wh_iou(anchor, gwh) for anchor in anchors])
best_ious, best_n = ious.max(0)
# Separate target values
b, target_labels = target[:, :2].long().t()
gx, gy = gxy.t()
gw, gh = gwh.t()
gi, gj = gxy.long().t()
# Set masks
obj_mask[b, best_n, gj, gi] = 1
noobj_mask[b, best_n, gj, gi] = 0
# Set noobj mask to zero where iou exceeds ignore threshold
for i, anchor_ious in enumerate(ious.t()):
noobj_mask[b[i], anchor_ious > ignore_thres, gj[i], gi[i]] = 0
# Coordinates
tx[b, best_n, gj, gi] = gx - gx.floor()
ty[b, best_n, gj, gi] = gy - gy.floor()
# Width and height
tw[b, best_n, gj, gi] = torch.log(gw / anchors[best_n][:, 0] + 1e-16)
th[b, best_n, gj, gi] = torch.log(gh / anchors[best_n][:, 1] + 1e-16)
# One-hot encoding of label
tcls[b, best_n, gj, gi, target_labels] = 1
# Compute label correctness and iou at best anchor
class_mask[b, best_n, gj, gi] = (pred_cls[b, best_n, gj, gi].argmax(-1) == target_labels).float()
iou_scores[b, best_n, gj, gi] = bbox_iou(pred_boxes[b, best_n, gj, gi], target_boxes, x1y1x2y2=False)
tconf = obj_mask.float()
return iou_scores, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tcls, tconf
After model training , Documents for testing . Verify that the data set is data/samples Under the folder , The verification results are saved in this py File automatically created folder output Under the folder .
from __future__ import division
from models import *
from utils.utils import *
from utils.datasets import *
import os
import time
import datetime
import argparse
from PIL import Image
import torch
from torch.utils.data import DataLoader
from torch.autograd import Variable
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from matplotlib.ticker import NullLocator
if __name__ == "__main__":
'''(1) Argument parsing '''
parser = argparse.ArgumentParser()
# Test folder path
parser.add_argument("--image_folder", type=str, default="data/samples", help="path to dataset")
#yolov3 Model information for ( The network layer , Number of convolution kernels per layer , Size , step ...)
parser.add_argument("--model_def", type=str, default="config/yolov3.cfg", help="path to model definition file")
# Pre training model path
parser.add_argument("--weights_path", type=str, default="weights/yolov3.weights", help="path to weights file")
# Class name
parser.add_argument("--class_path", type=str, default="data/coco.names", help="path to class label file")
# Target confidence threshold
parser.add_argument("--conf_thres", type=float, default=0.8, help="object confidence threshold")
#NMS Of IoU threshold
parser.add_argument("--nms_thres", type=float, default=0.4, help="iou thresshold for non-maximum suppression")
# Batch size
parser.add_argument("--batch_size", type=int, default=1, help="size of the batches")
#CPU Threads
parser.add_argument("--n_cpu", type=int, default=0, help="number of cpu threads to use during batch generation")
# Picture dimension
parser.add_argument("--img_size", type=int, default=416, help="size of each image dimension")
parser.add_argument("--checkpoint_model", type=str, help="path to checkpoint model")
opt = parser.parse_args()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
os.makedirs("output", exist_ok=True)# Create the output location of the prediction picture
'''(2) model building '''
# Load model : This statement loads darkent Model structure , namely YOLOv3 Model .Darknet Model in model.py Defined in .
# Set the model to evaluation mode
model = Darknet(opt.model_def, img_size=opt.img_size).to(device)# According to the configuration file of the model , Build the structure of the model
# Load the weight of training for the model structure ( Model parameters )
if opt.weights_path.endswith(".weights"):
# Load darknet weights
model.eval() # Set the model to evaluation mode , Otherwise, as long as the input data, the parameter will be updated 、 Optimize
'''(3) Dataset loading 、 Category loading '''
# Load the picture of the test :
# dataloader Essentially an iteratable object , Use iter() visit , Out of commission next() visit ;
# You can also use `for inputs, labels in dataloaders` Access to iteratable objects
# Generally, we implement a datasets object , The incoming to dataloader in ; Then use it internally yeild Return every time batch The data of
dataloader = DataLoader(
ImageFolder(opt.image_folder, img_size=opt.img_size),# Evaluation data set ,ImageFolder stay datasets.py In the definition of , The returned image path , And processed ( fill 、 Resize ) Pictures of the
# Load class alias ,classes It's a list
classes = load_classes(opt.class_path) # Extracts class labels from file
# Create a list to save image path and image detection information
imgs = []
img_detections = []
"""(3) Model to predict : The picture path 、 The picture prediction results are saved imgs and img_detections In the list """
print("\nPerforming object detection:")
prev_time = time.time()
Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor
# Test image detection : And save the image path and detection result information
# Work out batch The address of the picture in img_paths And test results detections
for batch_i, (img_paths, input_imgs) in enumerate(dataloader):# Use dataloader Load data , The loaded data is a batch of data
# Convert the input image into tensor And become a variable
input_imgs = Variable(input_imgs.type(Tensor))
# object detection : Use the model to detect the image , The test result is a tensor ,
# Non maximum suppression of test results , Get the final result
with torch.no_grad():
detections = model(input_imgs)
#print(detections.shape)#[:, 10647, 85]
## Non maximum suppression : The bounding box information , Change to upper left and lower right coordinates , And remove the coordinates with low confidence . (x1, y1, x2, y2, object_conf, class_score, class_pred)
detections = non_max_suppression(detections, opt.conf_thres, opt.nms_thres)# Non maximum suppression [:,:,7]
# Print : Test time , Tested batch
current_time = time.time()
inference_time = datetime.timedelta(seconds=current_time - prev_time)
prev_time = current_time
print("\t+ Batch %d, Inference Time: %s" % (batch_i, inference_time))
# Save image path , Image detection information ( after NMS After processing )
img_detections.extend(detections)# The length is 7
"""(4) Draw the test results to the picture , And save """
# Bounding box color
cmap = plt.get_cmap("tab20b") # Bounding-box colors
colors = [cmap(i) for i in np.linspace(0, 1, 20)]
# Traverse images
for img_i, (path, detections) in enumerate(zip(imgs, img_detections)):
print("(%d) Image: '%s'" % (img_i, path))
# Read the picture and draw it on plt.figure
img = np.array(Image.open(path))# Read the picture
plt.figure()# Create a picture Canvas
fig, ax = plt.subplots(1)
ax.imshow(img)# Draw the read image onto the canvas
# Draw the detected bounding box and label corresponding to the image onto the image
if detections is not None:
# The bounding box that will be detected ( To fill 、 Resize the prediction of the original image ), Resize , Make it match the target of the original image
detections = rescale_boxes(detections, opt.img_size, img.shape[:2])
# Get the class label of the detection result , And specify a color for each class
unique_labels = detections[:, -1].cpu().unique()# Returns all the different values in the parameter array , And sort the optional parameters from small to large
n_cls_preds = len(unique_labels)
bbox_colors = random.sample(colors, n_cls_preds)# Assign a bounding box color to each class
# Traverse each bounding box of the image corresponding to the detection result
for x1, y1, x2, y2, conf, cls_conf, cls_pred in detections:# The detection result is the upper left and lower right coordinates
print("\t+ Label: %s, Conf: %.5f" % (classes[int(cls_pred)], cls_conf.item()))
# Bounding box width and height
box_w = x2 - x1
box_h = y2 - y1
# Write the bounding box into the picture , And set the color
color = bbox_colors[int(np.where(unique_labels == int(cls_pred))[0])]
# Create a rectangular bounding box
bbox = patches.Rectangle((x1, y1), box_w, box_h, linewidth=2, edgecolor=color, facecolor="none")
# Write the rectangular bounding box into the canvas
# Add category information to the detection bounding box
plt.text( x1,y1,s=classes[int(cls_pred)],color="white",verticalalignment="top",bbox={
"color": color, "pad": 0} )
# Save the picture of the drawn bounding box
filename = path.split("/")[-1].split(".")[0]
filename}.png", bbox_inches="tight", pad_inches=0.0)
A file that defines the structure of the model , According to the configuration file information of the model , To build the model structure
from __future__ import division
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from utils.parse_config import *
from utils.utils import build_targets, to_cpu
''' Build network functions : Through the obtained model definition module_defs To build YOLOv3 Model structure , according to module_defs The module in configures the module list of the construction layer block '''
def create_modules(module_defs):
''' Build the model structure '''
'''(1) Analytic model hyperparameter , Get the number of input channels of the model '''
# from model_def obtain net A dictionary of configuration information hyperparams.model_def By parse_config Function to parse the list , Each element is a dictionary , Each dictionary contains a certain layer 、 Parameter information of the module
hyperparams = module_defs.pop(0)#hyperparams by module_defs The first dictionary element of , Is the hyperparametric information of the model {'type': 'net',...}
output_filters = [int(hyperparams["channels"])]
'''(2) structure nn.ModuleList(), Used to store the created network layer 、 modular '''
module_list = nn.ModuleList()
'''(3) Traverse each dictionary element of the model definition list , Create the corresponding layer 、 modular , Add to nn.ModuleList() in '''
# Traverse module_defs Every dictionary of , According to the dictionary , Create corresponding layers or modules . Among them, the dictionary of type There are several values :"convolutional","maxpool"
#"upsample", "route","shortcut", "yolo"
for module_i, module_def in enumerate(module_defs):
# Create a nn.Sequential()
modules = nn.Sequential()
# Convolution layer construction , To add to nn.Sequential()
if module_def["type"] == "convolutional":
# obtain convolutional Layer parameter information
bn = int(module_def["batch_normalize"])
filters = int(module_def["filters"])
kernel_size = int(module_def["size"])
pad = (kernel_size - 1) // 2
# establish convolution layer : according to convolutional Layer parameter information , establish convolutional layer , And add the modified layer to nn.Sequential() in
module_i}",# The name of the layer in the model
nn.Conv2d(# layer
in_channels=output_filters[-1],# Number of channels entered
out_channels=filters,# The number of output channels
kernel_size=kernel_size,# Size of tubercle
stride=int(module_def["stride"]),# step
padding=pad,# fill
bias=not bn,
if bn:
# add to BatchNorm2d layer
module_i}", nn.BatchNorm2d(filters, momentum=0.9, eps=1e-5))
if module_def["activation"] == "leaky":
# Add active layer LeakyReLU
module_i}", nn.LeakyReLU(0.1))
# Pool layer construction , To add to nn.Sequential()
elif module_def["type"] == "maxpool":
# obtain maxpool Layer parameter information
kernel_size = int(module_def["size"])
stride = int(module_def["stride"])
# according to maxpool Layer parameter information , establish maxpool layer , And add the modified layer to nn.Sequential() in
if kernel_size == 2 and stride == 1:
module_i}", nn.ZeroPad2d((0, 1, 0, 1)))
# establish maxpool layer
kernel_size=kernel_size, # Convolution kernel size
stride=stride, # step
padding=int((kernel_size - 1) // 2))# fill
# The upper sampling layer is built , To add to nn.Sequential()
# The upper sampling layer is a custom layer , Need to instantiate Upsample For an object , Add an object layer to the model list
elif module_def["type"] == "upsample":
# Configuration example of upper sampling , as follows
# [upsample]
# stride = 2
# structure upsample layer , Upper sampling layer class , Rewrote forward function
upsample = Upsample(scale_factor=int(module_def["stride"]), mode="nearest")
# Layers are added to the model
module_i}", upsample)
elif module_def["type"] == "route":
#youte Information , example
# [route]
# layers = -1, 36
# obtain route Layer parameter information
layers = [int(x) for x in module_def["layers"].split(",")]
filters = sum([output_filters[1:][i] for i in layers])
module_i}", EmptyLayer())#EmptyLayer() by “ Route ” and “ Shortcut ” Placeholder for layer
elif module_def["type"] == "shortcut":
filters = output_filters[1:][int(module_def["from"])]
module_i}", EmptyLayer())#EmptyLayer() by “ Route ” and “ Shortcut ” Placeholder for layer
elif module_def["type"] == "yolo":
# example : hypothesis yolo The configuration information of is as follows
# [yolo]
# mask = 3,4,5
# anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
# classes=80
# num=9
# jitter=.3
# ignore_thresh = .7
# truth_thresh = 1
# random=1
# obtain anchor The index of , The above example is 3,4,5
anchor_idxs = [int(x) for x in module_def["mask"].split(",")]
# extract anchor Dimensional information , Put it in the list
anchors = [int(x) for x in module_def["anchors"].split(",")]
anchors = [(anchors[i], anchors[i + 1]) for i in range(0, len(anchors), 2)]
anchors = [anchors[i] for i in anchor_idxs]
num_classes = int(module_def["classes"])
#print('anchors1:', anchors)# The above example is anchors1: [(30, 61), (62, 45), (59, 119)]
# Get the input size of the picture
img_size = int(hyperparams["height"])
# Definition yolo Detection layer : Instantiation yolo class , establish yolo layer , The parameters passed in are three anchor The size of the , The number of categories , The size of the image
yolo_layer = YOLOLayer(anchors, num_classes, img_size)
# take YOLO Layer is added to the model list
module_i}", yolo_layer)
module_list.append(modules) # Will create the nn.Sequential() That is, the created layer , Add to nn.ModuleList() in
output_filters.append(filters)# Add the number of output channels of the created layer to filters In the list , As the number of input channels for the next layer creation
return hyperparams, module_list# Return the parameters of the network 、 The network structure is a list of layers
''' Upper sampling layer '''
class Upsample(nn.Module):
""" nn.Upsample Be rewritten """
def __init__(self, scale_factor, mode="nearest"):
super(Upsample, self).__init__()
self.scale_factor = scale_factor# Up sampling step
self.mode = mode
def forward(self, x):
x = F.interpolate(x, scale_factor=self.scale_factor, mode=self.mode)# Up sampling method , interpolation
return x# Return the upsampling result
'''emptylayer Definition '''
class EmptyLayer(nn.Module):
"""Placeholder for 'route' and 'shortcut' layers"""
def __init__(self):
super(EmptyLayer, self).__init__()
'''yolo Layer definition : Detection layer '''
class YOLOLayer(nn.Module):
"""Detection layer"""
def __init__(self, anchors, num_classes, img_dim=416):# The parameters are three anchor The size of the , The number of categories , The size of the image
super(YOLOLayer, self).__init__()
# Foundation setup
self.anchors = anchors#anchor Size information for , For example, a certain layer yolo Size is [(30, 61), (62, 45), (59, 119)]
self.num_anchors = len(anchors)#anchor The number of
self.num_classes = num_classes# The number of categories
self.ignore_thres = 0.5
self.mse_loss = nn.MSELoss()
self.bce_loss = nn.BCELoss()
self.obj_scale = 1
self.noobj_scale = 100
self.metrics = {
self.img_dim = img_dim
self.grid_size = 0 # grid size
# Calculate the grid cell offset
def compute_grid_offsets(self, grid_size, cuda=True):
# Get mesh size ( A few × A few )
self.grid_size = grid_size
g = self.grid_size
# print('g',g) g Possible values are 13/26/52, Corresponding to different yolo Dimension of the characteristic diagram of the layer
FloatTensor = torch.cuda.FloatTensor if cuda else torch.FloatTensor
# Get the grid cell size
self.stride = self.img_dim / self.grid_size# Size of grid cells
# Calculate offsets for each grid, hypothesis g take 13,
#torch.arange(g) by tensor([0,1,2,3,4,5,6,7,8,9,10,11,12])
#torch.arange(g).repeat(g, 1) citing tensor([0,1,2,3,4,5,6,7,8,9,10,11,12]) Composed of 13 Row column tensor
#torch.arange(g).repeat(g, 1).view([1, 1, g, g]) Change the view to 【1,1,13,13】
self.grid_x = torch.arange(g).repeat(g, 1).view([1, 1, g, g]).type(FloatTensor)#
self.grid_y = torch.arange(g).repeat(g, 1).t().view([1, 1, g, g]).type(FloatTensor)
# hold anchor The width and height of are transformed into measures relative to the size of grid cells
self.scaled_anchors = FloatTensor([(a_w / self.stride, a_h / self.stride) for a_w, a_h in self.anchors])# For example, a certain layer yolo Size is [(30, 61), (62, 45), (59, 119)]
self.anchor_w = self.scaled_anchors[:, 0:1].view((1, self.num_anchors, 1, 1))# obtain anchor The width of
self.anchor_h = self.scaled_anchors[:, 1:2].view((1, self.num_anchors, 1, 1))# obtain anchor The height of
def forward(self, x, targets=None, img_dim=None):
#yolo Forward propagation of layers , Parameter is yolo Layer output from the upper layer as input x
FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
# Size of picture
self.img_dim = img_dim
# obtain x The shape of the
num_samples = x.size(0)
grid_size = x.size(2)
prediction = (
x.view(num_samples, self.num_anchors, self.num_classes + 5, grid_size, grid_size)#(num_samples,3,85,gride_size,grid_size)
.permute(0, 1, 3, 4, 2)#permute It is used to change dimensions and positions ,(num_samples,3,gride_size,grid_size,85)
.contiguous()# call contiguous() when , Will force a copy tensor, Make its layout the same as the one you created from scratch . Instead of sharing a memory with the original data .
# obtain outputs
x = torch.sigmoid(prediction[..., 0]) # Center x
y = torch.sigmoid(prediction[..., 1]) # Center y
w = prediction[..., 2] # Width
h = prediction[..., 3] # Height
pred_conf = torch.sigmoid(prediction[..., 4]) # Conf
pred_cls = torch.sigmoid(prediction[..., 5:]) # Cls pred.
# If grid size does not match current we compute new offsets
if grid_size != self.grid_size:
self.compute_grid_offsets(grid_size, cuda=x.is_cuda)
# Add offset and scale with anchors
pred_boxes = FloatTensor(prediction[..., :4].shape)
pred_boxes[..., 0] = x.data + self.grid_x
pred_boxes[..., 1] = y.data + self.grid_y
pred_boxes[..., 2] = torch.exp(w.data) * self.anchor_w
pred_boxes[..., 3] = torch.exp(h.data) * self.anchor_h
output = torch.cat(
pred_boxes.view(num_samples, -1, 4) * self.stride,
pred_conf.view(num_samples, -1, 1),
pred_cls.view(num_samples, -1, self.num_classes),
if targets is None:
return output, 0
iou_scores, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tcls, tconf = build_targets(
# Loss : Mask outputs to ignore non-existing objects (except with conf. loss)
loss_x = self.mse_loss(x[obj_mask], tx[obj_mask])
loss_y = self.mse_loss(y[obj_mask], ty[obj_mask])
loss_w = self.mse_loss(w[obj_mask], tw[obj_mask])
loss_h = self.mse_loss(h[obj_mask], th[obj_mask])
loss_conf_obj = self.bce_loss(pred_conf[obj_mask], tconf[obj_mask])
loss_conf_noobj = self.bce_loss(pred_conf[noobj_mask], tconf[noobj_mask])
loss_conf = self.obj_scale * loss_conf_obj + self.noobj_scale * loss_conf_noobj
loss_cls = self.bce_loss(pred_cls[obj_mask], tcls[obj_mask])
total_loss = loss_x + loss_y + loss_w + loss_h + loss_conf + loss_cls
# Metrics
cls_acc = 100 * class_mask[obj_mask].mean()
conf_obj = pred_conf[obj_mask].mean()
conf_noobj = pred_conf[noobj_mask].mean()
conf50 = (pred_conf > 0.5).float()
iou50 = (iou_scores > 0.5).float()
iou75 = (iou_scores > 0.75).float()
detected_mask = conf50 * class_mask * tconf
precision = torch.sum(iou50 * detected_mask) / (conf50.sum() + 1e-16)
recall50 = torch.sum(iou50 * detected_mask) / (obj_mask.sum() + 1e-16)
recall75 = torch.sum(iou75 * detected_mask) / (obj_mask.sum() + 1e-16)
self.metrics = {
"loss": to_cpu(total_loss).item(),
"x": to_cpu(loss_x).item(),
"y": to_cpu(loss_y).item(),
"w": to_cpu(loss_w).item(),
"h": to_cpu(loss_h).item(),
"conf": to_cpu(loss_conf).item(),
"cls": to_cpu(loss_cls).item(),
"cls_acc": to_cpu(cls_acc).item(),
"recall50": to_cpu(recall50).item(),
"recall75": to_cpu(recall75).item(),
"precision": to_cpu(precision).item(),
"conf_obj": to_cpu(conf_obj).item(),
"conf_noobj": to_cpu(conf_noobj).item(),
"grid_size": grid_size,
return output, total_loss
"""Darknet class :YOLOv3 Model """
class Darknet(nn.Module):
"""YOLOv3 object detection model"""
def __init__(self, config_path, img_size=416):
super(Darknet, self).__init__()
# parse_model_config() Parser for model configuration : Used to resolve yolo-v3 Layer configuration file (yolov3.cfg) And return the module definition
#( Model definition module_defs It's a list , Every element is a dictionary , The dictionary describes every module of the network / Layer information )
self.module_defs = parse_model_config(config_path)
# Through the obtained model definition module_defs, To build YOLOv3 Model
self.hyperparams,self.module_list = create_modules(self.module_defs)# Model parameters and model structure
self.yolo_layers = [layer[0] for layer in self.module_list if hasattr(layer[0], "metrics")]
self.img_size = img_size
self.seen = 0
self.header_info = np.array([0, 0, 0, self.seen, 0], dtype=np.int32)
def forward(self, x, targets=None):
img_dim = x.shape[2]
loss = 0
layer_outputs, yolo_outputs = [], []
for i, (module_def, module) in enumerate(zip(self.module_defs, self.module_list)):
if module_def["type"] in ["convolutional", "upsample", "maxpool"]:
x = module(x)
elif module_def["type"] == "route":
x = torch.cat([layer_outputs[int(layer_i)] for layer_i in module_def["layers"].split(",")], 1)
elif module_def["type"] == "shortcut":
layer_i = int(module_def["from"])
x = layer_outputs[-1] + layer_outputs[layer_i]
elif module_def["type"] == "yolo":
x, layer_loss = module[0](x, targets, img_dim)
loss += layer_loss
yolo_outputs = to_cpu(torch.cat(yolo_outputs, 1))
return yolo_outputs if targets is None else (loss, yolo_outputs)
def load_darknet_weights(self, weights_path):
"""Parses and loads the weights stored in 'weights_path'"""
# Open the weights file
with open(weights_path, "rb") as f:
header = np.fromfile(f, dtype=np.int32, count=5) # First five are header values
self.header_info = header # Needed to write header when saving weights
self.seen = header[3] # number of images seen during training
weights = np.fromfile(f, dtype=np.float32) # The rest are weights
# Establish cutoff for loading backbone weights
cutoff = None
if "darknet53.conv.74" in weights_path:
cutoff = 75
ptr = 0
for i, (module_def, module) in enumerate(zip(self.module_defs, self.module_list)):
if i == cutoff:
if module_def["type"] == "convolutional":
conv_layer = module[0]
if module_def["batch_normalize"]:
# Load BN bias, weights, running mean and running variance
bn_layer = module[1]
num_b = bn_layer.bias.numel() # Number of biases
# Bias
bn_b = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.bias)
ptr += num_b
# Weight
bn_w = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.weight)
ptr += num_b
# Running Mean
bn_rm = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.running_mean)
ptr += num_b
# Running Var
bn_rv = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.running_var)
ptr += num_b
# Load conv. bias
num_b = conv_layer.bias.numel()
conv_b = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(conv_layer.bias)
ptr += num_b
# Load conv. weights
num_w = conv_layer.weight.numel()
conv_w = torch.from_numpy(weights[ptr : ptr + num_w]).view_as(conv_layer.weight)
ptr += num_w
def save_darknet_weights(self, path, cutoff=-1):
""" @:param path - path of the new weights file @:param cutoff - save layers between 0 and cutoff (cutoff = -1 -> all are saved) """
fp = open(path, "wb")
self.header_info[3] = self.seen
# Iterate through layers
for i, (module_def, module) in enumerate(zip(self.module_defs[:cutoff], self.module_list[:cutoff])):
if module_def["type"] == "convolutional":
conv_layer = module[0]
# If batch norm, load bn first
if module_def["batch_normalize"]:
bn_layer = module[1]
# Load conv bias
# Load conv weights
Documents used to evaluate the performance of the model .
from __future__ import division
from models import *
from utils.utils import *
from utils.datasets import *
from utils.parse_config import *
import argparse
import tqdm
import torch
from torch.utils.data import DataLoader
from torch.autograd import Variable
""" Model evaluation function : The parameter is the model 、valid Dataset path 、iou threshold .nms threshold 、 Network input size 、 Batch size """
def evaluate(model, path, iou_thres, conf_thres, nms_thres, img_size, batch_size):
# add model.eval(). Otherwise , There is input data , Even without training , It also changes the weights
'''(1) Get evaluation dataset : Turn into batch A data set of '''
# dataset( Verification set image path set 、 Verification set picture set , Validation set label set )
# dataloader Get batch batch, Verification set image path batch、 Verification set pictures batch, Validation set label batch)
dataset = ListDataset(path, img_size=img_size, augment=False, multiscale=False)
dataloader = torch.utils.data.DataLoader(dataset,
collate_fn=dataset.collate_fn)#collate_fn Parameters , Implement custom batch Output
Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor
labels = []
sample_metrics = [] # List of tuples (TP, confs, pred)
for batch_i, (_, imgs, targets) in enumerate(tqdm.tqdm(dataloader, desc="Detecting objects")):#tqdm Progress bar
'''(2) batch Label handling '''
labels += targets[:, 1].tolist()# take targets The category information of is transformed into list Deposit in label In the list
# Rescale target
targets[:, 2:] = xywh2xyxy(targets[:, 2:])# take targets The coordinates of become (xyxy) form , At this time, the coordinates are also normalized
targets[:, 2:] *= img_size# Adapt to the ratio of the original drawing target form
'''(3)batch Picture prediction , And carry on NMS Handle '''
# Picture input model , And non maximum suppression of the model output
imgs = Variable(imgs.type(Tensor), requires_grad=False)
with torch.no_grad():
outputs = model(imgs)
outputs = non_max_suppression(outputs, conf_thres=conf_thres, nms_thres=nms_thres)
'''(4) Forecast information statistics : Get through NMS After processing , Predict the of the bounding box true_positive( The value is or 1)、 Prediction confidence , Forecast category information '''
sample_metrics += get_batch_statistics(outputs, targets, iou_threshold=iou_thres)# Parameters : Model output , Real label ( Adapted to the original x,y,x,y),iou threshold
# Here we need to pay attention to ,github There is an error in the above code , Need to add if Conditional statements , Training can work normally
if len(sample_metrics) == 0:
return np.array([]), np.array([]), np.array([]), np.array([]), np.array([])
# sample_metrics Information analysis , Get independent true_positive( The value is or 1)、 Prediction confidence , Forecast category Information
true_positives, pred_scores, pred_labels = [np.concatenate(x, 0) for x in list(zip(*sample_metrics))]
# Calculation precision, recall, AP, f1, ap_class, Here we call utils.py Calculate the function in
precision, recall, AP, f1, ap_class = ap_per_class(true_positives, pred_scores, pred_labels, labels)#pred_labels, labels The length of is different
return precision, recall, AP, f1, ap_class
if __name__ == "__main__":
'''(1) Argument parsing '''
parser = argparse.ArgumentParser()
parser.add_argument("--batch_size", type=int, default=8, help="size of each image batch")
parser.add_argument("--model_def", type=str, default="config/yolov3.cfg", help="path to model definition file")
parser.add_argument("--data_config", type=str, default="config/custom.data", help="path to data config file")
parser.add_argument("--weights_path", type=str, default="checkpoints/yolov3_ckpt_9.pth", help="path to weights file")#"weights/yolov3.weights"
parser.add_argument("--class_path", type=str, default="data/coco.names", help="path to class label file")
parser.add_argument("--iou_thres", type=float, default=0.5, help="iou threshold required to qualify as detected")
parser.add_argument("--conf_thres", type=float, default=0.001, help="object confidence threshold")
parser.add_argument("--nms_thres", type=float, default=0.5, help="iou thresshold for non-maximum suppression")
parser.add_argument("--n_cpu", type=int, default=8, help="number of cpu threads to use during batch generation")
parser.add_argument("--img_size", type=int, default=416, help="size of each image dimension")
opt = parser.parse_args()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
"""(2) Data analysis """
# call parse_config.py Data analysis of Eucalyptus , Return value data_config For the dictionary {class:80,train: route ,valid: route ...}
data_config = parse_data_config(opt.data_config)
valid_path = data_config["valid"]# Verify set path valid=data/custom/valid.txt
class_names = load_classes(data_config["names"])# Category path
"""(3) model building : Build the model , Load model parameters """
model = Darknet(opt.model_def).to(device)
if opt.weights_path.endswith(".weights"):
# Load darknet weights
model.load_state_dict(torch.load(opt.weights_path))# Custom functions
print("Compute mAP...")
"""(4) Model to evaluate """
precision, recall, AP, f1, ap_class = evaluate(
model,# Model
path=valid_path,# Verify set path
conf_thres=opt.conf_thres,# Confidence threshold
nms_thres=opt.nms_thres,#nms threshold
img_size=opt.img_size,# Network input size
batch_size=8,# Batch
print(precision, recall, AP, f1, ap_class)
print("Average Precisions:")
for i, c in enumerate(ap_class):
print(f"+ Class '{
c}' ({
class_names[c]}) - AP: {
print(f"mAP: {
Folder of model training , Training will generate :
(1)checkpoint Folder , Used to save a epoch Model parameters after training
(2)logs Folder , Used to save log information
from __future__ import division
from models import *
from utils.logger import *
from utils.utils import *
from utils.datasets import *
from utils.parse_config import *
from terminaltables import AsciiTable
import os
from test import evaluate
import time
import datetime
import argparse
import torch
from torch.utils.data import DataLoader
from torch.autograd import Variable
if __name__ == "__main__":
'''(1) Argument parsing '''
parser = argparse.ArgumentParser()
parser.add_argument("--epochs", type=int, default=10, help="number of epochs")
parser.add_argument("--batch_size", type=int, default=1, help="size of each image batch")
# Gradient addenda
parser.add_argument("--gradient_accumulations", type=int, default=2, help="number of gradient accums before step")
parser.add_argument("--model_def", type=str, default="config/yolov3.cfg", help="path to model definition file")
parser.add_argument("--data_config", type=str, default="config/custom.data", help="path to data config file")
parser.add_argument("--pretrained_weights", type=str, help="if specified starts from checkpoint model")
parser.add_argument("--n_cpu", type=int, default=1, help="number of cpu threads to use during batch generation")
parser.add_argument("--img_size", type=int, default=416, help="size of each image dimension")
parser.add_argument("--checkpoint_interval", type=int, default=1, help="interval between saving model weights")
parser.add_argument("--evaluation_interval", type=int, default=1, help="interval evaluations on validation set")
parser.add_argument("--compute_map", default=False, help="if True computes mAP every tenth batch")
parser.add_argument("--multiscale_training", default=True, help="allow for multi-scale training")
parser.add_argument("--weights_path", type=str, default="checkpoints/yolov3_ckpt_9.pth", help="path to weights file")
opt = parser.parse_args()
'''(2) Instantiate the log class '''
logger = Logger("logs")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
'''(3) folders creating '''
os.makedirs("output", exist_ok=True)
os.makedirs("checkpoints", exist_ok=True)
"""(4) Initialize model : model building , Model parameter loading """
model = Darknet(opt.model_def).to(device)
# If specified we start from checkpoint
if opt.pretrained_weights:
if opt.pretrained_weights.endswith(".pth"):
"""(5) Dataset loading """
data_config = parse_data_config(opt.data_config)# call parse_config.py File data configuration parsing function , obtain data_config For a dictionary
train_path = data_config["train"]# Training set path
valid_path = data_config["valid"]# Verify set path
class_names = load_classes(data_config["names"])# call utils.py Internal load_classes The function is used to get the category name contained in the dataset
#dataset It's a dataset , Picture path and 、 picture 、 label ( Normalized format x,y,w,h) Set
dataset = ListDataset(train_path, augment=True, multiscale=opt.multiscale_training)
#dataloader yes dataset Load in bulk
dataloader = torch.utils.data.DataLoader(
"""(7) Optimizer """
optimizer = torch.optim.Adam(model.parameters())
"""(8) model training """
metrics = [
for epoch in range(opt.epochs):# iteration epoch Time training
model.train()# Set the model to training mode
start_time = time .time()
for batch_i, (_, imgs, targets) in enumerate(dataloader):# each epoch Batch iterations of
# Cumulative iteration number of batches
batches_done = len(dataloader) * epoch + batch_i
# picture 、 Variable processing of labels
imgs = Variable(imgs.to(device))# Change the image into a variable , The gradient can be recorded
targets = Variable(targets.to(device), requires_grad=False)# Change labels into variables , Do not record gradients
# Get the output and loss of the model , Loss back propagation
loss, outputs = model(imgs, targets)# Input pictures and labels into the model , Get output
# Calculate the gradient
if batches_done % opt.gradient_accumulations:
# Calculate the gradient before each step Accumulates gradient before each step
# Trained epoch And batch Information
log_str = "\n---- [Epoch %d/%d, Batch %d/%d] ----\n" % (epoch+1, opt.epochs, batch_i+1, len(dataloader))
#print('log_str',log_str)# example ---- [Epoch 1/10, Batch 1/10] ----
# Create row index
metric_table = [["Metrics", *[f"YOLO Layer {
i}" for i in range(len(model.yolo_layers))]]]# Create a form during the training , Row index
#print(metric_table)# [['Metrics', 'YOLO Layer 0', 'YOLO Layer 1', 'YOLO Layer 2']]
# In every one of them YOLO layer Information of each index of
for i, metric in enumerate(metrics):#metrics A list of the names of each indicator , It has been defined above
# obtain metrics The numerical type of each item
formats = {
m: "%.6f" for m in metrics}# Will all metrics Output value type definition in , In this step, all output types are defined and reserved 6 Decimal place
formats["grid_size"] = "%2d"
formats["cls_acc"] = "%.2f%%"
#print(' formats', formats)#{'grid_size': '%2d', 'loss': '%.6f', 'x': '%.6f', 'y': '%.6f', 'w': '%.6f', 'h': '%.6f', 'conf': '%.6f', 'cls': '%.6f', 'cls_acc': '%.2f%%', 'recall50': '%.6f', 'recall75': '%.6f', 'precision': '%.6f', 'conf_obj': '%.6f', 'conf_noobj': '%.6f'}
# Table assignment
row_metrics = [formats[metric] % yolo.metrics.get(metric, 0) for yolo in model.yolo_layers]#?????????????
metric_table += [[metric, *row_metrics]]
# Tensorboard Log information
tensorboard_log = []
for j, yolo in enumerate(model.yolo_layers):
for name, metric in yolo.metrics.items():
if name != "grid_size":
tensorboard_log += [(f"{
j+1}", metric)]# Divide grid_size The rest of the information , Add to log
tensorboard_log += [("loss", loss.item())]# Add the loss to the log information
# Write the log information list to the created log object
logger.list_of_scalars_summary(tensorboard_log, batches_done)
#log_str Print various index parameters :
log_str += AsciiTable(metric_table).table
log_str += f"\nTotal loss {
# Calculate the epoch The approximate time remaining
epoch_batches_left = len(dataloader) - (batch_i + 1)
time_left = datetime.timedelta(seconds=epoch_batches_left * (time.time() - start_time) / (batch_i + 1))
log_str += f"\n---- ETA {
model.seen += imgs.size(0)
'''(9) Evaluate during training '''
if epoch % opt.evaluation_interval == 0:
print("\n---- Evaluating Model ----")
# Evaluate the current model on the evaluation data set , The specific evaluation details can be seen test.py
precision, recall, AP, f1, ap_class = evaluate(
evaluation_metrics = [
("val_precision", precision.mean()),
("val_recall", recall.mean()),
("val_mAP", AP.mean()),
("val_f1", f1.mean()),
logger.list_of_scalars_summary(evaluation_metrics, epoch)
# Print class APs and mAP
ap_table = [["Index", "Class name", "AP"]]
for i, c in enumerate(ap_class):
ap_table += [[c, class_names[c], "%.5f" % AP[i]]]
print(f"---- mAP {
'''(10) Model preservation '''
if epoch % opt.checkpoint_interval == 0:
torch.save(model.state_dict(), f"checkpoints/yolov3_ckpt_%d.pth" % epoch)
YOLOv4 Algorithm interpretation
YOLOv4 Improvement
- Mosaic data enhancement
The introduction has been added batchsize
- Label smoothing
IOU upgrade
IOU There will be the problem that the gradient disappears
GIOU Introduce closure area
DIOU Introduce the distance from the center point
CIOU Introduce aspect ratioNMS improvement
- SPPNet
V3 In order to better meet different input sizes , When training, change the size of the input data
SPP In fact, the maximum pooling is used to meet the consistency of final input characteristics
- CSPNet
every last block According to the characteristic diagram channel The dimension is split into two parts A normal network , Another direct concat To this block Output
One is the attention mechanism on the channel (Channel Attention), Add the weight of most feature graphs
One is spatial attention mechanism (Spatial Attention, Add the weight of most areas
YOLOV4 Only spatial attention mechanism is used in
- #yyds干货盘点#SQL聚合查询方法总结
- What else do you not know about new set()
- EasyCVR通过国标GB28181协议接入设备,出现设备自动拉流是什么原因?
- How to create a pyramid with openmesh
- RichView 文档中的 ITEM
- Modsim basic use (Modbus simulator)
- Use of common built-in classes of JS
- What if win11 can't pause the update? Win11 pause update is gray. How to solve it?
- uniapp使用腾讯地图选点 没有window监听回传用户的位置信息,怎么处理
Related concepts of cookies and sessions
Hls4ml reports an error the board_ part definition was not found for tul. com. tw:pynq-z2:part0:1.0.
Gaussdb (for MySQL):partial result cache, which accelerates the operator by caching intermediate results
How can a programmer grow rapidly
Servlet knowledge points
How to connect the two nodes of the flow chart
Procédure de mesure du capteur d'accord vibrant par le module d'acquisition d'accord vibrant
Anaconda installs the virtual environment to the specified path
Simple but modern server dashboard dashdot
GaussDB(for MySQL) :Partial Result Cache,通过缓存中间结果对算子进行加速
windows环境 redis安装和启动(后台启动)
8K HDR!|为 Chromium 实现 HEVC 硬解 - 原理/实测指南