当前位置:网站首页>Yolov5 input (I) -- mosaic data enhancement | CSDN creative punch in
Yolov5 input (I) -- mosaic data enhancement | CSDN creative punch in
2022-07-03 05:07:00 【TT ya】
Beginner little rookie , I hope it's like taking notes and recording what I've learned , Also hope to help the same entry-level people , I hope the big guys can help correct it ~ Tort made delete .
Catalog
3、random_perspective() function ( See code analysis for details )
One 、 Principle analysis
YOLOv5 Adopt and YOLOv4 Same Mosaic Data to enhance .
Main principle : It combines a selected picture with a random 3 Cut the picture randomly , Then splice it into a picture as training data .
This can enrich the background of the picture , And the stitching of four pictures together improves batch_size, It's going on batch normalization( normalization ) Four pictures will also be calculated when .
This way YOLOv5 On itself batch_size Not very dependent on .
Two 、 The code analysis
1、 Main part ——load_mosaic
labels4, segments4 = [], []
s = self.img_size # Get image size
yc, xc = (int(random.uniform(-x, 2 * s + x)) for x in self.mosaic_border) # mosaic center x, y
#random.uniform Randomly generate real numbers in the above range ( That is, half the image size to 1.5 Times image size )
# Here is random generation mosaic Center point First initialize that the annotation list is empty , Then get the image size s
Use according to the image size random.uniform() Random generation mosaic Center point , The scope is ( That is, half the image size to 1.5 Times image size )
indices = [index] + random.choices(self.indices, k=3) # 3 additional image indices
# Randomly generate another 3 Index of pictures
#random.choices—— Random generation 3 Index within the total number of pictures
# Then index together , Package it with the originally selected pictures indices
random.shuffle(indices)
# Sort these index values randomly utilize random.choices() Randomly generate another 3 Index of pictures , Will this 4 Fill in the index of the picture indices list , And then use it random.shuffle() Sort these index values randomly
for i, index in enumerate(indices): # Loop through these pictures
# Load image
img, _, (h, w) = load_image(self, index)# Load pictures and height and width Loop through this 4 A picture , And call load_image() Function to load the image and the corresponding height and width
The next step is how to place this 4 Zhang Tula ~
# place img in img4
if i == 0: # top left( top left corner )
img4 = np.full((s * 2, s * 2, img.shape[2]), 114, dtype=np.uint8) # base image with 4 tiles
# Mr. Cheng background
x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc # xmin, ymin, xmax, ymax (large image)
# Set the position on the big picture ( Or the original size , Or zoom in )(w,h) or (xc,yc)( The new big picture )
x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h # xmin, ymin, xmax, ymax (small image)
# Select the position on the small graph ( Original picture )The first picture is in the upper left corner
img4 First use np.full() Function filling initialization big picture , The size is 4 The picture is so big
Then set the position of the picture on the big picture , And corresponding in the original drawing ( Small picture ) Position coordinates intercepted on
elif i == 1: # top right( Upper right corner )
x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), yc
x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h
elif i == 2: # bottom left( The lower left corner )
x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h)
x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(y2a - y1a, h)
elif i == 3: # bottom right( The lower right corner )
x1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h)
x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)be left over 3 Prepared by Zhang rufa
img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b] # img4[ymin:ymax, xmin:xmax]
# Paste the corresponding small picture on the big picture Paste the corresponding part of the small picture on the large picture
padw = x1a - x1b
padh = y1a - y1b
# Calculate the offset from the small picture to the large picture , To calculate mosaic The location of the enhanced label Calculate the offset from the small picture to the large picture , To calculate mosaic The location of the enhanced label
# Labels
labels, segments = self.labels[index].copy(), self.segments[index].copy()
# Get tag
if labels.size:
labels[:, 1:] = xywhn2xyxy(labels[:, 1:], w, h, padw, padh) # normalized xywh to pixel xyxy format
# take xywh( Percentage those values ) Standardize to pixels xy Format
segments = [xyn2xy(x, w, h, padw, padh) for x in segments]
# Convert to pixel segment
labels4.append(labels)
segments4.extend(segments)
# Fill in the list Yes label Label to initialize :
First read the corresponding picture label, And then xywh Format label Standardize to pixels xy Format .
segments Convert to pixel segment format
Then fill in the previously prepared annotation list
# Concat/clip labels
labels4 = np.concatenate(labels4, 0) # Complete array splicing
for x in (labels4[:, 1:], *segments4):
np.clip(x, 0, 2 * s, out=x) # clip when using random_perspective()
#np.clip Intercept function , The fixed value is 0 To 2s Inside
# img4, labels4 = replicate(img4, labels4) # replicateThe first label List for array splicing , Convert the format , To facilitate the following processing , And intercept the data in 0 To 2 Times the size of the picture
# Augment
# Conduct mosaic When you put the four pictures together shape by [2*img_size,2*img_size]
# Yes mosaic The integrated images are rotated randomly 、 translation 、 The zoom 、 tailoring , and resize Enter a size for img_size
img4, labels4, segments4 = copy_paste(img4, labels4, segments4, p=self.hyp['copy_paste'])
img4, labels4 = random_perspective(img4, labels4, segments4,
degrees=self.hyp['degrees'],
translate=self.hyp['translate'],
scale=self.hyp['scale'],
shear=self.hyp['shear'],
perspective=self.hyp['perspective'],
border=self.mosaic_border) # border to removeConduct mosaic When you put the four pictures together shape by [2*img_size,2*img_size]
And right mosaic The integrated images are rotated randomly 、 translation 、 The zoom 、 tailoring , and resize Enter a size for img_size
return img4, labels4Finally, return the processed image and the corresponding label
2、load_image function
load_image function : Load the picture according to the ratio between the set input size and the original size of the picture ratio Conduct resize
First, get the image of the index
def load_image(self, i):
#load_image Load the picture according to the ratio between the set input size and the original size of the picture ratio Conduct resize
# loads 1 image from dataset index 'i', returns im, original hw, resized hw
im = self.imgs[i]# Get the image of the index Determine whether the image has cache , That is, have you ever scaled ( I'm not sure if this understanding is correct , If you are wrong, please tell me in the comment area , Thank you very much! ~)
without :
First go to the corresponding folder to find
If you can find : Load this picture
If you can't find it : Read the path of this figure , Then an error is reported, and the picture of the corresponding path cannot be found
Read the original height, width and setting of this picture resize The proportion
If this ratio is not equal to 1, Then we will resize Let's zoom
Finally, return to this picture , Original height and width and scaled height and width
if im is None: # not cached in ram
# If the picture is not cached ( I haven't done any scaling yet )
npy = self.img_npy[i] # Go to the folder to find
if npy and npy.exists(): # load npy
im = np.load(npy) # When we find it, we will load this picture
else: # read image
path = self.img_files[i] # If you can't find the picture, read the path of the original picture
im = cv2.imread(path) # BGR
assert im is not None, f'Image Not Found {path}' # Report an error and can't find this picture
h0, w0 = im.shape[:2] # orig hw
# Read the original height and width of this picture
r = self.img_size / max(h0, w0) # ratio
# Set up resize The proportion
if r != 1: # if sizes are not equal
im = cv2.resize(im, (int(w0 * r), int(h0 * r)),
interpolation=cv2.INTER_AREA if r < 1 and not self.augment else cv2.INTER_LINEAR)# Zoom
return im, (h0, w0), im.shape[:2] # im, hw_original, hw_resizedIf there is
Then return to this picture directly , Original height and width and scaled height and width ~
else:
return self.imgs[i], self.img_hw0[i], self.img_hw[i] # im, hw_original, hw_resized3、random_perspective() function ( See code analysis for details )
Random transformation
computing method : The product of the coordinate vector and the transformation matrix
First, get the height and width of the picture with the border
def random_perspective(im, targets=(), segments=(), degrees=10, translate=.1, scale=.1, shear=10, perspective=0.0,
border=(0, 0)):
# torchvision.transforms.RandomAffine(degrees=(-10, 10), translate=(0.1, 0.1), scale=(0.9, 1.1), shear=(-10, 10))
# targets = [cls, xyxy]
# Picture height and width ( add border Frame )
height = im.shape[0] + border[0] * 2 # shape(h,w,c)
width = im.shape[1] + border[1] * 2Then calculate the center point
# Center
C = np.eye(3)# Generate 3*3 The diagonal of is 1 Diagonal matrix of
#x The center of the direction
C[0, 2] = -im.shape[1] / 2 # x translation (pixels)
#y The center of the direction
C[1, 2] = -im.shape[0] / 2 # y translation (pixels)Then there are various transformations ( Spin and so on ) Matrix preparation
# Perspective
# perspective
P = np.eye(3)# Generate 3*3 The diagonal of is 1 Diagonal matrix of
# Random generation x,y Perspective value in direction
P[2, 0] = random.uniform(-perspective, perspective) # x perspective (about y)
P[2, 1] = random.uniform(-perspective, perspective) # y perspective (about x)
# Rotation and Scale
# Rotate and scale
R = np.eye(3)# Generate 3*3 The diagonal of is 1 Diagonal matrix of
a = random.uniform(-degrees, degrees)# Randomly generate angles in the range
# a += random.choice([-180, -90, 0, 90]) # add 90deg rotations to small rotations
s = random.uniform(1 - scale, 1 + scale) # Randomly generate scaling
# s = 2 ** random.uniform(-scale, scale)
R[:2] = cv2.getRotationMatrix2D(angle=a, center=(0, 0), scale=s)# The affine change matrix is assigned to R The first two lines
# Shear
# Bending angle
S = np.eye(3)# Generate 3*3 The diagonal of is 1 Diagonal matrix of
S[0, 1] = math.tan(random.uniform(-shear, shear) * math.pi / 180) # x shear (deg)
S[1, 0] = math.tan(random.uniform(-shear, shear) * math.pi / 180) # y shear (deg)
# Translation
# transformation ( Zoom in and out ?)
T = np.eye(3)
T[0, 2] = random.uniform(0.5 - translate, 0.5 + translate) * width # x translation (pixels)
T[1, 2] = random.uniform(0.5 - translate, 0.5 + translate) * height # y translation (pixels)
Then there is the combined rotation matrix
# Combined rotation matrix
# Combined rotation matrix
M = T @ S @ R @ P @ C # order of operations (right to left) is IMPORTANT
# Combine by matrix multiplication
if (border[0] != 0) or (border[1] != 0) or (M != np.eye(3)).any(): # image changed
# No borders or any transformations
if perspective:# If perspective
im = cv2.warpPerspective(im, M, dsize=(width, height), borderValue=(114, 114, 114))
#cv2.warpPerspective Perspective transformation function , It can keep the straight line without deformation , But parallel lines may no longer be parallel
else: # affine
im = cv2.warpAffine(im, M[:2], dsize=(width, height), borderValue=(114, 114, 114))
#cv2.warpAffine Radiative transformation function , Rotation can be realized , translation , The zoom , And the transformed parallel lines are still parallel Then, transform the coordinates of the label
# Transform label coordinates
# Transform label coordinates
n = len(targets)# Number of targets
if n:# If there is a goal
use_segments = any(x.any() for x in segments)# Judge segments Whether it is empty or whether it is all 0( Target pixel segment )
new = np.zeros((n, 4))# Initialize information matrix , Every goal 4 Messages xywh
if use_segments: # warp segments( deformation segments)
# If it's not empty
segments = resample_segments(segments) # upsample
# On the sampling
for i, segment in enumerate(segments):
xy = np.ones((len(segment), 3))
xy[:, :2] = segment# The first two columns are the pixel segments in the center of the target
xy = xy @ M.T # transform conversion
xy = xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2] # perspective rescale or affine
# Perspective processing , Rescale or affine
#xy The last column of is all 1 It's for and M.T When matrices multiply , Only with the last M.T Multiply the last line of , and M.T The last line of is P The perspective value set at that time
# clip Build
new[i] = segment2box(xy, width, height)
else: # warp boxes( deformation box)
xy = np.ones((n * 4, 3))
xy[:, :2] = targets[:, [1, 2, 3, 4, 1, 4, 3, 2]].reshape(n * 4, 2) # x1y1, x2y2, x1y2, x2y1
xy = xy @ M.T # transform
xy = (xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2]).reshape(n, 8) # perspective rescale or affine
# create new boxes
x = xy[:, [0, 2, 4, 6]]
y = xy[:, [1, 3, 5, 7]]
new = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T
# clip
# Remove the box that is cut too small after the above series of operations
new[:, [0, 2]] = new[:, [0, 2]].clip(0, width)
new[:, [1, 3]] = new[:, [1, 3]].clip(0, height)Finally, calculate the candidate box and return
# filter candidates
i = box_candidates(box1=targets[:, 1:5].T * s, box2=new.T, area_thr=0.01 if use_segments else 0.10)# Calculate the candidate box
targets = targets[i]
targets[:, 1:5] = new[i]
return im, targetsYou are welcome to criticize and correct in the comment area , thank you ~
边栏推荐
- [research materials] the fourth quarter report of the survey of Chinese small and micro entrepreneurs in 2021 - Download attached
- "Pthread.h" not found problem encountered in compiling GCC
- Coordinatorlayout appbarrayout recyclerview item exposure buried point misalignment analysis
- 动态规划——相关概念,(数塔问题)
- Notes | numpy-10 Iterative array
- JS dynamic table creation
- Compile and decompile GCC common instructions
- Review the old and know the new: Notes on Data Science
- The programmer resigned and was sentenced to 10 months for deleting the code. JD came home and said that it took 30000 to restore the database. Netizen: This is really a revenge
- sql语句模糊查询遇到的问题
猜你喜欢

Concurrent operation memory interaction

JS dynamic table creation

Analysis of proxy usage of ES6 new feature
![[luatos sensor] 2 air pressure bmp180](/img/88/2a6caa5fec95e54e3fb09c74ba8ae6.jpg)
[luatos sensor] 2 air pressure bmp180

The consumption of Internet of things users is only 76 cents, and the price has become the biggest obstacle to the promotion of 5g industrial interconnection

LVS load balancing cluster of efficient multi-purpose cluster (NAT mode)

SSM framework integration
![[set theory] relational power operation (relational power operation | examples of relational power operation | properties of relational power operation)](/img/8b/c10423ee95200a0d94f9fb9dde76eb.jpg)
[set theory] relational power operation (relational power operation | examples of relational power operation | properties of relational power operation)

Valentine's day limited withdrawal guide: for one in 200 million of you

Thesis reading_ Tsinghua Ernie
随机推荐
112 stucked keyboard (20 points)
[set theory] relationship properties (common relationship properties | relationship properties examples | relationship operation properties)
Use Sqlalchemy module to obtain the table name and field name of the existing table in the database
LVS load balancing cluster of efficient multi-purpose cluster (NAT mode)
Burp suite plug-in based on actual combat uses tips
RT thread flow notes I startup, schedule, thread
Notes | numpy-07 Slice and index
JDBC database operation
The principle is simple, but I don't know how to use it? Understand "contemporaneous group model" in one article
Source insight garbled code solution
1095 cars on campus (30 points)
[backtrader source code analysis 4] use Python to rewrite the first function of backtrader: time2num, which improves the efficiency by 2.2 times
appium1.22.x 版本後的 appium inspector 需單獨安裝
[develop wechat applet local storage with uni app]
Dynamic programming - related concepts, (tower problem)
Actual combat 8051 drives 8-bit nixie tube
Retirement plan fails, 64 year old programmer starts work again
Huawei personally ended up developing 5g RF chips, breaking the monopoly of Japan and the United States
[research materials] 2021 annual report on mergers and acquisitions in the property management industry - Download attached
Market status and development prospect prediction of the near infrared sensor industry of the global Internet of things in 2022