当前位置：网站首页>[mask RCNN] target detection and recognition based on mask RCNN

[mask RCNN] target detection and recognition based on mask RCNN

2022-06-30 06:43:00 【FPGA and MATLAB】

Mask-RCNN He Kaiming Faster-RCNN After another masterpiece , It integrates object detection and instance segmentation , And the performance is better than Faster-RCNN. Its basic structure is as follows ：

Mask R-CNN Is an instance segmentation model , It can determine the position and category of each target in the picture , Give pixel level prediction . So-called “ Instance segmentation ”, It refers to the segmentation of each interest object in the scene , Whether they belong to the same category or not —— For example, the model can be seen from the street view video Identify the vehicle in 、 Individual objectives such as personnel . The picture below is in COCO Trained on the dataset Mask R-CNN, As shown in the figure , Big enough for every car , Small to a single banana , It can mark the pixel position of the object in the picture with a window .

differ Faster R-CNN Such a classical object detection model ,Mask R-CNN One of the features of windows is that it can color the pixels that represent the outline of the object in the window . Some people may think this is a chicken rib function , But it's right. Autopilot The control of cars and robots is of great significance ：

Shading can help the car to identify the specific pixel position of each target on the road , So as to avoid collision ;

If a robot wants to grab a target object , It needs to know the location information （ Such as Amazon Of Unmanned aerial vehicle (uav) ）.

If you just want to COCO Training Mask R-CNN Model , The easiest way is to call Tensorflow Object Detection API, The specific content Github There are , No more details here .

Mask R-CNN How it works

In the build Mask R-CNN Before the model , Let's first understand its working mechanism .

in fact ,Mask R-CNN yes Faster R-CNN and FCN The combination of , The former is responsible for object detection （ Category labels + window ）, The latter is responsible for determining the target contour . As shown in the figure below ：

Its concept is very simple ： For each target object ,Faster R-CNN Both have two outputs , First, classification labels , The second is the candidate window ; To segment the target pixel , We can add a third output to the first two —— Binary mask indicating the pixel position of the object in the window （mask）. Different from the first two outputs , This new output needs to extract a finer spatial layout , So ,Mask R-CNN stay Faster-RCNN Add a branch network on ：Fully Convolution Networ（FCN）.

FCN Is a popular semantic segmentation algorithm , So called semantic segmentation , That is, the machine automatically divides the object area from the image , And identify the contents . The model first compresses the input image to the original size by convolution and maximum pooling layer 1/32, Then the classification prediction is carried out at this fine-grained level . Last , It uses up sampling and deconvolution The layer restores the graph to its original size .

So in short , We can say Mask R-CNN Combining two networks —— hold Faster R-CNN and FCN Into the same Mega Architecture . The loss function of the model calculates the classification 、 Generate window 、 Total loss of mask generation .

because MASK-RCNN Longer training time , We use matlab Provided after training MASK-RCNN To test , download

https://www.mathworks.com/supportfiles/vision/data/maskrcnn_pretrained_person_car.mat

This one is trained to recognize vehicles and pedestrians MASK-RCNN Model . Trained models , The data are as follows ：

targetSize = [700 700 3];
imgSize    = size(img);
[~, maxDim]= max(imgSize);
resizeSize = [NaN NaN]; 
resizeSize(maxDim) = targetSize(maxDim);
img        = imresize(img, resizeSize);

trainSize  = [800 800 3];
classNames = {'person','car','background'};
numClasses = 2;
params     = createMaskRCNNConfig(trainSize, numClasses, classNames);
Envs       = "cpu";
 
maskSubnet = helper.extractMaskNetwork(net);

%MaskRCNN
[boxes, scores, labels, masks] = detectMaskRCNN(net, maskSubnet, img, params, Envs);


if(isempty(masks))
   overlayedImage = img;
else
   overlayedImage = insertObjectMask(img, masks);
end
figure
imshow(overlayedImage)
showShape("rectangle", gather(boxes), "Label", labels, "LineColor",'g')

adopt MATLAB Simulation , The following simulation results can be achieved ：