当前位置:网站首页>Record in detail the implementation of yolact instance segmentation ncnn
Record in detail the implementation of yolact instance segmentation ncnn
2022-06-27 09:42:00 【Xiaobai learns vision】
Click on the above “ Xiaobai studies vision ”, Optional plus " Star standard " or “ Roof placement ”
Heavy dry goods , First time delivery link :https://zhuanlan.zhihu.com/p/128974102
This article reprints self-knowledge , The author has authorized , Do not reprint without permission .
0x0 YOLACT Instance segmentation
https://urlify.cn/rURFry
The end-to-end phase completes the instance segmentation
Fast ,550x550 Picture in TitanXP Claim to reach 33FPS
Open source code ,pytorch Dafa is good !
0x1 reason
Throughout the github, Whether it's ncnn still ncnn Derivative projects , classification , testing , location , feature extraction ,OCR, Style change ....
However , No instance partition is found , That someone sent a issue, And asked by name to do YOLACT Instance segmentation https://github.com/Tencent/ncnn/issues/1679
Well, then write a YOLACT Example , By the way, how to use ncnn Implement algorithms like this that require post-processing
0x2 pytorch test
YOLACT In the project YOLACT++ Model , Faster , better , however YOLACT++ It uses a classic operation that is not friendly to deployment deformable convolution
Pretend not to see , Let's go download YOLACT Model

newly build weights Folder , download yolact_resnet50_54_800000.pth
according to README instructions , Take a picture to see the effect
$ python eval.py --trained_model=weights/yolact_resnet50_54_800000.pth --score_threshold=0.15 --top_k=15 --image=test.jpg
0x3 Remove post-processing Export onnx
Directly modifying eval.py Of evalimage, Replace the result display with onnx export
def evalimage(net:Yolact, path:str, save_path:str=None):
frame = torch.from_numpy(cv2.imread(path)).cuda().float()
batch = FastBaseTransform()(frame.unsqueeze(0))
preds = net(batch)
torch.onnx._export(net, batch, "yolact.onnx", export_params=True, keep_initializers_as_inputs=True, opset_version=11)according to YOLACT issue Information in ,yolact.py At the beginning JIT You have to turn it off to export onnx
# As of March 10, 2019, Pytorch DataParallel still doesn't support JIT Script Modules
use_jit = FalseYOLACT The post-processing part is very pythonic, This direct export does not work , Remove post-processing from the model , Easy to export and convert
Even if onnx Can export post-processing , It's not recommended either
The post-treatment part is not standardized , The implementation details of each project author are also different , Such as a variety of nms and bbox Calculation method ,ncnn It's hard to use a unified op Realization (caffe-ssd Because there is only one version , So there is implementation )
Post processing in onnx Will be converted into a big lump of glue op, Very trivial , It is inefficient to implement in the framework
onnx Most of the glue op,ncnn Does not support or has compatibility problems , such as Gather etc. , Cannot be used directly
therefore , Remove post-processing Export onnx, Is the correct conversion pytorch ssd And so on
open yolact.py, find class Yolact Of forward Method , hold detect Process removal , Return directly to the model pred_outs Output
# return self.detect(pred_outs, self)
return pred_outs;Run the picture test again , Without post-processing yolact.onnx There is
$ python eval.py --trained_model=weights/yolact_resnet50_54_800000.pth --score_threshold=0.15 --top_k=15 --image=test.jpg0x4 simplify onnx
Directly derived onnx The model has a lot of glue op yes ncnn Don't support , use onnx-simplifier It's a routine operation
$ pip install -U onnx --user
$ pip install -U onnxruntime --user
$ pip install -U onnx-simplifier --user
$ python -m onnxsim yolact.onnx yolact-sim.onnxThere is a problem at this time
Graph must be in single static assignment (SSA) form, however '523' has been used as output names multiple timesPassing through github Look over issue, Confirm this is onnx bug
https://link.zhihu.com/?target=https%3A//github.com/onnx/onnx/issues/2613
fortunately onnx-simplifier Means have been provided to bypass
$ python -m onnxsim --skip-fuse-bn yolact.onnx yolact-sim.onnx0x5 ncnn Model transformation and optimization
The previous simplification onnx When ,--skip-fuse-bn Skip the batchnorm Merge , But that's okay ,ncnn It also has this function
ncnnoptimize The tool implements the fusion of many operators , For example, the common convolution-batchnorm-relu wait
Last parameter 0 Express fp32 Model ,65536 Means reduced to fp16 Model , It can reduce the binary volume of the model
$ ./onnx2ncnn yolact-sim.onnx yolact.param yolact.bin
$ ./ncnnoptimize yolact.param yolact.bin yolact-opt.param yolact-opt.bin 00x6 Fine tune the model manually
Or this sentence , Not reporting an error does not necessarily mean that it can be used , First use netron Tool open param Look at the model structure

There are four outputs of this model , It's framed in red
Convolution Conv_263 1 1 617 619 0=32 1=1 5=1 6=8192 9=1
Permute Transpose_265 1 1 619 620 0=3
UnaryOp Tanh_400 1 1 814 815 0=16
Concat Concat_401 5 1 634 673 712 751 790 816 0=-3
Concat Concat_402 5 1 646 685 724 763 802 817 0=-3
Concat Concat_403 5 1 659 698 737 776 815 818 0=-3
Softmax Softmax_405 1 1 817 820 0=1 1=1YOLACT The post-treatment of needs loc conf prior mask maskdim These things
At first, I can't see what these outputs correspond to , Let's see first shape
ncnn::Extractor ex = yolact.create_extractor();
ncnn::Mat in(550, 550, 3);
ex.input("input.1", in);
ncnn::Mat b620;
ncnn::Mat b816;
ncnn::Mat b818;
ncnn::Mat b820;
ex.extract("620", b620);// 32 x 138x138
ex.extract("816", b816);// 4 x 19248
ex.extract("818", b818);// 32 x 19248
ex.extract("820", b820);// 81 x 19248Directly compile and run the discovery Concat layer crash, That is, the blue box in the figure ,Concat axis The parameter is negative 0=-3,ncnn Not yet
according to Concat Multiple inputs shape, It is found that the two-dimensional data is in h axis concat, Direct change to 0=0 Can replace
Concat Concat_401 5 1 634 673 712 751 790 816 0=0
Concat Concat_402 5 1 646 685 724 763 802 817 0=0
Concat Concat_403 5 1 659 698 737 776 815 818 0=0b820 stay softmax Back , Be sure it is conf,shape 81x19248 Express 81 classification x 19248 individual prior
b816 shape 4x19248, Corresponds to each priorbox Of bbox Offset value
b818 shape 32x19248, according to YOLACT The post-processing of , It means maskdim, namely 32 The coefficient of a divided heat map
b620 shape 32x138x138, namely 32 A split heat map , There's a front. permute Layer is NCHW->NHWC Transformation prior No output in the model
ncnn Handle b620 NHWC shape inconvenient , Change it to extract permute Before NCHW data b619, That is, the green box in the figure outputs
ncnn::Extractor ex = yolact.create_extractor();
ncnn::Mat in(550, 550, 3);
ex.input("input.1", in);
ncnn::Mat maskmaps;
ncnn::Mat location;
ncnn::Mat mask;
ncnn::Mat confidence;
ex.extract("619", maskmaps);// 138x138 x 32
ex.extract("816", location);// 4 x 19248
ex.extract("818", mask);// maskdim 32 x 19248
ex.extract("820", confidence);// 81 x 192480x7 Generate prior
The original code is in yolact.py class PredictionModule make_priors, Add some print Get it all priorbox Generate rule hyperparameters
const int conv_ws[5] = {69, 35, 18, 9, 5};
const int conv_hs[5] = {69, 35, 18, 9, 5};
const float aspect_ratios[3] = {1.f, 0.5f, 2.f};
const float scales[5] = {24.f, 48.f, 96.f, 192.f, 384.f};YOLACT Of prior The four values are center_x center_y box_w box_h, range 0~1
The author wrote a bug,box_h = box_w Fixed square , We also need to put this bug To reproduce
// make priorbox
ncnn::Mat priorbox(4, 19248);
{
float* pb = priorbox;
for (int p = 0; p < 5; p++)
{
int conv_w = conv_ws[p];
int conv_h = conv_hs[p];
float scale = scales[p];
for (int i = 0; i < conv_h; i++)
{
for (int j = 0; j < conv_w; j++)
{
// +0.5 because priors are in center-size notation
float cx = (j + 0.5f) / conv_w;
float cy = (i + 0.5f) / conv_h;
for (int k = 0; k < 3; k++)
{
float ar = aspect_ratios[k];
ar = sqrt(ar);
float w = scale * ar / 550;
float h = scale / ar / 550;
// This is for backward compatability with a bug where I made everything square by accident
// cfg.backbone.use_square_anchors:
h = w;
pb[0] = cx;
pb[1] = cy;
pb[2] = w;
pb[3] = h;
pb += 4;
}
}
}
}
}0x8 YOLACT Whole process realization
Pretreatment part
data/config.py Yes ImageNet Of MEAN STD,BGR The order
# These are in BGR and are for ImageNet
MEANS = (103.94, 116.78, 123.68)
STD = (57.38, 57.12, 58.40)YOLACT Actual input RGB, To change the order
const int target_size = 550;
int img_w = bgr.cols;
int img_h = bgr.rows;
ncnn::Mat in = ncnn::Mat::from_pixels_resize(bgr.data, ncnn::Mat::PIXEL_BGR2RGB, img_w, img_h, target_size, target_size);
const float mean_vals[3] = {123.68f, 116.78f, 103.94f};
const float norm_vals[3] = {1.0/58.40f, 1.0/57.12f, 1.0/57.38f};
in.substract_mean_normalize(mean_vals, norm_vals);Post processing part
This section and SSD Post processing is very similar ,sort nms These codes are boring ncnn/src/layer/detectionoutput.cpp
The only thing to pay attention to is bbox Generate and SSD Dissimilarity , Use center_x center_y box_w box_h Realization ,YOLACT The original code is layers/box_util.py decode function
YOLACT Yes fastnms Method layers/funstions/detection.py, Faster , But I think it's normal nms After all, it's off the shelf code , It works very well
// generate all candidates for each class
for (int i=0; i<num_priors; i++)
{
// find class id with highest score
// start from 1 to skip background
// ignore background or low score
if (label == 0 || score <= confidence_thresh)
continue;
// apply center_size to priorbox with loc
float var[4] = {0.1f, 0.1f, 0.2f, 0.2f};
float pb_cx = pb[0];
float pb_cy = pb[1];
float pb_w = pb[2];
float pb_h = pb[3];
float bbox_cx = var[0] * loc[0] * pb_w + pb_cx;
float bbox_cy = var[1] * loc[1] * pb_h + pb_cy;
float bbox_w = (float)(exp(var[2] * loc[2]) * pb_w);
float bbox_h = (float)(exp(var[3] * loc[3]) * pb_h);
float obj_x1 = bbox_cx - bbox_w * 0.5f;
float obj_y1 = bbox_cy - bbox_h * 0.5f;
float obj_x2 = bbox_cx + bbox_w * 0.5f;
float obj_y2 = bbox_cy + bbox_h * 0.5f;
// clip inside image
// append object candidate
}
// merge candidate box for each class
for (int i=0; i<(int)class_candidates.size(); i++)
{
// sort + nms
}
// sort all result by score
// keep_top_kSplit graph generation
maskmaps the truth is that 32 Zhang 138x138 Dimensional heat map , Each of the previous outputs object Have their own 32 individual float coefficient
object The split graph of is each heat graph * Corresponding coefficient , Sum up , Zoom in to original size , Two valued , Last crop inside Output box


unnatrual It's beautiful !
0x9 Add learning materials
alas ? There are also supplementary learning materials ?
ncnn The implementation code and the improved model have been uploaded to github
https://link.zhihu.com/?target=https%3A//github.com/Tencent/ncnn
The good news !
Xiaobai learns visual knowledge about the planet
Open to the outside world

download 1:OpenCV-Contrib Chinese version of extension module
stay 「 Xiaobai studies vision 」 Official account back office reply : Extension module Chinese course , You can download the first copy of the whole network OpenCV Extension module tutorial Chinese version , Cover expansion module installation 、SFM Algorithm 、 Stereo vision 、 Target tracking 、 Biological vision 、 Super resolution processing and other more than 20 chapters .
download 2:Python Visual combat project 52 speak
stay 「 Xiaobai studies vision 」 Official account back office reply :Python Visual combat project , You can download, including image segmentation 、 Mask detection 、 Lane line detection 、 Vehicle count 、 Add Eyeliner 、 License plate recognition 、 Character recognition 、 Emotional tests 、 Text content extraction 、 Face recognition, etc 31 A visual combat project , Help fast school computer vision .
download 3:OpenCV Actual project 20 speak
stay 「 Xiaobai studies vision 」 Official account back office reply :OpenCV Actual project 20 speak , You can download the 20 Based on OpenCV Realization 20 A real project , Realization OpenCV Learn advanced .
Communication group
Welcome to join the official account reader group to communicate with your colleagues , There are SLAM、 3 d visual 、 sensor 、 Autopilot 、 Computational photography 、 testing 、 Division 、 distinguish 、 Medical imaging 、GAN、 Wechat groups such as algorithm competition ( It will be subdivided gradually in the future ), Please scan the following micro signal clustering , remarks :” nickname + School / company + Research direction “, for example :” Zhang San + Shanghai Jiaotong University + Vision SLAM“. Please note... According to the format , Otherwise, it will not pass . After successful addition, they will be invited to relevant wechat groups according to the research direction . Please do not send ads in the group , Or you'll be invited out , Thanks for your understanding ~边栏推荐
- torch.utils.data.RandomSampler和torch.utils.data.SequentialSampler的区别
- 我大抵是卷上瘾了,横竖睡不着!竟让一个Bug,搞我两次!
- Semi supervised learning—— Π- Introduction to model, temporary assembling and mean teacher
- How do I get the STW (pause) time of a GC (garbage collector)?
- 强化学习中好奇心机制
- leetcode:522. 最长特殊序列 II【贪心 + 子序列判断】
- std::memory_order_seq_cst内存序
- 如何获取GC(垃圾回收器)的STW(暂停)时间?
- Advantages and disadvantages of distributed file storage system
- 【报名】基础架构设计:从架构热点问题到行业变迁 | TF63
猜你喜欢

Quick start CherryPy (1)

Quelques exercices sur les arbres binaires

TDengine 邀请函:做用技术改变世界的超级英雄,成为 TD Hero

不容置疑,这是一个绝对精心制作的项目

Improving efficiency or increasing costs, how should developers understand pair programming?

Process 0, process 1, process 2
![[registration] infrastructure design: from architecture hot issues to industry changes | tf63](/img/75/b83aaf9610987f695eefe350f8170e.jpg)
[registration] infrastructure design: from architecture hot issues to industry changes | tf63

.NET 中的引用程序集

Decompile the jar package and recompile it into a jar package after modification

prometheus告警流程及相关时间参数说明
随机推荐
torch.utils.data.RandomSampler和torch.utils.data.SequentialSampler的区别
【报名】基础架构设计:从架构热点问题到行业变迁 | TF63
Improving efficiency or increasing costs, how should developers understand pair programming?
R language uses econcharts package to create microeconomic or macro-economic charts, demand function to visualize demand curve, and customize the parameters of demand function to enrich the visualizat
前馈-反馈控制系统设计(过程控制课程设计matlab/simulink)
R语言plotly可视化:可视化多个数据集归一化直方图(historgram)并在直方图中添加密度曲线kde、设置不同的直方图使用不同的分箱大小(bin size)、在直方图的底部边缘添加边缘轴须图
Tdengine invitation: be a superhero who uses technology to change the world and become TD hero
es 根据索引名称和索引字段更新值
微信小程序学习之五种页面跳转方法.
集合框架 泛型LinkedList TreeSet
ucore lab4
JS array splicing "suggested collection"
E+H二次表维修PH变送器二次显示仪修理CPM253-MR0005
隐私计算FATE-离线预测
手机影像内卷几时休?
NoSQL database redis installation
torchvision.models._utils.IntermediateLayerGetter使用教程
JS 文件上传下载
Use CAS to complete concurrent operations with atomic variables
MYSQL精通-01 增删改