当前位置:网站首页>Angled detection frame | calibrated depth feature for target detection (with implementation source code)
Angled detection frame | calibrated depth feature for target detection (with implementation source code)
2022-07-03 00:06:00 【Tom Hardy】
Click on the above “3D Visual workshop ”, choice “ Star standard ”
The dry goods arrive at the first time

The author 丨 Edison_G
Source: Institute of computer vision
In the last ten years , Significant progress has been made in target detection , These targets are usually distributed in large-scale changes and arbitrary directions . However , Most existing methods rely on having different scales 、 Heuristic definition of angle and aspect ratio anchor, Usually in anchor boxes and axis-aligned There is a serious misalignment between the convolution features of , This leads to a common inconsistency between classification scores and positioning accuracy .


One 、 Briefly
To solve this problem , Some researchers have proposed a Single-shot Alignment Network(S2A-Net), It consists of two modules : A feature alignment module (FAM) And a directional detection module (ODM).FAM Can pass anchor Optimize the network to generate high-quality anchor, And according to anchor boxes With the newly proposed alignment convolution, it adapts to the characteristics of alignment convolution .ODM First, take the initiative active rotating filters Encode the direction information , Then it produces direction sensitive and direction invariant features , To alleviate the inconsistency between classification score and positioning accuracy . Besides , Researchers have also further explored methods for detecting objects in large-scale images , Thus, a better trade-off is made between speed and accuracy .
A lot of experiments show that , The new method can be used in two commonly used data sets (DOTA and HRSC2016) Achieve state-of-the-art performance , While maintaining high efficiency !
Two 、 background
And based on R-CNN Compared with the detector of ,one-stage The detector returns to the bounding box , And directly use conventional and intensive sampling anchor Classify them . This architecture has high computational efficiency , But it often lags behind [G.-S. Xia, X. Bai, J. Ding, Z. Zhu, S. Belongie, J. Luo, M. Datcu, M. Pelillo, and L. Zhang, “DOTA: A large-scale dataset for object detection in aerial images,” in CVPR, 2018]. As shown in the figure below , Think one-stage The detector is seriously misaligned .

Heuristically defined anchor Low quality , The target cannot be overwritten , Lead to goals and anchor Dislocation between . for example , The widening ratio of the bridge is usually 1/3 To 1/30 Between , Only a few anchor Not even anchor You can calibrate . This dislocation usually aggravates the imbalance of foreground and background , Hinder performance .
The convolution feature of the backbone network is usually aligned with the fixed receiving field , The targets in aerial images are distributed in arbitrary directions and different appearances . Even if it's anchor boxes Assigned to instances with high reliability ( namely IoU),anchor boxes There is still dislocation with convolution features . let me put it another way ,anchor boxes To some extent, it is difficult to express the whole goal . result , The final classification score cannot accurately reflect the positioning accuracy , This also hinders the detection performance in the post-processing stage ( Such as NMS).
Performance comparison of different methods under the same setting

3、 ... and 、 New framework analysis

RetinaNet as Baseline

Be careful ,RetinaNet Designed for general object detection , Output Horizontal bounding box ( Here's the picture (a)) Expressed as :


In order to be compatible with object-oriented detection , The researchers replaced the directional bounding box RetinaNet Regression output of . Pictured above (b), Expressed as :

In fact, an angle parameter is added , angle θ Range [-π/4,3π/4].
Alignment Convolution
Directly above , Look at the picture .

As shown in the figure . The standard convolution sample on the characteristic graph through the regular grid .DeformConv Learn an offset field to increase the spatial sampling location . However , It may sample in the wrong place , Especially for objects with dense packaging . The researchers put forward AlignConv By adding an additional offset field anchor boxes Guide to extract the characteristics of grid distribution . And DeformConv Different ,AlignConv The offset field in is directly from anchor boxes Inferred from . The example above .(c) and (d) Illustrates the AlignConv Can be in anchor boxes Extract accurate features from .
(a) It is a standard two-dimensional convolution of conventional sampling positions ( Green dot ).(b) by Deformable Convolution, With deformable sampling position ( Blue dot ).(c) and (d) It is proposed by researchers to have horizontal and rotating anchor boxes(AB) Two examples of ( Orange rectangle ), The blue arrow indicates the offset field .
Feature Alignment Module (FAM)

Adopt input characteristics and anchor forecast . Map to input , And generate aligned features .
Oriented Detection Module (ODM)

ODM To alleviate the inconsistency between classification score and positioning accuracy , Then carry out accurate target detection .
Four 、 Experiment and visualization
Different RETINANET stay DOTA The result on the dataset




Researchers cut a large image into 1024×1024chip Images , In steps of 824. Combine the large image with chip Image input to the same network , The test results can be generated without resizing ( For example, the plane in the red box ).

Different methods in DOTA The result on the dataset

stay HRSC2016 Test results on the data
The test results in the above figure , Promising research in military or dock , The wharf can reduce the work of manual scheduling , The military can attack and counterattack more accurately . As a technology contributed before :
Computer vision to see the Suez Canal blockage ( Ship inspection )

This article is only for academic sharing , If there is any infringement , Please contact to delete .
3D Visual workshop boutique course official website :3dcver.com
1. Multi sensor data fusion technology for automatic driving field
2. For the field of automatic driving 3D Whole stack learning route of point cloud target detection !( Single mode + Multimodal / data + Code )
3. Thoroughly understand the visual three-dimensional reconstruction : Principle analysis 、 Code explanation 、 Optimization and improvement
4. China's first point cloud processing course for industrial practice
5. laser - Vision -IMU-GPS The fusion SLAM Algorithm sorting and code explanation
6. Thoroughly understand the vision - inertia SLAM: be based on VINS-Fusion The class officially started
7. Thoroughly understand based on LOAM Framework of the 3D laser SLAM: Source code analysis to algorithm optimization
8. Thorough analysis of indoor 、 Outdoor laser SLAM Key algorithm principle 、 Code and actual combat (cartographer+LOAM +LIO-SAM)
10. Monocular depth estimation method : Algorithm sorting and code implementation
11. Deployment of deep learning model in autopilot
12. Camera model and calibration ( Monocular + Binocular + fisheye )
13. blockbuster ! Four rotor aircraft : Algorithm and practice
14.ROS2 From entry to mastery : Theory and practice
15. The first one in China 3D Defect detection tutorial : theory 、 Source code and actual combat
blockbuster !3DCVer- Academic paper writing contribution Communication group Established
Scan the code to add a little assistant wechat , can Apply to join 3D Visual workshop - Academic paper writing and contribution WeChat ac group , The purpose is to communicate with each other 、 Top issue 、SCI、EI And so on .
meanwhile You can also apply to join our subdivided direction communication group , At present, there are mainly 3D Vision 、CV& Deep learning 、SLAM、 Three dimensional reconstruction 、 Point cloud post processing 、 Autopilot 、 Multi-sensor fusion 、CV introduction 、 Three dimensional measurement 、VR/AR、3D Face recognition 、 Medical imaging 、 defect detection 、 Pedestrian recognition 、 Target tracking 、 Visual products landing 、 The visual contest 、 License plate recognition 、 Hardware selection 、 Academic exchange 、 Job exchange 、ORB-SLAM Series source code exchange 、 Depth estimation Wait for wechat group .
Be sure to note : Research direction + School / company + nickname , for example :”3D Vision + Shanghai Jiaotong University + quietly “. Please note... According to the format , Can be quickly passed and invited into the group . Original contribution Please also contact .

▲ Long press and add wechat group or contribute

▲ The official account of long click attention
3D Vision goes from entry to mastery of knowledge : in the light of 3D In the field of vision Video Course cheng ( 3D reconstruction series 、 3D point cloud series 、 Structured light series 、 Hand eye calibration 、 Camera calibration 、 laser / Vision SLAM、 Automatically Driving, etc )、 Summary of knowledge points 、 Introduction advanced learning route 、 newest paper Share 、 Question answer Carry out deep cultivation in five aspects , There are also algorithm engineers from various large factories to provide technical guidance . meanwhile , The planet will be jointly released by well-known enterprises 3D Vision related algorithm development positions and project docking information , Create a set of technology and employment as one of the iron fans gathering area , near 4000 Planet members create better AI The world is making progress together , Knowledge planet portal :
Study 3D Visual core technology , Scan to see the introduction ,3 Unconditional refund within days

There are high quality tutorial materials in the circle 、 Answer questions and solve doubts 、 Help you solve problems efficiently
Feel useful , Please give me a compliment ~
边栏推荐
- JDBC練習案例
- 洛谷_P2010 [NOIP2016 普及组] 回文日期_折半枚举
- Container runtime analysis
- 返回二叉树中最大的二叉搜索子树的根节点
- JDBC Exercise case
- Happy Lantern Festival, how many of these technical lantern riddles can you guess correctly?
- 请求与响应
- 可知论与熟能生巧
- 实用系列丨免费可商用视频素材库
- In February 2022, the ranking list of domestic databases: oceanbase regained its popularity with "three consecutive increases", and gaussdb is expected to achieve the largest increase this month
猜你喜欢

What are the projects of metauniverse and what are the companies of metauniverse

Wechat applet basic learning (wxss)

Digital collection trading website domestic digital collection trading platform

35 pages dangerous chemicals safety management platform solution 2022 Edition
![洛谷_P2010 [NOIP2016 普及组] 回文日期_折半枚举](/img/a3/55bb71d39801ceeee421a0c8ded333.png)
洛谷_P2010 [NOIP2016 普及组] 回文日期_折半枚举

Bean加载控制

C MVC creates a view to get rid of the influence of layout

Explain in detail the process of realizing Chinese text classification by CNN

来自数砖大佬的 130页 PPT 深入介绍 Apache Spark 3.2 & 3.3 新功能

Interface difference test - diffy tool
随机推荐
[analysis of STL source code] imitation function (to be supplemented)
1380. Lucky numbers in the matrix
Use redis to realize self increment serial number
Detailed explanation of 'viewpager' in compose | developer said · dtalk
MFC 获取当前时间
Xcode real machine debugging
leetcode 650. 2 keys keyboard with only two keys (medium)
All things work together, and I will review oceanbase's practice in government and enterprise industry
How QT exports data to PDF files (qpdfwriter User Guide)
开源了 | 文心大模型ERNIE-Tiny轻量化技术,又准又快,效果全开
yolov5train. py
67页新型智慧城市整体规划建设方案(附下载)
Interpretation of new plug-ins | how to enhance authentication capability with forward auth
[error record] the flutter reports an error (could not resolve io.flutter:flutter_embedding_debug:1.0.0.)
Codeforces Round #771 (Div. 2)---A-D
返回二叉树中最大的二叉搜索子树的根节点
Digital twin smart factory develops digital twin factory solutions
Highly available cluster (HAC)
Convolution和Batch normalization的融合
MFC file operation