当前位置:网站首页>Interpretation of source code demand:a rotation equivariant detector for aerial object detection
Interpretation of source code demand:a rotation equivariant detector for aerial object detection
2022-06-30 08:41:00 【Wu lele~】
List of articles
Preface
Interpretation of this article 2021CVPR Rotating target detection :ReDet:A Rotation-equivariant Detector for Aerial Object Detection. Attach the address and source code link :
Thesis download address
Source code address
1、 Problem solved
This is what I did at the group meeting ppt. In short, there are two innovative points :
1) utilize NIPS2019 Of e2cnn Thought rewrites ResNet50 And named it ReCNN, bring CNN With rotational equivariant . That is, when the input image rotates ,CNN The extracted features are the same .
2) after e2cnn Extract the feature vector of the image F(K*N,H,W) after , In the channel dimension , Can be understood as divided into N A set of (N=4/8) representative 4 Two directions or 8 A direction , The number of subchannels in each group is K. but RRoIAlign The module only corrects the objects with different orientations in the spatial dimension , But it is not aligned in the channel dimension , So the author designed RiROIAlign The module is aligned in both the channel dimension and the space dimension , Thus, the characteristic of rotation invariance is obtained .
in general : This paper designs a very strong feature extractor .
2、 Model structure
2.1.ReCNN
I don't understand this either ,e2cnn It's too hard . Just say : The author is writing ReCNN after , stay ImageNet Retrain and fine tune the test data set .( Envy the ability to train Backbone People who ).
2.2. RiRoiAlign
In the model structure diagram, it means to use... First RRoiAlign Modules are spatially aligned , After that, each channel is exchanged in a loop , such as r=2, take Cn2 The channel value is assigned to Cn1,Cn1 The value is assigned to Cnn… Bilinear interpolation is performed between the front and back channels to calculate the pixel value of the current channel .( I am quite ignorant here , So I went to see the source code ). Source location :ReDet-master\mmdet\ops\riroi_align\src\riroi_align_kernel.cu. I try to make detailed notes . Don't understand. Welcome to comment and exchange .
#include <ATen/ATen.h>
#include <THC/THCAtomics.cuh>
#include <math.h>
#define PI 3.141592653
//CUDA It's parallel computing , Multithreaded computing . Each thread corresponds to one after pooling ROI The calculation of a pixel of .
//i For each thread id,n representative CUDA The total number of threads currently allocated .
#define CUDA_1D_KERNEL_LOOP(i, n) \ for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; \ i += blockDim.x * gridDim.x)
// Block size is 1024.
#define THREADS_PER_BLOCK 1024
// Get the grid size according to the block size .
inline int GET_BLOCKS(const int N) {
int optimal_block_num = (N + THREADS_PER_BLOCK - 1) / THREADS_PER_BLOCK;
int max_block_num = 65000;
return min(optimal_block_num, max_block_num);
}
// Bilinear interpolation is not posted , There are many comments on the Internet .
template <typename scalar_t>
__device__ scalar_t bilinear_interpolate(const scalar_t *bottom_data,
const int height, const int width,
scalar_t y, scalar_t x)
/
}
template <typename scalar_t>
__global__ void RiROIAlignForward(const int nthreads, const scalar_t *bottom_data,
const scalar_t *bottom_rois,
const scalar_t spatial_scale,
const int sample_num, const int channels,
const int height, const int width,
const int pooled_height, const int pooled_width,
const int nOrientation,
scalar_t *top_data)
// Introduce the meaning of each parameter :
//*bottom_data: Is the input eigenvector graph (K,N,H,W) The pointer after expanding into a one-dimensional array .
//*bottom_rois: Namely RPN Suggested rois(cx,cy,w,h,theta) One dimensional array pointer to ;
//nOrientation: Represents dividing the channel into 4/8 Group
//*top_data: Pointer to the pooled characteristic graph .
// index: Is the current thread id, After pooling *top_data The corresponding subscript .
CUDA_1D_KERNEL_LOOP(index, nthreads) {
// (n, c, ph, pw) is an element in the pooled output
// because index It's a one-dimensional array , For the sake of calculation , Calculate the position of the output characteristic graph corresponding to the one-dimensional array (n,c,o,ph,pw): At present
//index Corresponding to the first n Page of the image o On the group channel (ph,pw) Location .
int pw = index % pooled_width;
int ph = (index / pooled_width) % pooled_height;
int o = (index / pooled_width / pooled_height) % nOrientation;
int c = (index / pooled_width / pooled_height / nOrientation) % channels;
int n = index / pooled_width / pooled_height / nOrientation / channels;
// Take out roi Subscript of box .
const scalar_t* offset_bottom_rois = bottom_rois + n * 6;
int roi_batch_ind = offset_bottom_rois[0];
// obtain roi Of (cx,cy,w,h,theta)
scalar_t roi_center_w = offset_bottom_rois[1] * spatial_scale;
scalar_t roi_center_h = offset_bottom_rois[2] * spatial_scale;
scalar_t roi_width = offset_bottom_rois[3] * spatial_scale;
scalar_t roi_height = offset_bottom_rois[4] * spatial_scale;
scalar_t theta = offset_bottom_rois[5];
// obtain roi Width and height
roi_width = max(roi_width, (scalar_t)1.);
roi_height = max(roi_height, (scalar_t)1.);
// Get in h The number of points whose direction needs interpolation , For example, pool to 7*7 size : be 77/7=10 The height of each sub block is 10; w The same direction .
scalar_t bin_size_h = static_cast<scalar_t>(roi_height) / static_cast<scalar_t>(pooled_height);
scalar_t bin_size_w = static_cast<scalar_t>(roi_width) / static_cast<scalar_t>(pooled_width);
// Corresponding thesis r = theta*N/(2*pi) The formula , That is to say, the current roi In which group of channels
scalar_t ind_float = theta * nOrientation / (2 * PI);
// take ind_float integer
int ind = floor(ind_float);
// Get the formula in the paper 9 The coefficient in alpha value .
scalar_t l_var = ind_float - (scalar_t)ind;
scalar_t r_var = 1.0 - l_var;
// obtain ind Start rotating channel values ( It's to exclude theta>2*pi situation . Take the remainder after one circle ):
ind = (ind + nOrientation) % nOrientation;
// Get the channel that needs to be adjusted index.
// such as ind = 0, o = 0, be ind=0. here ind_rot = 0; ind_rot_plus = 1;
== Meaning is ind = 0 Facing object about 0 The calculation of No. output channel requires With the help of the input eigenvector 0 and 1 Pixel value of channel No .==
int ind_rot = (o - ind + nOrientation) % nOrientation;
int ind_rot_plus = (ind_rot + 1 + nOrientation) % nOrientation;
// Take out ind_rot and ind_rot_plus Corresponding pixel value
const scalar_t* offset_bottom_data =
bottom_data + (roi_batch_ind * channels * nOrientation + c * nOrientation + ind_rot) * height * width;
const scalar_t* offset_bottom_data_plus =
bottom_data + (roi_batch_ind * channels * nOrientation + c * nOrientation + ind_rot_plus) * height * width;
// The number of bilinear interpolation samples , Usually it is 2
int roi_bin_grid_h = (sample_num > 0)
? sample_num
: ceil(roi_height / pooled_height); // e.g., = 2
int roi_bin_grid_w =
(sample_num > 0) ? sample_num : ceil(roi_width / pooled_width);
// take roi become [xmin,ymin,theta] Format
scalar_t roi_start_h = -roi_height / 2.0;
scalar_t roi_start_w = -roi_width / 2.0;
scalar_t cosscalar_theta = cos(theta);
scalar_t sinscalar_theta = sin(theta);
// Determine the total number of sampling points , Finally, take the mean value .
const scalar_t count = roi_bin_grid_h * roi_bin_grid_w; // e.g. = 4
scalar_t output_val = 0.;
// Loop through the pixel values in each sub block , such as roi_w = 77, roi_h = 777, pooled_w=pooed_h=7.
// Then each sub block is (77/7, 777/7) size , That is, the following code represents the position of the pixel value in each sub block .
for (int iy = 0; iy < roi_bin_grid_h; iy++) {
// e.g., iy = 0, 1
const scalar_t yy = roi_start_h + ph * bin_size_h +
static_cast<scalar_t>(iy + .5f) * bin_size_h /
static_cast<scalar_t>(roi_bin_grid_h); // e.g., 0.5, 1.5
for (int ix = 0; ix < roi_bin_grid_w; ix++) {
const scalar_t xx = roi_start_w + pw * bin_size_w +
static_cast<scalar_t>(ix + .5f) * bin_size_w /
static_cast<scalar_t>(roi_bin_grid_w);
// Perform radial transformation for each position , Get the position after rotation
scalar_t x = xx * cosscalar_theta - yy * sinscalar_theta + roi_center_w;
scalar_t y = xx * sinscalar_theta + yy * cosscalar_theta + roi_center_h;
// With the rotating position (y,x) after , Execute bilinear interpolation to get The pixel value of the position of the current group channel .
scalar_t val = bilinear_interpolate<scalar_t>(
offset_bottom_data, height, width, y, x);
scalar_t val_plus = bilinear_interpolate<scalar_t>(
offset_bottom_data_plus, height, width, y, x);
// Execute the paper formula 9 Bilinear interpolation in .
output_val += r_var * val + l_var * val_plus;
}
}
// Take the mean
output_val /= count;
// Put the values in the corresponding output characteristic diagram index Pixel value .
top_data[index] = output_val;
}
}
As you can see from the code , It is not what the author said in his paper Space alignment in channel alignment . The author combines the two in practice , That is, after the pixel value of the channel position is determined, it is executed incidentally RRoIAlign.
If you still feel confused , Let me give an example here , I pushed the running process by myself . It's also ppt.
hypothesis N=4, Immediate Division 4 Group channel , Corresponding to nOrientation.r namely ind,o Represents the channel subscript of the pooled eigenvector . Take the table for example , When r=1 when ,o=1 When , About to enter the... Of the channel 0 and 1 Take the pixel value of channel No RiRoiAlign, And put the calculated pixel value into o=1 The location , That is, the channel alignment is realized . It's a cyclic coding process .
summary
Feeling RiRoiAlign In essence, objects with different orientations are placed under a relative reference system in the channel dimension , Let objects with different orientations look at themselves , The position of the channel is always aligned with its orientation . Thus, the real rotation invariance is realized .
边栏推荐
猜你喜欢
Redis设计与实现(六)| 集群(分片)
电流探头电路分析
Flink SQL custom connector
[data analysis and display]
Flink 数据偶尔数据积压导致checkpoint失败
TiDB v6.0.0 (DMR) :缓存表初试丨TiDB Book Rush
【NVMe2.0b 14-3】Doorbell Buffer Config command、Device Self-test command
Redis design and Implementation (VI) | cluster (sharding)
Tidb v6.0.0 (DMR): initial test of cache table - tidb Book rush
Redis设计与实现(三)| 服务器与客户端的交互(事件IO模型)
随机推荐
【NVMe2.0b 14】NVMe Admin Command Set
将线程绑定在某个具体的CPU逻辑内核上运行
增强for循环的增删操作 & 迭代器删除集合元素
[untitled]
【NVMe2.0b 14-1】Abort、Asynchronous Event Request、Capacity Management command
Codeworks 5 questions per day (1700 for each) - the third day
2021-02-19
Redis design and Implementation (VII) | publish & subscribe
微信公众号第三方平台开发,零基础入门。想学我教你啊
云服务器上部署仿牛客网项目
Detectron2 source code reading 2--- using the configurable decorator to build the dataloader
2021-04-29
Flink sql -- No factory implements ‘org.apache.flink.table.delegation.ExecutorFactory‘.
2021-02-18
【NVMe2.0b 14-2】Create/Delete Queue
Unity简单shader
Enhance the add / delete operation of for loop & iterator delete collection elements
国债逆回购绝对安全吗 网上怎么开户
Summary of common pytoch APIs
2021-05-17