当前位置：网站首页>Interpretation of source code demand:a rotation equivariant detector for aerial object detection

Interpretation of source code demand:a rotation equivariant detector for aerial object detection

2022-06-30 08:41:00 【Wu lele~】

List of articles

Preface
1、 Problem solved
2、 Model structure
- 2.1.ReCNN
- 2.2. RiRoiAlign
summary

Preface

Interpretation of this article 2021CVPR Rotating target detection ：ReDet:A Rotation-equivariant Detector for Aerial Object Detection. Attach the address and source code link ：
Thesis download address
Source code address

1、 Problem solved

Insert picture description here
This is what I did at the group meeting ppt. In short, there are two innovative points ：
1） utilize NIPS2019 Of e2cnn Thought rewrites ResNet50 And named it ReCNN, bring CNN With rotational equivariant . That is, when the input image rotates ,CNN The extracted features are the same .
2） after e2cnn Extract the feature vector of the image F(K*N,H,W) after , In the channel dimension , Can be understood as divided into N A set of (N=4/8) representative 4 Two directions or 8 A direction , The number of subchannels in each group is K. but RRoIAlign The module only corrects the objects with different orientations in the spatial dimension , But it is not aligned in the channel dimension , So the author designed RiROIAlign The module is aligned in both the channel dimension and the space dimension , Thus, the characteristic of rotation invariance is obtained .
in general ： This paper designs a very strong feature extractor .

2、 Model structure

Insert picture description here

2.1.ReCNN

I don't understand this either ,e2cnn It's too hard . Just say ： The author is writing ReCNN after , stay ImageNet Retrain and fine tune the test data set .（ Envy the ability to train Backbone People who ）.

2.2. RiRoiAlign

In the model structure diagram, it means to use... First RRoiAlign Modules are spatially aligned , After that, each channel is exchanged in a loop , such as r=2, take Cn2 The channel value is assigned to Cn1,Cn1 The value is assigned to Cnn… Bilinear interpolation is performed between the front and back channels to calculate the pixel value of the current channel .（ I am quite ignorant here , So I went to see the source code ）. Source location ：ReDet-master\mmdet\ops\riroi_align\src\riroi_align_kernel.cu. I try to make detailed notes . Don't understand. Welcome to comment and exchange .

#include <ATen/ATen.h>
#include <THC/THCAtomics.cuh>
#include <math.h>

#define PI 3.141592653

//CUDA It's parallel computing , Multithreaded computing . Each thread corresponds to one after pooling ROI The calculation of a pixel of .
//i For each thread id,n representative CUDA The total number of threads currently allocated .
#define CUDA_1D_KERNEL_LOOP(i, n) \ for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; \ i += blockDim.x * gridDim.x)
// Block size is 1024.
#define THREADS_PER_BLOCK 1024
// Get the grid size according to the block size .
inline int GET_BLOCKS(const int N) {
    
    int optimal_block_num = (N + THREADS_PER_BLOCK - 1) / THREADS_PER_BLOCK;
    int max_block_num = 65000;
    return min(optimal_block_num, max_block_num);
}
// Bilinear interpolation is not posted , There are many comments on the Internet .
template <typename scalar_t>
__device__ scalar_t bilinear_interpolate(const scalar_t *bottom_data,
                                         const int height, const int width,
                                         scalar_t y, scalar_t x)
  /
}

template <typename scalar_t>
__global__ void RiROIAlignForward(const int nthreads, const scalar_t *bottom_data,
                                const scalar_t *bottom_rois,
                                const scalar_t spatial_scale,
                                const int sample_num, const int channels,
                                const int height, const int width,
                                const int pooled_height, const int pooled_width,
                                const int nOrientation,
                                scalar_t *top_data)
    // Introduce the meaning of each parameter ：
    //*bottom_data:  Is the input eigenvector graph (K,N,H,W) The pointer after expanding into a one-dimensional array .
    //*bottom_rois: Namely RPN Suggested rois(cx,cy,w,h,theta) One dimensional array pointer to ;
    //nOrientation:  Represents dividing the channel into 4/8 Group 
    //*top_data： Pointer to the pooled characteristic graph .
    // index： Is the current thread id, After pooling *top_data The corresponding subscript .
    CUDA_1D_KERNEL_LOOP(index, nthreads) {
    
    // (n, c, ph, pw) is an element in the pooled output
    //  because index It's a one-dimensional array , For the sake of calculation , Calculate the position of the output characteristic graph corresponding to the one-dimensional array (n,c,o,ph,pw)： At present 
    //index Corresponding to the first n Page of the image o On the group channel (ph,pw) Location .
    int pw = index % pooled_width;
    int ph = (index / pooled_width) % pooled_height;
    int o = (index / pooled_width / pooled_height) % nOrientation;
    int c = (index / pooled_width / pooled_height / nOrientation) % channels;
    int n = index / pooled_width / pooled_height / nOrientation / channels;
    //  Take out roi Subscript of box .
    const scalar_t* offset_bottom_rois = bottom_rois + n * 6;
    int roi_batch_ind = offset_bottom_rois[0];
    //  obtain roi Of (cx,cy,w,h,theta)
    scalar_t roi_center_w = offset_bottom_rois[1] * spatial_scale;
    scalar_t roi_center_h = offset_bottom_rois[2] * spatial_scale;
    scalar_t roi_width = offset_bottom_rois[3] * spatial_scale;
    scalar_t roi_height = offset_bottom_rois[4] * spatial_scale;
    scalar_t theta = offset_bottom_rois[5];

    //  obtain roi Width and height 
    roi_width = max(roi_width, (scalar_t)1.);
    roi_height = max(roi_height, (scalar_t)1.);
    //  Get in h The number of points whose direction needs interpolation , For example, pool to 7*7 size ： be 77/7=10 The height of each sub block is 10; w The same direction .
    scalar_t bin_size_h = static_cast<scalar_t>(roi_height) / static_cast<scalar_t>(pooled_height);
    scalar_t bin_size_w = static_cast<scalar_t>(roi_width) / static_cast<scalar_t>(pooled_width);
    
    //  Corresponding thesis  r = theta*N/(2*pi) The formula , That is to say, the current roi In which group of channels 
    scalar_t ind_float = theta * nOrientation / (2 * PI);
    //  take ind_float integer 
    int ind =  floor(ind_float);
    //  Get the formula in the paper 9 The coefficient in alpha value .
    scalar_t l_var = ind_float - (scalar_t)ind;
    scalar_t r_var = 1.0 - l_var;
    //  obtain ind Start rotating channel values ( It's to exclude theta>2*pi situation . Take the remainder after one circle )：
    ind = (ind + nOrientation) % nOrientation;
    //  Get the channel that needs to be adjusted index.
    //  such as  ind = 0, o = 0, be ind=0. here  ind_rot = 0; ind_rot_plus = 1;
    == Meaning is  ind = 0 Facing object   about 0 The calculation of No. output channel requires   With the help of the input eigenvector 0 and 1 Pixel value of channel No .==
    int ind_rot = (o - ind + nOrientation) % nOrientation;
    int ind_rot_plus = (ind_rot + 1 + nOrientation) % nOrientation; 
    //  Take out ind_rot and ind_rot_plus Corresponding pixel value 
    const scalar_t* offset_bottom_data =
        bottom_data + (roi_batch_ind * channels * nOrientation + c * nOrientation + ind_rot) * height * width;

    const scalar_t* offset_bottom_data_plus =
        bottom_data + (roi_batch_ind * channels * nOrientation + c * nOrientation + ind_rot_plus) * height * width;

    //  The number of bilinear interpolation samples , Usually it is 2
    int roi_bin_grid_h = (sample_num > 0)
        ? sample_num
        : ceil(roi_height / pooled_height);  // e.g., = 2
    int roi_bin_grid_w =
        (sample_num > 0) ? sample_num : ceil(roi_width / pooled_width);
    
	//  take roi become [xmin,ymin,theta] Format 
    scalar_t roi_start_h = -roi_height / 2.0;
    scalar_t roi_start_w = -roi_width / 2.0;
    scalar_t cosscalar_theta = cos(theta);
    scalar_t sinscalar_theta = sin(theta);

    //  Determine the total number of sampling points , Finally, take the mean value .
    const scalar_t count = roi_bin_grid_h * roi_bin_grid_w;  // e.g. = 4

    scalar_t output_val = 0.;
    //  Loop through the pixel values in each sub block , such as roi_w = 77, roi_h = 777, pooled_w=pooed_h=7.
    // Then each sub block is (77/7, 777/7) size , That is, the following code represents the position of the pixel value in each sub block .
    for (int iy = 0; iy < roi_bin_grid_h; iy++) {
      // e.g., iy = 0, 1
        const scalar_t yy = roi_start_h + ph * bin_size_h +
            static_cast<scalar_t>(iy + .5f) * bin_size_h /
                static_cast<scalar_t>(roi_bin_grid_h);  // e.g., 0.5, 1.5
        for (int ix = 0; ix < roi_bin_grid_w; ix++) {
    
        const scalar_t xx = roi_start_w + pw * bin_size_w +
            static_cast<scalar_t>(ix + .5f) * bin_size_w /
                static_cast<scalar_t>(roi_bin_grid_w);

		//  Perform radial transformation for each position , Get the position after rotation 
        scalar_t x = xx * cosscalar_theta - yy * sinscalar_theta + roi_center_w;
        scalar_t y = xx * sinscalar_theta + yy * cosscalar_theta + roi_center_h;
		//  With the rotating position (y,x) after , Execute bilinear interpolation to get   The pixel value of the position of the current group channel .
        scalar_t val = bilinear_interpolate<scalar_t>(
            offset_bottom_data, height, width, y, x);
        scalar_t val_plus = bilinear_interpolate<scalar_t>(
            offset_bottom_data_plus, height, width, y, x);
        //  Execute the paper formula 9 Bilinear interpolation in .
        output_val += r_var * val + l_var * val_plus;
        }
    }
    //  Take the mean 
    output_val /= count;
    //  Put the values in the corresponding output characteristic diagram index Pixel value .
    top_data[index] = output_val;
    }
}

As you can see from the code , It is not what the author said in his paper Space alignment in channel alignment . The author combines the two in practice , That is, after the pixel value of the channel position is determined, it is executed incidentally RRoIAlign.
If you still feel confused , Let me give an example here , I pushed the running process by myself . It's also ppt.
Insert picture description here
hypothesis N=4, Immediate Division 4 Group channel , Corresponding to nOrientation.r namely ind,o Represents the channel subscript of the pooled eigenvector . Take the table for example , When r=1 when ,o=1 When , About to enter the... Of the channel 0 and 1 Take the pixel value of channel No RiRoiAlign, And put the calculated pixel value into o=1 The location , That is, the channel alignment is realized . It's a cyclic coding process .

summary

Feeling RiRoiAlign In essence, objects with different orientations are placed under a relative reference system in the channel dimension , Let objects with different orientations look at themselves , The position of the channel is always aligned with its orientation . Thus, the real rotation invariance is realized .

原网站

版权声明
本文为[Wu lele~]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202160535221276.html