当前位置：网站首页>Interpretation of the champion scheme of CVPR 2020 night target detection challenge

Interpretation of the champion scheme of CVPR 2020 night target detection challenge

2022-07-04 15:44:00 【Xiaobai learns vision】

stay CVPR 2020 Workshop Host NightOwls Detection Challenge in , From the domestic team Shenlan Technology DeepBlueAI The team got “ Single frame pedestrian detection ” and “ Multi frame pedestrian detection ” Two track champions , as well as “ Detect all objects in a single frame ” Runner up on the track .

The main purpose of the competition is to detect pedestrians or objects at night , It's a lot of systems , In particular, one of the keys to the safety and reliability of autopilot . as everyone knows , Panda intelligent bus is the core product of Shenlan technology automatic driving , since 2019 It won Guangzhou in 2007 、 Changsha 、 Shanghai 、 After Wuhan's automatic driving test license plate , This year, 5 In June, it successfully won the road test license plate of Shenzhen intelligent Internet connection automobile . The plan for the champion and runner up , Will be combined with daytime pedestrian detection , Create an all-weather pedestrian detection system suitable for different weather conditions , It is expected to be applied in Panda intelligent bus , For its safe driving escort .

Shenlan technology adheres to “ Artificial intelligence , Serving the people's livelihood ” For the idea , Responding to the call of national policy , Insight into the pain of the people , We are committed to bringing high-quality AI products and solutions to more people , The panda intelligent bus developed with ingenuity will be used in the field of intelligent city public transportation “ New infrastructure ”, To enhance the new experience of public travel .

The following is an introduction to DeepBlueAI Team solutions .

NightOwls Introduction to the testing challenge

testing RGB The pedestrian in the nighttime scene captured by the camera , It's a very important but underappreciated problem , At present, the latest visual detection algorithms can not predict the results well . official baseline stay Caltech（ Famous pedestrian detection dataset ） Upper Miss Rate（ The smaller the better. ） You can achieve 7.36%, But in the nighttime pedestrian dataset, it can only achieve 63.99%.

Pedestrian detection at night is a lot of systems （ Such as safe and reliable autopilot ） The key components of , However, the use of computer vision to solve the problem of night scene detection has not received much attention , therefore CVPR 2020 Scalability in Autonomous Driving Workshop A corresponding competition was launched .

NightOwls Detetection Challenge 2020 There are three questions ： Single frame pedestrian detection （ The title of the contest is related to 2019 Same year ）、 Multi frame pedestrian detection , And detect all objects in a single frame （ Including the pedestrian 、 Bicycle 、 There are three categories of motorcycles ）.

Pedestrian Detection from a Single Frame (same as 2019 competition)
Pedestrian Detection from a Multiple Frames All Objects Detection (pedestrian, cyclist, motorbike) from a Single Frame

Introduction to the contest question

An example of a nighttime pedestrian dataset

Track 1: Pedestrian detection from a single frame

The task only requires pedestrian detection （ Corresponding Ground truth in category_id = 1 Pedestrian categories of ）, And the algorithm can only use the current frame as the input of detection , This topic is related to ICCV 2019 NightOwls The challenge is the same .

Track 2: Pedestrian detection from multiple frames

The requirements and tasks of the mission 1 identical , It's just pedestrian detection , But this task allows you to use the current frame and all previous frames (N, N-1, N-2, …) To predict the pedestrian in the current frame .

The data sets for these two tasks are 279000 Zhang Quan notes the picture composition , These images are from dawn and night in many European cities 40 A video , And it covers different weather conditions .

The model effect evaluation uses the commonly used indicators in pedestrian detection Average Miss Rate metric, But only the height > = 50px The non occluded target of .

Track 3: All Objects Detection (pedestrian, cyclist, motorbike) from a Single Frame

This task requires the detection of all the categories in the frame that have appeared in the training set , Including bicycles 、 The motorcycle , And video sequence information is not allowed .

The difficulty of the contest

The main difficulties of this competition include the following aspects ：

Motion blur and image noise

It is different from the conventional detection data set , The competition takes into account the actual driving situation , The data is collected while the vehicle is moving , So when the speed is fast or there is relative motion, it will produce a continuous motion blur image . And because the camera is normal RGB The camera , As a result, the quality of the images collected in a weak light environment is greatly reduced , This is also the main reason for the effect of the model .

There is a big difference in contrast , Less color information

This is due to the fact that the data collected are mainly from the nighttime environment , So when it comes to data enhancement, you need to be careful , Different enhancement methods will have a greater impact .

Different data distribution

The data set of the competition covers different cities and weather , Previous pedestrian detection data sets generally do not meet these two conditions at the same time . The data is diverse , And there is a big difference between the data distribution of common data sets . The competition data set and the data set commonly used in training pre training model （ Such as COCO Data sets 、OBJ365） There is a big difference in the distribution of data for , Therefore, the pre training model based on common datasets is carried out fine-tune The effect is not as good as expected .

DeepBlueAI Team solutions

DeepBlueAI The team has achieved champion results in two tracks of single frame pedestrian detection and multi frame pedestrian detection , In the detection of all objects in a single frame, the track won second place .

In terms of detectors , The team first constructed a baseline：

Baseline = Backbone + DCN + FPN + Cascade + anchor ratio (2.44)

These modules are already in competition 「 Frequent visitor 」, It has also been analyzed thoroughly by many professionals , No more details here .DeepBlueAI The team did a simple experiment , Finding these modules is always useful , The algorithm is then used as baseline, Plus a little bit of pedestrian detection trick, Such as the anchor ratio Change it to 2.44、 For those marked as ignore In the training process loss No echo processing .

The main work includes the following aspects ：

1. Double Heads

By observing the experiment, we found that ,baseline The stone pillars in the background 、 Objects such as lamp posts are detected as pedestrians , This situation is mostly related to head The effect is not good . The team conducted experiments based on this , Such as TSD [7]、CLS [8]、double head [9], And finally chose a good effect and cost-effective double head structure （ As shown in the figure below ）：

Double Heads structure

Through the contrast experiment, we can find that ： Use FC-head Make a classification 、Conv-head Make a return , You can get the best results .

Classification needs more semantic information , Coordinate box regression requires more spatial information ,double head Methods divide and rule , Design for different needs head structure , So it's more effective . Of course, this method will also lead to an increase in the amount of calculation . In the case of balancing speed and accuracy , The team finally chose 3 A remnant 2 individual Non-local common 5 A module .

2. CBNet [10]

Merge more powerful backbone It can improve the performance of target detector .CBNet The author proposes a novel strategy , By adjacency backbone The composite connection between (Composite Connection) To combine multiple identical backbone. In this way, they built a more powerful backbone, be called 「 Composite backbone network 」(Composite Backbone Network).

Of course, this also brings about the increase of model parameter size and training time , Belong to speed–accuracy trade-off. The team has tried other ways to improve , But in the end, I chose the more practical one CBNet, This method does not need to worry about the pre training weight .

The team chose the cost-effective double backbone Model structure .

3. Data to enhance

The team found that Pixel-level The performance results are greatly reduced due to the enhanced way of , So I didn't keep trying in this direction .

And image enhancement Retinex, Visually, it brings image enhancement , But this method may destroy the structure information of the original image , The result is no improvement in the end result .

therefore , The team finally chose Spatial-level How to enhance , Make the results have a certain improvement .

Details of the experiment

1. take Cascade rcnn + DCN + FPN As baseline;

2. The original head Change it to Double head;

3. take CBNet As backbone;

4. Use cascade rcnn COCO-Pretrained weight;

5. Data to enhance ;

6. Multiscale training + Testing tricks.

experimental result

The following figure shows the results of the method used by the team on the local validation set ：

The team will compare this year's performance with last year's ICCV 2019 Compared with the track champion algorithm , We found that without using additional data sets , Last year, the single model was in 9 Under the fusion of three scales 11.06, And the team's algorithm is using 2 In the case of five scales, we can achieve 10.49.

The future work

Although the team has achieved good results , However, based on the existing experience, some future work directions are proposed ：

1. Because of the particularity of the data , The team tried to use some enhancement methods to improve the image quality 、 Brightness and other properties , Make the pedestrian in the picture easier to detect . But it turns out that these enhancement methods may destroy the original image structure , The effect is reduced . The team believes there will be better night time image processing , It just needs more research and exploration .

2. In track 2 where the previous frame information is allowed , The team used only a few simple IoU Information . Because the camera that collects this data set has been moving , The team has previously used some of the SOTA Methods , But it didn't work well . They think that they can explore how to use timing frame information in the future .

3. There are a large number of daytime pedestrian detection data sets in this field , So the team thinks it's time to try Domain Adaption The way of direction , To make the most of the pedestrian dataset .

reference

[1] Lin T Y , Dollár, Piotr, Girshick R , et al. Feature Pyramid Networks for Object Detection[J]. 2016.

[2] Dai J, Qi H, Xiong Y, et al. Deformable Convolutional Networks[J]. 2017.

[3] Cai Z , Vasconcelos N . Cascade R-CNN: Delving into High Quality Object Detection[J]. 2017.

[4] Xie S , Girshick R , Dollar P , et al. Aggregated Residual Transformations for Deep Neural Networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, 2017.

[5] Bochinski E , Eiselein V , Sikora T . High-Speed tracking-by-detection without using image information[C]// 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 2017.

[6] Henriques J F , Caseiro R , Martins P , et al. High-Speed Tracking with Kernelized Correlation Filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3):583-596.

[7] Song G , Liu Y , Wang X . Revisiting the Sibling Head in Object Detector[J]. 2020.

[8] Li A , Yang X , Zhang C . Rethinking Classification and Localization for Cascade R-CNN[J]. 2019.

[9] Wu, Y., Chen, Y., Yuan, L., Liu, Z., Wang, L., Li, H., & Fu, Y. (2019). Rethinking Classification and Localization in R-CNN. ArXiv, abs/1904.06493.

[10] Liu, Y., Wang, Y., Wang, S., Liang, T., Zhao, Q., Tang, Z., & Ling, H. (2020). CBNet: A Novel Composite Backbone Network Architecture for Object Detection. ArXiv, abs/1909.03625.

原网站

版权声明
本文为[Xiaobai learns vision]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202141224392076.html

当前位置：网站首页>Interpretation of the champion scheme of CVPR 2020 night target detection challenge

Interpretation of the champion scheme of CVPR 2020 night target detection challenge

边栏推荐

猜你喜欢

随机推荐