当前位置:网站首页>Interpretation of the champion scheme of CVPR 2020 night target detection challenge
Interpretation of the champion scheme of CVPR 2020 night target detection challenge
2022-07-04 15:44:00 【Xiaobai learns vision】
stay CVPR 2020 Workshop Host NightOwls Detection Challenge in , From the domestic team Shenlan Technology DeepBlueAI The team got “ Single frame pedestrian detection ” and “ Multi frame pedestrian detection ” Two track champions , as well as “ Detect all objects in a single frame ” Runner up on the track .
The main purpose of the competition is to detect pedestrians or objects at night , It's a lot of systems , In particular, one of the keys to the safety and reliability of autopilot . as everyone knows , Panda intelligent bus is the core product of Shenlan technology automatic driving , since 2019 It won Guangzhou in 2007 、 Changsha 、 Shanghai 、 After Wuhan's automatic driving test license plate , This year, 5 In June, it successfully won the road test license plate of Shenzhen intelligent Internet connection automobile . The plan for the champion and runner up , Will be combined with daytime pedestrian detection , Create an all-weather pedestrian detection system suitable for different weather conditions , It is expected to be applied in Panda intelligent bus , For its safe driving escort .
Shenlan technology adheres to “ Artificial intelligence , Serving the people's livelihood ” For the idea , Responding to the call of national policy , Insight into the pain of the people , We are committed to bringing high-quality AI products and solutions to more people , The panda intelligent bus developed with ingenuity will be used in the field of intelligent city public transportation “ New infrastructure ”, To enhance the new experience of public travel .
The following is an introduction to DeepBlueAI Team solutions .
NightOwls Introduction to the testing challenge
testing RGB The pedestrian in the nighttime scene captured by the camera , It's a very important but underappreciated problem , At present, the latest visual detection algorithms can not predict the results well . official baseline stay Caltech( Famous pedestrian detection dataset ) Upper Miss Rate( The smaller the better. ) You can achieve 7.36%, But in the nighttime pedestrian dataset, it can only achieve 63.99%.
Pedestrian detection at night is a lot of systems ( Such as safe and reliable autopilot ) The key components of , However, the use of computer vision to solve the problem of night scene detection has not received much attention , therefore CVPR 2020 Scalability in Autonomous Driving Workshop A corresponding competition was launched .
NightOwls Detetection Challenge 2020 There are three questions : Single frame pedestrian detection ( The title of the contest is related to 2019 Same year )、 Multi frame pedestrian detection , And detect all objects in a single frame ( Including the pedestrian 、 Bicycle 、 There are three categories of motorcycles ).
- Pedestrian Detection from a Single Frame (same as 2019 competition)
- Pedestrian Detection from a Multiple Frames All Objects Detection (pedestrian, cyclist, motorbike) from a Single Frame
Introduction to the contest question
An example of a nighttime pedestrian dataset
Track 1: Pedestrian detection from a single frame
The task only requires pedestrian detection ( Corresponding Ground truth in category_id = 1 Pedestrian categories of ), And the algorithm can only use the current frame as the input of detection , This topic is related to ICCV 2019 NightOwls The challenge is the same .
Track 2: Pedestrian detection from multiple frames
The requirements and tasks of the mission 1 identical , It's just pedestrian detection , But this task allows you to use the current frame and all previous frames (N, N-1, N-2, …) To predict the pedestrian in the current frame .
The data sets for these two tasks are 279000 Zhang Quan notes the picture composition , These images are from dawn and night in many European cities 40 A video , And it covers different weather conditions .
The model effect evaluation uses the commonly used indicators in pedestrian detection Average Miss Rate metric, But only the height > = 50px The non occluded target of .
Track 3: All Objects Detection (pedestrian, cyclist, motorbike) from a Single Frame
This task requires the detection of all the categories in the frame that have appeared in the training set , Including bicycles 、 The motorcycle , And video sequence information is not allowed .
The difficulty of the contest
The main difficulties of this competition include the following aspects :
- Motion blur and image noise
It is different from the conventional detection data set , The competition takes into account the actual driving situation , The data is collected while the vehicle is moving , So when the speed is fast or there is relative motion, it will produce a continuous motion blur image . And because the camera is normal RGB The camera , As a result, the quality of the images collected in a weak light environment is greatly reduced , This is also the main reason for the effect of the model .
- There is a big difference in contrast , Less color information
This is due to the fact that the data collected are mainly from the nighttime environment , So when it comes to data enhancement, you need to be careful , Different enhancement methods will have a greater impact .
- Different data distribution
The data set of the competition covers different cities and weather , Previous pedestrian detection data sets generally do not meet these two conditions at the same time . The data is diverse , And there is a big difference between the data distribution of common data sets . The competition data set and the data set commonly used in training pre training model ( Such as COCO Data sets 、OBJ365) There is a big difference in the distribution of data for , Therefore, the pre training model based on common datasets is carried out fine-tune The effect is not as good as expected .
DeepBlueAI Team solutions
DeepBlueAI The team has achieved champion results in two tracks of single frame pedestrian detection and multi frame pedestrian detection , In the detection of all objects in a single frame, the track won second place .
In terms of detectors , The team first constructed a baseline:
Baseline = Backbone + DCN + FPN + Cascade + anchor ratio (2.44)
These modules are already in competition 「 Frequent visitor 」, It has also been analyzed thoroughly by many professionals , No more details here .DeepBlueAI The team did a simple experiment , Finding these modules is always useful , The algorithm is then used as baseline, Plus a little bit of pedestrian detection trick, Such as the anchor ratio Change it to 2.44、 For those marked as ignore In the training process loss No echo processing .
The main work includes the following aspects :
1. Double Heads
By observing the experiment, we found that ,baseline The stone pillars in the background 、 Objects such as lamp posts are detected as pedestrians , This situation is mostly related to head The effect is not good . The team conducted experiments based on this , Such as TSD [7]、CLS [8]、double head [9], And finally chose a good effect and cost-effective double head structure ( As shown in the figure below ):
Double Heads structure
Through the contrast experiment, we can find that : Use FC-head Make a classification 、Conv-head Make a return , You can get the best results .
Classification needs more semantic information , Coordinate box regression requires more spatial information ,double head Methods divide and rule , Design for different needs head structure , So it's more effective . Of course, this method will also lead to an increase in the amount of calculation . In the case of balancing speed and accuracy , The team finally chose 3 A remnant 2 individual Non-local common 5 A module .
2. CBNet [10]
Merge more powerful backbone It can improve the performance of target detector .CBNet The author proposes a novel strategy , By adjacency backbone The composite connection between (Composite Connection) To combine multiple identical backbone. In this way, they built a more powerful backbone, be called 「 Composite backbone network 」(Composite Backbone Network).
Of course, this also brings about the increase of model parameter size and training time , Belong to speed–accuracy trade-off. The team has tried other ways to improve , But in the end, I chose the more practical one CBNet, This method does not need to worry about the pre training weight .
The team chose the cost-effective double backbone Model structure .
3. Data to enhance
The team found that Pixel-level The performance results are greatly reduced due to the enhanced way of , So I didn't keep trying in this direction .
And image enhancement Retinex, Visually, it brings image enhancement , But this method may destroy the structure information of the original image , The result is no improvement in the end result .
therefore , The team finally chose Spatial-level How to enhance , Make the results have a certain improvement .
Details of the experiment
1. take Cascade rcnn + DCN + FPN As baseline;
2. The original head Change it to Double head;
3. take CBNet As backbone;
4. Use cascade rcnn COCO-Pretrained weight;
5. Data to enhance ;
6. Multiscale training + Testing tricks.
experimental result
The following figure shows the results of the method used by the team on the local validation set :
The team will compare this year's performance with last year's ICCV 2019 Compared with the track champion algorithm , We found that without using additional data sets , Last year, the single model was in 9 Under the fusion of three scales 11.06, And the team's algorithm is using 2 In the case of five scales, we can achieve 10.49.
The future work
Although the team has achieved good results , However, based on the existing experience, some future work directions are proposed :
1. Because of the particularity of the data , The team tried to use some enhancement methods to improve the image quality 、 Brightness and other properties , Make the pedestrian in the picture easier to detect . But it turns out that these enhancement methods may destroy the original image structure , The effect is reduced . The team believes there will be better night time image processing , It just needs more research and exploration .
2. In track 2 where the previous frame information is allowed , The team used only a few simple IoU Information . Because the camera that collects this data set has been moving , The team has previously used some of the SOTA Methods , But it didn't work well . They think that they can explore how to use timing frame information in the future .
3. There are a large number of daytime pedestrian detection data sets in this field , So the team thinks it's time to try Domain Adaption The way of direction , To make the most of the pedestrian dataset .
reference
[1] Lin T Y , Dollár, Piotr, Girshick R , et al. Feature Pyramid Networks for Object Detection[J]. 2016.
[2] Dai J, Qi H, Xiong Y, et al. Deformable Convolutional Networks[J]. 2017.
[3] Cai Z , Vasconcelos N . Cascade R-CNN: Delving into High Quality Object Detection[J]. 2017.
[4] Xie S , Girshick R , Dollar P , et al. Aggregated Residual Transformations for Deep Neural Networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, 2017.
[5] Bochinski E , Eiselein V , Sikora T . High-Speed tracking-by-detection without using image information[C]// 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 2017.
[6] Henriques J F , Caseiro R , Martins P , et al. High-Speed Tracking with Kernelized Correlation Filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3):583-596.
[7] Song G , Liu Y , Wang X . Revisiting the Sibling Head in Object Detector[J]. 2020.
[8] Li A , Yang X , Zhang C . Rethinking Classification and Localization for Cascade R-CNN[J]. 2019.
[9] Wu, Y., Chen, Y., Yuan, L., Liu, Z., Wang, L., Li, H., & Fu, Y. (2019). Rethinking Classification and Localization in R-CNN. ArXiv, abs/1904.06493.
[10] Liu, Y., Wang, Y., Wang, S., Liang, T., Zhao, Q., Tang, Z., & Ling, H. (2020). CBNet: A Novel Composite Backbone Network Architecture for Object Detection. ArXiv, abs/1909.03625.
边栏推荐
- Unity script API - time class
- MySQL~MySQL给已有的数据表添加自增ID
- Redis的4种缓存模式分享
- Unity script lifecycle day02
- Solve the error of JSON module in PHP compilation and installation under CentOS 6.3
- 夜天之书 #53 Apache 开源社群的“石头汤”
- %f格式符
- 音视频技术开发周刊 | 252
- Weekly recruitment | senior DBA annual salary 49+, the more opportunities, the closer success!
- 宽度与对齐
猜你喜欢
开源人张亮的 17 年成长路线,热爱才能坚持
在芯片高度集成的今天,绝大多数都是CMOS器件
Unity脚本生命周期 Day02
Unity脚本常用API Day03
I plan to teach myself some programming and want to work as a part-time programmer. I want to ask which programmer has a simple part-time platform list and doesn't investigate the degree of the receiv
In today's highly integrated chips, most of them are CMOS devices
2022年九大CIO趋势和优先事项
What is the future of the booming intelligent Internet of things (aiot) in recent years?
Guitar Pro 8win10 latest guitar learning / score / creation
函数式接口,方法引用,Lambda实现的List集合排序小工具
随机推荐
js平铺数据查找叶子节点
进制形式
直播预告 | PostgreSQL 内核解读系列第二讲:PostgreSQL 体系结构
Live broadcast preview | PostgreSQL kernel Interpretation Series II: PostgreSQL architecture
. Net applications consider x64 generation
Unity script API - transform transform
c# 实现定义一套中间SQL可以跨库执行的SQL语句
音视频技术开发周刊 | 252
Unity脚本API—Transform 变换
[book club issue 13] coding format of video files
An article learns variables in go language
Understand Alibaba cloud's secret weapon "dragon architecture" in the article "science popularization talent"
AI做题水平已超过CS博士?
CentOS 6.3 下 PHP编译安装JSON模块报错解决
PXE network
Force button brush question 01 (reverse linked list + sliding window +lru cache mechanism)
开源人张亮的 17 年成长路线,热爱才能坚持
Redis 解决事务冲突之乐观锁和悲观锁
mysql 联合主键_Mysql 创建联合主键[通俗易懂]
函数式接口,方法引用,Lambda实现的List集合排序小工具