当前位置:网站首页>Bev instance prediction based on monocular camera (iccv 2021)
Bev instance prediction based on monocular camera (iccv 2021)
2022-06-30 05:18:00 【3D vision workshop】
Author Huang Yu @ You know
Source https://zhuanlan.zhihu.com/p/422992592
Editor 3D Visual workshop
ICCV‘21 The paper “FIERY: Future Instance Prediction in Bird’s-Eye View from Surround Monocular Cameras“, The author is from an autonomous driving start-up company in the UK Wayve And Cambridge University .

Driving needs to interact with road intelligence and predict their future behavior , For safe navigation .FIERY It's a monocular camera BEV Future probability prediction model . It predicts the future instance segmentation and motion of dynamic intelligent body , Convert to nonparametric future trajectory . Combined with the perception of traditional autopilot stack 、 Fusion and prediction components , Directly from RGB Monocular camera input estimation BEV forecast .
FIERY Learn to model the inherent randomness of the future based on camera driving data in an end-to-end manner , Independent of HD map , Predict multimodal future trajectory .
Open source code :https://github.com/wayveai/fiery
Blog address :https://wayve.ai/blog/fiery-future-instance-prediction-birds-eye-view/
The following two figures are BEV Schematic diagram of network multimodal future prediction : First two lines :RGB Camera input ; The predicted instance is segmented and projected to the ground plane , Visualize the average future trajectory of the dynamic agent as a transparent path ; Bottom line : stay 100m × 100m A bird's-eye view of the size of the car , Future instance predictions are represented by a central black rectangle .


Model FIERY The overview is shown in the figure : A camera input BEV Future prediction models

·1. Past moment {1, ..., t}, The depth probability distribution of pixels is predicted and the camera internal and external parameters are known , Input the camera into (O1, ..., Ot) Upgrade to 3D;
·2. Project features onto BEV (x1, ..., xt). Use the space converter module S , According to past self motion (a1, ..., at−1), take BEV The feature is converted to the current reference system ( Time t).
·3. 3D Convolution time domain model learning - Empty state st.
·4. Parameterize two probability distributions : Current and future distribution . The current distribution is in its current state st On condition that , The future distribution is in the current state st And future labels (yt+1 , ..., yt+H ) On condition that .
·5. From the future distribution in training and the current distribution in reasoning , Sample a latent code ηt. current state st And hidden code ηt Is an input to the future forecast model , Recursively predict future states (s^t+1,...,s^t+H).
·6. The status is decoded as BEV Future instance segmentation and future motion (yˆt,...,yˆt+H).
Here is the depth probability (depth probability) As a form of self - attention , The feature is modulated by predicting the depth plane according to the feature . Use a known camera for internal and external reference ( Relative to the vehicle ), From every camera (u1t,...,unt) In a common reference coordinate system ( Time t The center of inertia of the vehicle ) Upgrade to 3D .
Modelled on the ECCV‘20 The paper “Probabilistic future prediction for video scene understanding“ The job of , Using conditional variation (variational) Method to simulate the inherent randomness of future prediction . Two distributions are introduced : Current distribution P Only the current spatiotemporal state can be accessed st, And future distribution F You can also access the observed future tags (yt+1,...,yt+H), among H Is the future forecast range .
During training , Using samples from future distributions ηt To enforce predictions consistent with observing the future , Cover with KL- The pattern of divergence loss encourages the current distribution to cover the observed future . In reasoning , Sample from current distribution ηt, Each of these samples encodes a possible future .
The future prediction model is a convolution GRU The Internet , Change the current state st And future distribution in training F Or current distribution P Sampling potential code ηt As input , Reasoning , Recursively predict future states .
The output feature is an aerial view decoder D The input of , It is fed into multiple output heads : Semantic segmentation 、 Instance center and instance offset ( Point to the center of the instance ), And instance future flow ( motion ). The figure below shows the model output diagram :

The instance segmentation result :(i) The instance center is obtained by non maximum suppression ;(ii) Use the offset vector to group the pixels to the nearest instance Center ;(iii) Future flows allow consistent instance identification , Adopt from t To t + 1 Future flow and time t + 1 To compare warped center.
The experimental measure is :Video Panoptic Quality (VQP) and Generalised Energy Distance(DGED) .
Benchmark methods include :
·VPN(“Cross-view semantic segmentation for sensing surroundings,” IEEE Robotics and Automation Letters, 2020)
·VED(“Monocular semantic occupancy grid mapping with convolutional variational encoder–decoder networks,” IEEE Robotics and Automation Letters, 2019.)
·PON(“Predicting semantic map representations from images using pyramid occupancy networks,” CVPR 2020)
·Lift-Splat(“Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d,” ECCV, 2020)
·STA(“Enabling spatio- temporal aggregation in birds-eye-view vehicle estimation,” ICRA, 2021)
·Fishing Net(“Fishing net: Future inference of semantic heatmaps in grids. CVPR‘20 workshop)
The experimental results are as follows :

among Setting 1,2,3 Defined as
· Set up 1:100m × 50m,25cm The resolution of the . Forecast for the current time range .
· Set up 2:100m × 100m,50cm The resolution of the . Forecast for the current time range .
· Set up 3:32.0m × 19.2m,10cm The resolution of the . Predicting the future 2.0s. Where the model and Fishing Net Compare the two variants of , One that uses camera input , One uses lidar input .


As shown in the figure FIERY Static( No time context ) and FIERY( In the past 1.0s) stay NuScenes Data current frame BEV Comparison of task results of instance segmentation :FIERY Can predict partially observable and occluded elements , Such as the protruding part of the blue ellipse .

(a) Even if it is blocked , It can also correctly predict the vehicles parked on the left .

(b) The two cars parked on the left were seriously blocked by vehicles in the opposite lane , But by fusing past information , Accurately predict their location .
This article is only for academic sharing , If there is any infringement , Please contact to delete .
3D Recommended visual quality courses :
1. Multi sensor data fusion technology for automatic driving field
2. For the field of automatic driving 3D Whole stack learning route of point cloud target detection !( Single mode + Multimodal / data + Code )
3. Thoroughly understand the visual three-dimensional reconstruction : Principle analysis 、 Code explanation 、 Optimization and improvement
4. China's first point cloud processing course for industrial practice
5. laser - Vision -IMU-GPS The fusion SLAM Algorithm sorting and code explanation
6. Thoroughly understand the vision - inertia SLAM: be based on VINS-Fusion The class officially started
7. Thoroughly understand based on LOAM Framework of the 3D laser SLAM: Source code analysis to algorithm optimization
8. Thorough analysis of indoor 、 Outdoor laser SLAM Key algorithm principle 、 Code and actual combat (cartographer+LOAM +LIO-SAM)
10. Monocular depth estimation method : Algorithm sorting and code implementation
11. Deployment of deep learning model in autopilot
12. Camera model and calibration ( Monocular + Binocular + fisheye )
13. blockbuster ! Four rotor aircraft : Algorithm and practice
14.ROS2 From entry to mastery : Theory and practice
blockbuster !3DCVer- Academic paper writing contribution Communication group Established
Scan the code to add a little assistant wechat , can Apply to join 3D Visual workshop - Academic paper writing and contribution WeChat ac group , The purpose is to communicate with each other 、 Top issue 、SCI、EI And so on .
meanwhile You can also apply to join our subdivided direction communication group , At present, there are mainly 3D Vision 、CV& Deep learning 、SLAM、 Three dimensional reconstruction 、 Point cloud post processing 、 Autopilot 、 Multi-sensor fusion 、CV introduction 、 Three dimensional measurement 、VR/AR、3D Face recognition 、 Medical imaging 、 defect detection 、 Pedestrian recognition 、 Target tracking 、 Visual products landing 、 The visual contest 、 License plate recognition 、 Hardware selection 、 Academic exchange 、 Job exchange 、ORB-SLAM Series source code exchange 、 Depth estimation Wait for wechat group .
Be sure to note : Research direction + School / company + nickname , for example :”3D Vision + Shanghai Jiaotong University + quietly “. Please note... According to the format , Can be quickly passed and invited into the group . Original contribution Please also contact .

▲ Long press and add wechat group or contribute

▲ The official account of long click attention
3D Vision goes from entry to mastery of knowledge : in the light of 3D In the field of vision Video Course cheng ( 3D reconstruction series 、 3D point cloud series 、 Structured light series 、 Hand eye calibration 、 Camera calibration 、 laser / Vision SLAM、 Automatically Driving, etc )、 Summary of knowledge points 、 Introduction advanced learning route 、 newest paper Share 、 Question answer Carry out deep cultivation in five aspects , There are also algorithm engineers from various large factories to provide technical guidance . meanwhile , The planet will be jointly released by well-known enterprises 3D Vision related algorithm development positions and project docking information , Create a set of technology and employment as one of the iron fans gathering area , near 4000 Planet members create better AI The world is making progress together , Knowledge planet portal :
Study 3D Visual core technology , Scan to see the introduction ,3 Unconditional refund within days

There are high quality tutorial materials in the circle 、 Answer questions and solve doubts 、 Help you solve problems efficiently
Feel useful , Please give me a compliment ~
边栏推荐
- Unity obtains serial port data
- Chapter 7 vertex processing and drawing commands of OpenGL super classic (7th Edition)
- [Motrix] download Baidu cloud files using Motrix
- 东塔攻防世界—xss绕过安全狗
- Unity3d position the model, rotate, drag and zoom around the model to obtain the center point of the model
- 炒美原油的国际交易平台如何能保障资金安全呢?
- Unity shortcut key
- Bessel curve with n control points
- Parkour demo
- Postman 做测试的 6 个常见问题
猜你喜欢

Network communication problem locating steps

Unity ugui text value suspended enlarged display add text background

Unity Logitech steering wheel access

PWN Introduction (2) stack overflow Foundation

Procedural animation -- inverse kinematics of tentacles

Unity 3D model operation and UI conflict Scrollview

MinGW-w64下载文件失败the file has been downloaded incorrectly!

Redis cluster concept

The fourth day of learning C language for Asian people

Unity- the camera follows the player
随机推荐
Intellj idea generates jar packages for projects containing external lib to other projects. The method refers to the jar package written by itself
Introduction to mmcv common APIs
Pytorchcnn image recognition and classification model training framework
Unity packaging failure solution
Force buckle 704 Binary search
MinGW-w64下载文件失败the file has been downloaded incorrectly!
Four methods of unity ugui button binding events
OpenGL draws model on QT platform to solve the problem of initializing VAO and VBO
Does the tester need to analyze the cause of the bug?
Unity Catmull ROM curve
Generate a slice of mesh Foundation
[notes] unity Scrollview button page turning
Intellj idea jars projects containing external lib to other project reference methods - jars
Initial environment configuration of the list of OpenGL super classic (version 7) vs2019
[recruitment] UE4 Development Engineer
Unity shader flat shadow
终端便捷ssh(免密)连接
Unity notes_ SQL Function
【VCS+Verdi联合仿真】~ 以计数器为例
VFPBS在IIS下调用EXCEL遇到的Access is denied