当前位置:网站首页>Cvpr2020 best paper: unsupervised learning of symmetric deformable 3D objects
Cvpr2020 best paper: unsupervised learning of symmetric deformable 3D objects
2022-07-28 11:33:00 【51CTO】
Code address :
https://github.com/elliottwu/unsup3d
Project address :https://elliottwu.com/projects/unsup3d/
Address of the test :
summary
The author proposes a method of learning three-dimensional deformable objects from the original monocular images , And there is no additional monitoring signal . This method is based on the architecture of self encoder , Convert the input image to depth 、 Reflectivity 、 Visual angle and lighting information . To disassemble these unsupervised components , The author uses such a fact , That is, in general, many objects are a symmetrical structure . Reasoning about lighting allows us to explore potential symmetries , Although the appearance is not symmetrical due to shadows and other reasons . Experiments show that this method can recover very good faces from monocular images 、 The three-dimensional shape of a cat's face and a car .
brief introduction
Understand the meaning of images 3D Structure is very critical in many computer vision applications , Many deep networks are 2D Understand the image on the plane ,3D Modeling can remove variability in natural images and improve image understanding . Similar to other ways , The author considers learning from deformable objects 3D Model ( notes : It is to generate a model by changing the shape of the object , for instance mesh Represents the sphere , Another object can be generated by changing the position of the vertex ).
The author studies this problem under two challenging conditions , The first is that there is no 2D perhaps 3D The true value of , The second is that the algorithm must use unconstrained monocular image sets —— Specially , Multiple view images of the same instance are not required , This is because it is very important to get an image in many applications . Based on the above two questions , The algorithm can build the three-dimensional shape of the object from an image , As shown in the figure below :

First, a self encoder is used to decompose the image into reflectivity 、 depth 、 Illumination and viewing angle information , And there is no direct supervision over this information . however , This is an ill posed problem , To minimize this problem , The author notes that most objects are symmetrical . Thus, a virtual second perspective can be obtained through simple mirror symmetry , If we can find the connection between these two images , Three dimensional reconstruction can be achieved through three-dimensional reconstruction .
But for an object , For various reasons , Not completely symmetrical . The author solves this problem from two aspects , The first method is to explore the potential symmetry by using the determined illumination model , The second way is to increase this model to infer the potential lack of symmetry of objects .
The author integrates the above components into an end-to-end learning framework , Including confidence graph , Are generated from the original image . It was also found that , Symmetry can be achieved by flipping the internal representation , This is particularly useful for probabilistic symmetric reasoning . Finally, the experimental surface , This method has shown good results on many data sets , And it exceeds the most advanced technology at present ( You can directly watch the final experimental results and videos ).
Related work
In order to evaluate the contribution of this method to the literature of image-based 3D reconstruction , The author considers the following three aspects : Use of information , Assumptions and outputs . The following table shows the comparison with previous works , The author's contribution based on the above three factors :

SFM: Traditional methods such as sfm It can reconstruct three-dimensional structure from a single strict scene , Although the monocular reconstruction method can show good results from a single image , But it needs multiple perspectives or videos for training . Another method is called Non-Rigid SfM (NRSfM), Be able to learn to reconstruct deformable objects , But it needs to be marked 2D Key points as supervision .
Shape from X: Other clues are selected or used as sfm To restore shape , For example, outline 、 texture 、 Symmetry, etc . Specially , The method in this paper is affected by the restoration of shape from symmetry and shading , The former uses the mirror image as the virtual second angle of view to reconstruct the symmetrical object , The latter assumes a shadow model , Such as Lambert reflectance , And reconstruct the surface by using non-uniform illumination .
Special kinds of reconstruction : Image based methods have been widely used recently , Whether it's the original image or 2D Key points . Although this is an ill posed problem , But it can be solved by learning appropriate objects from training data . In addition to direct use 3D Truth value , Some authors consider using video 、 Stereopair , There are other ways to use 2D Key point annotation or image mask. For human body or face , Some methods learn to reconstruct directly from the original image through some predefined models . These models are generated by some special software or other methods , But it is difficult for some animals to get , Limited the details of the shape .
Recently, some authors have tried to learn the geometric texture of object categories from the original monocular images , But there are certain defects or deficiencies , A detailed comparison will be made later . Because from 3D Model restoration image for comparison , So a very important component is the differentiable renderer . Many rendering methods have been proposed , It's used here Neural 3d mesh renderer( There is an article in the official account dedicated to this ).
Method
A collection of images given an object category , For example, face , Our goal is to learn a model Φ, Decompose the input image into 3D shape 、 Reflectivity 、 Lighting and perspective , As shown in the figure below :

Because only the original image can be learned , So first, recover the four factors mentioned above from the image . There's another fact , Most objects are symmetrical , But for other reasons , For each instance, it is not necessarily completely symmetrical . To solve this problem , The author clearly established an asymmetric illumination model , And for each pixel in the input image , There is a confidence value , Used to explain the probability that the pixel has a symmetrical equivalent in the image ( Upper figure conf). The contents of each part will be introduced in detail below .
1、 Photo self coding : An image can be expressed as 3xWxH The grid of , Suppose that most of the images are centered on the object of interest , Our goal is to learn a function Φ, Implement neural networks , Convert the input into four elements (d,a,w,l). These four elements are the depth map d, Albedo image a, Light direction l And perspective w.
Then the object is reconstructed in two steps from these four elements , light Λ And projection Π, As shown below :

Lighting function Λ Generate depth based maps from a normative perspective d、 Light direction l And reflectivity a The object , viewpoint w Represents the standard viewpoint and the actual input image I Transformation between viewpoints . Then the projection function Π Based on the perspective of change 、 Standardized depth and lighting functions produce models , Generate another image , And input image to calculate the reconstruction loss .
2、 Possibly symmetrical objects : The use of symmetry for three-dimensional reconstruction requires the recognition of symmetrical object points in the image , Here the author assumes that the depth and albedo are reconstructed in a standard coordinate system , It's about a fixed vertical plane . This can help the model find “ Canonical view ”, This is very important for reconstruction .
To achieve these goals , The author considers an operator flipping diagram along the horizontal axis ,d≈flipd and a≈flipa. Although these limitations can be enforced by adding the corresponding loss function to the learning goal , But they are difficult to balance . So , The author achieves the same effect by obtaining the model reconstructed after overturning , As shown in the figure below :

Then consider the loss of the image generated after the reconstruction of the two models , Because they are commensurate , It's easy to balance and train together . what's more , This method allows us to reason more easily about symmetric probability .
The loss between the original image and the generated image is as follows :

among L1,uv Between pixels L1 Loss ,σ It is a confidence graph established by the network , Express the arbitrary uncertainty of the model . This loss can be explained by the negative log likelihood of the Laplace distribution of the reconstructed residuals . Optimization possibilities allow the model to self calibrate , Learn meaningful confidence graphs .
what's more , From the same image , Use the network to estimate the second confidence graph . This confidence graph shows which parts of the input image may be asymmetric . For example, in the second chapter, the hair on the face is not symmetrical , The second confidence graph is in the hair area that does not meet the symmetry assumption , A high reconstruction uncertainty can be specified .
in general , The author combines two kinds of reconstruction errors to give the learning goal :

3、 Imaging model : The image is taken by a camera at a specific angle , If we use P Represents a... Represented in the camera reference system 3D spot , It is mapped to pixels by the following projection P=(u,v,1):

This model assumes the field of view of a perspective camera (FOV) θFOV. Assume that the nominal distance between the object and the camera is about 1 rice . Consider that these images are tailored around a specific object , Suppose a relatively narrow FOV, Such as θFOV=10°. Depth map d Set the depth value duv And every pixel in the standard view (u, v) Connect , By reversing the camera model , The author found that this corresponds to three-dimensional points :

4、 Loss of perception : The loss between images mentioned above is sensitive to small geometric defects , It may lead to fuzzy reconstruction . The author adds a term of perceived loss to alleviate this problem , In the image encoder of the first k Layer prediction is a representation e(I), This feature coder does not need to be trained in supervisory tasks , Similar to the previous loss function , Assume Gaussian distribution , The perceived loss is :

among

It means the first one k Index of each pixel of the layer uv. For a more detailed description of losses, see the paper .
Result display

face , Reconstruction of cat face and car

Face reconstruction

And SOTA Comparison of effects

Reconstruction effect of abstract cat face
This article is only for academic sharing , If there is any infringement , Please contact to delete .

▲ The official account of long click attention
边栏推荐
- Thinkphp5 behavior hook return result (data) example
- Microsoft security team found an Austrian company that used windows Zero Day vulnerability to sell spyware
- R语言-用于非平衡数据集的一些度量指标
- Byte side: how to realize reliable transmission with UDP?
- Machine learning strong foundation plan 0-5: why is the essence of learning generalization ability?
- 使用c语言实现双向链表
- outlook突然变得很慢很卡怎么解决
- 【MySQL从入门到精通】【高级篇】(十)MyISAM的索引方案&&索引的优缺点
- win10安装sqlmap(windows 7)
- GIS数据漫谈(五)— 地理坐标系统
猜你喜欢
![[MySQL from introduction to proficiency] [advanced chapter] (IX) precautions for InnoDB's b+ tree index](/img/dc/2c11852929cc2ad4a2e44b87e6f812.png)
[MySQL from introduction to proficiency] [advanced chapter] (IX) precautions for InnoDB's b+ tree index

苹果手机iCloud钥匙串的加密缺陷

GIS数据漫谈(五)— 地理坐标系统

Outlook suddenly becomes very slow and too laggy. How to solve it

保障邮箱安全,验证码四个优势

什么是WordPress

【MySQL从入门到精通】【高级篇】(九)InnoDB的B+树索引的注意事项

五面阿里技术专家岗,已拿offer,这些面试题你能答出多少

目标检测领域必看的6篇论文

Why should coding and modulation be carried out before transmission
随机推荐
服务器在线测速系统源码
Refresh your understanding of redis cluster
Open source huizhichuang future | 2022 open atom global open source summit openatom openeuler sub forum was successfully held
Left connection and right connection of MySQL (the difference between inner connection and natural connection)
Good use explosion! The idea version of postman has been released, and its functions are really powerful
「学习笔记」树状数组
[极客大挑战 2019]BabySQL-1|SQL注入
WinForm generates random verification code
Rongyun IM & RTC capabilities on new sites
Google Earth engine - use geetool to download single scene images in batches and retrieve NDSI results with Landsat 8
Cvpr2021 pedestrian re identification /person re identification paper + summary of open source code
1331. 数组序号转换
GIS数据漫谈(五)— 地理坐标系统
Understanding of the return value of the structure pointer function passed to the structure pointer
Installing sqlmap on win10 (Windows 7)
[MySQL from introduction to proficiency] [advanced chapter] (IX) precautions for InnoDB's b+ tree index
Leetcode:1300. the sum of the array closest to the target value after transforming the array [dichotomy]
Google Earth Engine——使用geetool批量下载单景影像以Landsat 8 反演后的NDSI结果
Outlook suddenly becomes very slow and too laggy. How to solve it
【一知半解】零值拷贝