当前位置:网站首页>Cvpr2020 best paper: unsupervised learning of symmetric deformable 3D objects

Cvpr2020 best paper: unsupervised learning of symmetric deformable 3D objects

2022-07-28 11:33:00 51CTO


Code address :

 ​https://github.com/elliottwu/unsup3d​

Project address :https://elliottwu.com/projects/unsup3d/

Address of the test :

 ​http://www.robots.ox.ac.uk/~vgg/blog/unsupervised-learning-of-probably-symmetric-deformable-3d-objects-from-images-in-the-wild.html?image=004_face&type=human​

summary

The author proposes a method of learning three-dimensional deformable objects from the original monocular images , And there is no additional monitoring signal . This method is based on the architecture of self encoder , Convert the input image to depth 、 Reflectivity 、 Visual angle and lighting information . To disassemble these unsupervised components , The author uses such a fact , That is, in general, many objects are a symmetrical structure . Reasoning about lighting allows us to explore potential symmetries , Although the appearance is not symmetrical due to shadows and other reasons . Experiments show that this method can recover very good faces from monocular images 、 The three-dimensional shape of a cat's face and a car .

brief introduction  

Understand the meaning of images 3D Structure is very critical in many computer vision applications , Many deep networks are 2D Understand the image on the plane ,3D Modeling can remove variability in natural images and improve image understanding . Similar to other ways , The author considers learning from deformable objects 3D Model ( notes : It is to generate a model by changing the shape of the object , for instance mesh Represents the sphere , Another object can be generated by changing the position of the vertex ).

The author studies this problem under two challenging conditions , The first is that there is no 2D perhaps 3D The true value of , The second is that the algorithm must use unconstrained monocular image sets —— Specially , Multiple view images of the same instance are not required , This is because it is very important to get an image in many applications . Based on the above two questions , The algorithm can build the three-dimensional shape of the object from an image , As shown in the figure below :

CVPR2020 best paper: Unsupervised learning of symmetric deformable three-dimensional objects _ WeChat

First, a self encoder is used to decompose the image into reflectivity 、 depth 、 Illumination and viewing angle information , And there is no direct supervision over this information . however , This is an ill posed problem , To minimize this problem , The author notes that most objects are symmetrical . Thus, a virtual second perspective can be obtained through simple mirror symmetry , If we can find the connection between these two images , Three dimensional reconstruction can be achieved through three-dimensional reconstruction .

But for an object , For various reasons , Not completely symmetrical . The author solves this problem from two aspects , The first method is to explore the potential symmetry by using the determined illumination model , The second way is to increase this model to infer the potential lack of symmetry of objects .

The author integrates the above components into an end-to-end learning framework , Including confidence graph , Are generated from the original image . It was also found that , Symmetry can be achieved by flipping the internal representation , This is particularly useful for probabilistic symmetric reasoning . Finally, the experimental surface , This method has shown good results on many data sets , And it exceeds the most advanced technology at present ( You can directly watch the final experimental results and videos ).

Related work

In order to evaluate the contribution of this method to the literature of image-based 3D reconstruction , The author considers the following three aspects : Use of information , Assumptions and outputs . The following table shows the comparison with previous works , The author's contribution based on the above three factors :

CVPR2020 best paper: Unsupervised learning of symmetric deformable three-dimensional objects _3d_02

SFM: Traditional methods such as sfm It can reconstruct three-dimensional structure from a single strict scene , Although the monocular reconstruction method can show good results from a single image , But it needs multiple perspectives or videos for training . Another method is called Non-Rigid SfM (NRSfM), Be able to learn to reconstruct deformable objects , But it needs to be marked 2D Key points as supervision .

Shape from X: Other clues are selected or used as sfm To restore shape , For example, outline 、 texture 、 Symmetry, etc . Specially , The method in this paper is affected by the restoration of shape from symmetry and shading , The former uses the mirror image as the virtual second angle of view to reconstruct the symmetrical object , The latter assumes a shadow model , Such as Lambert reflectance , And reconstruct the surface by using non-uniform illumination .

Special kinds of reconstruction : Image based methods have been widely used recently , Whether it's the original image or 2D Key points . Although this is an ill posed problem , But it can be solved by learning appropriate objects from training data . In addition to direct use 3D Truth value , Some authors consider using video 、 Stereopair , There are other ways to use 2D Key point annotation or image mask. For human body or face , Some methods learn to reconstruct directly from the original image through some predefined models . These models are generated by some special software or other methods , But it is difficult for some animals to get , Limited the details of the shape .

Recently, some authors have tried to learn the geometric texture of object categories from the original monocular images , But there are certain defects or deficiencies , A detailed comparison will be made later . Because from 3D Model restoration image for comparison , So a very important component is the differentiable renderer . Many rendering methods have been proposed , It's used here Neural 3d mesh renderer( There is an article in the official account dedicated to this ).

Method

A collection of images given an object category , For example, face , Our goal is to learn a model Φ, Decompose the input image into 3D shape 、 Reflectivity 、 Lighting and perspective , As shown in the figure below :

CVPR2020 best paper: Unsupervised learning of symmetric deformable three-dimensional objects _3d_03

Because only the original image can be learned , So first, recover the four factors mentioned above from the image . There's another fact , Most objects are symmetrical , But for other reasons , For each instance, it is not necessarily completely symmetrical . To solve this problem , The author clearly established an asymmetric illumination model , And for each pixel in the input image , There is a confidence value , Used to explain the probability that the pixel has a symmetrical equivalent in the image ( Upper figure conf). The contents of each part will be introduced in detail below .

1、 Photo self coding : An image can be expressed as 3xWxH The grid of , Suppose that most of the images are centered on the object of interest , Our goal is to learn a function Φ, Implement neural networks , Convert the input into four elements (d,a,w,l). These four elements are the depth map d, Albedo image a, Light direction l And perspective w.

Then the object is reconstructed in two steps from these four elements , light Λ And projection Π, As shown below :

CVPR2020 best paper: Unsupervised learning of symmetric deformable three-dimensional objects _3d_04

Lighting function Λ Generate depth based maps from a normative perspective d、 Light direction l And reflectivity a The object , viewpoint w Represents the standard viewpoint and the actual input image I Transformation between viewpoints . Then the projection function Π Based on the perspective of change 、 Standardized depth and lighting functions produce models , Generate another image , And input image to calculate the reconstruction loss .

2、 Possibly symmetrical objects : The use of symmetry for three-dimensional reconstruction requires the recognition of symmetrical object points in the image , Here the author assumes that the depth and albedo are reconstructed in a standard coordinate system , It's about a fixed vertical plane . This can help the model find “ Canonical view ”, This is very important for reconstruction .

To achieve these goals , The author considers an operator flipping diagram along the horizontal axis ,d≈flipd and a≈flipa. Although these limitations can be enforced by adding the corresponding loss function to the learning goal , But they are difficult to balance . So , The author achieves the same effect by obtaining the model reconstructed after overturning , As shown in the figure below :

CVPR2020 best paper: Unsupervised learning of symmetric deformable three-dimensional objects _ WeChat _05

Then consider the loss of the image generated after the reconstruction of the two models , Because they are commensurate , It's easy to balance and train together . what's more , This method allows us to reason more easily about symmetric probability .

The loss between the original image and the generated image is as follows :

CVPR2020 best paper: Unsupervised learning of symmetric deformable three-dimensional objects _ Reflectivity _06

among L1,uv Between pixels L1 Loss ,σ It is a confidence graph established by the network , Express the arbitrary uncertainty of the model . This loss can be explained by the negative log likelihood of the Laplace distribution of the reconstructed residuals . Optimization possibilities allow the model to self calibrate , Learn meaningful confidence graphs .

what's more , From the same image , Use the network to estimate the second confidence graph . This confidence graph shows which parts of the input image may be asymmetric . For example, in the second chapter, the hair on the face is not symmetrical , The second confidence graph is in the hair area that does not meet the symmetry assumption , A high reconstruction uncertainty can be specified .

in general , The author combines two kinds of reconstruction errors to give the learning goal :

CVPR2020 best paper: Unsupervised learning of symmetric deformable three-dimensional objects _ Reflectivity _07

3、 Imaging model : The image is taken by a camera at a specific angle , If we use P Represents a... Represented in the camera reference system 3D spot , It is mapped to pixels by the following projection P=(u,v,1):

CVPR2020 best paper: Unsupervised learning of symmetric deformable three-dimensional objects _3d_08

This model assumes the field of view of a perspective camera (FOV) θFOV. Assume that the nominal distance between the object and the camera is about 1 rice . Consider that these images are tailored around a specific object , Suppose a relatively narrow FOV, Such as θFOV=10°. Depth map d Set the depth value duv And every pixel in the standard view (u, v) Connect , By reversing the camera model , The author found that this corresponds to three-dimensional points :

CVPR2020 best paper: Unsupervised learning of symmetric deformable three-dimensional objects _ WeChat _09

4、 Loss of perception : The loss between images mentioned above is sensitive to small geometric defects , It may lead to fuzzy reconstruction . The author adds a term of perceived loss to alleviate this problem , In the image encoder of the first k Layer prediction is a representation e(I), This feature coder does not need to be trained in supervisory tasks , Similar to the previous loss function , Assume Gaussian distribution , The perceived loss is :

CVPR2020 best paper: Unsupervised learning of symmetric deformable three-dimensional objects _ Reflectivity _10

among

CVPR2020 best paper: Unsupervised learning of symmetric deformable three-dimensional objects _ Reflectivity _11

It means the first one k Index of each pixel of the layer uv. For a more detailed description of losses, see the paper . 

Result display

CVPR2020 best paper: Unsupervised learning of symmetric deformable three-dimensional objects _3d_12

face , Reconstruction of cat face and car

CVPR2020 best paper: Unsupervised learning of symmetric deformable three-dimensional objects _ Reflectivity _13

Face reconstruction

CVPR2020 best paper: Unsupervised learning of symmetric deformable three-dimensional objects _ WeChat _14

And SOTA Comparison of effects

CVPR2020 best paper: Unsupervised learning of symmetric deformable three-dimensional objects _ WeChat _15

Reconstruction effect of abstract cat face

This article is only for academic sharing , If there is any infringement , Please contact to delete .


CVPR2020 best paper: Unsupervised learning of symmetric deformable three-dimensional objects _3d_16

▲ The official account of long click attention

原网站

版权声明
本文为[51CTO]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/209/202207281043015203.html