当前位置：网站首页>3D face reconstruction and dense alignment with position map progression network

3D face reconstruction and dense alignment with position map progression network

2022-07-27 09:56:00 【yfy2022yfy】

2019/10/31, I didn't take notes of what I read before , Recap .

Address of thesis ：http://openaccess.thecvf.com/content_ECCV_2018/papers/Yao_Feng_Joint_3D_Face_ECCV_2018_paper.pdf

github Address ：https://github.com/YadiraF/PRNet

Abstract

This paper presents a direct method , To achieve 3D Face reconstruction and dense alignment . Designed a 2D expression , be called UV Coordinates , Can be in UV Space saves the complete face 3D shape . And then you can train CNN Network from a 2D Figure regression 3D shape .

One 、 brief introduction

In the early 3D Reconstruction and dense alignment research , Have used 2D Key points , but 2D Keys cannot handle large angles and occlusion ;

Used 3DMM Method , Also limited by perspective projection and 3D Splines （3D ThinPlate Spline） The amount of calculation is large ;

There are end-to-end solutions , Get rid of these restrictions , But additional networks are needed to estimate depth information , And does not provide dense alignment ;

VRN Voxel representation is proposed , This method requires a lot of calculation , Low resolution , And because of the sparse characteristics of point clouds , There are a lot of invalid calculations .

This paper presents an end-to-end PRN Multitasking approach , It can complete dense face alignment and 3D Face shape reconstruction . Main contributions ：

In an end-to-end manner , High resolution 3D Face reconstruction and dense alignment ;
Designed UV Location map , To record the face 3D Location information ;
A weight mask is designed for loss Calculation ,loss The weight of each point in is different , It can significantly improve network performance
CNN Adopt lightweight mode , A single face task can achieve 100FPS
stay AFLW200-3D and Florence Reachable on dataset 25% Performance improvement of

Two 、 Methods to introduce

Take what you wrote before ppt Use it .

（a） Input diagram + Point cloud ;（b） Input diagram ;（c）UV Texture map ;（d）UV Coordinates ;（e）（f）（g）UV Three channels of coordinates
chart 1 UV Space

Open source datasets , Such as 300W-LP,ground-truth It's no use UV It means , So Mr. Cheng UV Training data . take （a） The point cloud coordinates in are transformed into （d） The form of expression , The method is shown in Fig 2：

chart 2 The point cloud is converted to UV Express

After generating the required training data, you can use lightweight CNN Network to deal with ：

The network structure code is as follows ：

se = tcl.conv2d(x, num_outputs=size, kernel_size=4, stride=1) # 256 x 256 x 16
se = resBlock(se, num_outputs=size * 2, kernel_size=4, stride=2) # 128 x 128 x 32
se = resBlock(se, num_outputs=size * 2, kernel_size=4, stride=1) # 128 x 128 x 32
se = resBlock(se, num_outputs=size * 4, kernel_size=4, stride=2) # 64 x 64 x 64
se = resBlock(se, num_outputs=size * 4, kernel_size=4, stride=1) # 64 x 64 x 64
se = resBlock(se, num_outputs=size * 8, kernel_size=4, stride=2) # 32 x 32 x 128
se = resBlock(se, num_outputs=size * 8, kernel_size=4, stride=1) # 32 x 32 x 128
se = resBlock(se, num_outputs=size * 16, kernel_size=4, stride=2) # 16 x 16 x 256
se = resBlock(se, num_outputs=size * 16, kernel_size=4, stride=1) # 16 x 16 x 256
se = resBlock(se, num_outputs=size * 32, kernel_size=4, stride=2) # 8 x 8 x 512
se = resBlock(se, num_outputs=size * 32, kernel_size=4, stride=1) # 8 x 8 x 512

pd = tcl.conv2d_transpose(se, size * 32, 4, stride=1) # 8 x 8 x 512
pd = tcl.conv2d_transpose(pd, size * 16, 4, stride=2) # 16 x 16 x 256
pd = tcl.conv2d_transpose(pd, size * 16, 4, stride=1) # 16 x 16 x 256
pd = tcl.conv2d_transpose(pd, size * 16, 4, stride=1) # 16 x 16 x 256
pd = tcl.conv2d_transpose(pd, size * 8, 4, stride=2) # 32 x 32 x 128
pd = tcl.conv2d_transpose(pd, size * 8, 4, stride=1) # 32 x 32 x 128
pd = tcl.conv2d_transpose(pd, size * 8, 4, stride=1) # 32 x 32 x 128
pd = tcl.conv2d_transpose(pd, size * 4, 4, stride=2) # 64 x 64 x 64
pd = tcl.conv2d_transpose(pd, size * 4, 4, stride=1) # 64 x 64 x 64
pd = tcl.conv2d_transpose(pd, size * 4, 4, stride=1) # 64 x 64 x 64
pd = tcl.conv2d_transpose(pd, size * 2, 4, stride=2) # 128 x 128 x 32
pd = tcl.conv2d_transpose(pd, size * 2, 4, stride=1) # 128 x 128 x 32
pd = tcl.conv2d_transpose(pd, size, 4, stride=2) # 256 x 256 x 16
pd = tcl.conv2d_transpose(pd, size, 4, stride=1) # 256 x 256 x 16
pd = tcl.conv2d_transpose(pd, 3, 4, stride=1) # 256 x 256 x 3
pd = tcl.conv2d_transpose(pd, 3, 4, stride=1) # 256 x 256 x 3
pos = tcl.conv2d_transpose(pd, 3, 4, stride=1, activation_fn = tf.nn.sigmoid)

The residual block code is as follows , Activation is relu, Normalization is BN,shortcut Corresponding to cubic convolution , Subsequent channel merging , Finally, normalize and activate ：

def resBlock(x, num_outputs, kernel_size = 4, stride=1, activation_fn=tf.nn.relu, normalizer_fn=tcl.batch_norm, scope=None):
        assert num_outputs%2==0 #num_outputs must be divided by channel_factor(2 here)
        with tf.variable_scope(scope, 'resBlock'):
            shortcut = x
            if stride != 1 or x.get_shape()[3] != num_outputs:
                shortcut = tcl.conv2d(shortcut, num_outputs, kernel_size=1, stride=stride,
                            activation_fn=None, normalizer_fn=None, scope='shortcut')
            x = tcl.conv2d(x, num_outputs/2, kernel_size=1, stride=1, padding='SAME')
            x = tcl.conv2d(x, num_outputs/2, kernel_size=kernel_size, stride=stride, padding='SAME')
            x = tcl.conv2d(x, num_outputs, kernel_size=1, stride=1, activation_fn=None, padding='SAME', normalizer_fn=None)
     
            x += shortcut       
            x = normalizer_fn(x)
            x = activation_fn(x)
        return x

Loss

P(x,y) yes UV Prediction results of location space , Representing each pixel xyz Location
W(x,y) yes UV Weight of position space , Yes UV Space weight control ,2D Key points : Eyes, nose and mouth : Face others : other = 16:4:3:0;

Training

Training data source : Used 300W-LP Data sets

Have face data from all angles ,resize To 256x256
3DMM Annotation of coefficients
Use 3DMM Generate 3D Point cloud , And convert 3D Point cloud to UV Space

Although generated GT Used 3DMM Dimensioning factor of , But the model itself does not contain 3DMM Any linear constraint of the model .

Data augmentation ： All kinds of scenes

Angle transformation ：-45 ~ 45 degree
translation ： coefficient 0.9 ~ 1.2 ( The size of the original drawing is the benchmark )
Color channel transformation ： coefficient 0.6 ~ 1.4
Add noise 、 Texture occlusion , Simulate the real situation occlusion .
adam Optimizer , Initial learning rate 0.0001, Every time 5 individual epoch, attenuation 1 And a half ,batch size:16

3、 ... and 、 test result

First, let's talk about what this method can do , Because I learned 2D And 3D Mapping between , The functions that dimension can realize are as follows ：