Reconstructing accurate 3D shapes of human faces from a single 2D image is a highly challenging Computer Vision problem that was studied for decades. Statistical modeling techniques, such as the 3D Morphable Model (3DMM), have been widely employed because of their capability of reconstructing a plausible model grounding on the prior knowledge of the facial shape. However, most of them derive a and smooth approximation of the real shape, without accounting for the surface details. In this work, we propose an approach based on a Conditional Generative Adversarial Network (CGAN) for refining the reconstruction provided by a 3DMM. The latter is represented as a threechannel image, where the pixel intensities represent, respectively, the depth and the azimuth and elevation angles of the surface normals. The network architecture is an encoderdecoder, which is trained progressively, starting from the lower-resolution layers; this technique allows a more stable training, which led to the generation of high quality outputs even when high-resolution images are fed during the training. Experimental results show that our method is able to produce detailed realistic reconstructions and obtain lower errors with respect to the 3DMM. Finally, a comparison with a state-of-the-art solution evidences competitive performance and a clear improvement in the quality of the generated models.
Coarse-to-Fine 3D Face Reconstruction / Ferrari, C.; Galteri, L.; Lisanti, G.; Berretti, S.; Del Bimbo, A.. - STAMPA. - (2019), pp. 25-31. (Intervento presentato al convegno IEEE Conference on Computer Vision Workshops tenutosi a Long Beach, California nel 16-20 June, 2019).
Coarse-to-Fine 3D Face Reconstruction
C. Ferrari;
2019-01-01
Abstract
Reconstructing accurate 3D shapes of human faces from a single 2D image is a highly challenging Computer Vision problem that was studied for decades. Statistical modeling techniques, such as the 3D Morphable Model (3DMM), have been widely employed because of their capability of reconstructing a plausible model grounding on the prior knowledge of the facial shape. However, most of them derive a and smooth approximation of the real shape, without accounting for the surface details. In this work, we propose an approach based on a Conditional Generative Adversarial Network (CGAN) for refining the reconstruction provided by a 3DMM. The latter is represented as a threechannel image, where the pixel intensities represent, respectively, the depth and the azimuth and elevation angles of the surface normals. The network architecture is an encoderdecoder, which is trained progressively, starting from the lower-resolution layers; this technique allows a more stable training, which led to the generation of high quality outputs even when high-resolution images are fed during the training. Experimental results show that our method is able to produce detailed realistic reconstructions and obtain lower errors with respect to the 3DMM. Finally, a comparison with a state-of-the-art solution evidences competitive performance and a clear improvement in the quality of the generated models.File | Dimensione | Formato | |
---|---|---|---|
cvprw19.pdf
accesso aperto
Tipologia:
Versione (PDF) editoriale
Licenza:
Creative commons
Dimensione
806.37 kB
Formato
Adobe PDF
|
806.37 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.