Reconstructing accurate 3D shapes of human faces from a single 2D image is a highly challenging Computer Vision problem that was studied for decades. Statistical modeling techniques, such as the 3D Morphable Model (3DMM), have been widely employed because of their capability of reconstructing a plausible model grounding on the prior knowledge of the facial shape. However, most of them derive a and smooth approximation of the real shape, without accounting for the surface details. In this work, we propose an approach based on a Conditional Generative Adversarial Network (CGAN) for refining the reconstruction provided by a 3DMM. The latter is represented as a threechannel image, where the pixel intensities represent, respectively, the depth and the azimuth and elevation angles of the surface normals. The network architecture is an encoderdecoder, which is trained progressively, starting from the lower-resolution layers; this technique allows a more stable training, which led to the generation of high quality outputs even when high-resolution images are fed during the training. Experimental results show that our method is able to produce detailed realistic reconstructions and obtain lower errors with respect to the 3DMM. Finally, a comparison with a state-of-the-art solution evidences competitive performance and a clear improvement in the quality of the generated models.

Coarse-to-Fine 3D Face Reconstruction / Ferrari, C.; Galteri, L.; Lisanti, G.; Berretti, S.; Del Bimbo, A.. - STAMPA. - (2019), pp. 25-31. (Intervento presentato al convegno IEEE Conference on Computer Vision Workshops tenutosi a Long Beach, California nel 16-20 June, 2019).

Coarse-to-Fine 3D Face Reconstruction

C. Ferrari;
2019-01-01

Abstract

Reconstructing accurate 3D shapes of human faces from a single 2D image is a highly challenging Computer Vision problem that was studied for decades. Statistical modeling techniques, such as the 3D Morphable Model (3DMM), have been widely employed because of their capability of reconstructing a plausible model grounding on the prior knowledge of the facial shape. However, most of them derive a and smooth approximation of the real shape, without accounting for the surface details. In this work, we propose an approach based on a Conditional Generative Adversarial Network (CGAN) for refining the reconstruction provided by a 3DMM. The latter is represented as a threechannel image, where the pixel intensities represent, respectively, the depth and the azimuth and elevation angles of the surface normals. The network architecture is an encoderdecoder, which is trained progressively, starting from the lower-resolution layers; this technique allows a more stable training, which led to the generation of high quality outputs even when high-resolution images are fed during the training. Experimental results show that our method is able to produce detailed realistic reconstructions and obtain lower errors with respect to the 3DMM. Finally, a comparison with a state-of-the-art solution evidences competitive performance and a clear improvement in the quality of the generated models.
2019
Coarse-to-Fine 3D Face Reconstruction / Ferrari, C.; Galteri, L.; Lisanti, G.; Berretti, S.; Del Bimbo, A.. - STAMPA. - (2019), pp. 25-31. (Intervento presentato al convegno IEEE Conference on Computer Vision Workshops tenutosi a Long Beach, California nel 16-20 June, 2019).
File in questo prodotto:
File Dimensione Formato  
cvprw19.pdf

accesso aperto

Tipologia: Versione (PDF) editoriale
Licenza: Creative commons
Dimensione 806.37 kB
Formato Adobe PDF
806.37 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11381/2900779
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 0
social impact