Diffusion Models have become very popular for Semantic Image Synthesis (SIS) of human faces. Nevertheless, their training and inference is computationally expensive and their computational requirements are high due to the quadratic complexity of attention layers. In this paper, we propose a novel architecture called SISMA, based on the recently proposed Mamba. SISMA generates high quality samples by controlling their shape using a semantic mask at a reduced computational demand. We validated our approach through comprehensive experiments with CelebAMask-HQ, revealing that our architecture not only achieves a better FID score yet also operates at three times the speed of state-of-the-art architectures. This indicates that the proposed design is a viable, lightweight substitute to transformer-based models.

SISMA: Semantic Face Image Synthesis with Mamba / Botti, F., Ergasti, A., Fontanini, T., Ferrari, C., Bertozzi, M., Prati, A.. - (2026), pp. 609-619. (ICIAP 2025 - Workshop ) [10.1007/978-3-032-11317-7_49].

SISMA: Semantic Face Image Synthesis with Mamba

Botti, Filippo;Ergasti, Alex;Fontanini, Tomaso;Ferrari, Claudio;Bertozzi, Massimo;Prati, Andrea
2026-01-01

Abstract

Diffusion Models have become very popular for Semantic Image Synthesis (SIS) of human faces. Nevertheless, their training and inference is computationally expensive and their computational requirements are high due to the quadratic complexity of attention layers. In this paper, we propose a novel architecture called SISMA, based on the recently proposed Mamba. SISMA generates high quality samples by controlling their shape using a semantic mask at a reduced computational demand. We validated our approach through comprehensive experiments with CelebAMask-HQ, revealing that our architecture not only achieves a better FID score yet also operates at three times the speed of state-of-the-art architectures. This indicates that the proposed design is a viable, lightweight substitute to transformer-based models.
2026
9783032113160
9783032113177
SISMA: Semantic Face Image Synthesis with Mamba / Botti, F., Ergasti, A., Fontanini, T., Ferrari, C., Bertozzi, M., Prati, A.. - (2026), pp. 609-619. (ICIAP 2025 - Workshop ) [10.1007/978-3-032-11317-7_49].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11381/3043553
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact