Generative models for text-to-image synthesis have made significant advancements in recent years, enabling the creation of highly detailed and stylistically diverse images. In this work, we introduce MCGM-Styler, an extension of our previous MCGM as reported (MCGM: Mask conditional text-to-image generative model, 2024) model, which generates images based on masked conditions to specify the action pose of subjects in a source image. Our key contribution is the addition of a new training step that enables the model to also perform style transfer, allowing it to generate images that not only force the pose by mask condition, but also adhere to both single and multiple target artistic styles. Unlike traditional approaches that require large datasets, MCGM-Styler is trained on a single image, making it highly efficient and adaptable. The model can handle scenes with one or multiple subjects, generating coherent and stylistically consistent outputs so the user can generate any subject(s) in any pose and style or generate an image with a mix of different styles. We evaluate our approach against existing works, particularly the DreamStyler model (Ahn et al. in Proc. AAAI Conf. Artif. Intell. 38:674-681), a state-of-the-art method for style transfer. Our results demonstrate that the MCGM-Styler achieves superior performance in preserving not only the pose of the concept, but also the style fidelity, highlighting its effectiveness in controllable image generation.

Mcgm-styler: free-form styler for mask conditional text-to-image generative model / Skaik, R.; Rossi, L.; Fontanini, T.; Prati, A.. - In: THE VISUAL COMPUTER. - ISSN 0178-2789. - 42:1(2026). [10.1007/s00371-025-04226-8]

Mcgm-styler: free-form styler for mask conditional text-to-image generative model

Skaik R.
;
Rossi L.;Fontanini T.;Prati A.
2026-01-01

Abstract

Generative models for text-to-image synthesis have made significant advancements in recent years, enabling the creation of highly detailed and stylistically diverse images. In this work, we introduce MCGM-Styler, an extension of our previous MCGM as reported (MCGM: Mask conditional text-to-image generative model, 2024) model, which generates images based on masked conditions to specify the action pose of subjects in a source image. Our key contribution is the addition of a new training step that enables the model to also perform style transfer, allowing it to generate images that not only force the pose by mask condition, but also adhere to both single and multiple target artistic styles. Unlike traditional approaches that require large datasets, MCGM-Styler is trained on a single image, making it highly efficient and adaptable. The model can handle scenes with one or multiple subjects, generating coherent and stylistically consistent outputs so the user can generate any subject(s) in any pose and style or generate an image with a mix of different styles. We evaluate our approach against existing works, particularly the DreamStyler model (Ahn et al. in Proc. AAAI Conf. Artif. Intell. 38:674-681), a state-of-the-art method for style transfer. Our results demonstrate that the MCGM-Styler achieves superior performance in preserving not only the pose of the concept, but also the style fidelity, highlighting its effectiveness in controllable image generation.
2026
Mcgm-styler: free-form styler for mask conditional text-to-image generative model / Skaik, R.; Rossi, L.; Fontanini, T.; Prati, A.. - In: THE VISUAL COMPUTER. - ISSN 0178-2789. - 42:1(2026). [10.1007/s00371-025-04226-8]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11381/3045773
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact