This paper describes the inclusion of Sicilian in the Semantic Web through the development of new resources aligned with Linguistic Linked Open Data principles. More specifically, we model and publish the first Sicilian Lemma Bank and a bilingual Sicilian–Italian glossary extracted from the Sicilian Wiktionary (Wikizziunariu). These resources are formalized using the OntoLex-Lemon and LiLa (Linking Latin) ontologies with the aim of enabling cross-lingual interoperability. The glossary is also linked to the LiITA (Linking Italian) knowledge base. In addition, two preliminary experiments are reported: the first evaluates the translation capabilities of commercial Large Language Models (LLMs) from Sicilian into Italian; the second investigates bilingual lexicon induction through cross-lingual embedding alignment, with results indicating the challenges posed by low-resource dialects. This work aims to demonstrate the feasibility and importance of integrating under-resourced languages into broader Computational Linguistics and Semantic Web infrastructures.

Ciallabacialla! Modeling and Linking a Regional Lexical Resource to Include Sicilian in the Semantic Web / Muscianisi, Domenico Giuseppe; Sprugnoli, Rachele; Moretti, Giovanni; Litta, Eleonora. - (2025), pp. 1093-1101. ( Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025) Cagliari ).

Ciallabacialla! Modeling and Linking a Regional Lexical Resource to Include Sicilian in the Semantic Web

Domenico Giuseppe Muscianisi;Rachele Sprugnoli
;
2025-01-01

Abstract

This paper describes the inclusion of Sicilian in the Semantic Web through the development of new resources aligned with Linguistic Linked Open Data principles. More specifically, we model and publish the first Sicilian Lemma Bank and a bilingual Sicilian–Italian glossary extracted from the Sicilian Wiktionary (Wikizziunariu). These resources are formalized using the OntoLex-Lemon and LiLa (Linking Latin) ontologies with the aim of enabling cross-lingual interoperability. The glossary is also linked to the LiITA (Linking Italian) knowledge base. In addition, two preliminary experiments are reported: the first evaluates the translation capabilities of commercial Large Language Models (LLMs) from Sicilian into Italian; the second investigates bilingual lexicon induction through cross-lingual embedding alignment, with results indicating the challenges posed by low-resource dialects. This work aims to demonstrate the feasibility and importance of integrating under-resourced languages into broader Computational Linguistics and Semantic Web infrastructures.
2025
Ciallabacialla! Modeling and Linking a Regional Lexical Resource to Include Sicilian in the Semantic Web / Muscianisi, Domenico Giuseppe; Sprugnoli, Rachele; Moretti, Giovanni; Litta, Eleonora. - (2025), pp. 1093-1101. ( Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025) Cagliari ).
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11381/3043873
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact