This paper describes the inclusion of Sicilian in the Semantic Web through the development of new resources aligned with Linguistic Linked Open Data principles. More specifically, we model and publish the first Sicilian Lemma Bank and a bilingual Sicilian–Italian glossary extracted from the Sicilian Wiktionary (Wikizziunariu). These resources are formalized using the OntoLex-Lemon and LiLa (Linking Latin) ontologies with the aim of enabling cross-lingual interoperability. The glossary is also linked to the LiITA (Linking Italian) knowledge base. In addition, two preliminary experiments are reported: the first evaluates the translation capabilities of commercial Large Language Models (LLMs) from Sicilian into Italian; the second investigates bilingual lexicon induction through cross-lingual embedding alignment, with results indicating the challenges posed by low-resource dialects. This work aims to demonstrate the feasibility and importance of integrating under-resourced languages into broader Computational Linguistics and Semantic Web infrastructures.
Ciallabacialla! Modeling and Linking a Regional Lexical Resource to Include Sicilian in the Semantic Web / Muscianisi, Domenico Giuseppe; Sprugnoli, Rachele; Moretti, Giovanni; Litta, Eleonora. - (2025), pp. 1093-1101. ( Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025) Cagliari ).
Ciallabacialla! Modeling and Linking a Regional Lexical Resource to Include Sicilian in the Semantic Web
Domenico Giuseppe Muscianisi;Rachele Sprugnoli
;
2025-01-01
Abstract
This paper describes the inclusion of Sicilian in the Semantic Web through the development of new resources aligned with Linguistic Linked Open Data principles. More specifically, we model and publish the first Sicilian Lemma Bank and a bilingual Sicilian–Italian glossary extracted from the Sicilian Wiktionary (Wikizziunariu). These resources are formalized using the OntoLex-Lemon and LiLa (Linking Latin) ontologies with the aim of enabling cross-lingual interoperability. The glossary is also linked to the LiITA (Linking Italian) knowledge base. In addition, two preliminary experiments are reported: the first evaluates the translation capabilities of commercial Large Language Models (LLMs) from Sicilian into Italian; the second investigates bilingual lexicon induction through cross-lingual embedding alignment, with results indicating the challenges posed by low-resource dialects. This work aims to demonstrate the feasibility and importance of integrating under-resourced languages into broader Computational Linguistics and Semantic Web infrastructures.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


