Ciallabacialla! Modeling and Linking a Regional Lexical Resource to Include Sicilian in the Semantic Web

IRIS

This paper describes the inclusion of Sicilian in the Semantic Web through the development of new resources aligned with Linguistic Linked Open Data principles. More specifically, we model and publish the first Sicilian Lemma Bank and a bilingual Sicilian–Italian glossary extracted from the Sicilian Wiktionary (Wikizziunariu). These resources are formalized using the OntoLex-Lemon and LiLa (Linking Latin) ontologies with the aim of enabling cross-lingual interoperability. The glossary is also linked to the LiITA (Linking Italian) knowledge base. In addition, two preliminary experiments are reported: the first evaluates the translation capabilities of commercial Large Language Models (LLMs) from Sicilian into Italian; the second investigates bilingual lexicon induction through cross-lingual embedding alignment, with results indicating the challenges posed by low-resource dialects. This work aims to demonstrate the feasibility and importance of integrating under-resourced languages into broader Computational Linguistics and Semantic Web infrastructures.

Ciallabacialla! Modeling and Linking a Regional Lexical Resource to Include Sicilian in the Semantic Web / Muscianisi, D.G., Sprugnoli, R., Moretti, G., Litta, E.. - (2025), pp. 1093-1101. (Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025) Cagliari ).

Ciallabacialla! Modeling and Linking a Regional Lexical Resource to Include Sicilian in the Semantic Web

Domenico Giuseppe Muscianisi;Rachele Sprugnoli;Giovanni Moretti;Eleonora Litta

2025-01-01

Abstract

This paper describes the inclusion of Sicilian in the Semantic Web through the development of new resources aligned with Linguistic Linked Open Data principles. More specifically, we model and publish the first Sicilian Lemma Bank and a bilingual Sicilian–Italian glossary extracted from the Sicilian Wiktionary (Wikizziunariu). These resources are formalized using the OntoLex-Lemon and LiLa (Linking Latin) ontologies with the aim of enabling cross-lingual interoperability. The glossary is also linked to the LiITA (Linking Italian) knowledge base. In addition, two preliminary experiments are reported: the first evaluates the translation capabilities of commercial Large Language Models (LLMs) from Sicilian into Italian; the second investigates bilingual lexicon induction through cross-lingual embedding alignment, with results indicating the challenges posed by low-resource dialects. This work aims to demonstrate the feasibility and importance of integrating under-resourced languages into broader Computational Linguistics and Semantic Web infrastructures.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Citazione
	
				Ciallabacialla! Modeling and Linking a Regional Lexical Resource to Include Sicilian in the Semantic Web / Muscianisi, D.G., Sprugnoli, R., Moretti, G., Litta, E.. - (2025), pp. 1093-1101. (Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025) Cagliari ).
			
	Appare nelle tipologie:
	
				4.1b Atto convegno Volume

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11381/3043873

Citazioni

ND

ND

ND

social impact