In this paper we present a multigenre corpus spanning 50 years of European history. It contains a comprehensive collection of Alcide De Gasperi’s public documents, 2,762 in total, written or transcribed between 1901 and 1954. The corpus comprises different types of texts, including newspaper articles, propaganda documents, official letters and parliamentary speeches. The corpus is freely available and includes several annotation layers, i.e. key-concepts, lemmas, PoS tags, person names and geo-referenced places, representing a high-quality ‘silver’ annotation. We believe that this resource can foster research in historical corpus analysis, stylometry and computational social science, among others.
Prendo la Parola in Questo Consesso Mondiale: A Multi-Genre 20th Century Corpus in the Political Domain / Tonelli, Sara; Sprugnoli, Rachele; Moretti, Giovanni. - ELETTRONICO. - (2019), pp. 1-8. (Intervento presentato al convegno Sixth Italian Conference on Computational Linguistics (CLiC-it 2019) tenutosi a Bari nel 13-15 novembre 2019).
Prendo la Parola in Questo Consesso Mondiale: A Multi-Genre 20th Century Corpus in the Political Domain
Sprugnoli Rachele;
2019-01-01
Abstract
In this paper we present a multigenre corpus spanning 50 years of European history. It contains a comprehensive collection of Alcide De Gasperi’s public documents, 2,762 in total, written or transcribed between 1901 and 1954. The corpus comprises different types of texts, including newspaper articles, propaganda documents, official letters and parliamentary speeches. The corpus is freely available and includes several annotation layers, i.e. key-concepts, lemmas, PoS tags, person names and geo-referenced places, representing a high-quality ‘silver’ annotation. We believe that this resource can foster research in historical corpus analysis, stylometry and computational social science, among others.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.