In this paper we present a multigenre corpus spanning 50 years of European history. It contains a comprehensive collection of Alcide De Gasperi’s public documents, 2,762 in total, written or transcribed between 1901 and 1954. The corpus comprises different types of texts, including newspaper articles, propaganda documents, official letters and parliamentary speeches. The corpus is freely available and includes several annotation layers, i.e. key-concepts, lemmas, PoS tags, person names and geo-referenced places, representing a high-quality ‘silver’ annotation. We believe that this resource can foster research in historical corpus analysis, stylometry and computational social science, among others.

Prendo la Parola in Questo Consesso Mondiale: A Multi-Genre 20th Century Corpus in the Political Domain / Tonelli, Sara; Sprugnoli, Rachele; Moretti, Giovanni. - ELETTRONICO. - (2019), pp. 1-8. (Intervento presentato al convegno Sixth Italian Conference on Computational Linguistics (CLiC-it 2019) tenutosi a Bari nel 13-15 novembre 2019).

Prendo la Parola in Questo Consesso Mondiale: A Multi-Genre 20th Century Corpus in the Political Domain

Sprugnoli Rachele;
2019-01-01

Abstract

In this paper we present a multigenre corpus spanning 50 years of European history. It contains a comprehensive collection of Alcide De Gasperi’s public documents, 2,762 in total, written or transcribed between 1901 and 1954. The corpus comprises different types of texts, including newspaper articles, propaganda documents, official letters and parliamentary speeches. The corpus is freely available and includes several annotation layers, i.e. key-concepts, lemmas, PoS tags, person names and geo-referenced places, representing a high-quality ‘silver’ annotation. We believe that this resource can foster research in historical corpus analysis, stylometry and computational social science, among others.
2019
979-12-80136-00-8
Prendo la Parola in Questo Consesso Mondiale: A Multi-Genre 20th Century Corpus in the Political Domain / Tonelli, Sara; Sprugnoli, Rachele; Moretti, Giovanni. - ELETTRONICO. - (2019), pp. 1-8. (Intervento presentato al convegno Sixth Italian Conference on Computational Linguistics (CLiC-it 2019) tenutosi a Bari nel 13-15 novembre 2019).
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11381/2913246
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact