Within the field of sentiment analysis and emotion detection applied to tweets, one of the main problems related to the construction of an automatic classifier is the lack of suitable training sets. Considering the tediousness of manually annotating a training set, and the noise present in data collected directly from the social web, in this paper we propose an iterative learning approach, which combines distant supervision with dataset pruning technique. In particular, following the “eat your own dogfood” idea, we have applied a classifier, trained on raw data obtained from different Twitter channels, to the same original dataset, for removing the most dubious instances automatically. This kind of approach has been used to obtain a more polished training set for emotion classification, based on Parrot’s model of six basic emotions. On the basis of the achieved results, we argue that the automatic filtering of training sets can make the application of the distant supervision approach more effective in many use cases.
Automatic Creation of a Large and Polished Training Set for Sentiment Analysis on Twitter / Cagnoni, Stefano; Fornacciari, Paolo; Kavaja, Juxhino; Mordonini, Monica; Poggi, Agostino; Solimeo, Alex; Tomaiuolo, Michele. - STAMPA. - 10710:(2018), pp. 146-157. (Intervento presentato al convegno 3rd International Conference on Machine Learning, Optimization, and Data Science (MOD) tenutosi a Volterra (Italy) nel SEP 14-17, 2017) [10.1007/978-3-319-72926-8_13].
Automatic Creation of a Large and Polished Training Set for Sentiment Analysis on Twitter
Stefano Cagnoni;Paolo Fornacciari;KAVAJA, JUXHINO;Monica Mordonini;Agostino Poggi;SOLIMEO, ALEX;Michele Tomaiuolo
2018-01-01
Abstract
Within the field of sentiment analysis and emotion detection applied to tweets, one of the main problems related to the construction of an automatic classifier is the lack of suitable training sets. Considering the tediousness of manually annotating a training set, and the noise present in data collected directly from the social web, in this paper we propose an iterative learning approach, which combines distant supervision with dataset pruning technique. In particular, following the “eat your own dogfood” idea, we have applied a classifier, trained on raw data obtained from different Twitter channels, to the same original dataset, for removing the most dubious instances automatically. This kind of approach has been used to obtain a more polished training set for emotion classification, based on Parrot’s model of six basic emotions. On the basis of the achieved results, we argue that the automatic filtering of training sets can make the application of the distant supervision approach more effective in many use cases.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.