Automatic Creation of a Large and Polished Training Set for Sentiment Analysis on Twitter

IRIS

Within the field of sentiment analysis and emotion detection applied to tweets, one of the main problems related to the construction of an automatic classifier is the lack of suitable training sets. Considering the tediousness of manually annotating a training set, and the noise present in data collected directly from the social web, in this paper we propose an iterative learning approach, which combines distant supervision with dataset pruning technique. In particular, following the “eat your own dogfood” idea, we have applied a classifier, trained on raw data obtained from different Twitter channels, to the same original dataset, for removing the most dubious instances automatically. This kind of approach has been used to obtain a more polished training set for emotion classification, based on Parrot’s model of six basic emotions. On the basis of the achieved results, we argue that the automatic filtering of training sets can make the application of the distant supervision approach more effective in many use cases.

Automatic Creation of a Large and Polished Training Set for Sentiment Analysis on Twitter / Cagnoni, Stefano; Fornacciari, Paolo; Kavaja, Juxhino; Mordonini, Monica; Poggi, Agostino; Solimeo, Alex; Tomaiuolo, Michele. - STAMPA. - 10710:(2018), pp. 146-157. ( 3rd International Conference on Machine Learning, Optimization, and Data Science (MOD) Volterra (Italy) SEP 14-17, 2017) [10.1007/978-3-319-72926-8_13].

Automatic Creation of a Large and Polished Training Set for Sentiment Analysis on Twitter

Stefano Cagnoni;Paolo Fornacciari;KAVAJA, JUXHINO;Monica Mordonini;Agostino Poggi;SOLIMEO, ALEX;Michele Tomaiuolo

2018-01-01

Abstract

Within the field of sentiment analysis and emotion detection applied to tweets, one of the main problems related to the construction of an automatic classifier is the lack of suitable training sets. Considering the tediousness of manually annotating a training set, and the noise present in data collected directly from the social web, in this paper we propose an iterative learning approach, which combines distant supervision with dataset pruning technique. In particular, following the “eat your own dogfood” idea, we have applied a classifier, trained on raw data obtained from different Twitter channels, to the same original dataset, for removing the most dubious instances automatically. This kind of approach has been used to obtain a more polished training set for emotion classification, based on Parrot’s model of six basic emotions. On the basis of the achieved results, we argue that the automatic filtering of training sets can make the application of the distant supervision approach more effective in many use cases.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2018
			
	Codice ISBN
	
				978-3-319-72925-1
			
	Citazione
	
				Automatic Creation of a Large and Polished Training Set for Sentiment Analysis on Twitter / Cagnoni, Stefano; Fornacciari, Paolo; Kavaja, Juxhino; Mordonini, Monica; Poggi, Agostino; Solimeo, Alex; Tomaiuolo, Michele. - STAMPA. - 10710:(2018), pp. 146-157. ( 3rd International Conference on Machine Learning, Optimization, and Data Science (MOD) Volterra (Italy) SEP 14-17, 2017) [10.1007/978-3-319-72926-8_13].
			
	Appare nelle tipologie:
	
				4.1b Atto convegno Volume

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11381/2836956

Citazioni

ND

4

0

social impact