Supervised methods of classification naturally exploit linear and non linear relationships between explanatory variables and a response. However, the presence of clusters may lead to a different pattern within each group. For instance, data can naturally be grouped in several linear structures and so, simple linear regression models can be used for classification. Estimation of linear models can be severely biased by influential observations or outliers. A practical problem arises when the groups identifying the different relationships are unknown, and the number of ``relevant'' variables is high. In such a context, supervised classification problem can become cumbersome. As a solution, within the general framework of generalized linear models, a new robust approach is to exploit the sequential ordering of the data provided by the forward search algorithm. Such algorithm will be used two-folds to address the problems of variable selection for model fit, while grouping the data naturally ``around'' the model. The influence of outliers, if any is inside the dataset, will be monitored at each step of the sequential procedure. Preliminary results on simulated data have highlighted the benefit of adopting the forward search algorithm, which can reveal masked outliers, influential observations and show hidden structures.

A forward approach for supervised classification with model selection / Corbellini, Aldo; Morelli, Gianluca; Laurini, Fabrizio. - STAMPA. - (2012), pp. 59-59. (Intervento presentato al convegno 6th CSDA International Conference on Computational and Financial Econometrics (CFE 2012) -- 5th International Conference of the ERCIM (European Research Consortium for Informatics and Mathematics) Working Group on Computing & Statistics (ERCIM 2012) tenutosi a Oviedo nel 1-3 Dicembre 2012).

A forward approach for supervised classification with model selection

CORBELLINI, Aldo;MORELLI, Gianluca;LAURINI, Fabrizio
Methodology
2012-01-01

Abstract

Supervised methods of classification naturally exploit linear and non linear relationships between explanatory variables and a response. However, the presence of clusters may lead to a different pattern within each group. For instance, data can naturally be grouped in several linear structures and so, simple linear regression models can be used for classification. Estimation of linear models can be severely biased by influential observations or outliers. A practical problem arises when the groups identifying the different relationships are unknown, and the number of ``relevant'' variables is high. In such a context, supervised classification problem can become cumbersome. As a solution, within the general framework of generalized linear models, a new robust approach is to exploit the sequential ordering of the data provided by the forward search algorithm. Such algorithm will be used two-folds to address the problems of variable selection for model fit, while grouping the data naturally ``around'' the model. The influence of outliers, if any is inside the dataset, will be monitored at each step of the sequential procedure. Preliminary results on simulated data have highlighted the benefit of adopting the forward search algorithm, which can reveal masked outliers, influential observations and show hidden structures.
2012
978-84-937822-2-1
A forward approach for supervised classification with model selection / Corbellini, Aldo; Morelli, Gianluca; Laurini, Fabrizio. - STAMPA. - (2012), pp. 59-59. (Intervento presentato al convegno 6th CSDA International Conference on Computational and Financial Econometrics (CFE 2012) -- 5th International Conference of the ERCIM (European Research Consortium for Informatics and Mathematics) Working Group on Computing & Statistics (ERCIM 2012) tenutosi a Oviedo nel 1-3 Dicembre 2012).
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11381/2817288
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact