The identification of atypical observations and the immunization of data analysis against both outliers and failures of modeling are important aspects of modern statistics. The forward search is a graphics rich approach that leads to the formal detection of outliers and to the detection of model inadequacy combined with suggestions for model enhancement. The key idea is to monitor quantities of interest, such as parameter estimates and test statistics, as the model is fitted to data subsets of increasing size. In this paper we propose some computational improvements of the forward search algorithm and we provide a recursive implementation of the procedure which exploits the information of the previous step. The output is a set of efficient routines for fast updating of the model parameter estimates, which do not require any data sorting, and fast computation of likelihood contributions, which do not require matrix inversion or qr decomposition. It is shown that the new algorithms enable a reduction of the computation time by more than 80%. Furthermore, the running time now increases almost linearly with the sample size. All the routines described in this paper are included in the FSDA toolbox for MATLAB which is freely downloadable from the internet.

The Forward Search for Very Large Datasets / Riani, Marco; Perrotta, Domenico; Cerioli, Andrea. - In: JOURNAL OF STATISTICAL SOFTWARE. - ISSN 1548-7660. - 67:Code Snippet 1(2015), pp. 1-20. [10.18637/jss.v067.c01]

The Forward Search for Very Large Datasets

RIANI, Marco;CERIOLI, Andrea
2015-01-01

Abstract

The identification of atypical observations and the immunization of data analysis against both outliers and failures of modeling are important aspects of modern statistics. The forward search is a graphics rich approach that leads to the formal detection of outliers and to the detection of model inadequacy combined with suggestions for model enhancement. The key idea is to monitor quantities of interest, such as parameter estimates and test statistics, as the model is fitted to data subsets of increasing size. In this paper we propose some computational improvements of the forward search algorithm and we provide a recursive implementation of the procedure which exploits the information of the previous step. The output is a set of efficient routines for fast updating of the model parameter estimates, which do not require any data sorting, and fast computation of likelihood contributions, which do not require matrix inversion or qr decomposition. It is shown that the new algorithms enable a reduction of the computation time by more than 80%. Furthermore, the running time now increases almost linearly with the sample size. All the routines described in this paper are included in the FSDA toolbox for MATLAB which is freely downloadable from the internet.
2015
The Forward Search for Very Large Datasets / Riani, Marco; Perrotta, Domenico; Cerioli, Andrea. - In: JOURNAL OF STATISTICAL SOFTWARE. - ISSN 1548-7660. - 67:Code Snippet 1(2015), pp. 1-20. [10.18637/jss.v067.c01]
File in questo prodotto:
File Dimensione Formato  
v67c01.pdf

accesso aperto

Descrizione: Published manuscript
Tipologia: Versione (PDF) editoriale
Licenza: Creative commons
Dimensione 575.88 kB
Formato Adobe PDF
575.88 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11381/2796924
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 15
  • ???jsp.display-item.citation.isi??? 14
social impact