Machine Learning for Bankruptcy Prediction in the American Stock Market: Dataset and Benchmarks

IRIS

Predicting corporate bankruptcy is one of the fundamental tasks in credit risk assessment. In particular, since the 2007/2008 financial crisis, it has become a priority for most financial institutions, practitioners, and academics. The recent advancements in machine learning (ML) enabled the development of several models for bankruptcy prediction. The most challenging aspect of this task is dealing with the class imbalance due to the rarity of bankruptcy events in the real economy. Furthermore, a fair comparison in the literature is difficult to make because bankruptcy datasets are not publicly available and because studies often restrict their datasets to specific economic sectors and markets and/or time periods. In this work, we investigated the design and the application of different ML models to two different tasks related to default events: (a) estimating survival probabilities over time; (b) default prediction using time-series accounting data with different lengths. The entire dataset used for the experiments has been made available to the scientific community for further research and benchmarking purposes. The dataset pertains to 8262 different public companies listed on the American stock market between 1999 and 2018. Finally, in light of the results obtained, we critically discuss the most interesting metrics as proposed benchmarks for future studies.

Machine Learning for Bankruptcy Prediction in the American Stock Market: Dataset and Benchmarks / Lombardo, G; Pellegrino, M; Adosoglou, G; Cagnoni, S; Pardalos, Pm; Poggi, A. - In: FUTURE INTERNET. - ISSN 1999-5903. - 14:8(2022), p. 244. [10.3390/fi14080244]

Machine Learning for Bankruptcy Prediction in the American Stock Market: Dataset and Benchmarks

Lombardo, G^{Conceptualization};Pellegrino, M^Software;Adosoglou, G^Methodology;Cagnoni, S^{Writing – Review & Editing};Pardalos, PM^Supervision;Poggi, A^Supervision

2022-01-01

Abstract

Predicting corporate bankruptcy is one of the fundamental tasks in credit risk assessment. In particular, since the 2007/2008 financial crisis, it has become a priority for most financial institutions, practitioners, and academics. The recent advancements in machine learning (ML) enabled the development of several models for bankruptcy prediction. The most challenging aspect of this task is dealing with the class imbalance due to the rarity of bankruptcy events in the real economy. Furthermore, a fair comparison in the literature is difficult to make because bankruptcy datasets are not publicly available and because studies often restrict their datasets to specific economic sectors and markets and/or time periods. In this work, we investigated the design and the application of different ML models to two different tasks related to default events: (a) estimating survival probabilities over time; (b) default prediction using time-series accounting data with different lengths. The entire dataset used for the experiments has been made available to the scientific community for further research and benchmarking purposes. The dataset pertains to 8262 different public companies listed on the American stock market between 1999 and 2018. Finally, in light of the results obtained, we critically discuss the most interesting metrics as proposed benchmarks for future studies.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2022
			
	Citazione
	
				Machine Learning for Bankruptcy Prediction in the American Stock Market: Dataset and Benchmarks / Lombardo, G; Pellegrino, M; Adosoglou, G; Cagnoni, S; Pardalos, Pm; Poggi, A. - In: FUTURE INTERNET. - ISSN 1999-5903. - 14:8(2022), p. 244. [10.3390/fi14080244]
			
	Appare nelle tipologie:
	
				1.1 Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
futureinternet-14-00244-v2.pdf accesso aperto Tipologia: Versione (PDF) editoriale Licenza: Creative commons Dimensione 567.37 kB Formato Adobe PDF Visualizza/Apri	567.37 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11381/2933563

Citazioni

ND

39

23

social impact