Predicting corporate bankruptcy is one of the fundamental tasks in credit risk assessment. In particular, since the 2007/2008 financial crisis, it has become a priority for most financial institutions, practitioners, and academics. The recent advancements in machine learning (ML) enabled the development of several models for bankruptcy prediction. The most challenging aspect of this task is dealing with the class imbalance due to the rarity of bankruptcy events in the real economy. Furthermore, a fair comparison in the literature is difficult to make because bankruptcy datasets are not publicly available and because studies often restrict their datasets to specific economic sectors and markets and/or time periods. In this work, we investigated the design and the application of different ML models to two different tasks related to default events: (a) estimating survival probabilities over time; (b) default prediction using time-series accounting data with different lengths. The entire dataset used for the experiments has been made available to the scientific community for further research and benchmarking purposes. The dataset pertains to 8262 different public companies listed on the American stock market between 1999 and 2018. Finally, in light of the results obtained, we critically discuss the most interesting metrics as proposed benchmarks for future studies.

Machine Learning for Bankruptcy Prediction in the American Stock Market: Dataset and Benchmarks / Lombardo, G; Pellegrino, M; Adosoglou, G; Cagnoni, S; Pardalos, Pm; Poggi, A. - In: FUTURE INTERNET. - ISSN 1999-5903. - 14:8(2022), p. 244. [10.3390/fi14080244]

Machine Learning for Bankruptcy Prediction in the American Stock Market: Dataset and Benchmarks

Lombardo, G
Conceptualization
;
Pellegrino, M
Software
;
Cagnoni, S
Writing – Review & Editing
;
Poggi, A
Supervision
2022-01-01

Abstract

Predicting corporate bankruptcy is one of the fundamental tasks in credit risk assessment. In particular, since the 2007/2008 financial crisis, it has become a priority for most financial institutions, practitioners, and academics. The recent advancements in machine learning (ML) enabled the development of several models for bankruptcy prediction. The most challenging aspect of this task is dealing with the class imbalance due to the rarity of bankruptcy events in the real economy. Furthermore, a fair comparison in the literature is difficult to make because bankruptcy datasets are not publicly available and because studies often restrict their datasets to specific economic sectors and markets and/or time periods. In this work, we investigated the design and the application of different ML models to two different tasks related to default events: (a) estimating survival probabilities over time; (b) default prediction using time-series accounting data with different lengths. The entire dataset used for the experiments has been made available to the scientific community for further research and benchmarking purposes. The dataset pertains to 8262 different public companies listed on the American stock market between 1999 and 2018. Finally, in light of the results obtained, we critically discuss the most interesting metrics as proposed benchmarks for future studies.
2022
Machine Learning for Bankruptcy Prediction in the American Stock Market: Dataset and Benchmarks / Lombardo, G; Pellegrino, M; Adosoglou, G; Cagnoni, S; Pardalos, Pm; Poggi, A. - In: FUTURE INTERNET. - ISSN 1999-5903. - 14:8(2022), p. 244. [10.3390/fi14080244]
File in questo prodotto:
File Dimensione Formato  
futureinternet-14-00244-v2.pdf

accesso aperto

Tipologia: Versione (PDF) editoriale
Licenza: Creative commons
Dimensione 567.37 kB
Formato Adobe PDF
567.37 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11381/2933563
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 27
  • ???jsp.display-item.citation.isi??? 18
social impact