Purpose – This study aims to evaluate the efficacy of modern machine learning classifiers, random forest, gradient boosting trees, decision trees, support vector machines and logistic regression, in forecasting corporate bankruptcy among Italian firms, aiming to surpass traditional credit-scoring approaches by leveraging rich financial data. Design/methodology/approach – Using a comprehensive panel of 1, 826, 157 firm–year observations (532, 255 active; 76, 464 bankrupt) from 1980 to 2019, the authors compare models trained on different data configurations, while addressing class imbalance through undersampling and advanced synthetic minority over-sampling technique (SMOTE) techniques. Models are validated on held-out samples, regional subsets and an out-of-time test (2016–2017), with performance gauged by area under the curve (AUC), F1-score, precision, recall and specificity. Findings – Ensemble methods (random forest and gradient boosting) outperform other classifiers, particularly when using raw accounting inputs, achieving AUCs near 0.99 and F1-scores up to 0.98; resampling enhances robustness without diminishing predictive power, and variable-importance analysis underscores capital-structure metrics as key early warning indicators. Originality/value – To the best of the authors’ knowledge, this is the first large-scale Italian bankruptcy study to juxtapose ratio-based models with high-dimensional raw data under multiple SMOTE variants, revealing that comprehensive financial statement variables markedly improve predictive accuracy and offering novel insights for both researchers and risk practitioners.

Corporate financial distress prediction: a machine learning approach in the era of big data / Gabrielli, Gianluca; Melioli, Andrea; Bertini, Flavio. - In: JOURNAL OF ACCOUNTING & ORGANIZATIONAL CHANGE. - ISSN 1832-5912. - 22:7(2026), pp. 31-65. [10.1108/jaoc-05-2025-0166]

Corporate financial distress prediction: a machine learning approach in the era of big data

Gabrielli, Gianluca;Bertini, Flavio
2026-01-01

Abstract

Purpose – This study aims to evaluate the efficacy of modern machine learning classifiers, random forest, gradient boosting trees, decision trees, support vector machines and logistic regression, in forecasting corporate bankruptcy among Italian firms, aiming to surpass traditional credit-scoring approaches by leveraging rich financial data. Design/methodology/approach – Using a comprehensive panel of 1, 826, 157 firm–year observations (532, 255 active; 76, 464 bankrupt) from 1980 to 2019, the authors compare models trained on different data configurations, while addressing class imbalance through undersampling and advanced synthetic minority over-sampling technique (SMOTE) techniques. Models are validated on held-out samples, regional subsets and an out-of-time test (2016–2017), with performance gauged by area under the curve (AUC), F1-score, precision, recall and specificity. Findings – Ensemble methods (random forest and gradient boosting) outperform other classifiers, particularly when using raw accounting inputs, achieving AUCs near 0.99 and F1-scores up to 0.98; resampling enhances robustness without diminishing predictive power, and variable-importance analysis underscores capital-structure metrics as key early warning indicators. Originality/value – To the best of the authors’ knowledge, this is the first large-scale Italian bankruptcy study to juxtapose ratio-based models with high-dimensional raw data under multiple SMOTE variants, revealing that comprehensive financial statement variables markedly improve predictive accuracy and offering novel insights for both researchers and risk practitioners.
2026
Corporate financial distress prediction: a machine learning approach in the era of big data / Gabrielli, Gianluca; Melioli, Andrea; Bertini, Flavio. - In: JOURNAL OF ACCOUNTING & ORGANIZATIONAL CHANGE. - ISSN 1832-5912. - 22:7(2026), pp. 31-65. [10.1108/jaoc-05-2025-0166]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11381/3047693
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact