Clusterwise Linear Regression (CLR) combines classical linear regression with cluster analysis to model heterogeneous data. It overcomes the limitations of a single global model by simultaneously partitioning the data points into distinct clusters and fitting each cluster separately. However, since the underlying point-to-cluster assignments are unknown, the estimation process typically leads to a computationally challenging combinatorial problem. In this work, we introduce a new reformulation of the CLR problem under Gaussian assumptions, and propose a probabilistic branch-and-bound algorithm called pclustreg. This algorithm gives, with high probability, solutions that are at least as good as the (unknown) ground truth in terms of log-likelihood, bridging the gap between existing likelihood-based heuristic and global methods. Moreover, by limiting the number of expanded nodes, it can also be used as an effective heuristic algorithm. We highlight the performance of pclustreg on both synthetic and real-world datasets, comparing it against the state-of-the-art likelihood-based heuristic method, and show that it achieves comparable or better results both in terms of solution accuracy and computing times.
Clusterwise linear regression using a probabilistic branch and bound algorithm under Gaussianity / Fois, A.; Insolia, L.; Consolini, L.; Laurini, F.; Locatelli, M.; Riani, M.. - In: COMPUTERS & OPERATIONS RESEARCH. - ISSN 0305-0548. - 189:(2026). [10.1016/j.cor.2025.107375]
Clusterwise linear regression using a probabilistic branch and bound algorithm under Gaussianity
Fois, A.
;Consolini, L.;Laurini, F.;Locatelli, M.;Riani, M.
2026-01-01
Abstract
Clusterwise Linear Regression (CLR) combines classical linear regression with cluster analysis to model heterogeneous data. It overcomes the limitations of a single global model by simultaneously partitioning the data points into distinct clusters and fitting each cluster separately. However, since the underlying point-to-cluster assignments are unknown, the estimation process typically leads to a computationally challenging combinatorial problem. In this work, we introduce a new reformulation of the CLR problem under Gaussian assumptions, and propose a probabilistic branch-and-bound algorithm called pclustreg. This algorithm gives, with high probability, solutions that are at least as good as the (unknown) ground truth in terms of log-likelihood, bridging the gap between existing likelihood-based heuristic and global methods. Moreover, by limiting the number of expanded nodes, it can also be used as an effective heuristic algorithm. We highlight the performance of pclustreg on both synthetic and real-world datasets, comparing it against the state-of-the-art likelihood-based heuristic method, and show that it achieves comparable or better results both in terms of solution accuracy and computing times.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


