Disease prediction is a high demand field which requires significant support from machine learning (ML) to enhance the result efficiency. The research works on application of K-means clustering supervised classification in disease prediction where each class only has one labeled data. The K-centroid convergence clustering identification (KC 3 I) system is based on semi-K-means clustering but only requires single labeled data per class for the training process with the training dataset to update the centroid. The KC 3 I model also includes a dictionary box to index all the input centroids before and after the updating process. Each centroid matches with a corresponding label inside this box. After the training process, each time the input features arrive, the trained centroid will put them to its cluster depending on the Euclidean distance, then convert them into the specific class name, which is coherent to that centroid index. Two validation stages were carried out and accomplished the expectation in terms of precision, recall, F1-score, and absolute accuracy. The last part demonstrates the possibility of feature reduction by selecting the most crucial feature with the extra tree classifier method. Total data are fed into the KC 3 I system with the most important features and remain the same accuracy.

K-centroid convergence clustering identification in one-label per type for disease prediction / Hoang, Minh Long; Delmonte, Nicola. - In: IAES INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE. - ISSN 2089-4872. - 13:1(2024), pp. 1149-1159. [10.11591/ijai.v13.i1.pp1149-1159]

K-centroid convergence clustering identification in one-label per type for disease prediction

Hoang, Minh Long
;
Delmonte, Nicola
2024-01-01

Abstract

Disease prediction is a high demand field which requires significant support from machine learning (ML) to enhance the result efficiency. The research works on application of K-means clustering supervised classification in disease prediction where each class only has one labeled data. The K-centroid convergence clustering identification (KC 3 I) system is based on semi-K-means clustering but only requires single labeled data per class for the training process with the training dataset to update the centroid. The KC 3 I model also includes a dictionary box to index all the input centroids before and after the updating process. Each centroid matches with a corresponding label inside this box. After the training process, each time the input features arrive, the trained centroid will put them to its cluster depending on the Euclidean distance, then convert them into the specific class name, which is coherent to that centroid index. Two validation stages were carried out and accomplished the expectation in terms of precision, recall, F1-score, and absolute accuracy. The last part demonstrates the possibility of feature reduction by selecting the most crucial feature with the extra tree classifier method. Total data are fed into the KC 3 I system with the most important features and remain the same accuracy.
2024
K-centroid convergence clustering identification in one-label per type for disease prediction / Hoang, Minh Long; Delmonte, Nicola. - In: IAES INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE. - ISSN 2089-4872. - 13:1(2024), pp. 1149-1159. [10.11591/ijai.v13.i1.pp1149-1159]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11381/2968860
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? ND
social impact