This study investigates the ability of six pre-trained sentence transformers to organize medical knowledge by performing unsupervised clustering on 70 high-level Medical Subject Headings (MeSH) terms across seven medical specialties. We evaluated models from different pre-training paradigms: general-purpose, domain-adapted, and from-scratch domain-specific. The results reveal a clear performance hierarchy. A top tier of models, including the general-purpose MPNet and the domain-adapted BioBERT and RoBERTa, produced highly coherent, specialty-aligned clusters (Adjusted Rand Index > 0.80). Conversely, models pre-trained from scratch on specialized corpora, such as PubMedBERT and BioClinicalBERT, performed poorly (Adjusted Rand Index < 0.51), with BioClinicalBERT yielding a disorganized clustering. These findings challenge the assumption that domain-specific pre-training guarantees superior performance for all semantic tasks. We conclude that model architecture, alignment between the pre-training objective and the downstream task, and the nature of the training data are more critical determinants of success for creating semantically coherent embedding spaces for medical concepts.
The Specialist’s Paradox: Generalist AI May Better Organize Medical Knowledge / Galli, Carlo; Colangelo, Maria Teresa; Meleti, Marco; Calciolari, Elena. - In: ALGORITHMS. - ISSN 1999-4893. - 18:7(2025). [10.3390/a18070451]
The Specialist’s Paradox: Generalist AI May Better Organize Medical Knowledge
Galli, Carlo;Colangelo, Maria Teresa;Meleti, Marco;Calciolari, Elena
2025-01-01
Abstract
This study investigates the ability of six pre-trained sentence transformers to organize medical knowledge by performing unsupervised clustering on 70 high-level Medical Subject Headings (MeSH) terms across seven medical specialties. We evaluated models from different pre-training paradigms: general-purpose, domain-adapted, and from-scratch domain-specific. The results reveal a clear performance hierarchy. A top tier of models, including the general-purpose MPNet and the domain-adapted BioBERT and RoBERTa, produced highly coherent, specialty-aligned clusters (Adjusted Rand Index > 0.80). Conversely, models pre-trained from scratch on specialized corpora, such as PubMedBERT and BioClinicalBERT, performed poorly (Adjusted Rand Index < 0.51), with BioClinicalBERT yielding a disorganized clustering. These findings challenge the assumption that domain-specific pre-training guarantees superior performance for all semantic tasks. We conclude that model architecture, alignment between the pre-training objective and the downstream task, and the nature of the training data are more critical determinants of success for creating semantically coherent embedding spaces for medical concepts.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


