A Comparative Analysis of Sentence Transformer Models for Automated Journal Recommendation Using PubMed Metadata

IRIS

We present an automated journal recommendation pipeline designed to evaluate the performance of five Sentence Transformer models—all-mpnet-base-v2 (Mpnet), all-MiniLM-L6-v2 (Minilm-l6), all-MiniLM-L12-v2 (Minilm-l12), multi-qa-distilbert-cos-v1 (Multi-qa-distilbert), and all-distilroberta-v1 (roberta)—for recommending journals aligned with a manuscript’s thematic scope. The pipeline extracts domain-relevant keywords from a manuscript via KeyBERT, retrieves potentially related articles from PubMed, and encodes both the test manuscript and retrieved articles into high-dimensional embeddings. By computing cosine similarity, it ranks relevant journals based on thematic overlap. Evaluations on 50 test articles highlight mpnet’s strong performance (mean similarity score 0.71 ± 0.04), albeit with higher computational demands. Minilm-l12 and minilm-l6 offer comparable precision at lower cost, while multi-qa-distilbert and roberta yield broader recommendations better suited to interdisciplinary research. These findings underscore key trade-offs among embedding models and demonstrate how they can provide interpretable, data-driven insights to guide journal selection across varied research contexts.

A Comparative Analysis of Sentence Transformer Models for Automated Journal Recommendation Using PubMed Metadata / Colangelo, M. T.; Meleti, M.; Guizzardi, S.; Calciolari, E.; Galli, C.. - In: BIG DATA AND COGNITIVE COMPUTING. - ISSN 2504-2289. - 9:3(2025). [10.3390/bdcc9030067]

A Comparative Analysis of Sentence Transformer Models for Automated Journal Recommendation Using PubMed Metadata

Colangelo M. T.;Meleti M.;Guizzardi S.;Calciolari E.;Galli C.

2025-01-01

Abstract

We present an automated journal recommendation pipeline designed to evaluate the performance of five Sentence Transformer models—all-mpnet-base-v2 (Mpnet), all-MiniLM-L6-v2 (Minilm-l6), all-MiniLM-L12-v2 (Minilm-l12), multi-qa-distilbert-cos-v1 (Multi-qa-distilbert), and all-distilroberta-v1 (roberta)—for recommending journals aligned with a manuscript’s thematic scope. The pipeline extracts domain-relevant keywords from a manuscript via KeyBERT, retrieves potentially related articles from PubMed, and encodes both the test manuscript and retrieved articles into high-dimensional embeddings. By computing cosine similarity, it ranks relevant journals based on thematic overlap. Evaluations on 50 test articles highlight mpnet’s strong performance (mean similarity score 0.71 ± 0.04), albeit with higher computational demands. Minilm-l12 and minilm-l6 offer comparable precision at lower cost, while multi-qa-distilbert and roberta yield broader recommendations better suited to interdisciplinary research. These findings underscore key trade-offs among embedding models and demonstrate how they can provide interpretable, data-driven insights to guide journal selection across varied research contexts.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2025
			
	Citazione
	
				A Comparative Analysis of Sentence Transformer Models for Automated Journal Recommendation Using PubMed Metadata / Colangelo, M. T.; Meleti, M.; Guizzardi, S.; Calciolari, E.; Galli, C.. - In: BIG DATA AND COGNITIVE COMPUTING. - ISSN 2504-2289. - 9:3(2025). [10.3390/bdcc9030067]
			
	Appare nelle tipologie:
	
				1.1 Articolo su rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11381/3028061

Citazioni

ND

5

5

social impact