We present an automated journal recommendation pipeline designed to evaluate the performance of five Sentence Transformer models—all-mpnet-base-v2 (Mpnet), all-MiniLM-L6-v2 (Minilm-l6), all-MiniLM-L12-v2 (Minilm-l12), multi-qa-distilbert-cos-v1 (Multi-qa-distilbert), and all-distilroberta-v1 (roberta)—for recommending journals aligned with a manuscript’s thematic scope. The pipeline extracts domain-relevant keywords from a manuscript via KeyBERT, retrieves potentially related articles from PubMed, and encodes both the test manuscript and retrieved articles into high-dimensional embeddings. By computing cosine similarity, it ranks relevant journals based on thematic overlap. Evaluations on 50 test articles highlight mpnet’s strong performance (mean similarity score 0.71 ± 0.04), albeit with higher computational demands. Minilm-l12 and minilm-l6 offer comparable precision at lower cost, while multi-qa-distilbert and roberta yield broader recommendations better suited to interdisciplinary research. These findings underscore key trade-offs among embedding models and demonstrate how they can provide interpretable, data-driven insights to guide journal selection across varied research contexts.

A Comparative Analysis of Sentence Transformer Models for Automated Journal Recommendation Using PubMed Metadata / Colangelo, M. T.; Meleti, M.; Guizzardi, S.; Calciolari, E.; Galli, C.. - In: BIG DATA AND COGNITIVE COMPUTING. - ISSN 2504-2289. - 9:3(2025). [10.3390/bdcc9030067]

A Comparative Analysis of Sentence Transformer Models for Automated Journal Recommendation Using PubMed Metadata

Colangelo M. T.;Meleti M.;Guizzardi S.;Calciolari E.;Galli C.
2025-01-01

Abstract

We present an automated journal recommendation pipeline designed to evaluate the performance of five Sentence Transformer models—all-mpnet-base-v2 (Mpnet), all-MiniLM-L6-v2 (Minilm-l6), all-MiniLM-L12-v2 (Minilm-l12), multi-qa-distilbert-cos-v1 (Multi-qa-distilbert), and all-distilroberta-v1 (roberta)—for recommending journals aligned with a manuscript’s thematic scope. The pipeline extracts domain-relevant keywords from a manuscript via KeyBERT, retrieves potentially related articles from PubMed, and encodes both the test manuscript and retrieved articles into high-dimensional embeddings. By computing cosine similarity, it ranks relevant journals based on thematic overlap. Evaluations on 50 test articles highlight mpnet’s strong performance (mean similarity score 0.71 ± 0.04), albeit with higher computational demands. Minilm-l12 and minilm-l6 offer comparable precision at lower cost, while multi-qa-distilbert and roberta yield broader recommendations better suited to interdisciplinary research. These findings underscore key trade-offs among embedding models and demonstrate how they can provide interpretable, data-driven insights to guide journal selection across varied research contexts.
2025
A Comparative Analysis of Sentence Transformer Models for Automated Journal Recommendation Using PubMed Metadata / Colangelo, M. T.; Meleti, M.; Guizzardi, S.; Calciolari, E.; Galli, C.. - In: BIG DATA AND COGNITIVE COMPUTING. - ISSN 2504-2289. - 9:3(2025). [10.3390/bdcc9030067]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11381/3028061
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 5
social impact