Systematic reviews require labor-intensive screening processes—an approach prone to bottlenecks, delays, and scalability constraints in large-scale reviews. Large Language Models (LLMs) have recently emerged as a powerful alternative, capable of operating in zero-shot or few-shot modes to classify abstracts according to predefined criteria without requiring continuous human intervention like semi-automated platforms. This review focuses on the central challenges that users in the biomedical field encounter when integrating LLMs—such as GPT-4—into evidence-based research. It examines critical requirements for software and data preprocessing, discusses various prompt strategies, and underscores the continued need for human oversight to maintain rigorous quality control. By drawing on current practices for cost management, reproducibility, and prompt refinement, this article highlights how review teams can substantially reduce screening workloads without compromising the comprehensiveness of evidence-based inquiry. The findings presented aim to balance the strengths of LLM-driven automation with structured human checks, ensuring that systematic reviews retain their methodological integrity while leveraging the efficiency gains made possible by recent advances in artificial intelligence.

Large Language Models in Systematic Review Screening: Opportunities, Challenges, and Methodological Considerations / Galli, C.; Gavrilova, A. V.; Calciolari, E.. - In: INFORMATION. - ISSN 2078-2489. - 16:5(2025). [10.3390/info16050378]

Large Language Models in Systematic Review Screening: Opportunities, Challenges, and Methodological Considerations

Galli C.;Calciolari E.
2025-01-01

Abstract

Systematic reviews require labor-intensive screening processes—an approach prone to bottlenecks, delays, and scalability constraints in large-scale reviews. Large Language Models (LLMs) have recently emerged as a powerful alternative, capable of operating in zero-shot or few-shot modes to classify abstracts according to predefined criteria without requiring continuous human intervention like semi-automated platforms. This review focuses on the central challenges that users in the biomedical field encounter when integrating LLMs—such as GPT-4—into evidence-based research. It examines critical requirements for software and data preprocessing, discusses various prompt strategies, and underscores the continued need for human oversight to maintain rigorous quality control. By drawing on current practices for cost management, reproducibility, and prompt refinement, this article highlights how review teams can substantially reduce screening workloads without compromising the comprehensiveness of evidence-based inquiry. The findings presented aim to balance the strengths of LLM-driven automation with structured human checks, ensuring that systematic reviews retain their methodological integrity while leveraging the efficiency gains made possible by recent advances in artificial intelligence.
2025
Large Language Models in Systematic Review Screening: Opportunities, Challenges, and Methodological Considerations / Galli, C.; Gavrilova, A. V.; Calciolari, E.. - In: INFORMATION. - ISSN 2078-2489. - 16:5(2025). [10.3390/info16050378]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11381/3028062
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 2
social impact