ӳ��ý

Cardiology knowledge assessment of retrieval-augmented open versus proprietary large language models.

PLOS digital health

Authors	Constantine Tarabanis Shaan Khurshid Areti Karamanou Rodo Piperaki Lucas Mavromatis Aris Hatzimemos Dimitrios Tachmatzidis Constantinos Bakogiannis Vassilios Vassilikos Patrick Ellinor Lior Jankelson Evangelos Kalampokis
Abstract	To evaluate the performance of open-weight and proprietary LLMs, with and without Retrieval-Augmented Generation (RAG), on cardiology board-style questions and benchmark them against the human average. We tested 14 LLMs (6 open-weight, 8 proprietary) on 449 multiple-choice questions from the American College of Cardiology Self-Assessment Program (ACCSAP). Accuracy was measured as percent correct. RAG was implemented using a knowledge base of 123 guideline and textbook documents. The open-weight model DeepSeek R1 achieved the highest accuracy at 86.9% (95% CI: 83.4-89.7%), outperforming proprietary models and the human average of 78%. GPT 4o (80.9%, 95% CI: 77.0-84.2%) and the commercial platform OpenEvidence (81.3%, 95% CI: 77.4-84.7%) demonstrated similar performance. A positive correlation between model size and performance was observed within model families, but across families, substantial variability persisted among models with similar parameter counts. After RAG, all models improved, and open-weight models like Mistral Large 2 (78.0%, 95% CI: 73.9-81.5) performed comparably to proprietary alternatives like GPT 4o. Large language models (LLMs) are increasingly integrated into clinical workflows, yet their performance in cardiovascular medicine remains insufficiently evaluated. Open-weight models can match or exceed proprietary systems in cardiovascular knowledge, with RAG particularly beneficial for smaller models. Given their transparency, configurability, and potential for local deployment, open-weight models, strategically augmented, represent viable, lower-cost alternatives for clinical applications. Open-weight LLMs demonstrate competency in cardiovascular medicine comparable to or exceeding that of proprietary models, with and without RAG depending on the model.
Year of Publication	2026
Journal	PLOS digital health
Volume	5
Issue	3
Pages	e0001029
Date Published	03/2026
ISSN	2767-3170
DOI	10.1371/journal.pdig.0001029
PubMed ID	41818295
Links

Recent ӳ��ý Publications

The role of frailty and comorbidities in severe infections and the risk of dementia: a prospective, multicohort, observational study.

Global trends and geographic variations of hypertension in childhood and adolescence 1990-2021: a systematic analysis of the Global Burden of Disease Study 2021.

Scalable biological-cognitive profiling for Alzheimer's disease in the population.

Resolving parameter uncertainty in SIR models through population-level serological surveillance: A synthetic study.

Extended precision cut liver slice culture models liver regeneration and ductular reaction.