Comparative Performance of agentic AI and Physicians in Taking Clinical History across Leading Large Language Models (LLMs).
| Authors | |
| Keywords | |
| Abstract | Comprehensive clinical history taking is essential for high-quality care. We hypothesized that large language models (LLMs), guided by a structured agentic framework, can efficiently obtain clinically meaningful patient histories. We developed an iterative prompting system that evaluates relevance and completeness across standard history domains and generates targeted follow-up questions until sufficient detail is obtained. We built a patient-facing application and evaluated it using 52 published case reports and 20 constructed clinical scenarios with simulated patient interactions. The framework was implemented using GPT-4o, Gemini-2.5-Flash-Lite, or Grok-3. After each interaction, the system generated an EHR-ready clinical summary, differential diagnosis, and recommended investigations. Across models, relevant history elements were captured with >85% accuracy and F1 scores, as independently assessed by three blinded physicians, and recommended investigations aligned with those used to establish final diagnoses. These findings support the potential of agentic LLM systems for structured clinical history collection and justify prospective clinical evaluation. |
| Year of Publication | 2026
|
| Journal | medRxiv : the preprint server for health sciences
|
| Date Published | 01/2026
|
| DOI | 10.64898/2026.01.23.26344723
|
| PubMed ID | 41646794
|
| Links |