Ontology accelerates few-shot learning capability of large language model: A study in extraction of drug efficacy in a rare pediatric epilepsy.

International journal of medical informatics
Authors
Keywords
Abstract

OBJECTIVE: Dravet Syndrome (DS) is a developmental and epileptic encephalopathy that is characterized by severe, prolonged motor seizures and high resistance to multiple antiseizure medications (ASMs) with multiple comorbidities. Evaluating the efficacy of new drugs in DS preclinical models and mapping them to human phenotypes of DS through analysis of published literature is an important goal for improving outcomes in this rare pediatric epilepsy.MATERIALS AND METHODS: Large language models (LLM) have demonstrated great promise in parsing published literature; however, the performance of LLMs falls short in medical applications. In this study, we investigate the effectiveness of domain ontology developed by human experts to optimize LLMs for medical text processing in a rare disease. Utilizing a benchmark dataset that describes the efficacy of 17 ASMs tested in preclinical models and DS patients, we define a new ontology-augmented phased in-context learning (PCL) approach to process 4935 full-text DS articles. We expand this analysis to 7 new drugs that demonstrate efficacy in reducing seizures to identify gaps in current knowledge for designing new experimental studies for drug discovery in DS.RESULTS: Few-shot or in-context learning is a foundational capability of LLMs and the few-shot learning capability of the Gemini 1.0 Pro version LLM dramatically increases when we augment prompts with the DS epilepsy ontology. The DS epilepsy ontology is the largest epilepsy and seizure ontology in clinical use that was developed by DS basic scientists and clinical neurologists. The ontology-augmented PCL prompt achieves 100% accuracy in reproducing the benchmark drug efficacy dataset for 17 ASMs with only two examples for in-context learning.CONCLUSION: The new ontology-augmented PCL approach significantly accelerates the few-shot learning capabilities of the Gemini LLM, thereby reducing the number of required examples and time needed to optimize LLMs for medical applications.

Year of Publication
2025
Journal
International journal of medical informatics
Volume
201
Pages
105942
Date Published
09/2025
ISSN
1872-8243
DOI
10.1016/j.ijmedinf.2025.105942
PubMed ID
40311258
Links