Natural language processing algorithm accurately classifies diverticulitis-related complications and predicts long-term outcomes.
| Authors | |
| Keywords | |
| Abstract | BACKGROUND & AIMS: Diagnostic codes lack the precision to identify specific complications of diverticulitis, limiting their utility in large-scale, real-world data. We developed a natural language processing (NLP) algorithm to classify diverticulitis and associated features using computed tomography (CT) reports.METHODS: Using data from Mass General Brigham Research Patient Data Registry (1979-2024), we identified patients with a diagnosis code for diverticular disease (ICD-9: 562; ICD-10: K57) and a prior abdominopelvic CT report. We developed and validated our NLP algorithm to detect diverticulitis and associated features. We subsequently investigated the associations between NLP-defined severity at first diagnosis (i.e., uncomplicated, mild, severe, or chronic complications) and risk of severe diverticulitis recurrence using a Cox proportional hazards regression model. We assessed the predictive value of NLP-detected features using random forest models.RESULTS: The NLP algorithm achieved positive and negative predictive values of 82.8% to 99.9%, outperforming both ICD codes and a generalist large language model. Among 16,349 patients with NLP-detected diverticulitis, 3,192 developed severe recurrence over 76,736 person-years. Compared to uncomplicated diverticulitis, the multivariable-adjusted hazard ratio (HR) for severe recurrence was 1.39 (95% confidence interval [CI]: 1.14-1.69) for mild complications, 3.02 (95% CI: 2.80-3.27) for severe complications, and 5.41 (95% CI: 4.78-6.13) for chronic complications. NLP-detected features significantly improved the prediction of severe diverticulitis recurrence compared to codified variables.CONCLUSION: Our NLP algorithm accurately classifies diverticulitis features, facilitating the construction of large and high-quality EHR-based cohorts. Severity at initial diagnosis predicts risk of severe recurrence, supporting the use of artificial intelligence for risk stratification and long-term management. |
| Year of Publication | 2026
|
| Journal | Clinical gastroenterology and hepatology : the official clinical practice journal of the American Gastroenterological Association
|
| Date Published | 03/2026
|
| ISSN | 1542-7714
|
| DOI | 10.1016/j.cgh.2026.03.009
|
| PubMed ID | 41881290
|
| Links |