Original Article

Vol. 41 No. 1 (2026): Vol. 41 No. 1 (2026): Archives of Rheumatology

Evaluation of the Diagnostic Performance of ChatGPT in Radiographic Staging of Sacroiliitis According to the Modified New York Criteria

Main Article Content

Uğur Güngör Demir
Ali Nail Demir
Alper Uysal

Abstract

 Background/Aims: This study aimed to evaluate the diagnostic performance of ChatGPT in grading sacroiliitis on pelvic radiographs according to the modified New York criteria.


Materials and Methods: This retrospective study included 266 individuals with or without radiographic sacroiliac joint involvement according to the modified New York criteria (231 with ankylosing spondylitis and 35 without radiographic evidence of sacroiliitis). Two experts independently graded all radiographs based on the modified New York criteria, with disagreements resolved by a third reviewer. ChatGPT-5o (OpenAI, 2025) was prompted to classify each radiograph using a standardized English-language instruction. ChatGPT’s grading outputs were compared with expert consensus.


Results: A statistically significant association was found between ChatGPT and expert gradings, but agreement remained slight (κ = 0.136). Multi-class performance was limited (overall accuracy = 30%), while binary analysis showed higher apparent accuracy (78%) due to a strong positive bias. Sensitivity was 0.796, specificity was 0.696, positive predictive value was 0.946, and negative predictive value was 0.338. Per-grade area under curve values ranged from 0.52 to 0.75, with the highest for Grade 0.


Conclusion: ChatGPT demonstrated only limited agreement with expert assessments and showed poor ability to distinguish between sacroiliitis stages, performing adequately only for normal joints. These findings suggest that large language models like ChatGPT are unsuitable for direct radiographic interpretation without integration into specialized, vision-based diagnostic frameworks.


Cite this article as: Güngör Demir U, Demir AN, Uysal A. Evaluation of the diagnostic performance of ChatGPT in radiographic staging of sacroiliitis according to the modified New York criteria. ArchRheumatol. 2026;41(1):57-63.

Article Details

Similar Articles

<< < 11 12 13 14 15 16 17 18 19 20 > >> 

You may also start an advanced similarity search for this article.