JRM

Journal of Radiology in Medicine is an international journal that published original research and articles in all areas of radiology. Its publishes original research articles, review articles, case reports, editorial commentaries, letters to the editor, educational articles, and conference/meeting announcements.

EndNote Style
Index
Original Article
Feasibility of a multimodal large language model in interpreting plain radiographs of bone tumors: a pilot study
Aims: To evaluate the diagnostic accuracy and clinical feasibility of a multimodal large language model (ChatGPT-5) in interpreting plain radiographs of bone tumors and differentiating between benign and malignant lesions.
Methods: This retrospective pilot study utilized 50 verified bone tumor cases (27 benign and 23 malignant) sourced from the Radiopaedia database. Anonymized radiographs were processed by ChatGPT-5 using a standardized zero-shot prompt in independent sessions to prevent contextual bias. Model performance was assessed based on the accuracy of the most likely diagnosis, the inclusion of correct diagnoses within the top three differentials, and benign–malignant classification metrics. Statistical analysis included the Clopper–Pearson binomial method for confidence intervals and McNemar’s exact test to evaluate improvements in diagnostic accuracy and potential systematic error asymmetry.
Results: The model achieved 100% accuracy in identifying the imaging modality and the affected bone. The accuracy for the single most likely diagnosis was 56.0% (95% CI: 41.3–70.0), which significantly increased to 70.0% (95% CI: 55.4–82.1) when two differential diagnoses were included (p=0.016). For benign–malignant classification, the model demonstrated an overall accuracy of 76.0%, with a high specificity of 96.3% but a notably limited sensitivity for malignancy at 52.2%. A statistically significant error asymmetry indicated a systematic tendency toward benign classification (p=0.006).
Conclusion: While ChatGPT-5 demonstrates proficiency in foundational radiographic identification, its low sensitivity for malignancy remains a critical limitation for independent clinical use. The results suggest that while multimodal LLMs may serve as promising educational or triage aids, they currently require rigorous human expert oversight to maintain diagnostic safety in image interpretation.


1. Lecler A, Duron L, Soyer P. Revolutionizing radiology with GPT-based models: current applications, future possibilities and limitations of ChatGPT. Diagn Interv Imaging. 2023;104(6):269-274. doi:10.1016/j.diii. 2023.02.003
2. Genez S, Özer H, Buz Yaşar A, et al. Evaluation of ChatGPT-5 for automated ASPECTS assessment on non-contrast CT in acute ischemic stroke. Diagnostics (Basel). 2025;15(24):3160. doi:10.3390/diagnostics15243160
3. Nguyen D, Rao A, Mazumder A, Succi MD. Exploring the accuracy of embedded ChatGPT-4 and ChatGPT-4o in generating BI-RADS scores: a pilot study in radiologic clinical support. Clin Imaging. 2025; 117:110335. doi:10.1016/j.clinimag.2024.110335
4. Wu Z, Li S, Zhao X. The application of ChatGPT in medical education: prospects and challenges. Int J Surg. 2025;111(1):1652-1653. doi:10.1097/JS9.0000000000001887
5. Liu M, Okuhara T, Chang X, et al. Performance of ChatGPT across different versions in medical licensing examinations worldwide: systematic review and meta-analysis. J Med Internet Res. 2024; 26:e60807. doi:10.2196/60807
6. Costelloe CM, Madewell JE. Radiography in the initial diagnosis of primary bone tumors.AJR Am J Roentgenol. 2013;200(1):3-7. doi:10. 2214/AJR.12.8488
7. Priolo F, Cerase A. The current role of radiography in the assessment of skeletal tumors and tumor-like lesions. Eur J Radiol. 1998;27 Suppl 1: S77-S85. doi:10.1016/s0720-048x(98)00047-3.
8. Kitamura FC. ChatGPT Is shaping the future of medical writing but still requires human judgment. Radiology. 2023;307(2):e230171. doi:10. 1148/radiol.230171
9. Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2):e0000198. doi:10.1371/journal.pdig.0000198
10. Liao W, Liu Z, Dai H, et al. Differentiating ChatGPT-generated and human-written medical texts: quantitative study. JMIR Med Educ. 2023;9:e48904. doi:10.2196/48904
11. Hayden N, Gilbert S, Poisson LM Griffith B, Klochko C. Performance of GPT-4 with vision on text- and image-based ACR diagnostic radiology in-training examination questions. Radiology. 2024;312(3):e240153. doi:10.1148/radiol.240153
12. Handa P, Chhabra D, Goel N, Krishnan S. Exploring the role of ChatGPT in medical image analysis. Biomed Signal Process Control. 2023;86:105292. doi:10.1016/j.bspc.2023.105292.
13. Dehdab R, Brendlin A, Werner S, et al. Evaluating ChatGPT-4V in chest CT diagnostics: a critical image interpretation assessment. Jpn J Radiol. 2024;42(10):1168-1177. doi:10.1007/s11604-024-01606-3
14. Lacaita PG, Galijasevic M, Swoboda M, et al. The accuracy of ChatGPT-4o in interpreting chest and abdominal X-Ray images. J Pers Med. 2025; 15(5):194. doi:10.3390/jpm15050194
15. Hiredesai AN, Martinez CJ, Anderson ML, Howlett CP, Unadkat KD, Noland SS. Is artificial intelligence the future of radiology? Accuracy of ChatGPT in radiologic diagnosis of upper extremity bony pathology. Hand (N Y). 2026;21(1):73-80. doi:10.1177/15589447241298982
16. Yang X, Chen W. The performance of ChatGPT on medical image-based assessments and implications for medical education. BMC Med Educ. 2025;25(1):1192. doi:10.1186/s12909-025-07752-0
17. Atakır K, Işın K, Taş A, Önder H.Diagnostic accuracy and consistency of ChatGPT-4o in radiology: influence of image, clinical data, and answer options on performance. Diagn Interv Radiol. 2025. doi:10.4274/dir.2025.253460
18. Lyu Q, Tan J, Zapadka ME, et al. Translating radiology reports into plain language using ChatGPT and GPT-4 with prompt learning: results, limitations, and potential. Vis Comput Ind Biomed Art. 2023; 6(1):9. doi:10.1186/s42492-023-00136-5
Volume 3, Issue 1, 2026
Page : 5-8
_Footer