Investigating the Accuracy and Consistency of ChatGPT in the Management of Achilles Tendon Ruptures

Scritto il 06/03/2025

da Christopha J Knee

Cureus. 2025 Feb 3;17(2):e78433. doi: 10.7759/cureus.78433. eCollection 2025 Feb.

ABSTRACT

Background The emergence of generative artificial intelligence, such as ChatGPT (OpenAI, San Francisco, CA, USA), offers significant potential for improving the delivery of patient information and aiding in clinical decision-making. The aim of this study was to investigate the accuracy and consistency of ChatGPT in providing patient information and answering orthopaedic clinical questions regarding Achilles tendon ruptures. Methods Eight questions regarding Achilles tendon rupture management were presented to ChatGPT twice, resulting in 16 responses. References were requested for all responses. Each response was evaluated for accuracy and consistency, utilising a grading scale ranging from I (comprehensive) to IV (completely incorrect). Final grading was determined through consensus discussions among two orthopaedic registrars and two senior orthopaedic surgeons. Descriptive statistics were performed. Results All of the responses produced by ChatGPT were graded as containing both correct and incorrect information (grade III). Consistency was observed in six out of eight (75%) questions when comparing the two responses for each question. ChatGPT provided 47 references, with 16 out of 47 (34%) correct, 19 out of 47 (40%) incorrect, and 12 out of 47 (26%) fabricated. Conclusion ChatGPT lacks accuracy and consistency in providing information on the management of Achilles tendon ruptures. All patient information and orthopaedic clinical decision-making recommendations contained inaccurate or fabricated information.

PMID:40046346 | PMC:PMC11882158 | DOI:10.7759/cureus.78433