The Future of AI in Healthcare: Bridging Potential and Practice

By: Lana Cheng

As artificial intelligence (AI) models like ChatGPT continue to evolve, the idea of a healthcare system deeply intertwined with AI feels less like science fiction and more like an impending reality. Researchers are investigating how these cutting-edge tools can transform medical decision-making. A recent study published in JAMA Network Open, a medical journal by the American Medical Association, investigated the effects of large language models such as ChatGPT in improving diagnostic reasoning performance among physicians in various specialties. What they discovered surprised even the experts.

The experiment recruited 50 physicians from various hospital systems across the U.S. Participants were presented with real-world patient cases and divided into two groups. One group relied on conventional diagnostic tools, such as medical manuals and internet resources. The other group had ChatGPT at their disposal as a diagnostic aid. At first glance, the results seemed modest: the traditional group scored a median of 74%, while the AI-assisted group performed slightly better at 76%.

But the real surprise came when researchers tested ChatGPT on its own. When tasked with analyzing the same cases independently, the AI achieved a remarkable median score of 92%, outperforming both groups of physicians.

This unexpected outcome raised an intriguing question: Why didn’t the group of participants with access to ChatGPT perform better? After all, the tool at their fingertips is the same tool that showed exceptional performance on its own.

Jonathan H. Chen, the study’s senior author and an assistant professor at Stanford’s School of Medicine, suggests that the answer lies in human behavior. “What is very possible is that once a human feels like they have got a diagnosis, they do not ‘waste time or space’ on explaining more of the steps for why,” Chen explains. This insight aligns with a broader medical phenomenon: Experienced practitioners often rely on intuition or pattern recognition, sometimes limiting their willingness to explore alternative explanations—even when aided by advanced tools.

Researchers delved deeper by examining the interactions between doctors and ChatGPT. They found that many physicians used the AI model like a search engine, providing fragmented inputs rather than full patient cases. Others dismissed ChatGPT’s recommendations when they conflicted with their own diagnoses. Few doctors fully harnessed the AI’s potential, opting to treat it as a search engine rather than using it to analyze comprehensive cases and generate detailed, case-specific insights.

References

Colón-Rodríguez, C. (2023, July 12). Shedding Light on Healthcare Algorithmic and Artificial Intelligence Bias | Office of Minority Health. Minorityhealth.hhs.gov. https://minorityhealth.hhs.gov/news/shedding-light-healthcare-algorithmic-and-artificial-intelligence-bias

Goh, E., Gallo, R., Hom, J., Strong, E., Weng, Y., Kerman, H., Cool, J. A., Kanjee, Z., Parsons, A. S., Ahuja, N., Horvitz, E., Yang, D., Milstein, A., Olson, A. P. J., Rodman, A., & Chen, J. H. (2024). Large Language Model Influence on Diagnostic Reasoning. JAMA Network Open, 7(10), e2440969. https://doi.org/10.1001/jamanetworkopen.2024.40969

Hadhazy, A. (2024, October 28). Can AI Improve Medical Diagnostic Accuracy? Stanford HAI; Stanford University. https://hai.stanford.edu/news/can-ai-improve-medical-diagnostic-accuracy

Kolata, G. (2024, November 17). ChatGPT Defeated Doctors at Diagnosing Illness. The New York Times. https://www.nytimes.com/2024/11/17/health/chatgpt-ai-doctors-diagnosis.html

Los Angeles Pacific University. (2023, December 21). Revolutionizing Healthcare: How Is AI Being Used in the Healthcare Industry? Los Angeles Pacific University. https://www.lapu.edu/ai-health-care-industry/

Ravindra Kumar Garg, Urs, V. L., Agrawal, A., Sarvesh Kumar Chaudhary, Vimal Kumar Paliwal, & Sujita Kumar Kar. (2023). Exploring the role of ChatGPT in patient care (diagnosis and treatment) and medical research: A systematic review. Health Promotion Perspectives, 13(3), 183–191. https://doi.org/10.34172/hpp.2023.22

Shieh, A., Tran, B., He, G., Kumar, M., Freed, J. A., & Majety, P. (2024). Assessing ChatGPT 4.0’s test performance and clinical diagnostic accuracy on USMLE STEP 2 CK and clinical case reports. Scientific Reports, 14(1), 9330. https://doi.org/10.1038/s41598-024-58760-x