Introduction
Recent advances in the application of LLMs in healthcare have shown that these models can perform at levels approaching those of human experts on rigorous standardized assessments such as the USMLE.By mastering complex clinical reasoning, differential diagnosis, and evidence-based management queries, LLMs in healthcare demonstrate both broad knowledge recall and flexible application in exam-like scenarios. This milestone has accelerated efforts to integrate AI-driven tools into clinical workflows, enabling physicians to streamline documentation, generate patient summaries, and access relevant literature in real time.
A new medRxiv study compared physicians using conventional decision-support systems to those collaborating with a purpose-built LLM interface that provided ranked differential diagnoses and evidence summaries.
Methodology
- Model used: A GPT-4 model tailored through the GPTs framework with a specialized system prompt.
- Datasets used: All participants evaluated a set of up to six clinical vignettes, with their order randomized
- Prompting strategy: The system prompt framed GPT-4 as a “diagnostic collaborator” to generate, rank, and critique differential diagnoses alongside clinicians. First-opinion users fed cases up front; second-opinion users submitted their own assessments before reviewing the AI summary—and all could then discuss freely before finalizing.
- Comparison: The comparison was performed among clinicians using AI-as-first-opinion, AI-as-second opinion and using conventional tools alone

Fig 1: Application of LLMs in healthcare ensure higher diagnostic accuracy
Results
- Performance Improvement: Clinicians using only conventional resources scored 75%, whereas AI-as-first-opinion users scored 85% and AI-as-second-opinion users scored 82%
- Workflow Equivalence: After adjusting for case and clinician variability, there was no significant difference in overall scores between the two AI workflows.
- Decision Speed: AI-as-first opinion yielded faster case completion and superior performance on clinically actionable decisions.
- User Acceptance: Over 95% of participants found the AI tool valuable, would use it daily, and reported increased confidence in their differential diagnoses.
- These findings underscore the real-world potential of application of LLMs in healthcare
What This Means for Using AI for Diagnosis
- Enhanced Accuracy: Integrating a collaborative and customized LLM in healthcare can boost diagnostic accuracy by nearly 10 percentage points versus traditional methods.
- Workflow Flexibility: Using AI at any point in the diagnostic workflow improves performance, allowing institutions to tailor integration to existing clinical routines. With LLMs in healthcare configured to align with local protocols and EMR systems, teams can choose the level of AI support that best complements their expertise and workload.
- Actionable Insights: AI supports accelerated decision-making and is particularly effective for tasks requiring immediate, actionable management plans. This capability not only streamlines multidisciplinary care but also empowers HCPs to initiate timely interventions, ultimately shortening time-to-treatment and improving clinical efficiency.
Benefits for the medical community
- Physicians: Using custom LLMs in healthcare will help enhance diagnostic accuracy, increase confidence, make case reviews faster, allowing more time for patient care.
- Hospitals & Clinics: These institutions will benefit from streamlined workflows with AI support, reduced diagnostic errors, and improved quality metrics.
- Medical students: It provided an interactive learning tool for students, that offers instant feedback on clinical reasoning and diverse case exposure.
Conclusion
This study demonstrates that a custom LLM designed for collaboration with clinicians can significantly boost diagnostic accuracy. Rather than asking if AI will replace doctors, we should focus on how AI and clinicians can work together to improve learning, decision-making, efficiency, and patient care. In this vision, tailored LLMs for healthcare will pave the way for a future where human expertise and AI intelligence combine to deliver better outcomes. Platforms like Dx meet this need by providing access to up-to-date medical guidelines, AI-driven Q&A for education and exam prep, and sophisticated clinical tools for diagnostics, all engineered to empower tomorrow’s healthcare professionals.
Read the full story on medrxiv. Also, explore Dx’s potential with its specialized diagnosis module, designed to assist in accurate and timely medical diagnoses! Try it now and share your feedback to help us make it even better.