A new study has found that ChatGPT Health, OpenAI’s medical chatbot, often fails to recognise when users need urgent care, raising concerns about the limits of AI in healthcare. Despite handling textbook emergencies effectively, the system underestimated the severity of more than half of the cases that required immediate attention.
ChatGPT Health, launched in January 2026, allows users to connect medical records and wellness app data to receive personalised health guidance. The service attracts more than 230 million users weekly, who ask questions ranging from food safety and allergy management to remedies for common colds.
Researchers at Mount Sinai in New York created 60 structured clinical scenarios across 21 medical specialties to test the chatbot. Cases ranged from minor conditions suitable for home care to true medical emergencies. Three independent physicians determined the correct level of urgency using guidelines from 56 medical societies.
“We wanted to answer a very basic but critical question: if someone is experiencing a real medical emergency and turns to ChatGPT Health for help, will it clearly tell them to go to the emergency room?” said Ashwin Ramaswamy, lead author of the study. He noted that the chatbot performed well in clear emergencies, such as strokes or severe allergic reactions, but struggled when danger was less obvious. In one asthma scenario, ChatGPT Health identified early warning signs of respiratory failure in its explanation but still advised waiting rather than seeking immediate care.
The study also highlighted risks in responding to users reporting self-harm or suicidal thoughts. While the chatbot is programmed to encourage help-seeking and link to crisis services, the banner linking to the suicide and crisis lifeline appeared inconsistently. Researchers found that it was more likely to prompt help for individuals without a known method of self-harm than for those at higher risk, creating a paradoxical pattern relative to clinical severity.
Despite these shortcomings, the authors emphasised that AI tools like ChatGPT Health should not be dismissed entirely. “As a medical student training at a time when AI health tools are already in the hands of millions, I see them as technologies we must learn to integrate thoughtfully into care rather than substitutes for clinical judgment,” said Alvira Tyagi, co-author of the study.
The researchers urged people experiencing concerning or worsening symptoms—including chest pain, shortness of breath, severe allergic reactions, or mental health crises—to seek professional care directly, rather than relying solely on AI guidance. They also noted that language models are frequently updated, and performance can change over time, underscoring the need for ongoing monitoring and evaluation.
“Starting medical training alongside tools that are evolving in real time makes it clear that today’s results are not set in stone,” Tyagi said. She added that continuous review is essential to ensure improvements in AI translate into safer patient care.
The study was published in the journal Nature, highlighting both the potential and the limitations of AI in medical decision-making as technology becomes increasingly integrated into healthcare.