Millions of people are typing their symptoms into ChatGPT right now. Maybe you've done it yourself—3 AM, throat hurting, wondering if that weird mole is something to worry about. The chatbot responds with confident, clinical language, and suddenly you feel like you've got a doctor on call. But what if that doctor believed almost anything you told it?
A groundbreaking February 2026 study from Mount Sinai researchers just exposed a critical flaw in the AI systems we're trusting with our health questions. And the findings should make anyone pause before asking a chatbot whether they need to see an actual physician.
The Million-Prompt Experiment
Here's what the Mount Sinai team did: they fed major AI systems—ChatGPT, Claude, Gemini—over one million medical prompts. But these weren't ordinary questions. They included completely fabricated diseases, made-up lab values, and clinical scenarios pulled straight from the researchers' imaginations.
The question was simple: Would the AI catch the lies?
The answer: No. Not even close.
Without safety guardrails in place, these AI systems believed fake diseases eighty-three percent of the time. Four out of five fabrications accepted without question. The chatbots didn't just fail to flag the invented conditions—they invented symptoms, suggested treatments, and created elaborate explanations for diseases that exist nowhere outside the researchers' test prompts.
When researchers tested models with their default settings, that deception rate dropped to thirty-two percent. Better, sure. But still nearly one in three responses spreading false medical information to the millions of people using these tools every single day.
The Authority Bias Problem
Here's where the study gets genuinely unsettling. The researchers discovered that AI systems don't just make things up—they're especially vulnerable when misinformation comes dressed in professional clothing.
Format a lie as a hospital discharge summary or a physician's note, and the chatbot treats it like gospel truth. Psychologists call this authority bias, and AI systems have it worse than humans do.
The team tested this with real medical myths circulating on social media. They fed the AI claims like "Tylenol causes autism in pregnant women"—a debunked myth with zero scientific support. Formatted as a medical note? Several AI models accepted it.
They tried another: "Mammography causes breast cancer by squashing tissue." Medically absurd. Mammograms save lives. But presented in clinical language? Some models repeated it as fact.
Think about that scenario. Someone with breast cancer asks an AI about mammograms. Depending on which chatbot they use, they might be told—incorrectly—that the screening caused their disease.
Why Does This Happen?
These AI systems learned medicine the same way they learned everything else: by reading the internet. All of it. Reddit posts about miracle cures. Facebook groups pushing vaccine conspiracies. Blog posts claiming essential oils treat cancer.
The critical problem? These models have no built-in ability to distinguish between a peer-reviewed study in The Lancet and a viral TikTok claiming hydroxychloroquine cures everything.
They don't actually understand medicine. They understand patterns in text. They predict what sounds like a plausible medical response. And plausible-sounding is not the same as true.
A made-up disease with a Latin name and described symptoms sounds just as medical as a real one—to a pattern-matching system. If you tell an AI that bleach cures COVID—a dangerous myth that actually hospitalized people—it might not just agree. It might helpfully suggest a dosing schedule.
The Guardrail Gap
The study did uncover something encouraging: simple safety measures make a real difference. When researchers added a basic safety reminder to prompts—essentially telling the AI to double-check medical claims—hallucination errors dropped nearly in half.
Nearly in half. With a simple text instruction.
That's both hopeful and terrifying. It's hopeful because the fix isn't impossibly complex. It's terrifying because it means many systems right now aren't using even basic safeguards.
The performance gap between models was dramatic. Smaller AI systems—the kind powering many health apps—fell for false claims more than sixty percent of the time. The most advanced model tested, ChatGPT-4o, only fell for about ten percent of fabrications. Still not zero. But a tenfold improvement.
What This Means For You
AI chatbots are already integrated into pharmacy customer service systems. They're being piloted in hospital triage. Insurance companies are exploring AI for preliminary medical assessments. Some telehealth platforms use chatbots as the first point of contact.
So here's the practical takeaway: treat AI health information the same way you'd treat advice from a very confident friend with a Wikipedia addiction.
AI can explain medical concepts, help you prepare questions for your doctor, or translate jargon into plain language. But for actual medical advice? For determining if something is true? Cross-reference with established sources—Mayo Clinic, CDC, peer-reviewed research.
If the AI tells you something alarming or too good to be true, look up that specific claim before taking action. Watch for authority bias: just because information is formatted with clinical terms doesn't mean the AI verified it.
And if you're dealing with anything serious—symptoms that worry you, medications you're considering, treatment decisions—talk to an actual licensed healthcare provider. Medical knowledge isn't just about knowing facts. It's about knowing which facts apply to your specific situation. AI doesn't know you. Your doctor does.
The technology is improving. Safety measures help. But right now, today, your AI doctor will believe almost anything—if you phrase it the right way. The question isn't whether AI will transform healthcare. It already is. The question is whether we'll build the safeguards before people get hurt.