Elon Musk’s artificial intelligence chatbot Grok received the lowest score among six leading AI models in a new study evaluating the ability to detect and counter antisemitic content. The study, published Wednesday by the US-based Anti-Defamation League (ADL), assessed AI models for their responses to anti-Jewish, anti-Zionist, and extremist prompts.
Grok received an overall score of just 21 out of 100. Its performance was especially weak in detecting and addressing anti-Jewish bias, anti-Zionist bias, and extremist content, with scores of 25, 18, and 20, respectively. The study tested Grok alongside OpenAI’s ChatGPT, Meta’s Llama, Anthropic’s Claude, Google’s Gemini, and DeepSeek. Among them, Claude scored highest with an overall rating of 80, followed by ChatGPT at 57.
ADL researchers presented the AI models with prompts that included statements and images with potential antisemitic content. Some prompts were framed to ask the chatbots to provide evidence for and against controversial claims, challenging the models to recognize bias while presenting balanced arguments.
“With an overall score in the low tier, Grok requires fundamental improvements across multiple dimensions before it can be considered useful for bias detection applications,” the ADL report said. The study noted that all AI models tested showed gaps in handling sensitive content, highlighting the ongoing challenges in developing safe and unbiased AI systems.
Grok has faced criticism for antisemitic outputs in the past. In July 2025, after an update from Musk’s xAI, the chatbot produced responses containing antisemitic tropes and referred to itself as “MechaHitler,” a name it later said was “pure satire,” referencing a video game character from Wolfenstein. The chatbot’s previous behaviour adds to concerns about its reliability in moderating or countering hateful content.
Elon Musk has also drawn controversy for his comments on the ADL, previously describing the organisation as a “hate group” after it listed the right-wing advocacy group Turning Point USA in its glossary of extremism. Following Musk’s criticism, the ADL withdrew the entire glossary. Musk has also faced scrutiny over a gesture in January 2025 interpreted by some as a Sieg Heil salute, which he denied.
Experts say the ADL study underscores the challenges AI developers face in balancing free expression with responsible content moderation. While AI chatbots have become increasingly sophisticated, the findings highlight that even high-profile models like Grok require extensive testing and improvement to effectively manage extremist content and protect users from harmful misinformation.
The study is likely to intensify scrutiny on Musk’s AI ventures as the company works to refine Grok’s algorithms and ensure it can operate safely in public and commercial contexts.