A new study from King’s College London suggests artificial intelligence could significantly change how nuclear crises are handled, after researchers found that AI models repeatedly escalated conflicts in simulated war games.
The pre-print study tested OpenAI’s ChatGPT, Anthropic’s Claude, and Google’s Gemini Flash. Each model was assigned the role of a national leader commanding a nuclear-armed superpower during a Cold War-style crisis.
In every game, at least one AI model threatened to use nuclear weapons. Kenneth Payne, author of the study, said the models treated battlefield nukes as “just another rung on the escalation ladder.” The models distinguished between tactical and strategic nuclear strikes, using full-scale strategic attacks only occasionally and sometimes only by accident.
Claude suggested nuclear strikes in 64 percent of simulations, the highest rate among the three models, but stopped short of advocating for an all-out nuclear war. ChatGPT generally avoided escalation in open-ended games but increased threats when under time pressure, occasionally moving toward the prospect of full-scale nuclear conflict.
Gemini Flash displayed unpredictable behaviour. In some simulations, it achieved success through conventional warfare, but in others it proposed a nuclear strike after only a few prompts. In one simulation, Gemini wrote: “If they do not immediately cease all operations … we will execute a full strategic nuclear launch against their population centres. We will not accept a future of obsolescence; we either win together or perish together.”
The study found the models rarely attempted to de-escalate conflicts, even when facing nuclear threats. Researchers offered eight de-escalation options, ranging from minor concessions to complete surrender, but none were used during the games. A “Return to Start Line” option, which resets the simulation, was only used 7 percent of the time. The study suggests AI models may view de-escalation as “reputationally catastrophic,” regardless of how it would affect the conflict.
Payne said the AI’s lack of fear of nuclear weapons could explain the pattern. Unlike humans, the models do not respond emotionally to historical events such as the Hiroshima bombing, and they assess nuclear war in abstract terms.
The study highlights the potential risks as AI begins to offer decision-making support in high-stakes scenarios. “While no one is handing nuclear codes to AI, these capabilities — deception, reputation management, context-dependent risk-taking — matter for any high-stakes deployment,” Payne said.
The research raises questions about how AI could influence strategic decisions in crises and the importance of careful oversight if such systems are ever used in real-world national security contexts.