OpenAI has claimed victory over Elon Musk’s xAI in a high-profile tournament to determine the top-performing chess-playing software among general-purpose language models. The competition, which concluded on Thursday, saw OpenAI’s “o3” model defeat xAI’s “Grok 4” in a decisive final, intensifying the ongoing rivalry between the two tech companies.
While chess has long been used to measure the progress of computing power—famously producing machines that can beat even the best human players—this event took a different approach. Rather than specialized chess engines, the tournament pitted multi-task language models, typically designed for everyday problem-solving, against each other in a battle of strategy and precision.
OpenAI’s o3 emerged from the competition undefeated, securing the title with a string of dominant performances in the final. In contrast, Grok 4, which had been the favourite throughout much of the event, faltered badly in its last games, losing its queen multiple times and making what experts described as “uncharacteristic blunders.”
Pedro Pinhata, a writer for Chess.com, noted, “Up until the semi-finals, it seemed like nothing would be able to stop Grok 4 on its way to winning the event. But the illusion fell through on the last day of the tournament.” Chess grandmaster Hikaru Nakamura, who provided live commentary, added, “Grok made so many mistakes in these games, but OpenAI did not.”
Google’s “Gemini” model claimed third place after defeating another OpenAI entrant in the consolation round. The competition also included systems from Anthropic, China’s DeepSeek, and Moonshot AI, showcasing the global scope of the contest.
Hosted on Google-owned Kaggle, the three-day tournament brought together eight large language models, each tested in head-to-head matches. Although Musk downplayed the result ahead of the final—saying Grok’s earlier success was a “side effect” and that xAI had “spent almost no effort on chess”—the event underscored the growing use of games as benchmarks for assessing advanced computational reasoning.
Historically, games like chess and Go have been used to test a program’s ability to learn and adapt to complex rule-based challenges. Google’s DeepMind made headlines in the late 2010s when its AlphaGo system defeated several world champions in Go, prompting South Korean legend Lee Se-dol to retire in 2019, saying, “There is an entity that cannot be defeated.”
The latest tournament marks a new chapter in that tradition, shifting the focus from human-versus-machine contests to AI-versus-AI battles. As these systems continue to evolve, competitions like this may offer both a measure of progress and a glimpse into the strategic capabilities of tomorrow’s most advanced digital minds.