Mastering Board Games by External and Internal Planning with Language Models

  Paper   PDF   Demo   Video

Presented at NeurIPS 

Paper Review by Jory Schossau — January 15, 2025

   

Researchers create a new hybrid large language model capable of intuitive planning, and explicit single-move prediction, creating a powerful combination.

RPS
Rock, Paper, Scissors: Kids playing the classic completition game of Rock, Paper, Scissors. attribution

Remember playing Rock, Paper, Scissors? That momentary decision, trying to outthink your opponent, wondering if they're going to stick with their favorite move or try to surprise you. It's a fairly simplistic game, but it captures this something that is fascinating about how we make decisions and plan ahead. Whether you're a casual player or a R-P-S magician, you're engaging in a form of planning — considering possibilities, predicting outcomes, and making choices based on what you think might happen.

From those playground games to the complex world of chess, computers have become increasingly sophisticated at making game-related decisions. But here's something that might surprise you: while artificial intelligence can now beat world champions at chess, the way it thinks about moves is fundamentally different from how current AI language models — the ones that can write essays and answer questions — approach decision-making. This gap between game-playing AI and language AI is what makes recent breakthroughs in combining these technologies so exciting.

LLM Chess Enthusiasm
Enthusiasm for LLMs playing Chess around the same time this paper was published. Attribution

Today's language models are fairly skilled at tasks like writing, answering questions, and programming computers. They can engage in witty conversation and explain complicated topics. However, when it comes to step-by-step planning and logical reasoning - like the kind needed to play a good game of chess - they often fall short. "Hallucinations in LLMs are often the result of the model’s attempt to fill in gaps in knowledge or context, with assumptions that are based on the patterns it has learned during training. This can lead to incorrect or misleading outputs, which can be particularly problematic in sensitive applications." 1 It's a bit like having a brilliant conversationalist who struggles to plan their next week's schedule. This limitation has been a significant challenge in AI development, particularly when it comes to games and strategic decision-making.

Historically, automating chess-playing was very difficult. Some people even made fame and fortune by showcasing chess-playing mechanical intelligence called the Mechanical Turk — although it was really just a hidden human.2 Fast forward to today, and chess engines are both completely automated and virtually unbeatable. What's fascinating is that these specialized chess programs are completely different from the AI language models we interact with daily. It's like having two different types of intelligence: one that can discuss chess theory eloquently but can't play well, and another that can play brilliantly but can't explain its moves in human terms.

Mechanical Turk
An imagined cross-section of the Turk from Racknitz. By Joseph Racknitz - Humboldt University Library, Public Domain, Wikimedia

The challenge researchers face seems simple at the outset: how can we teach a language model to think ahead? While these models can process vast quantities of text and generate very human-like responses, they struggle with the kind of methodical, multi-step planning that games require. It's similar to the difference between someone who can describe how to ride a bike versus the task of actually balancing on two wheels — knowing and doing are two different things.

When it comes to making a language model plan ahead, researchers have developed two distinct approaches: external search and internal search. External search is similar to having a chess coach who helps you analyze different moves one by one, while internal search is more like teaching someone to think through multiple moves in their head. In this paper by Schultz and colleagues, they combined both approaches to create a more powerful system. External search involves the model evaluating different game states systematically, while internal search teaches the model to simulate this evaluation process within its own 'mind.'

External search is similar to planning a road trip using a map and GPS. Just as you might explore different routes, checking traffic conditions and distances at each intersection, the AI model explores different possible moves by evaluating each game state. It's a methodical process where the model acts as both the map (providing possible moves) and the navigator (evaluating which moves are promising). This uses a technique called Monte Carlo Tree Search (MCTS), and it allows the model to systematically explore different possibilities before we ask it to make a decision.

MCTS builds a tree of likely future game states, with each branch representing a different possible move. The model evaluates these states and assigns them values based on how likely they are to lead to a win condition. What makes this implementation special is that it doesn't require a traditional chess engine to determine legal moves or evaluate positions — the language model handles all of it internally because the authors trained it to do so.

Monte Carlo Tree Search
Monte Carlo Tree Search. Figure 3 from the paper.

This model is what the authors call a Multi-Action-Value (MAV) model, which serves three crucial functions simultaneously: it understands the rules and current state of the game (world model), evaluates how good different moves are (value function), and suggests which moves to consider (policy function). Think of it as a single expert who can track the game, judge moves, and make suggestions all at once, rather than needing separate specialists for each task. This is a significant improvement over previous approaches that required multiple systems working together.

One of the most useful aspects of the MAV model is how it keeps track of the game state. The model represents the board as text in a way that preserves the spatial relationships between positions on the board, making it easier for the language model to understand the game's layout. When a move is made, the model can accurately predict the resulting position without needing an external chess engine. This is like having a player who can perfectly visualize the board and mentally keep track of pieces and their possible and legal moves — a skill that even many human players find difficult.

As indicated earlier, the MAV model doesn't just identify possible moves — it assigns each one a probability of winning. Think of it like a weather forecast for chess moves, but instead of probability of rain, it's predicting chances to win. The model discretizes 0% to 100% in 64 different 'buckets' to classify these probabilities, making it easier to distinguish between slightly better and significantly better moves. As an aside, there are 64 bins ($2^6)$ because powers of 2 work really well for current computers when trying to eke out speed. This is similar to how human players might categorize moves as 'terrible,' 'okay,' 'good,' or 'brilliant,' but with more granularity.

Chess board with 2 possible moves
Paper fig 2a. 2 possible moves being analyzed: Blue Bh7, Red Re1.
Histograph of win probabilities of the 2 moves
Paper fig 2b. Histograph of win probabilities: Blue Bh7, Red Re1.

Speaking of speed, one of the most technical contributions of this research is how efficiently the model works. Unlike previous approaches that needed to evaluate each possible move separately — imagine having to ask about each move one at a time — this model can evaluate all possible moves from a single query. It's like the difference between asking a chess master about moves one-by-one versus having them look at the board once and tell you all the promising options in a ranked list. This makes the approach faster and thus more practical to use.

The way this model works parallels how humans often make complex decisions. That is: When we're faced with a difficult choice, we might first get a quick intuitive sense of our options (the model's initial evaluation), then carefully think through the consequences of each choice (external search), while also drawing on our past experience with similar situations and where those choices may lead (internal search). It's fascinating how these artificial systems are beginning to mirror human cognitive processes, even if they arrive at their decisions quite differently.

Internal search is where things get really interesting. Instead of relying on an external system to explore different possibilities, the researchers taught the model to simulate this exploration process within itself. It's like teaching someone not just the rules of chess, but also how to think several moves ahead systematically. The model learns to generate and evaluate possible future positions entirely within its own 'mind,' similar to how a human player might visualize potential sequences of moves.

Think about how you might teach a beginner chess player to think ahead. You'd start by showing them how to consider one move at a time, then gradually build up to thinking about sequences of moves and their consequences. That's essentially what the researchers did with this AI model. They trained it by showing it examples of good decision-making processes, including how to evaluate positions, consider multiple possibilities, and choose the best path forward.

The internal search capability is implemented through a clever training approach. The researchers taught the model by showing it examples of search trees - essentially maps of possible moves and their outcomes - represented as text. Think of it like teaching someone chess strategy by showing them annotated game trees, but in this case, the model learns to generate these trees itself. This allows it to perform a kind of 'mental simulation' of different move sequences and their potential outcomes.

Internal Search Diagram
Paper fig 6b. Diagram of Internal Search.
Internal Search Prompt Response
Paper fig 6a. Prompt response. External Search is mostly the black part near the top. Internal Search is the red, green, and blue colored text.

While the researchers effectively combined two promising prior ideas from the literature, what makes this research especially interesting is how the two search methods complement each other. External search provides systematic exploration of possibilities, while internal search allows for quick, intuitive evaluation of positions into the future. It's similar to how a human chess master might combine careful analysis of specific variations with intuitive pattern recognition. The combination proved more powerful than either method alone, demonstrating how different approaches to planning can work synergistically.

The authors show results to be quite good — Grandmaster-level performance in chess. Existing AI chess engines, like the open source Stockfish3 already play this well, but the difference is in how the models got that good. Unlike previous chess engines that rely on brute-force calculation of millions of positions, this language model system achieved its inference-time performance while analyzing a similar number of positions as human grandmasters typically consider — around a hundred to a thousand positions per move. This efficiency suggests that the model has developed a more human-like ability to focus on the most relevant possibilities rather than exhaustively analyzing every option.

The researchers didn't stop at chess — they also tested their system on other games like Connect Four and Hex. This broader testing is important because it showed that the approach wasn't just a specialized chess-solver, but a general method for teaching language models how to plan and make decisions. Each game presented different challenges and strategic considerations, yet their model adapted well to each of them.

The ability to efficiently evaluate multiple possibilities while maintaining a broader strategic view, and training only from examples, could be revolutionary.

While this research focused on board games, there are broader implications. The combination of external and internal search methods could be valuable in any domain requiring complex decision-making and planning. For instance, consider business strategy, wherein multiple possible futures and their consequences must be considered, or urban planning, wherein different decisions can have long-lasting impacts. The ability to efficiently evaluate multiple possibilities while maintaining a broader strategic view, and training only from examples, could be revolutionary.

One of the most practical innovations in this research is the ability to control how much computation the model performs through parameters like 'breadth' and 'depth' of search. Notably, these parameters are simply text in the initial query, but the model has been trained to respond to these values. It's similar to adjusting the difficulty level in a game — you can dial up the thoroughness of analysis when you need more careful consideration, or dial it down when you need quick decisions. This flexibility makes the system much more practical for real-world applications where different situations might require different levels of analysis. From a crucial scientific standpoint, this parameterization allowed the authors to empirically investigate the effects of planning.

As highlighted in the quick review, one limitation of this research is its focus on just one particular approach to improving planning in language models. While the combination of external and internal search is powerful, there are many other potential approaches to enhance planning capabilities. For instance, symbolic algorithms or hybrid approaches combining language models with other technologies could offer different advantages.4

Looking ahead, there's exciting potential in combining this approach with other technologies. Rather than relying solely on language models, future systems might integrate symbolic reasoning, traditional planning algorithms, or specialized domain knowledge in novel ways. This could lead to more robust and versatile planning systems that can handle an even wider range of challenges while still relying only on training examples. For example, compiling the language model's world model into runnable code could make inference much faster.

While this research focused on perfect information games like chess, where all players can see the complete state of the game, real-world applications often involve incomplete information and uncertainty. Future research could explore how these planning approaches might adapt to situations where not all information is available, such as business strategy or diplomatic negotiations.

The research demonstrates that combining external and internal search methods creates a system that's more powerful than either approach alone. As we watch these developments unfold, it's fascinating to see how AI systems are beginning to mirror human cognitive processes more closely. Just as we combine careful analysis with intuitive understanding, these systems are being imbued with a balance of systematic exploration and learned patterns. This suggests we're moving closer to AI systems that can truly reason about complex situations in ways that complement human thinking.

Remember that game of Rock, Paper, Scissors we started with? While it might seem worlds apart from the sophisticated AI system described here, they share a fundamental challenge: how to make good decisions by thinking ahead. What's the next planning problem you think AI research should tackle?


References

1

Hadi, Muhammad Usman, et al. "A survey on large language models: Applications, challenges, limitations, and practical usage." Authorea Preprints (2023).

2

Stephens, Elizabeth. "The mechanical Turk: A short history of ‘artificial artificial intelligence’." Cultural Studies 37.1 (2023): 65-87.

3

The Stockfish developers. https://github.com/official-stockfish/Stockfish

4

Kelly, Stephen, et al. "Discovering adaptable symbolic algorithms from scratch." 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023.