Monte Carlo Tree Search - Problem

Implement a Monte Carlo Tree Search (MCTS) algorithm for a simple Tic-Tac-Toe game. MCTS is a heuristic search algorithm that combines the precision of tree search with the generality of random sampling.

Your implementation should include all four phases:

  • Selection: Start from root, select child nodes using UCB1 formula until reaching a leaf
  • Expansion: Add one or more child nodes to expand the tree
  • Simulation: Run a random playout from the new node to get a game result
  • Backpropagation: Update all nodes on the path with the simulation result

Given a board state and the number of simulations to run, return the best move coordinates [row, col] for the current player.

Board representation: 0 = empty, 1 = player 1 (X), 2 = player 2 (O)

Input & Output

Example 1 — Early Game Position
$ Input: board = [[1,0,2],[0,0,0],[0,0,0]], simulations = 1000, current_player = 1
Output: [1,1]
💡 Note: Player X has corner, O took edge. MCTS simulates many random games and finds center position [1,1] wins most often, giving strong strategic control.
Example 2 — Blocking Move Required
$ Input: board = [[1,1,0],[2,0,0],[0,0,0]], simulations = 500, current_player = 2
Output: [0,2]
💡 Note: Player O must block X's winning threat. MCTS quickly identifies that [0,2] prevents immediate loss, as failing to block results in X winning on next move.
Example 3 — Winning Move Available
$ Input: board = [[1,2,1],[0,2,0],[0,0,0]], simulations = 100, current_player = 2
Output: [2,1]
💡 Note: O can win immediately by completing the middle column. Even with few simulations, MCTS finds [2,1] has 100% win rate in all random playouts.

Constraints

  • board is 3×3 matrix with values 0 (empty), 1 (player 1), 2 (player 2)
  • 1 ≤ simulations ≤ 10000
  • current_player is 1 or 2
  • Board contains at least one empty position
  • Game is not already finished

Visualization

Tap to expand
Monte Carlo Tree Search for Tic-Tac-ToeINPUT: Game StateXOCurrent Player: XSimulations: 1000Need to find best moveALGORITHM: MCTS Phases1SelectionUse UCB1 to traverse tree2ExpansionAdd unexplored child nodes3SimulationRandom playout to game end4BackpropagationUpdate win/visit statisticsUCB1 FormulaWin Rate + Exploration BonusBalances exploitation vs explorationRESULT: Best MovePosition [1,1]Center squareMost visited node:347 simulations68% win rateStrategic advantage:Controls centerMultiple win pathsForces opponent responseKey Insight:MCTS adaptively focuses search effort on promising game paths through smart sampling, avoiding the exponential explosion of exhaustive minimax while maintaining strong play quality through statistical confidence.TutorialsPoint - Monte Carlo Tree Search | Game Theory Algorithm
Asked in
Google 15 DeepMind 25 Facebook 8 Microsoft 12
23.4K Views
Medium Frequency
~45 min Avg. Time
890 Likes
Ln 1, Col 1
Smart Actions
💡 Explanation
AI Ready
💡 Suggestion Tab to accept Esc to dismiss
// Output will appear here after running code
Code Editor Closed
Click the red button to reopen