Monte Carlo Tree Search - Problem

Implement a Monte Carlo Tree Search (MCTS) algorithm for a simple Tic-Tac-Toe game. MCTS is a heuristic search algorithm that combines the precision of tree search with the generality of random sampling.

Your implementation should include all four phases:

Selection: Start from root, select child nodes using UCB1 formula until reaching a leaf
Expansion: Add one or more child nodes to expand the tree
Simulation: Run a random playout from the new node to get a game result
Backpropagation: Update all nodes on the path with the simulation result

Given a board state and the number of simulations to run, return the best move coordinates [row, col] for the current player.

Board representation: 0 = empty, 1 = player 1 (X), 2 = player 2 (O)

Input & Output

Example 1 — Early Game Position

$ Input: board = [[1,0,2],[0,0,0],[0,0,0]], simulations = 1000, current_player = 1

› Output: [1,1]

💡 Note: Player X has corner, O took edge. MCTS simulates many random games and finds center position [1,1] wins most often, giving strong strategic control.

Example 2 — Blocking Move Required

$ Input: board = [[1,1,0],[2,0,0],[0,0,0]], simulations = 500, current_player = 2

› Output: [0,2]

💡 Note: Player O must block X's winning threat. MCTS quickly identifies that [0,2] prevents immediate loss, as failing to block results in X winning on next move.

Example 3 — Winning Move Available

$ Input: board = [[1,2,1],[0,2,0],[0,0,0]], simulations = 100, current_player = 2

› Output: [2,1]

💡 Note: O can win immediately by completing the middle column. Even with few simulations, MCTS finds [2,1] has 100% win rate in all random playouts.

Constraints

board is 3×3 matrix with values 0 (empty), 1 (player 1), 2 (player 2)
1 ≤ simulations ≤ 10000
current_player is 1 or 2
Board contains at least one empty position
Game is not already finished

Visualization

Tap to expand

Asked in

G Google 15 D DeepMind 25 f Facebook 8 M Microsoft 12

The key insight is using UCB1 formula to balance exploration of new moves with exploitation of promising paths. MCTS builds a search tree adaptively, focusing computational resources on the most valuable game positions. Best approach is Monte Carlo Tree Search with four phases. Time: O(s × d), Space: O(n)

Common Approaches

✓ Brute Force Minimax

⏱️ Time: O(b^d) Space: O(d)

Explores the complete game tree using minimax with alpha-beta pruning. For each possible move, recursively evaluates all future positions until reaching terminal states.

Monte Carlo Tree Search

⏱️ Time: O(s × d) Space: O(n)

Uses four phases: selection with UCB1, expansion of promising nodes, random simulation to terminal states, and backpropagation of results. Adaptively focuses search on most promising moves.

Brute Force Minimax — Algorithm Steps

Generate all possible moves from current position
For each move, recursively apply minimax to all resulting positions
Backtrack scores using min/max depending on player turn
Return move with highest minimax value

Visualization

Tap to expand

Step-by-Step Walkthrough

Generate Moves

Find all empty positions on the board

Recursive Search

For each move, explore all possible future games

Backtrack Scores

Propagate terminal game results up the tree

Select Best

Choose move with highest minimax value

Code -

solution.c — C

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <string.h>

int isWinner(int board[3][3], int player) {
    for (int i = 0; i < 3; i++) {
        if (board[i][0] == player && board[i][1] == player && board[i][2] == player) return 1;
        if (board[0][i] == player && board[1][i] == player && board[2][i] == player) return 1;
    }
    if (board[0][0] == player && board[1][1] == player && board[2][2] == player) return 1;
    if (board[0][2] == player && board[1][1] == player && board[2][0] == player) return 1;
    return 0;
}

int isFull(int board[3][3]) {
    for (int i = 0; i < 3; i++) {
        for (int j = 0; j < 3; j++) {
            if (board[i][j] == 0) return 0;
        }
    }
    return 1;
}

int minimax(int board[3][3], int depth, int maximizing, int player, int originalPlayer) {
    if (isWinner(board, 1)) return originalPlayer == 1 ? 1 : -1;
    if (isWinner(board, 2)) return originalPlayer == 2 ? 1 : -1;
    if (isFull(board)) return 0;
    
    if (maximizing) {
        int maxEval = INT_MIN;
        for (int i = 0; i < 3; i++) {
            for (int j = 0; j < 3; j++) {
                if (board[i][j] == 0) {
                    board[i][j] = player;
                    int evalScore = minimax(board, depth + 1, 0, 3 - player, originalPlayer);
                    board[i][j] = 0;
                    if (evalScore > maxEval) maxEval = evalScore;
                }
            }
        }
        return maxEval;
    } else {
        int minEval = INT_MAX;
        for (int i = 0; i < 3; i++) {
            for (int j = 0; j < 3; j++) {
                if (board[i][j] == 0) {
                    board[i][j] = player;
                    int evalScore = minimax(board, depth + 1, 1, 3 - player, originalPlayer);
                    board[i][j] = 0;
                    if (evalScore < minEval) minEval = evalScore;
                }
            }
        }
        return minEval;
    }
}

int* solution(int board[3][3], int simulations, int current_player) {
    static int bestMove[2];
    int bestScore = INT_MIN;
    
    for (int i = 0; i < 3; i++) {
        for (int j = 0; j < 3; j++) {
            if (board[i][j] == 0) {
                board[i][j] = current_player;
                int score = minimax(board, 0, 0, 3 - current_player, current_player);
                board[i][j] = 0;
                if (score > bestScore) {
                    bestScore = score;
                    bestMove[0] = i;
                    bestMove[1] = j;
                }
            }
        }
    }
    
    return bestMove;
}

int main() {
    int board[3][3];
    int simulations, current_player;
    
    // Read board
    char line[100];
    fgets(line, sizeof(line), stdin);
    char *ptr = line + 2; // skip "[["
    for (int i = 0; i < 3; i++) {
        for (int j = 0; j < 3; j++) {
            board[i][j] = *ptr - '0';
            ptr += 2; // skip number and comma/bracket
        }
        ptr += 2; // skip "],["
    }
    
    scanf("%d", &simulations);
    scanf("%d", &current_player);
    
    int* result = solution(board, simulations, current_player);
    printf("[%d,%d]\n", result[0], result[1]);
    return 0;
}

Time & Space Complexity

Time Complexity

⏱️

O(b^d)

Where b is branching factor (~9 for empty board) and d is game depth, exponentially exploring all possibilities

✓ Linear Growth

Space Complexity

O(d)

Recursion stack depth equals maximum game length

✓ Linear Space

23.5K Views

Medium Frequency

~45 min Avg. Time

890 Likes

Ln 1, Col 1

Smart Actions

💡 Explanation

AI Ready

💡 Suggestion Tab to accept Esc to dismiss

// Output will appear here after running code

Code Editor Closed

Click the red button to reopen

Monte Carlo Tree Search - Problem

Input & Output

Constraints

Visualization

Related Problems

Common Approaches

Brute Force Minimax — Algorithm Steps

Visualization

Code -

Time & Space Complexity

Select Compiler