Tutorialspoint
Problem
Solution
Submissions

MapReduce Framework for Distributed Computing

Certification: Advanced Level Accuracy: 50% Submissions: 2 Points: 20

Implement a simplified version of the MapReduce framework with Mapper and Reducer classes for parallel processing and aggregation.

Example 1
  • Input:
    data = ["hello world", "big data"]
    mapper_function = lambda x: ...
    reducer_function = lambda x, y: ...
  • Output:
    [("hello", 1), ("world", 1), ("big", 1), ("data", 1)]
  • Explanation:
    • Map words into key-value (word, 1).
    • Reduce by summing word counts.
    • Return aggregated word frequencies.
Example 2
  • Input:
    data = [10, 20, 30, 40, 50]
    mapper_function = lambda x: ...
    reducer_function = lambda x, y: ...
  • Output:
    [("even", 150)]
  • Explanation:
    • Tag numbers as even or odd.
    • Sum values for each tag.
    • Return totals for each category.
Constraints
  • 1 ≤ len(data) ≤ 10^6
  • Mapper returns list of key-value pairs
  • Reducer returns one pair per key
  • Time Complexity: O(n * m)
  • Space Complexity: O(n * k)
MapFunctions / MethodsMicrosoftGoldman Sachs
Editorial

Login to view the detailed solution and explanation for this problem.

My Submissions
All Solutions
Lang Status Date Code
You do not have any submissions for this problem.
User Lang Status Date Code
No submissions found.

Please Login to continue
Solve Problems

 
 
 
Output Window

Don't have an account? Register

Solution Hints

  • Use Python's multiprocessing module to parallelize the mapping phase
  • Implement a partitioning mechanism to distribute intermediate key-value pairs
  • Group values by keys after the mapping phase
  • Use multiple processes for both mapping and reducing phases
  • Implement proper exception handling and resource management

Steps to solve by this approach:

 Step 1: Initialize the MapReduce framework with a configurable number of worker processes

 Step 2: Implement the execute method that orchestrates the entire MapReduce workflow
 Step 3: Create the map phase function that applies the mapper function to each data chunk in parallel
 Step 4: Develop the shuffle and sort phase to group values by keys
 Step 5: Implement the reduce phase function that applies the reducer function to each key-value group

Submitted Code :