Count M-length substrings occurring exactly K times in a string


In this article, we will be delving into a unique and fascinating problem from the realm of computer science - "Counting M-Length Substrings Occurring Exactly K Times in a String". This type of problem is often encountered during programming competitions and interviews. Before we get started, let's define what we're dealing with −

  • Substrin  A continuous sequence that is found within another string.

  • M-Length  The length of the substring that we're interested in.

  • K Times  The exact number of times the substring should appear in the original string.

Algorithm Explanation

To solve this problem, we will leverage the power of hash maps (also known as unordered maps in C++). Hash maps allow us to store data in key-value pairs and offers constant time complexity for search and insert operations, making them a great tool for problems like these.

The algorithm for counting M-length substrings occurring exactly K times in a string is as follows −

  • Initialize an empty hash map.

  • Iterate over the string, creating all possible M-length substrings.

  • For each substring, add it to the hash map. If it already exists, increment its count.

  • After all substrings have been counted, iterate over the hash map to find all substrings that occurred exactly K times.

C++ Implementation

Here is a C++ implementation of the aforementioned algorithm −

Example

Here are the programs that implements the aforementioned algorithm −

#include <stdio.h>
#include <string.h>

int countSubstrings(char *s, int M, int K) {
   char substr[M + 1];  // Array to hold the current substring
   int n = strlen(s);   // Length of the input string
   int count = 0;       // Variable to store the count of valid substrings

   // Loop through the string to find substrings
   for (int i = 0; i <= n - M; i++) {
      strncpy(substr, s + i, M); // Copy M characters from the current position
      substr[M] = '\0'; // Null-terminate the substring
      int freq = 1; // Initialize frequency of the current substring

      // Count the frequency of the current substring in the rest of the string
      for (int j = i + 1; j <= n - M; j++) {
         if (strncmp(substr, s + j, M) == 0) {
            freq++;
         }
      }

      if (freq == K) {
         count++; // Increment count if frequency matches K
      }
   }

   return count;
}
int main() {
   char s[] = "abcabcabc";
   int M = 3;
   int K = 3;

   int result = countSubstrings(s, M, K);
   printf("The number of M length substring occurring exactly K times is: %d\n", result);
   return 0;
}

Output

The number of M-length substrings occurring exactly K times is: 1
#include<bits/stdc++.h>
using namespace std;

int countSubstrings(string s, int M, int K) {
   unordered_map<string, int> count_map;
   int n = s.length();
   
   for (int i = 0; i <= n - M; i++) {
      string substring = s.substr(i, M);
      count_map[substring]++;
   }
   
   int count = 0;
   for (auto it : count_map) {
      if (it.second == K)
         count++;
   }

   return count;
}

int main() {
   string s = "abcabcabc";
   int M = 3;
   int K = 3;
   
   int result = countSubstrings(s, M, K);
   cout << "The number of M-length substrings occurring exactly K times is: " << result << endl;
   
   return 0;
}

Output

The number of M-length substrings occurring exactly K times is: 1
import java.util.HashMap;
import java.util.Map;

public class Main {
   public static int countSubstrings(String s, int M, int K) {
      Map<String, Integer> countMap = new HashMap<>(); // Map to store substring frequencies
      int n = s.length(); // Length of the input string

      // Loop through the string to find substrings
      for (int i = 0; i <= n - M; i++) {
         String substring = s.substring(i, i + M); // Extract the current substring
         countMap.put(substring, countMap.getOrDefault(substring, 0) + 1); // Update substring frequency
      }

      int count = 0;
      // Count the substrings with frequency equal to K
      for (int freq : countMap.values()) {
         if (freq == K) {
            count++;
         }
      }

      return count;
   }

   public static void main(String[] args) {
      String s = "abcabcabc";
      int M = 3;
      int K = 3;

      int result = countSubstrings(s, M, K);
      System.out.println("The number of M length substring occurring exactly K times is: " + result);
   }
}

Output

The number of M-length substrings occurring exactly K times is: 1
def count_substrings(s, M, K):
   count_map = {}  # Dictionary to store substring frequencies
   n = len(s)  # Length of the input string
   
   # Loop through the string to find substrings
   for i in range(n - M + 1):
      substring = s[i:i+M]  # Extract the current substring
      if substring in count_map:
         count_map[substring] += 1
      else:
         count_map[substring] = 1
   
   count = 0
   # Count the substrings with frequency equal to K
   for freq in count_map.values():
      if freq == K:
         count += 1
   
   return count

def main():
   s = "abcabcabc"
   M = 3
   K = 3
   
   result = count_substrings(s, M, K)
   print("The number of M length substring occurring exactly K times is:", result)

if __name__ == "__main__":
   main()

Output

The number of M-length substrings occurring exactly K times is: 1

In the above code, the countSubstrings function takes the input string s, the length of the substring M, and the number of occurrences K as arguments. It initializes an unordered map count_map to keep track of all substrings and their occurrences. Then it iterates over the string to create all possible substrings of length M, and for each substring, it increments the count in the map. Once all substrings are counted, it iterates over the map to count all substrings that occurred exactly K times.

The main function is where the code execution starts. It initializes a string s, and values for M and K. It then calls the countSubstrings function and prints the result.

Testcase Example

Let's consider the string "abcabcabc", with M=3 and K=3.

Here, the M-length substrings are "abc", "bca", "cab", "abc", "bca", "cab", "abc". It's clear that the substring "abc" appears exactly 3 times in the string, so the output of the program will be 1.

This problem-solving approach, where we use a hash map to count substrings, is an excellent example of the time-space tradeoff in computer science. While we're using extra space to store the substrings and their counts, we significantly reduce the time complexity of the problem by making it possible to count the occurrences in constant time.

Time and Space Complexity

The time complexity of this algorithm is O(n), where n is the length of the string. This is because we're iterating over the string only once to create all possible M-length substrings.

The space complexity is O(n) as well, due to the storage requirements of the hash map, where in the worst-case scenario, each substring is unique, leading to n different entries in the map.

Conclusion

In this article, we examined a common problem in computer science - counting the number of M-length substrings that occur exactly K times in a string. We implemented an efficient solution in C++ using hash maps, which provides us with constant-time search and insert operations. This problem is a perfect example of how data structures and algorithms can be used together to solve complex problems efficiently.

Updated on: 16-Oct-2023

211 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements