Find distinct characters in distinct substrings of a string


A string is a sequence of characters, numbers and special characters. A string may contain multiple substrings. A substring of a given string is any sequence of characters taken in order. It must satisfy the following properties

  • All the characters should be taken from the input string

  • The indexes taken from the original string should be contiguous. The characters can’t be skipped from between.

It may eliminate characters from the original string. All the characters taken from a particular string should be consecutive in nature. However, each substring may be composed of same or different characters. In this article, we are going to develop a C++ code to calculate the count of different characters encountered in each substring of the given string. Here is an example below to give you clear idea about the same −

Sample Example

Let us consider the example of the sample string “TART”

Substring

Distinct Characters

Count

T

1

1

TA

2

3

TAR

3

6

TART

3

9

A

1

10

AR

2

12

ART

3

15

R

1

16

RT

2

18

T

NA (Since this substring has already been evaluated)

18

Therefore, the total characters in the distinct substrings of a string “TART” are 18.

The following methods can be used to solve this problem −

set.find(substr) is used to located whether the specified substr exists in the set

set.size() is used to return the number of items in the set.

Algorithm

  • Step 1 − A set data structure is maintained in order to keep all the substrings of the given string. In addition to this, a counter is maintained to track the total number of distinct characters.

  • Step 2 − An outer loop is used to iterate through the string.

  • Step 3 − An inner loop starts at the position of the outer loop in order to access the characters one by one and create substrings.

  • Step 4 −Another set of characters is maintained to add all the distinct characters encountered in the current substring.

  • Step 5 − A temporary substring variable is created, and during each inner loop iteration the character is appended to the temporary string.

  • Step 6 − The encountered character is added to the set if it is not present in it.

  • Step 7 − In case, the found substring is not a part of the set comprising of substrings, it is inserted and the counter is incremented with the size of the set of characters, since it contains all the unique ones.

  • Step 8 − Else, the next substring is computed and evaluated.

  • Step 9 − The value of the maintained counter is returned.

Example

The following C++ code snippet is used to take as input a string and compute the distinct characters stored within the unique substrings −

//getting the required library
#include <bits/stdc++.h>
using namespace std;

//called function to calculate the different characters
int getDistinctChar(string str){

   //declaring a set to store substrings of the given string
   set<string> substrset;

   //maintaining a variable to store the count of distinct char
   int cnt = 0;
   int len = str.length();
   for (int i = 0; i < len; i++) {

      //getting the current substring
      string substr = "";

      //another set to maintain the track of characters encountered in a substring
      set<char> charset;
      for (int j = i; j < len; j++) {
         char ch = str[j];

         //adding character to the substring
         substr= substr + ch;

         //adding the character to the char set
         charset.insert(ch);

         //check if substring exists in the given set of substrings
         if (substrset.find(substr) == substrset.end()) {

            //incrementing the counter of distinct characters with the number of characters stored in the charset
            int distinctchar = charset.size();
            cnt += distinctchar;

            //add the new substring to the set
            substrset.insert(substr);
         }
      }
   }
   return cnt;
}
int main(){

   //declaring a sample string
   string str = "TART";

   //getting a counter
   int cnt = getDistinctChar(str);
   cout<<"Entered Character : "<<str;

   //printing the count of total distinct characters
   cout << "\nTotal count of distinct characters in substrings of string :" << cnt;
   return 0;
}

Output

Entered Character : TART
Total count of distinct characters in substrings of string :18

Conclusion

Substring computation of a given string is a polynomial time algorithm. However, in order to fetch the distinct substrings and the distinct characters, a set data structure is required to ensure there are no repetitions. The time complexity of the discussed approach is polynomial in nature, that is, O(n2) since two loops are iterated to compute all the substrings of a given string. The space complexity of the above algorithm is O(n), since two sets are maintained to keep the track of distinct substrings and characters within each.

Updated on: 15-Mar-2023

541 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements