Remove Duplicates from a String in O(1) Extra Space


In this problem, our task is to remove all the duplicate characters present in the string other than the first occurrence of every character. Also, it's required to solve the problem without using any extra space, and the Space complexity must be O(1). Various approaches are used in this article. The Boolean array is defined in one approach to determine the repetition of the characters where the index of Boolean array is mapped to each alphabet. In the second approach, the concept of the Bit manipulation is used to remove the repeated characters from the resultant string.

Let's explore the article, to know how it can be solved by using Java programming language.

To show you some instances

Instance-1

If String = “tutorialspoint”

After applying the algorithm -

Resultant String = “tuorialspn”

Instance-2

If String = “learningmaterial”

After applying the algorithm -

Resultant String = “learnigmt”

Multiple Approaches

We have provided the solution in different approaches.

  • By using a Boolean array of size 26

  • By using single integer bits.

Approach-1: By Using a Boolean array of size 26

To solve this problem, we will use a Boolean array of 26 elements. Each alphabet is mapped with the index of a Boolean array to tell the repetition of characters.

Mapping of the English Alphabet to a Boolean array

'a' ⇒ found[0]
'b' ⇒ found[1]
'c' ⇒ found[2]
'd' ⇒ found[3]
.
.
.
'z' ⇒ found[25

Here, if found[i] = True, it represents that character mapped to index - i is already present on the left side of the current character of the string. So, we do not need to use that character anymore.

If found[i] = False that means the character mapped to index-i has not come yet, so it is required to consider it and insert it into the resultant string.

Algorithm: removeDuplicatesFromStr(str)

Step-1: Create an array named found.

Step-2: Declare a write pointer equal to -1.

Step-3: Initiate the for loop to traverse the string “str”.

Step-4: Set the integer d = ASCII(str[read]) – 97.

Step-5: IF found[d] = False, then SET found[d] = True.

Step-6: Increase the value of the write pointer by 1.

Step-7: Update the input string and then also assign the last character of the computed string to NULL.

Step-8: Finally, print the result after eliminating the redundant characters.

Example

#include <iostream>
using namespace std;
void removeDuplicatesFromStr(char inputString[]) {
   // found is a Boolean array whose index mapped to the alphabets
   bool present[26] = {false};
   // write pointer, tells the place of writing
   int write = -1;
   //use for loop 
   for (int read = 0; inputString[read] != '\0'; ++read) {
       // search bit to check character's repetition
       int d = (int)inputString[read] - 97;
       // if the character did not come yet
       if (present[d] == false) {
           present[d] = true;
           write += 1;
           inputString[write] = inputString[read];
       }
   }
   // set last character of resultant string to NULL
   inputString[write+1] = '\0';
}

int main() {
   char inputString[10000];
   int N;
   cout << "Input String for removing duplicates: ";
   cin >> inputString;
   removeDuplicatesFromStr(inputString);
   cout << "Output String after removing duplicates: " << inputString << endl;

   return 0;
}

Output

Input String for removing duplicates: learningmaterial
Output String after removing duplicates: learnigmt

Time Complexity of Program = O(N)

Space Complexity of Program = O(26) = O(1)

Approach-2: By using single integer bits

To solve this problem, we create an integer whose bits tell us whether the character is repeating or not.

Size of Integer in C++ = 4 Bytes = 4*8 bits = 32 bits.

No. of characters in the English alphabet = 26

No. of English Alphabet < No. of bits in Integer,

Therefore, we can use separate bits of Integer to represent each character of the English alphabet.

Mapping of English Alphabet to Integer

'a' ⇒ bit 0 ⇒ (int)'a' - 97
'b' ⇒ bit 1 ⇒ (int)'b' - 97
'c' ⇒ bit 2 ⇒ (int)'c' - 97
'd' ⇒ bit 3 ⇒ (int)'d' - 97
.
.
.
'z' ⇒ bit 25 ⇒ (int)'z' - 97

If 'a' came into a string, then bit-0 became SET.

A similar operation is performed for each character.

Here, If bit-d of Integer = 1, it represents that character mapped to bit-d is already present on the left side of the current character of the string. So, we don't need to use that character anymore.

If bit-d of Integer = 0, it represents that character mapped to bit-d is not came yet, So it's required to consider it and insert it into the resultant string.

Algorithm: removeDuplicates(str)

Step-1: Create an Integer variable named found.

Step-2: Declare a write pointer equal to -1.

Step-3: Initiate the for loop to find a bit for checking repetition.

Step-4: Set the integer d = ASCII(str[read]) – 97.

Step-5: IF 'd' is not present in 'found', then insert 'd' in found.

Step-6: Increase the value of the write pointer by 1.

Step-7: Update the input string and then also assign the last character of the computed string to NULL.

Step-8: Finally, print the result after eliminating the redundant characters.

Example

#include <iostream>
using namespace std;
void removeDuplicates(char inputStr[]) {
   // found is an integer whose bits mapped to the alphabets
   int found = 0;
   // write pointer, tells the place of writing
   int write = -1;

   for (int read = 0; inputStr[read] != '\0'; ++read) {
       // find bit for checking repetition
       int d = (int)inputStr[read] - 97;

       // if character not came yet
       if ((found & (1<<d)) == 0) {
           found = found | (1<<d);
           write += 1;
           inputStr[write] = inputStr[read];
       }
   }

   // set last character of resultant string to NULL
   inputStr[write+1] = '\0';
}

int main() {
   char inputStr[10000];
   int n;

   cout << "Input String for removing duplicates: ";
   cin >> inputStr;

   removeDuplicates(inputStr);

   cout << "Output String after removing duplicates: " << inputStr << endl;

   return 0;
}

Output

Input String for removing duplicates: tutorialspoint
Output String after removing duplicates: tuorialspn

Time Complexity of Program = O(N)

Space Complexity of Program = O(1)

In this article, we gave two methods to solve the problem using the Time complexity of O(N) and the Space Complexity of O(1).

Updated on: 23-Aug-2023

140 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements