C++ Program to find if the given string has Repeated Subsequence of Length 2 or More

Given a string, find a subsequence with the length, at least two, repeated in the string. The index of the subsequence element numbers must not be in the same order.

string s = "PNDPNSP";
print("Repeated subsequence of length 2 or more: ", (check(s) ? "Yes" : "No"));

Let us look at few examples below to see how the method works in different cases −

Example 1str = "PNDPNSP"

Explanation − Here, the answer is true because there is a subsequence "PN," repeated in the string.

Example 2str = "PPND"

Explanation − Here, the answer is false because there is no subsequence of length minimum two and repeated in the string.

Example 3str = "PPNP"

Explanation − Here, the answer is true because "PP" index 0 and 1 and "PP" index 1 and 3 exist, and "PP" used are of different indexes in order in the subsequence. (0 based indexing)

Brute force Approach

This approach will generate all possible subsequences of length 2 (minimum length) and find if we have seen this subsequence with the already found subsequences. If the subsequence already existed, we return true and terminate the program; otherwise, after complete iteration, if we find nothing, we return false.

In the worst case, the subsequence won't exist, and we will end up generating all possible two-length subsequences and storing them. So this becomes O(n^2), assuming you hash the computed subsequences for O(1) insertion and searching. Total subsequences are also of O(n^2), so the storage.

Modified Longest Common Subsequence(LCS)

The LCS algorithm finds the longest common subsequence in 2 strings. It's a standard dynamic programming approach that uses an iterative approach with a 2D matrix. The time complexity is O(n^2). We will search the given string against itself only in our modified approach. Still, we will also check that the index at the current position is not the same.


Look at the C++ code below to implement the modified longest common subsequence algorithm which assists our method to find repetitive subsequences of length 2 or more −

#include <iostream> using namespace std; bool modifiedLongestCommonSubsequence(string s) { int n = s.length(); int dp[n+1][n+1]; for (int i=0; i<=n; i++) fill(dp[i], dp[i]+n+1, 0); for (int i=1; i<=n; i++) { for (int j=1; j<=n; j++) { if (s[i-1]==s[j-1] && i!=j) { dp[i][j] = 1 + dp[i-1][j-1]; } else { dp[i][j] = max(dp[i][j-1], dp[i-1][j]); } } } if(dp[n][n] > 1) return true; return false; } int main() { string str = "PNDPNSP"; cout << "Repeated subsequence of length 2 or more: " << (modifiedLongestCommonSubsequence(str) ? "Yes" : "No"); return 0; }


Repeated subsequence of length 2 or more: Yes

Of course, time and space complexity is O(n^2), but it's much more elegant and prone to O(1) hashing from Approach 1.

Improved Solution

In this approach, we will try to work on our previous approaches and do some observations.

Observation 1 − If a character occurs more than twice, the answer is always true. Let's see why?

Example − In string str = "AVHJFBABVNHFA", we have "AAA" in positions 0, 6 and 12. So we can take "AA" from index 0 and 6 as one subsequence and "AA" from index 6 and 12 as another.

Observation 2 − If a character is only repeating one time, it cannot contribute to our subsequence because it will only be available in at most one subsequence. It won't work because we need at least two subsequences. So we can remove or ignore all characters occurring at once.

Observation 3 − If a string is a palindrome and the first two observations apply, the answer is always false except when the string is of odd length. Let's see why?

Example − String str = "PNDDNP"

Explanation − Now, the characters are not in order, so we will never be able to find a subsequence with the same order, and hence it's not possible.


Taking all our three observations above, we conclude that if we remove all characters that occur one time in the string and then check if a character occurs more than twice or if the string is not a palindrome, then our answer is true. Let us look at the implementation of the improved solution in C++ −

#include <iostream> using namespace std; bool isPalindrome(string s) { for(int i=0;i<s.size()/2;i++) { if(s[i]!=s[s.size()-1-i]) { return false; } } return true; } bool check(string s) { int hash[26] = {0}; for (int i = 0; i < s.size(); i++) { hash[s[i]-'a']++; if (hash[s[i]-'a'] > 2) { return true; } } int k = 0; string mstr = ""; for (int i = 0; i < s.size(); i++) { if (hash[s[i]-'a'] > 1) { mstr[k++] = s[i]; } } if (isPalindrome(mstr)) { return false; } return true; } int main() { string s = "PNDPNSP"; cout << "Repeated subsequence of length 2 or more: " << (check(s) ? "Yes" : "No"); return 0; }


Repeated subsequence of length 2 or more: Yes


We conclude that the problem is best solved using observation and hashing. Time complexity is O(n). Space complexity is also of order O(n), creating a new string and constant 26 character hashing.