Java Program to Implement the String Search Algorithm for Short Text Sizes

Data Structure Java Programming

In this problem, we need to find the index of the pattern in the string. Implementing an efficient text search is very important to allow users to search large text databases easily. For example, you are writing a blog in Microsoft Word or code in VSCode, containing 1 lakh+ word. If the search algorithm is inefficient, it can take time to show you search results when searching for any word or sentence.

We will learn two different approaches to implementing the string search algorithm. One is the naïve approach, and another is the KMP algorithm.

Problem statement - We have given a string str and search value of different lengths. We need to find the index of the search value in the given string.

Sample examples

Input

str = "welcome to Tutorialspoint for well organized tutorials and libraries!", 
searchValue = "wel";

Output

0, 30

Explanation - It prints the starting position of the search value in the str.

Input

str = "Apple is good! Apple is Awesome! Apples are amazing!", searchValue = 
"Apple is"

Output

0, 15

Explanation - The 'Apple is' appears two times in the given string.

Input

str = 'Hello! Are you fine?”, searchValue = “How”

Output

-1

Explanation - As the search value is not found in the string, it prints -1.

Approach 1

This is the naïve approach in which we will check each substring of length equal to the search value's length to find a matching.

Algorithm

Step 1 - Initialize the length variables and 'matches' to store the total number of matches.

Step 2 - Traverse the string from the 0th index to (len_str - len_search)th index.

Step 3 - use another nested loop to traverse the search pattern.

Step 4 - If the character at (p + q)th index in the string and the character at the qth index in the search value don't match, break the loop.

Step 5 - if q equals the len_search, increase the match and print the p-value as we found the pattern.

Step 6 - At last, if the matcares is equal to 0, print -1 as we havefound any search value occurrencealue in the string.

Example

import java.io.*;
public class Main {
   // Function to find string matches
   public static void FindMatch(String str, String searchValue) {
      int len_str = str.length();
      int len_search = searchValue.length();
      int matches = 0;
      // Traverse the string
      for (int p = 0; p <= (len_str - len_search); p++) {
         int q;
         for (q = 0; q < len_search; q++) {
            if (str.charAt(p + q) != searchValue.charAt(q))
               break;
         }
         if (q == len_search) {
            matches++;
            System.out.println("Pattern position is : " + p);
         }
      }
      if (matches == 0)
         System.out.println("No Pattern Found in the given string!");
      else
         System.out.println("Total search patterns found in the string are = " + matches);
   }
   public static void main(String[] args) {
      String str = "welcome to Tutorialspoint for well organized tutorials and libraries!";
      String searchValue = "wel";
      FindMatch(str, searchValue);
   }
}

Output

Pattern position is : 0
Pattern position is : 30
Total search patterns found in the string are = 2

Time complexity - O(N*M) where N is string length and M is search value length.

Space complexity - O(1) as we don't use any extra space.

Approach 2

The KMP algorithm is invented by Knuth-Morris-Pratt, which is a very efficient approach for string searching. The KMP algorithm helps us to avoid unnecessary backtracking while searching for the pattern. In the naïve approach, we search in each substring of length M, but here we don't need to backtrack in the given string.

We will prepare an array of the longest proper prefix for the search value and take help of that to make the search efficient.

Algorithm

Step 1 - In the findMatch() function, define the required variables and longest_pref_suff[] array to store the longest proper prefix.

Step 2 - Execute the processSuffPref() function to fill the array of lps.

Step 2.1 - In the processSuffPref() function, initialize the first element with 0. Also, define the prev_len and initialize with 0 and p with 1.

Step 2.2 - Make iteration until p is less than search_len. If the character at the pth index in the search pattern is the same as the character at the prev_len index, increase the value of prev_len and p. Also, insert the prev_len value at the pth index in the array.

Step 2.3 - If characters don't match, and prev_len is not equal to zero, update its value by longest_pref_suf[prev_len - 1]. Otherwise, update the value at the pth index in the array and increase the 'p' value.

Step 3 - In the next step, initialize the p and q with 0. Furthermore, start making iterations to the string and pattern.

Step 4 - If search.charAt(q) == str.charAt(p) is true, increment p and q by 1 to move ahead.

Step 5 - If q == search_len is true, print the p - q, which is the starting index of search value. Also, update the q value with longest_pref_suff[q - 1].

Step 6 - If q is not equal to search_len, and characters at index p in str and index q in the search string are not the same, follow the steps below.

Step 7 - If q is not zero, update q value; else, increase p by 1.

Example

public class Main {
   public static void processSuffPref(String search, 
   int search_len, int longest_pref_suf[]) {
      // variable to store the length of the previous prefix
      int prev_len = 0;
      int p = 1;
      longest_pref_suf[0] = 0; // This is always 0
      while (p < search_len) {
         if (search.charAt(p) == search.charAt(prev_len)) {
            prev_len++;
            longest_pref_suf[p] = prev_len;
            p++;
         } else // If it doesn't match
         {
            if (prev_len != 0) {
               prev_len = longest_pref_suf[prev_len - 1];
            } else {
               longest_pref_suf[p] = prev_len;
               p++;
            }
         }
      }
   }

   public static void FindMatch(String str, String search) {
      // Initialize lengths
      int str_len = str.length();
      int search_len = search.length();
      // array to store longest prefix and suffix values
      int longest_pref_suff[] = new int[search_len];
      // calculate the longest prefix and suffix values
      processSuffPref(search, search_len, longest_pref_suff);
      int p = 0; // string index
      int q = 0; // search index
      while (p < str_len) {
// If characters at q index in str and p index in p match, increment both pointers
         if (search.charAt(q) == str.charAt(p)) {
            q++; p++;
         }
         if (q == search_len) {
            System.out.println("Index of search value is - " + (p - q));
            q = longest_pref_suff[q - 1];
         }
         // If a pattern is not found after q matches
         else if (p < str_len && search.charAt(q) != str.charAt(p)) {
            if (q != 0)
               q = longest_pref_suff[q - 1];
            else
               p = p + 1;
         }
      }
   }
   public static void main(String args[]) {
      String str = "welcome to Tutorialspoint for well organized tutorials and libraries!";
      String searchValue = "wel";
      FindMatch(str, searchValue);
   }
}

Output

Index of search value is - 0
Index of search value is - 30

Time complexity - O(N + M) as we don't do backtracking while traversing the string and pattern.

Space complexity - O(M) as we store the longest proper prefix for the search pattern.

Programmers can observe the difference in the time complexity of the first and second approaches. The first approach takes M times greater time than the second approach. The KMP algorithm can be used to search patterns in large text containing millions of words.

Shubham Vora

Updated on: 24-Aug-2023

37 Views

Kickstart Your Career

Get certified by completing the course

Get Started