Java Program to Extract a Single Quote Enclosed String from a Larger String using Regex


Regex or Regular Expression is language used for pattern-matching and string manipulation. It consists of a sequence of characters which define a search pattern and can be used for performing actions like search, replace and even validate on text input. A regular expression consists of series of characters and symbols that amount to form a search pattern.

In this article, we are going to see how to write a java program to extract a single quote enclosed string from a larger string using Regex.

Java provides support for regex from the java.util.regex package. The pattern class represents a compiled regular expression and the matcher class can be used for matching a pattern against a given input string.

Single Substring Enclosed in Single Quotes

In the example below, we will first be defining the input string as well as the regex pattern we want to match. The pattern ‘(_+?)’ matches any sequence of characters enclosed within single quotes and the part _*? Matches any character 0 or more times but as few times as possible in order to allow the rest of the pattern to match.

We then create a Matcher object from the pattern to apply to the input string with the help of the find method. In the event that the pattern matches, we extract the matched string using the group() method with a parameter of 1 which is representative to the 1st capture group in the pattern. This is the drawback of this method that it does not capture all groups of single quotes enclosed substrings.

Example

import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class StringExtractor {
   public static void main(String[] args) {
      String input = "This is a 'single quote' enclosed string";
      Pattern pattern = Pattern.compile("'(.*?)'");
      Matcher matcher = pattern.matcher(input);
        
      if (matcher.find()) {
         String extractedString = matcher.group(1);
         System.out.println(extractedString);
      }
   }
}

Output

single quote

Multiple Single Quote Enclosed Substring

The above method had 1 major drawback that it was too simple and could not extract multiple single quote enclosed substrings from the input string and extracted only the 1st occurrence. This is an updated and advanced version of the previous method as it is capable of extracting multiple occurrences. We make use of a while loop to iterate and keep searching for matches till none are left in the input string. The matches list is used to store all the extracted strings and is returned by the method. The main method demonstrates how to make use of the updated extractStringsWithRegex() method for extracting all single quote enclosed strings.

Example

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.ArrayList;
import java.util.List;
public class StringExtractor {    
   public static List<String> extractStringsWithRegex(String input) {
      // This function takes string as input, iterates over to search for regex matches
      // and stores them in a List named matches which is finally returned in the end
      Pattern pattern = Pattern.compile("'(.*?)'");
      Matcher matcher = pattern.matcher(input);
      List<String> matches = new ArrayList<>();
      while (matcher.find()) {
         matches.add(matcher.group(1));
      }
      return matches;
   }   
   public static void main(String[] args) {
      String input = "This is a 'test' string with 'multiple' 'single quote' enclosed 'words'";
      List<String> matches = extractStringsWithRegex(input);
      for (String match : matches) {
         System.out.println(match);
      }
   }
}

Output

test
multiple
single quote
words

The java program to extract a single quote enclosed string from a larger string using regex has some advantages and disadvantages which are as follows.

Advantages

  • Regex is highly powerful and allows matching of single quote enclosed strings and even for more complicated patterns to be matched.

  • The Matcher class provides us with additional methods for working with the matched string like finding the start and end indices of the match.

Disadvantages

  • Writing and understanding regex can be more difficult to understand as compared to other methods.

  • Regex may be slower as compared to other methods, especially for large input strings or complex patterns.

Conclusion

There are numerous ways that can be used to extract single quote enclosed strings however the most common methods are using regex, split() and substring() methods. Regex are powerful and flexible options as they can handle complex patterns but are time consuming in very large strings. While making use of regex, the Pattern class is used to represent the pattern and the Matcher class for applying the pattern to input strings and then extracting the matching text. Regex has multiple use cases ranging from validating user input data to manipulating text. Whenever dealing with regex, it is important to carefully design and test the pattern to make sure that it matches the desired text and handles all the possible edge cases well.

Updated on: 06-Apr-2023

705 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements