Extract substrings between any pair of delimiters


Delimiters are the characters that separate a string from other characters, for example in a sentence in our normal day to day reading activity, we find out the different words because it is separated by spaces. We have () parenthesis as main delimiters in mathematical and regular expressions.

The concept of substrings and their manipulation is very important in programming, especially in c which is a language used to write compilers and assemblers. The delimiters are identified in the string and the characters after the starting delimiter are copied into another variable, until the ending delimiter.

The == and != operators can be used to compare the character in the string and the delimiter character specified by the user.

A string is accepted from the user with the scanf() function, so spaces are not allowed to be a part of the string. If the puts() or other functions or methods are used, it is possible to get an advanced version.

This program is using the basic concepts of array and string handling without using any string functions available in the header files. The string compare , string copy functions could be used, but as an exercise in simple logic this program has been done with the very basic logic.

Methods Used

Method 1: Using substring()

Method 2: Using functions

Both approaches have their benefits. Approach 1 is a straightforward solution that helps users understand the process of string manipulation, while Approach 2 promotes better software design principles and maintainability through the use of functions.

Syntax

The extraction of substrings between any pair of delimiters in the C programming language is a frequent programming task. The method for extracting substrings can vary based on the specific problem requirements and constraints. Nonetheless, a widely used technique is to utilize the strtok() function from the C Standard Library. This function is used to break down a string into a series of tokens based on a specified delimiter. The function takes the original string and the delimiter as inputs and returns a pointer to the first token found in the string. To extract all substrings, the function can be repeatedly invoked with a null pointer as the first argument to obtain subsequent tokens. The end of the string is indicated by a null pointer returned by the strtok() function.

char *strtok(char *str, const char *delim);

Algorithm

Step 1 − Declare str1,str2,delim1,delim2 initialize to null.

Step 2 − Declare integer variables len, n, I, subs

Step 3 − Accept str1, delim1 and delim2 from console

Step 4 − Check and store the length in len

Step 5 − while n<length of input string len, check if str1[n] ==delim1

Step 6 − If yes, subs=n, break the loop

Step 7 − make n=0 while str1[subs] != delim2

Step 8 − copy str1 after the delim1 to str2, str2[n] = str1[subs], increment n and subs

Step 9 − Print str2 which has the input string minus ().

Method 1:Using substring()

The simple step-by-step implementation of array manipulations for a string has several benefits. It is straightforward and easy to understand, which can be beneficial for beginners or those learning programming. This approach allows users to see the exact process that the program follows to manipulate the string. However, as mentioned, this approach has some limitations, such as not accepting strings with spaces and restricting the length to 20 characters. Using the gets method, you can overcome the limitation of string size, but it is worth noting that the gets method has been deprecated due to potential buffer overflows and security risks.

Example

This code constitutes a software that extracts a portion of a string based on two delimiters. The first delimiter designates the start of the substring, and the second delimiter defines its end. The input string is stored in the str1 variable, and the two delimiters are defined as the delim1 and delim2 variables. The extracted substring is saved in the str2 variable. The program first identifies the starting position of the substring using the first delimiter, then calculates its length by counting the number of characters from the starting position to the end position defined by the second delimiter. The Substring function is then invoked to extract the substring from the original string and store it in the str2 variable. The extracted substring is then displayed on the screen.

#include <stdio.h>
#include <string.h>

void Substring(char *str2, const char *str1, int start, int n) {
   strncpy(str2, str1 + start, n);
}
int main() {
   // Predefined input values
   char str1[] = "Hello[world]!";
   char delim1 = '[';
   char delim2 = ']';

   char str2[100];
   int len1 = strlen(str1);
   int start, subs, n = 0;

   // Getting the position of substring based on delimiter
   while (n < len1) {
      if (str1[n] == delim1) {
         subs = n;
         break;
      }
      n++;
   }
   start = n;

   // Getting the length of substring
   if (str1[subs] == delim1) {
      n = 0;
      subs++;
      while (str1[subs] != delim2) {
         subs++;
         n++;
      }
      Substring(str2, str1, start + 1, n);
   }
   // Adding null character at the end
   str2[n] = '\0';
   printf("\nSub string is %s", str2);

   return 0;
}

Output

Sub string is world 

Method 2: Functions

Implementing the program using functions can provide a more modular and organized solution. It breaks the code into smaller, reusable pieces that can be tested and debugged independently. This approach promotes better software design principles and code readability. By creating functions, you can also easily extend the program's functionality and improve its maintainability.

Example

This code constitutes a C software that extracts a portion of a defined string. The string is declared as a character array and the delimiters are pre-specified in the main function. The Getpos function is utilized to determine the position of the first delimiter (delim1) in the string. The Copystr function is employed to copy the characters between the two delimiters (delim1 and delim2) into a new string. The length of the original string is calculated using the strlen function from the string.h library. The substring is then displayed on the screen using the printf function.

#include <stdio.h>
#include <string.h>

void Getpos(char *str1, int len1, char delim1, int *subs) {
   int n = 0;
   while (n < len1) {
      if (str1[n] == delim1) {
         *subs = n;
         break;
      }
      n++;
   }
}

void Copystr(char *str1, char *str2, char delim1, char delim2, int subs) {
   if (str1[subs] == delim1) {
      int n = 0;
      subs++;
      while (str1[subs] != delim2) {
         str2[n] = str1[subs];
         subs++;
         n++;
      }
   }
}

int main() {
   // Predefined input values
   char str1[] = "Hello[world]!";
   char delim1 = '[';
   char delim2 = ']';

   char str2[100];
   int len1, subs;

   len1 = strlen(str1);

   Getpos(str1, len1, delim1, &subs);
   Copystr(str1, str2, delim1, delim2, subs);

   str2[strlen(str2)] = '\0';

   printf("\nSub string is %s", str2);

   return 0;
} 

Output

Sub string is world

Conclusion

The strings in C are stored in memory as characters, where each character or alphabet of the string can be accessed and processed separately.The array manipulation of the strings makes it easy to do various actions on the strings, like concatenate, reverse, find a palindrome and so on.This flexibility renders it useful in file manipulation and minimal memory usage.

Updated on: 20-Jul-2023

406 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements