Python Program to Extract String Till First Non-Alphanumeric Character


Python strings are sequence of characters that represent information or data. A normal string can contain various characters that are enclosed within single or double quotes but an Alphanumeric string only consist of digits and letters. Both alphanumeric and non-alphanumeric strings are used and applied in various scenarios including password protection, data processing and validation, formatting etc.

Specific patterns can be identified and extracted. We can also provide different combinations using these types of strings. We will perform an operation based on these strings. Our task is to extract a string till first non-Alphanumeric character is encountered.

Understanding the Problem

We have to extract a substring from an original string before we encounter a non-alphanumeric character. Let’s understand this with the help of an example.

Input Output Scenarios

Let us consider a dictionary with the following values −

Input: Inp_STR = "Sales18@22!Roam"

The given string consists of letters, digits and special character. We have to retrieve a substring as soon as we encounter a non-alphanumeric character.

Output: Sales18

As we can see a substring “Sales18” is returned from the original string because after this a non-alphanumeric character was encountered and i.e., “@”. Now that we have understood the problem statement, let’s discuss a few solutions.

Using Iterations

This is a basic and simpler approach of extracting strings based on the conditions provided. We will pass a string and create a new variable which will store all the alphanumeric characters i.e., letters (Upper and lower case) and digits. After this, we will traverse through the original string and iterate over each character.

We will establish a condition which will check whether the characters from the original string are alphanumeric or not. Once a non-alphanumeric character is encountered the loop will break and the substring will be returned.

Example

Following is an example to extract string till first non-alphanumeric character −

Inp_STR = "Sales18@22Roam"
print(f"The original string is: {Inp_STR}")

ExSTR = ""
alphaNum = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890"
for x in Inp_STR:
   if x not in alphaNum:
      break
   else:
      ExSTR += x
print(f"The extracted string till 1st Non-Alphanumeric character: {ExSTR}")

Output

The original string is: Sales18@22Roam
The extracted string till 1st Non-Alphanumeric character: Sales18

Using Regex module + Search()

The Regex module or “re” module is a powerful programming tool that is used to search and match patterns. These are patterns are passed in the form of unique expressions. Using this module, we will detect non-alphanumeric patterns in the original string and retrieve the sequence of 1st encounter. We use the “search()” function to search the string for a non-alphanumeric pattern which is represented by the expression “\W+”.

The “\W” represents a non-alphanumeric class and the “+” sets the logic for consecutive matching of non-alphanumeric characters. The “.start()” method returns the starting index of the matched substring and this index value will be used to retrieve the desired substring.

Example

Following is an example −

import re
Inp_STR = "Sales18@22Roam"
print(f"The original string is: {Inp_STR}")

ExSTR = re.search(r"\W+", Inp_STR).start()
print(f"The 1st non-alphanumeric character is encountered at: {ExSTR}")
ExSTR = Inp_STR[ : ExSTR]

print(f"The extracted string till 1st Non-Alphanumeric character: {ExSTR}")

Output

The original string is: Sales18@22Roam
The 1st non-alphanumeric character is encountered at: 7
The extracted string till 1st Non-Alphanumeric character: Sales18

Using Regex module + Findall()

This is an alternative approach of extracting string till first non-alphanumeric character is encountered. In this approach, we will use the “findall()” function from the re module to find all the occurrences of substrings consisting of alphanumeric characters.

A list of matching substrings will be obtained and we will retrieve the 1st substring using the “0” index value. We will use a regular expression: “[\dA-Za-z]*” which represents zero or more alphanumeric characters in a row.

The regex symbol “\d” matches any digit between 0 to 9, “A-Z” matches any uppercase letter between A to Z, “a-z” matches any lowercase letter between a to z.

Example

Following is an example −

import re
Inp_STR = "Sales18@22Roam"
print(f"The original string is: {Inp_STR}")

ExSTR = re.findall(r"[\dA-Za-z]*", Inp_STR)[0]
print(f"The extracted string till 1st Non-Alphanumeric character: {ExSTR}")

Output

The original string is: Sales18@22Roam
The extracted string till 1st Non-Alphanumeric character: Sales18

Using Isalnum() Method

In this approach, we will iterate over the indices of each character from the original string and establish a condition which will check if the character at index “x” is not alphanumeric. This is done with the help of “isalnum()” method which determines the alphanumeric nature of the string. After this, we will use list slicing to extract the string till 1st alphanumeric character.

Example

Following is an example −

Inp_STR = "Sales18@22Roam"
print(f"The original string is: {Inp_STR}")

for x in range(len(Inp_STR)):
   if not Inp_STR[x].isalnum():
      ExSTR = Inp_STR[:x]
      print(f"The 1st non-alphanumeric character is encountered at: {x}")
      break
   else:
      ExSTR = Inp_STR
print(f"The extracted string till 1st Non-Alphanumeric character: {ExSTR}")

Output

The original string is: Sales18@22Roam
The 1st non-alphanumeric character is encountered at: 7
The extracted string till 1st Non-Alphanumeric character: Sales18

Conclusion

During the course of this article, we discussed some efficient and optimized solutions for extracting a substring from a string when the 1st non-alphanumeric character is encountered. We understood the simple and brute solutions as well as the advanced and optimized solutions. We used regex module and used its “search()” and “findall()” functions to extract the relevant string. At last, we discussed another list slicing based solution that involved the use of “isalnum()” method.

Updated on: 12-Jul-2023

188 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements