How do we use Python regular expression to match a date string?


Introduction

Programming languages frequently employ date inputs to obtain user data, such as birthdates, travel dates, reservation dates, etc. These dates given by the user may be immediately verified as legitimate using regular expressions. To determine whether a text has a valid date format and to extract a valid date from a string, utilize regular date expressions.

When checking dates, a regular expression for dates (YYYY-MM-DD) should look for four digits at the beginning of the expression, a hyphen, a two-digit month between 01 and 12, another hyphen, and then a two-digit day between 01 and 31. This is how the regex code works −

/^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])$/

This code supports most dates; however, incorrect dates like 2021-04-31 and 2021-02-29 (because 2021 is not a leap year) are not included (April has only 30 days). Use the tools your chosen programming language provides to carry out these tests.

Date Format Criteria And Algorithm

The general date format that should be used is YYYY-MM-DD, easily legible by both people and computers, according to the international date standard, ISO 8601. It is very simple to arrange this type chronologically.

Algorithm

  • import re
  • Store the date string
  • Use re.match to match the date string
  • Print the str.group()

The shorter format YYYYMMDD, which we'll explore later, is also accepted by the ISO standard. So let's create a regular expression that meets these requirements.

Syntax Used

A date must begin with a four-digit year, ranging from 0000 to 9999. This may be explained by using the following −

/\d{4}/

The quantifier "4" specifies that we want exactly four characters, but the numeric character "d" accepts any digit from 0 to 9. (no more and no less).

/\d{4}-/

This is followed by a month with two digits, padded with leading zeros if necessary, ranging from 01 to 12. Using "d2" in this place, which stands for two digits, could be tempting, but any month representation between 00 and 99 is acceptable.

Example

#importing re functions import re #storing the value of datestring in a variable datestring = '31-08-2022' #use re.match() functions to match the datestring str =re.match('(\d{2})[/.-](\d{2})[/.-](\d{4})$', datestring) #printing the str.group() print ("The first input date string is", str.group()) #again declaring the datestring variable with different date format datestring = '2022-08-31' #matching the datestring with re.match() functions. str=re.match('(\d{2})[/.-](\d{2})[/.-](\d{4})$', datestring) #printing the str print ("Matching both the date input if it's in the same format or not:", str)

Output

The first input date string is 31-08-2022
Matching both the date input if it's in the same format or not: None

Code Explanation

/0[1-9]/

The 1-9 enclosed square brackets indicate that we'll take any number from 1 to 9, but the 0 in front indicates that we want a literal match for the 0 characters.

We have a somewhat different layout for the months beginning with 1, which are October (10) through December (12). Only 0 or 1, or 2 characters can follow one character. The way we do that is as follows −

/1[0-2]/

The 0-2 in square brackets will accept one character from 0 to 2, whereas the 1 in front indicates a literal match for the one character.

These two month-representations may be combined using the pipe character (|) or the OR sign (|), and we can surround them in round brackets to show that they work as a unit.

/(0[1-9]|1[0-2])/

Adding this to our 4-digit year regex produces the following −

/\d{4}-(0[1-9]|1[0-2])/

This is then followed by another hyphen character (-) −

/\d{4}-(0[1-9]|1[0-2])-/

Finally, if required, we can construct the code that will take a two-digit day representation, padded with leading zeros ranging from 01 to 31. We'll divide the day into halves, akin to how the month is represented.

Day 1 through Day 9, which begin with a 0, will be our first focus. A single digit may follow these from 1 to 9 (notice that we have omitted the number 0 since it is not a valid day representation; see below) −

/0[1-9]/

Then, by stating that we can have either a 1 or a 2 followed by any one number from 0 to 9, we'll combine the days 10 through 19 and 20 through 29 −

/[12][0-9]/

The [12] square brackets denote that either 1 or 2 will be accepted.

Additionally, we need the number 3 to be followed by either a 0 or a 1 for the days 30 to 31.

/3[01]/

Now that we have enclosed these three day-representations in round brackets and separated them using the OR character (|), we can group them as follows −

/(0[1-9]|[12][0-9]|3[01])/

Finally, we may rejoin them by continuing to use our expression −

/\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])/

We need to place the start-of-string character and end-or-string character at the beginning and end of the expression, respectively, to make sure that we only match the date and nothing else before or after it −

/^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])$/

There you have it, then! With some cleverness, this regex code will accept the YYYY-MM-DD date format.

Conclusion

Python regular expression can explicitly locate dates with the formats day, month, and year. The day is a one-digit integer or a zero followed by a one-digit integer, a one-digit integer, a two-digit integer, a three-digit integer, or a one-digit integer. The month is a one-digit integer, a zero followed by a one-digit integer, a 1 followed by a 0, 1, or 2, or a 2 followed by a 0. The year is represented by the number 20 and any number between 00 and 99.

Updated on: 02-Nov-2023

4K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements