What is Regular Expressions?

Compiler DesignProgramming LanguagesComputer Programming

Regular expressions are an important notation for defining patterns. Each pattern connects a set of strings. Therefore regular expressions will give as names for sets of strings.

It supports an appropriate and useful notation for describing tokens. Regular Expressions define the language accepted by finite Automata (Transition Diagram).

Regular Expressions are defined over an alphabet $\sum$.

If R is a Regular Expression, therefore L(R) represents language denoted by the regular expression.

Language − It is a collection of strings over some fixed alphabet. The empty string can be indicated by ε.

Example − If L (Language) = set of strings of 0’s & 1’s of length two

then L = {00, 01, 10, 11}

Example − If L = {1}

then L*=L0∪L1∪L2∪….. Here * can be 0, 1, 2, 3………..
∴ L*={ε}∪{1}∪{11}∪…..
∴ L*={ε,1,11,111,…..}

Operation on Regular Languages

The various operations on the regular language are as follows −

If L1={00,10} & L2={01,11}

OperationDescriptionExample
UnionUnion of two languages L1 and L2 produce the set of strings which may be either in language L1 or in language L2or in both. L1∪L2= {set of string in L1 and string in L2}L1∪L2={00, 10, 01, 11}
ConcatenationConcatenation of two languages L1 and L2 create a set of strings which are formed by combining the strings in L1 with strings in L2 (strings in L1 should be followed by strings in L2). L1L2={Set of string in L1 followed by strings in L2}.L1L2={0001, 0011, 1001,1011}
Kleen closure of L1${L^{*}_{1}}$Kleene closure defines zero or more appearance of input symbols in a string. It consists of an empty string Ɛ (a set of strings with 0 or more occurrences of input symbols).
${L^{*}_{1}}$=${L^{0}_{1}}$∪${L^{1}_{1}}$∪${L^{2}_{1}}$∪…..
${L^{*}_{1}}=\displaystyle\bigcup\limits_{i=0}^{∞} {L^{i}_{1}}$
${L^{*}_{1}}$={ε,00,10,1010, 0010,1000,0000,000000, 001000,….}
Positive Closure ${L^{+}_{1}}$Positive closure indicates one or more occurrences of input symbols in a string. It eliminates empty string Ɛ(set of strings with 1or more appearance of input symbols).
${L^{+}_{1}}$=${L^{1}_{1}}$∪${L^{2}_{1}}$∪…..
${L^{+}_{1}}=\displaystyle\bigcup\limits_{i=0}^{∞} {L^{i}_{1}}$
${L^{+}_{1}}$={00,10,1010, 0010,1000,0000,000000, 001000,….}

Extensions of Regular Expressions

Kleene suggests regular expression in the 1950s with the primary operation for a union, concatenation, and Kleene closure.

There is some notational extension specified that are directly in use −

• One or more instance − Unary postfix operator + displays positive closure of a regular expression and its language. It defined that if a is the regular expression, then (a) + indicates the language (L(a) +. There are two algebraic laws $r^{*}$ = r+|e and r+ =r$r^{*}$ = $r^{*}$r relate the positive closure and Kleene closure.

• Zero or one instance − Unary postfix operator? define zero or one appearance. It define that r? is similar to r|e or L(r?) = L(r) U {e}. This operator has the equal precedence and associativity as * and +./

Published on 23-Oct-2021 12:03:20