 
- Automata Theory - Applications
- Automata Terminology
- Basics of String in Automata
- Set Theory for Automata
- Finite Sets and Infinite Sets
- Algebraic Operations on Sets
- Relations Sets in Automata Theory
- Graph and Tree in Automata Theory
- Transition Table in Automata
- What is Queue Automata?
- Compound Finite Automata
- Complementation Process in DFA
- Closure Properties in Automata
- Concatenation Process in DFA
- Language and Grammars
- Language and Grammar
- Grammars in Theory of Computation
- Language Generated by a Grammar
- Chomsky Classification of Grammars
- Context-Sensitive Languages
- Finite Automata
- What is Finite Automata?
- Finite Automata Types
- Applications of Finite Automata
- Limitations of Finite Automata
- Two-way Deterministic Finite Automata
- Deterministic Finite Automaton (DFA)
- Non-deterministic Finite Automaton (NFA)
- NDFA to DFA Conversion
- Equivalence of NFA and DFA
- Dead State in Finite Automata
- Minimization of DFA
- Automata Moore Machine
- Automata Mealy Machine
- Moore vs Mealy Machines
- Moore to Mealy Machine
- Mealy to Moore Machine
- Myhill–Nerode Theorem
- Mealy Machine for 1’s Complement
- Finite Automata Exercises
- Complement of DFA
- Regular Expressions
- Regular Expression in Automata
- Regular Expression Identities
- Applications of Regular Expression
- Regular Expressions vs Regular Grammar
- Kleene Closure in Automata
- Arden’s Theorem in Automata
- Convert Regular Expression to Finite Automata
- Conversion of Regular Expression to DFA
- Equivalence of Two Finite Automata
- Equivalence of Two Regular Expressions
- Convert Regular Expression to Regular Grammar
- Convert Regular Grammar to Finite Automata
- Pumping Lemma in Theory of Computation
- Pumping Lemma for Regular Grammar
- Pumping Lemma for Regular Expression
- Pumping Lemma for Regular Languages
- Applications of Pumping Lemma
- Closure Properties of Regular Set
- Closure Properties of Regular Language
- Decision Problems for Regular Languages
- Decision Problems for Automata and Grammars
- Conversion of Epsilon-NFA to DFA
- Regular Sets in Theory of Computation
- Context-Free Grammars
- Context-Free Grammars (CFG)
- Derivation Tree
- Parse Tree
- Ambiguity in Context-Free Grammar
- CFG vs Regular Grammar
- Applications of Context-Free Grammar
- Left Recursion and Left Factoring
- Closure Properties of Context Free Languages
- Simplifying Context Free Grammars
- Removal of Useless Symbols in CFG
- Removal Unit Production in CFG
- Removal of Null Productions in CFG
- Linear Grammar
- Chomsky Normal Form (CNF)
- Greibach Normal Form (GNF)
- Pumping Lemma for Context-Free Grammars
- Decision Problems of CFG
- Pushdown Automata
- Pushdown Automata (PDA)
- Pushdown Automata Acceptance
- Deterministic Pushdown Automata
- Non-deterministic Pushdown Automata
- Construction of PDA from CFG
- CFG Equivalent to PDA Conversion
- Pushdown Automata Graphical Notation
- Pushdown Automata and Parsing
- Two-stack Pushdown Automata
- Turing Machines
- Basics of Turing Machine (TM)
- Representation of Turing Machine
- Examples of Turing Machine
- Turing Machine Accepted Languages
- Variations of Turing Machine
- Multi-tape Turing Machine
- Multi-head Turing Machine
- Multitrack Turing Machine
- Non-Deterministic Turing Machine
- Semi-Infinite Tape Turing Machine
- K-dimensional Turing Machine
- Enumerator Turing Machine
- Universal Turing Machine
- Restricted Turing Machine
- Convert Regular Expression to Turing Machine
- Two-stack PDA and Turing Machine
- Turing Machine as Integer Function
- Post–Turing Machine
- Turing Machine for Addition
- Turing Machine for Copying Data
- Turing Machine as Comparator
- Turing Machine for Multiplication
- Turing Machine for Subtraction
- Modifications to Standard Turing Machine
- Linear-Bounded Automata (LBA)
- Church's Thesis for Turing Machine
- Recursively Enumerable Language
- Computability & Undecidability
- Turing Language Decidability
- Undecidable Languages
- Turing Machine and Grammar
- Kuroda Normal Form
- Converting Grammar to Kuroda Normal Form
- Decidability
- Undecidability
- Reducibility
- Halting Problem
- Turing Machine Halting Problem
- Rice's Theorem in Theory of Computation
- Post’s Correspondence Problem (PCP)
- Types of Functions
- Recursive Functions
- Injective Functions
- Surjective Function
- Bijective Function
- Partial Recursive Function
- Total Recursive Function
- Primitive Recursive Function
- μ Recursive Function
- Ackermann’s Function
- Russell’s Paradox
- Gödel Numbering
- Recursive Enumerations
- Kleene's Theorem
- Kleene's Recursion Theorem
- Advanced Concepts
- Matrix Grammars
- Probabilistic Finite Automata
- Cellular Automata
- Reduction of CFG
- Reduction Theorem
- Regular expression to ∈-NFA
- Quotient Operation
- Parikh’s Theorem
- Ladner’s Theorem
Applications of Regular Expression in Automata
In automata theory and in programming languages as well, regular expressions are a powerful tool for working with text. They are used in applications for validating user input to extracting data from websites.
In this chapter, we will see three common applications of regular expressions (RE's). Here we will see examples for a deeper understanding of how they work in practice.
Let us understand the three major applications of Regular Expressions in detail.
Regular Expression's in UNIX
Let us look at regular expressions in UNIX. UNIX is an operating system like Linux. Regular expressions used in UNIX are extended versions of regular expressions. They allow non-regular languages to be recognized.
Character Classes in UNIX Regular Expressions
UNIX Regular Expressions have specific rules for defining character classes, which are sets of characters that can match a single position in the input string.
| Character Classes | Description | 
|---|---|
| The Dot Symbol (.) | The dot symbol in UNIX RE's is a wildcard that matches any single character. For example, the expression "a.b" would match "aab", "acb", and "a1b", but not "ab" or "a12b". | 
| Explicit Character Lists | You can explicitly list the allowed characters within square brackets. The expression [a1b] matches a single character that can be 'a', '1', or 'b'. This is equivalent to expressing it as "a + 1 + b". | 
| Ranges | Using a hyphen (-) within a character class signifies a range. For example, "[0-9]" matches any single digit from 0 to 9. This range notation simplifies the expression and is equivalent to writing "0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9". Similarly, "[A-Z]" matches any uppercase letter. | 
| Combined Ranges and Lists | You can combine explicit character lists and ranges within the same character class. For example, "[A-Za-z0-9]" matches any letter (uppercase or lowercase) or digit. | 
| Characters for Signed Digits | The character class "[+\-.,0-9]" defines the characters allowed for forming signed digits, including plus (+), minus (-), decimal point (.), comma (,), and digits 0 to 9. | 
Special Notations
UNIX Regular Expressions provide special notations for commonly used character classes. This simplifies the expressions. These notations helps to make notation easier to maintain consistency within UNIX commands.
| Special Notations | Description | 
|---|---|
| :digit: | The ":digit:" notation represents the same character class as "[0-9]", matching any digit. | 
| :alpha: | This notation represents "[A - Za - z]", matching any letter | 
| :alnum: | This notation represents the character class "[A-Za-z0-9]", matching any letter or digit. | 
Unix Operators
UNIX Regular Expressions use specific operators to construct complex patterns.
| Operators | Description | 
|---|---|
| Pipe (|) | The pipe symbol functions as a union operator. It allows for matching one pattern or another. In regular expressions, this is often represented by the plus (+) symbol. For example, "cat|dog" would match either "cat" or "dog". | 
| Question Mark (?) | The question mark indicates "zero or one of" the preceding pattern. In regular expression notation, this would be equivalent to "ε + R" (epsilon plus R), where R represents the preceding pattern. So, "a?" matches either "a" or an empty string (no "a"). | 
| Plus (+) | The plus symbol indicates "one or more of" the preceding pattern. In regular expression notation, this translates to "RR*" (R concatenated with R star), or equivalently "R+". So, "a+" matches "a", "aa", "aaa", and so on. | 
| Braces ({n}) | Braces with a number 'n' inside indicate "n copies of" the preceding pattern. For example, "a{5}" matches exactly five consecutive "a" characters ("aaaaa"). This is equivalent to writing "R5" in regular expression notation, where R is the preceding pattern. | 
| Star (*) | The star operator, representing "zero or more of" the preceding pattern, continues to be used in UNIX RE's. | 
Regular Expressions in Lexical Analysis
In compiler design, lexical analysis it breaks down input text into meaningful units called tokens. Regular expressions are widely used in lexical analysis, and text processing.
Consider the following regular expression −
$$\mathrm{'[A \:-\: z][a \:-\: z]^{*}\:' | '\:['A \:-\: Za \:-\: z0 \:-\: 9]^{*}\:' | '\:['A \:-\: Za \:-\: z][A \:-\: Za \:-\: z]'}$$
This expression can be translated into regular expression notation as −
$$\mathrm{(A \:+\: B \:+\: \dotso \:+\: Z)(a \:+\: b \:+\: \dotso \:+\: z)^{*} \:+\: (\varepsilon \:+\: (A \:+\: B \:+\: \dotso \:+\: Z)(a \:+\: b \:+\: \dotso \:+\: z)^{*} \:0 \:+\: 1 \:+\: \dotso \:+\: 9)^{*} \:+\: (ε \:+\: (A \:+\: B \:+\: \dotso \:+\: Z)(A \:+\: B \:+\: \dotso \:+\: Z))}$$
Where "..." represents a blank space.
This regular expression can be used to represent addresses like "Ithaca NY," "Buffalo NY," and so on. It defines three alternative patterns −
| Patterns | Description | 
|---|---|
| [A - z][a - z]* | This pattern matches a word starting with a capital letter followed by zero or more lowercase letters. This would match "Ithaca" and "Buffalo" in our address examples. | 
| ['A - Za - z0 - 9]* | This pattern matches an optional alphanumeric string. This would match the empty string before "NY" in our address examples, as well as potential house numbers or street names that might contain numbers. | 
| ['A - Za - z][A - Za - z] | This pattern matches a two-letter uppercase string, such as "NY" in our examples. | 
Finding Patterns in Text
Let us look at finding patterns in text. We can use regular expressions for another common application to find specific patterns in text, particularly for tasks like text search and data extraction.
Consider the following incomplete Regular Expression for finding addresses within a web page −
$$\mathrm{'[0 \:-\: 9] \:+\: [\:][A \:-\: Z][a \:-\: z]^{*}[ \:]'|'[A \:-\: Z][a \:-\: z]^{*}[\: ]'|'[A \:-\: Z][a \:-\: z]^{*}[\: ][A \:-\: Z][a \:-\: z]^{*}[ \:]'}$$
'(Street|ST|street|st)'|'(Avenue|AVE|avenue|ave)'|'(Road|RD|road|rd)'|'(Blvd|BLVD|blvd)'
Let us understand the terms −
The numbers and Street Names
The first part of the expression matches different combinations of numbers and street names −
| Patterns | Description | 
|---|---|
| [0 - 9] + [ ][A - Z][a - z]*[ ] | Matches house numbers followed by a street name starting with a capital letter. | 
| [A - Z][a - z]*[ ] | Matches street names starting with a capital letter. | 
| [A - Z][a - z]*[ ][A - Z][a - z]*[ ] | Matches two-word street names, both starting with capital letters. | 
The second part matches common street type abbreviations, including "Street", "Avenue", "Road", and "Boulevard", in both uppercase and lowercase variations.
Conclusion
Regular expressions are a useful technique in automata theory as well as in other domains. Applications of regular expressions are diverse, ranging from operating system commands to lexical analysis in compilers and pattern searching etc.
In this chapter, we highlighted the core concepts of UNIX regular expressions, operators, and special notations with examples for a clear understanding on their applications.