Basics of String in Automata



Automata theory uses several concepts of sets, language, grammar, regular expressions, etc. Finite automata and finite state machines can accept alphabets, strings, and substrings. The concept of regular expression covers prefixes and suffixes.

Basic terminologies related to mathematics are required to learn the fundamentals of formal language and automata theory. In this chapter, we will go through the concepts of strings and their use in automata theory.

Fundamental Concepts of Strings in Automata

Strings are the basic part of a language which consists of symbols and alphabets. In automata theory several automata are used to accept or generate strings based on inputs.

The fundamental concepts of strings can be classified into the following three parts −

  • Symbol − Symbols are basic building block of a string. We can compare this as letters in an alphabet.
  • Alphabet () − All possible symbols used in a language are termed as alphabet. Here in automata as well, we use alphabets which are nothing but a finite set of symbols used to create strings.
  • String (w) − Finally, the collection of letters or symbols which are present in the alphabet set is termed as strings. In automata theory, we use the notation w to represent strings. Strings would be finite in length.

String Properties in Automata Theory

In automata theory, it does not have complex string properties like manipulation or transformation that we use in our daily life or different programming tasks. However, it focuses on how these strings are built and recognized.

Let's consider the following string properties −

  • String Length − The number of letters or valid symbols (obviously they are present in alphabet) are present in a string.
  • Finiteness − In automata theory we use finite strings. In automata theory we do not use infinitely long strings.
  • Order of Strings − When we use multiple strings in automata, they must follow the order, for example if w is a string and wT is the reverse of it, to check palindrome which is starting with w, we must follow the order wwT.
  • Concatenation − We can combine multiple strings together through concatenation operations like "ab" is concatenated with "cd" will make "abcd".
  • Empty String () − The idea of empty string or is unique in automata theory which is like a placeholder which contains nothing, not a single symbol.

String Components

A string must have several components or we can break a string into several parts. These can be classified as follows −

1. Prefix

A prefix is nothing but the starting portion of a string. For example, if "banana" is a string, then "ba" could be one of its prefix. The whole string itself, i.e., "banana" could also be its prefix.

Note − If there are "n" characters in a string, then there could be 2n number of prefixes including the empty strings.

2. Proper Prefix

A proper prefix is similar to a prefix, but here we do not consider the string itself. If "banana" is a string, then "banana" itself will not be a proper prefix. So, there will be (2n-1) number of proper prefixes.

3. Suffix

Suffix is the ending portion of a string. For example, if "automata" is a string, then its suffix could be "ta", "ata", "mata". Even the whole word "automata" could be one of the suffix of itself.

Note − If there are "n" characters in a string, then there could be 2n number of suffixes including the empty strings.

4. Proper Suffix

A proper suffix is just like a suffix, but here we do not consider the string itself. If "automata" is a string, then "automata" itself will not be a proper suffix. So, there will be (2n-1) number of proper suffixes.

Applications of Strings and Components

We use automata where we determine if a given string belongs to a specific language or not. Prefixes help identify legal starting points for processing a string within an automaton, allowing it to efficiently navigate through the string and check if it follows a defined pattern.

Suffixes are less commonly used in automata but can be helpful in certain constructions such as suffix trees, which can efficiently search for patterns within a large set of strings.

Example of Basics String in Automata

A string over an alphabet is a finite sequence of letters from the alphabet.

  • toc, money, c, and adedwxq are strings over the alphabet ∑ = {a, b, c, . . . , z}.
  • 84029 is a string over the alphabet ∑ = {0, 1, 2, . . . , 9}.

Empty String

The empty string or null string, denoted by ∧, is the string consisting of no letters, no matter what type of language we are considering.

String Concatenation

Given two strings w1 and w2, we define the concatenation of w1 and w2 to be the string as w1w2.

Example 1

  • If w1 = pq and w2 = r, then w1w2 = pqr.
  • If w1 = acc and w2 = ac, then w1w2 = accac and w2w1 = acacc.
  • If w1 = ∧ and w2 = cd, then w1w2 = cd.
  • If w1 = aa and w2 = ∧, then w1w2 = aa.
  • If w1 = ∧ and w2 = ∧, then w1w2 = ∧; because ∧∧ = ∧.

For any string w, we can define wn for n ≥ 0 inductively as follows −

w0 = ∧;

wn + 1 = wnw for any n ≥ 0.

Example 2

If w = man, then w

0 = ∧, w

1 = mam, w

2 = mammam, w

3 = mammammam,

and so on.

Substring

Given a string s, substring of s is any part of the string s means w is a substring of s, if there exist strings x and y (either or both possibly null) such that s = xwy.

Example

Take the string 472828. Then ∧, 282, 4, and 472828 are all substrings of 472828.

48 is not a substring of 472828.

Conclusion

In automata theory, strings are one of the fundamental components that we use to design our system through automata. Here we consider the concepts of languages, grammars, etc., where strings are nothing but the actual example or the outcome of a language.

We can design automata to check if strings are present in a given rule or not. In this chapter, we explained the basics of strings and how they are used in automata including their components like prefixes and suffixes with examples.

Advertisements