Natural Language Processing - Inception

In this chapter, we will discuss the natural language inception in Natural Language Processing. To begin with, let us first understand what is Natural Language Grammar.

Natural Language Grammar

For linguistics, language is a group of arbitrary vocal signs. We may say that language is creative, governed by rules, innate as well as universal at the same time. On the other hand, it is humanly too. The nature of the language is different for different people. There is a lot of misconception about the nature of the language. That is why it is very important to understand the meaning of the ambiguous term ‘grammar’. In linguistics, the term grammar may be defined as the rules or principles with the help of which language works. In broad sense, we can divide grammar in two categories −

Descriptive Grammar

The set of rules, where linguistics and grammarians formulate the speaker’s grammar is called descriptive grammar.

Perspective Grammar

It is a very different sense of grammar, which attempts to maintain a standard of correctness in the language. This category has little to do with the actual working of the language.

Components of Language

The language of study is divided into the interrelated components, which are conventional as well as arbitrary divisions of linguistic investigation. The explanation of these components is as follows −


The very first component of language is phonology. It is the study of the speech sounds of a particular language. The origin of the word can be traced to Greek language, where ‘phone’ means sound or voice. Phonetics, a subdivision of phonology is the study of the speech sounds of human language from the perspective of their production, perception or their physical properties. IPA (International Phonetic Alphabet) is a tool that represents human sounds in a regular way while studying phonology. In IPA, every written symbol represents one and only one speech sound and vice-versa.


It may be defined as one of the units of sound that differentiate one word from other in a language. In linguistic, phonemes are written between slashes. For example, phoneme /k/ occurs in the words such as kit, skit.


It is the second component of language. It is the study of the structure and classification of the words in a particular language. The origin of the word is from Greek language, where the word ‘morphe’ means ‘form’. Morphology considers the principles of formation of words in a language. In other words, how sounds combine into meaningful units like prefixes, suffixes and roots. It also considers how words can be grouped into parts of speech.


In linguistics, the abstract unit of morphological analysis that corresponds to a set of forms taken by a single word is called lexeme. The way in which a lexeme is used in a sentence is determined by its grammatical category. Lexeme can be individual word or multiword. For example, the word talk is an example of an individual word lexeme, which may have many grammatical variants like talks, talked and talking. Multiword lexeme can be made up of more than one orthographic word. For example, speak up, pull through, etc. are the examples of multiword lexemes.


It is the third component of language. It is the study of the order and arrangement of the words into larger units. The word can be traced to Greek language, where the word suntassein means ‘to put in order’. It studies the type of sentences and their structure, of clauses, of phrases.


It is the fourth component of language. It is the study of how meaning is conveyed. The meaning can be related to the outside world or can be related to the grammar of the sentence. The word can be traced to Greek language, where the word semainein means means ‘to signify’, ‘show’, ‘signal’.


It is the fifth component of language. It is the study of the functions of the language and its use in context. The origin of the word can be traced to Greek language where the word ‘pragma’ means ‘deed’, ‘affair’.

Grammatical Categories

A grammatical category may be defined as a class of units or features within the grammar of a language. These units are the building blocks of language and share a common set of characteristics. Grammatical categories are also called grammatical features.

The inventory of grammatical categories is described below −


It is the simplest grammatical category. We have two terms related to this category −singular and plural. Singular is the concept of ‘one’ whereas, plural is the concept of ‘more than one’. For example, dog/dogs, this/these.


Grammatical gender is expressed by variation in personal pronouns and 3rd person. Examples of grammatical genders are singular − he, she, it; the first and second person forms − I, we and you; the 3rd person plural form they, is either common gender or neuter gender.


Another simple grammatical category is person. Under this, following three terms are recognized −

  • 1st person − The person who is speaking is recognized as 1st person.

  • 2nd person − The person who is the hearer or the person spoken to is recognized as 2nd person.

  • 3rd person − The person or thing about whom we are speaking is recognized as 3rd person.


It is one of the most difficult grammatical categories. It may be defined as an indication of the function of a noun phrase (NP) or the relationship of a noun phrase to a verb or to the other noun phrases in the sentence. We have the following three cases expressed in personal and interrogative pronouns −

  • Nominative case − It is the function of subject. For example, I, we, you, he, she, it, they and who are nominative.

  • Genitive case − It is the function of possessor. For example, my/mine, our/ours, his, her/hers, its, their/theirs, whose are genitive.

  • Objective case − It is the function of object. For example, me, us, you, him, her, them, whom are objective.


This grammatical category is related to adjectives and adverbs. It has the following three terms −

  • Positive degree − It expresses a quality. For example, big, fast, beautiful are positive degrees.

  • Comparative degree − It expresses greater degree or intensity of the quality in one of two items. For example, bigger, faster, more beautiful are comparative degrees.

  • Superlative degree − It expresses greatest degree or intensity of the quality in one of three or more items. For example, biggest, fastest, most beautiful are superlative degrees.

Definiteness and Indefiniteness

Both these concepts are very simple. Definiteness as we know represents a referent, which is known, familiar or identifiable by the speaker or hearer. On the other hand, indefiniteness represents a referent that is not known, or is unfamiliar. The concept can be understood in the co-occurrence of an article with a noun −

  • definite articlethe

  • indefinite articlea/an


This grammatical category is related to verb and can be defined as the linguistic indication of the time of an action. A tense establishes a relation because it indicates the time of an event with respect to the moment of speaking. Broadly, it is of the following three types −

  • Present tense − Represents the occurrence of an action in the present moment. For example, Ram works hard.

  • Past tense − Represents the occurrence of an action before the present moment. For example, it rained.

  • Future tense − Represents the occurrence of an action after the present moment. For example, it will rain.


This grammatical category may be defined as the view taken of an event. It can be of the following types −

  • Perfective aspect − The view is taken as whole and complete in the aspect. For example, the simple past tense like yesterday I met my friend, in English is perfective in aspect as it views the event as complete and whole.

  • Imperfective aspect − The view is taken as ongoing and incomplete in the aspect. For example, the present participle tense like I am working on this problem, in English is imperfective in aspect as it views the event as incomplete and ongoing.


This grammatical category is a bit difficult to define but it can be simply stated as the indication of the speaker’s attitude towards what he/she is talking about. It is also the grammatical feature of verbs. It is distinct from grammatical tenses and grammatical aspect. The examples of moods are indicative, interrogative, imperative, injunctive, subjunctive, potential, optative, gerunds and participles.


It is also called concord. It happens when a word changes from depending on the other words to which it relates. In other words, it involves making the value of some grammatical category agree between different words or part of speech. Followings are the agreements based on other grammatical categories −

  • Agreement based on Person − It is the agreement between subject and the verb. For example, we always use “I am” and “He is” but never “He am” and “I is”.

  • Agreement based on Number − This agreement is between subject and the verb. In this case, there are specific verb forms for first person singular, second person plural and so on. For example, 1st person singular: I really am, 2nd person plural: We really are, 3rd person singular: The boy sings, 3rd person plural: The boys sing.

  • Agreement based on Gender − In English, there is agreement in gender between pronouns and antecedents. For example, He reached his destination. The ship reached her destination.

  • Agreement based on Case − This kind of agreement is not a significant feature of English. For example, who came first − he or his sister?

Spoken Language Syntax

The written English and spoken English grammar have many common features but along with that, they also differ in a number of aspects. The following features distinguish between the spoken and written English grammar −

Disfluencies and Repair

This striking feature makes spoken and written English grammar different from each other. It is individually known as phenomena of disfluencies and collectively as phenomena of repair. Disfluencies include the use of following −

  • Fillers words − Sometimes in between the sentence, we use some filler words. They are called fillers of filler pause. Examples of such words are uh and um.

  • Reparandum and repair − The repeated segment of words in between the sentence is called reparandum. In the same segment, the changed word is called repair. Consider the following example to understand this −

Does ABC airlines offer any one-way flights uh one-way fares for 5000 rupees?

In the above sentence, one-way flight is a reparadum and one-way flights is a repair.


After the filler pause, restarts occurs. For example, in the above sentence, restarts occur when the speaker starts asking about one-way flights then stops, correct himself by filler pause and then restarting asking about one-way fares.

Word Fragments

Sometimes we speak the sentences with smaller fragments of words. For example, wwha-what is the time? Here the words w-wha are word fragments.