- C Programming Tutorial
- C - Home
- C - Overview
- C - Features
- C - History
- C - Environment Setup
- C - Program Structure
- C - Hello World
- C - Compilation Process
- C - Comments
- C - Tokens
- C - Keywords
- C - Identifiers
- C - User Input
- C - Basic Syntax
- C - Data Types
- C - Variables
- C - Integer Promotions
- C - Constants
- C - Literals
- C - Escape sequences
- C - Storage Classes
- C - Operators
- C - Decision Making
- C - if statement
- C - if...else statement
- C - nested if statements
- C - switch statement
- C - nested switch statements
- C - Loops
- C - While loop
- C - For loop
- C - Do...while loop
- C - Nested loop
- C - Infinite loop
- C - Break Statement
- C - Continue Statement
- C - goto Statement
- C - Functions
- C - Main Functions
- C - Return Statement
- C - Recursion
- C - Scope Rules
- C - Arrays
- C - Properties of Array
- C - Multi-Dimensional Arrays
- C - Passing Arrays to Function
- C - Return Array from Function
- C - Variable Length Arrays
- C - Pointers
- C - Pointer Arithmetics
- C - Passing Pointers to Functions
- C - Strings
- C - Array of Strings
- C - Structures
- C - Structures and Functions
- C - Arrays of Structures
- C - Pointers to Structures
- C - Self-Referential Structures
- C - Nested Structures
- C - Unions
- C - Bit Fields
- C - Typedef
- C - Input & Output
- C - File I/O
- C - Preprocessors
- C - Header Files
- C - Type Casting
- C - Error Handling
- C - Variable Arguments
- C - Memory Management
- C - Command Line Arguments
- C Programming Resources
- C - Questions & Answers
- C - Quick Guide
- C - Useful Resources
- C - Discussion
C - Tokens
A token is referred to as the smallest unit in the source code of a computer language such as C. The term token is borrowed from the theory of linguistics. Just as a certain piece of text in a language (like English) comprises words (collection of alphabets), digits, and punctuation symbols. A compiler breaks a C program into tokens and then proceeds ahead to the next stages used in the compilation process.
The first stage in the compilation process is a tokenizer. The tokenizer divides the source code into individual tokens, identifying the token type, and passing tokens one at a time to the next stage of the compiler.
The parser is the next stage in the compilation. It is capable of understanding the language's grammar. identifies syntax errors and translates an error-free program into the machine language.
A C source code also comprises tokens of different types. The tokens in C are of the following types −
- Character set
- Keyword tokens
- Literal tokens
- Identifier tokens
- Operator tokens
- Special symbol tokens
Let us discuss each of these token types.
C Character set
The C language identifies a character set that comprises English alphabets – upper and lowercase (A to Z, as well as a to z), digits 0 to 9, and certain other symbols with a special meaning attached to them. In C, certain combinations of characters also have a special meaning attached to them. For example, \n is known as a newline character. Such combinations are called escape sequences.
Here is the character set of C language:
- Uppercase: A to Z
- Lowercase: a to z
- Digits: 0 to 9
- Special characters: ! " # $ % & ' ( ) * + - . : , ; ` ~ = < > { } [ ] ^ _ \ /
A sequence of any of these characters inside a pair of double quote symbols " and " are used to represent a string literal. Digits are used to represent numeric literal. Square brackets are used for defining an array. Curly brackets are used to mark code blocks. Back slash is an escape character. Other characters are defined as operators.
C Keywords
In C, the predefined sequences of alphabets are called keywords. Compared to human languages, programming languages have fewer keywords. To start with, C had 32 keywords, later on, few more were added in subsequent revisions of C standards. All keywords are in lowercase. Each keyword has rules of usage (in programming it is called syntax) attached to it.
The C compiler checks whether a keyword has been used according to the syntax, and translates the source code into the object code.
C Literals
In computer programming terminology, the term literal refers to a textual representation of a value to be assigned to a variable, directly hard-coded in the source code.
A numeric literal contains digits, a decimal symbol, and/or the exponentiation character E or e.
The string literal is made up of any sequence of characters put inside a pair of double quotation symbols. A character literal is a single character inside a single quote.
Arrays can also be represented in literal form by putting a comma-separated sequence of literals between square brackets.
In C, escape sequences are also a type of literal. Two or more characters, the first being a backslash \ character, put inside a single quote form an escape sequence. Each escape sequence has a predefined meaning attached to it.
C Identifiers
In contrast to the keywords, the identifiers are the user-defined elements in a program. You need to define various program elements by giving them an appropriate name. For example, variable, constant, label, user-defined type, function, etc. There are certain rules prescribed in C, to form an identifier. One of the important restrictions is that a reserved keyword cannot be used as an identifier. For example, for is a keyword in C, and hence it cannot be used as an identifier i.e., name of a variable, function, etc.
C Operators
C is a computational language. Hence a C program consists of expressions that perform arithmetic and comparison operations. The special symbols from the character set of C are mostly defined as operators. For example, the well-known symbols, + - * and / are the arithmetic operators in C. Similarly, the < and > are used as comparison operators.
C Special symbols
Apart from the symbols defined as operators, the other symbols include punctuation symbols like commas, semicolons, and colons. In C, you find them used differently in different contexts. Similarly, the parentheses ( and ) are used in arithmetic expressions as well as in function definitions. The curly brackets are employed to mark the scope of functions, code blocks in conditional and looping statements, etc.