Normal Forms in DBMS


Normalization is a process of organizing data in a database to reduce redundancy and improve data consistency. Primary keys are really important in organizing information in a database. They help to make sure that every row in a table has a unique identification so that nothing gets mixed up or lost.

In this article, we will discuss the different normal forms of database normalization.

Normalization

Normalization is a process of organizing the data in database to avoid data redundancy, insertion anomaly, update anomaly & deletion anomaly.

Normalization is the process of organizing data in a database to minimize redundancy and dependency. In database design, there are different normal forms based on the primary keys of a table. These include −

First Normal Form (1NF)

1NF requires that each column in a table contains atomic values and that each row is uniquely identified. This means that a table cannot have repeating groups or arrays as columns, and each row must have a unique primary key.

Example

A table is in 1NF if each column contains atomic values and each row is uniquely identified. For example, a table that lists customers and their phone numbers −

Customer ID

Name

Phone Numbers

1

John

555-1234, 555-5678

2

Jane

555-9876

3

Michael

555-5555

This violates 1NF because the Phone Numbers column contains repeating groups.

To normalize this table to 1NF, we can split the Phone Numbers column into separate rows and add a separate primary key column −

Customer ID

Name

Phone Number

1

John

555-1234

1

John

555-5678

2

Jane

555-9876

3

Michael

555-5555

Second Normal Form (2NF)

2NF builds on 1NF by requiring that each non-primary key column in a table is fully functionally dependent on the primary key. This means that a table should not have partial dependencies, where a non-primary key column depends on only part of the primary key.

Example

A table is in 2NF if each non-primary key column is fully functionally dependent on the primary key. For example, a table that lists orders and their line items:

Order ID

Customer ID

Customer Name

Item ID

Item Name

Quantity

1

1

John

1

Shirt

2

1

1

John

2

Pants

1

2

2

Jane

1

Shirt

1

2

2

Jane

3

Hat

3

This violates 2NF because the Customer Name column depends on only part of the primary key (Customer ID). To normalize this table to 2NF, we can split it into two tables −

Order ID

Customer ID

Item ID

Quantity

1

1

1

2

1

1

2

1

2

2

1

1

2

2

3

3

Customer ID

Customer Name

1

John

2

Jane

Third Normal Form (3NF)

3NF builds on 2NF by requiring that each non-primary key column in a table is not transitively dependent on the primary key. This means that a table should not have transitive dependencies, where a non-primary key column depends on another non-primary key column.

Example

To explain 3NF further, let's consider an example of a table that lists customer orders −

Order ID

Customer ID

Customer Name

Customer City

Order Date

Order Total

1

100

John Smith

New York

2022-01-01

100

2

101

Jane Doe

Los Angeles

2022-01-02

200

3

102

Bob Johnson

San Francisco

2022-01-03

300

In this example, the non-primary key column "Customer City" is transitively dependent on the primary key. That is, it depends on "Customer ID", which is not part of the primary key, instead of depending directly on the primary key "Order ID". To bring this table to 3NF, we can split it into two tables −

Table 1: Customers

Customer ID

Customer Name

Customer City

100

John Smith

New York

101

Jane Doe

Los Angeles

102

Bob Johnson

San Francisco

Table 2: Orders

Order ID

Customer ID

Order Date

Order Total

1

100

2022-01-01

100

2

101

2022-01-02

200

3

102

2022-01-03

300

Now, the "Customer City" column is no longer transitively dependent on the primary key and is instead in a separate table that has a direct relationship with the primary key. This makes the table 3NF-compliant.

Boyce-Codd Normal Form (BCNF)

BCNF is a stricter form of 3NF that applies to tables with more than one candidate key. BCNF requires that each non-trivial dependency in a table is a dependency on a candidate key. This means that a table should not have non-trivial dependencies, where a non-primary key column depends on another non-primary key column. BCNF ensures that each table in a database is a separate entity and eliminates redundancies.

Example

A table is in BCNF if each determinant is a candidate key. In other words, every non-trivial functional dependency in the table must be on a candidate key. For example, consider a table that lists information about books and their authors −

Table: Books

Book ID

Title

Author ID

Author Name

Author Nationality

1

Crime and Punishment

100

Fyodor Dostoevsky

Russian

2

The Great Gatsby

101

F. Scott Fitzgerald

American

3

Pride and Prejudice

102

Jane Austen

British

In this example, the functional dependency between "Author ID" and "Author Name" violates BCNF because it is not on a candidate key. To bring this table to BCNF, we can split it into two tables −

Table 1: Authors

Author ID

Author Name

Author Nationality

101

Fyodor Dostoevsky

Russian

101

F. Scott Fitzgerald

American

102

Jane Austen

British

Table 2: Books

Book ID

Title

Author ID

1

Crime and Punishment

100

2

The Great Gatsby

101

3

Pride and Prejudice

102

Now, the "Author Name" and "Author Nationality" columns are not transitively dependent on the primary key, and the table is in BCNF.

Fourth Normal Form (4NF)

4NF builds on BCNF by requiring that a table should not have multi-valued dependencies. A multi-valued dependency occurs when a non-primary key column depends on a combination of other non-primary key columns. For example, a table that lists customer orders with a primary key of order ID and non-primary key columns for customer ID and order items violates 4NF because order items depend on both order ID and customer ID.

For example, a table that lists orders and their products, with columns for order ID, product ID, and product details, violates 4NF because the product details depend on the combination of order ID and product ID.

Example

Consider the following table of orders and products

Order ID

Product ID

Product Name

Product Description

1

100

Widget

Red Widget

1

200

Widget

Blue Widget

2

100

Widget

Red Widget

2

300

Thing

Green Thing

3

200

Widget

Blue Widget

3

300

Thing

Green Thing

In this table, the product name and description depend on both the order ID and product ID, creating a multi-valued dependency. To bring the table into 4NF, we can split it into three tables −

Order ID

Product ID

1

100

1

200

2

100

2

300

3

200

3

300

Product ID

Product Name

100

Widget

200

Widget

300

Thing

Product ID

Product Description

100

Red Widget

200

Blue Widget

300

Green Thing

Advantages and Disadvantages of Normalization

Advantages of Normalization

  • Reduced Data Redundancy

  • Improved Data Consistency

  • Simplified Database Maintenance

  • Improved Query Performance

Disadvantages of Normalization

  • Increased Complexity

  • Decreased Read Performance

  • Increased Write Performance

  • Increased Storage Space

  • Over-Normalization

Conclusion

So, we have explained the concept of normalization in database management systems (DBMS) and its importance in data management. We covered different normal forms of database normalization, including 1NF, 2NF, 3NF, BCNF, and 4NF. We explained how normalization helps eliminate data redundancy, insertion, update, and deletion anomalies. We have also provided examples of tables in different normal forms and how to normalize them to the desired normal form. We have written about the importance of normalization in ensuring data consistency and accuracy in DBMS.

Updated on: 18-May-2023

24K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements