Normalization Process in DBMS


Database normalization is a way of organizing information in a database so that it is efficient and easy to use. It involves breaking down big tables into smaller ones that relate to each other in a logical way. This helps to prevent errors and confusion when making changes to the data.

In this article, we will discuss the Normalization Process in DBMS with the help of an example.

Introduction

There are different levels of normalization, starting with First Normal Form (INF), Second Normal Form (2NF), and Third Normal Form (3NF). Each level builds on the previous one and helps to reduce errors and duplication in the data. There is also a stronger version of 3NF called Boyce-Codd Normal Form (BCNF), which is even better at preventing errors.

There are also higher levels of normalization like Fourth Normal Form (4NF) and Fifth Normal Form (5NF), but these are rarely used because they only apply to very specific situations.

Normalization is the process of organizing data in a database to minimize redundancy and dependency. In database design, there are different normal forms based on the primary keys of a table. These include:

First Normal Form (1NF)

1NF requires that each column in a table contains atomic values and that each row is uniquely identified. This means that a table cannot have repeating groups or arrays as columns, and each row must have a unique primary key.

Second Normal Form (2NF)

2NF builds on 1NF by requiring that each non-primary key column in a table is fully functionally dependent on the primary key. This means that a table should not have partial dependencies, where a non-primary key column depends on only part of the primary key.

Third Normal Form (3NF)

3NF builds on 2NF by requiring that each non-primary key column in a table is not transitively dependent on the primary key. This means that a table should not have transitive dependencies, where a non-primary key column depends on another non-primary key column.

Boyce-Codd Normal Form (BCNF)

BCNF is a stricter form of 3NF that applies to tables with more than one candidate key. BCNF requires that each non-trivial dependency in a table is a dependency on a candidate key. This means that a table should not have non-trivial dependencies, where a non-primary key column depends on another non-primary key column. BCNF ensures that each table in a database is a separate entity and eliminates redundancies.

Fourth Normal Form (4NF)

4NF builds on BCNF by requiring that a table should not have multi-valued dependencies. A multi-valued dependency occurs when a non-primary key column depends on a combination of other non-primary key columns.

This is flow of achieving highest normal forms in DBMS:

Note that we do these operations on functional dependencies (FDs) (and multi-valued dependencies (MVDs) on 4NF and 5NF) to obtain highest normal forms.

Example

Book ID

Book Title

Author Name

Author Email

Author Phone

Publisher Name

Publisher Address

1

The Great Gatsby

F. Scott Fitzgerald

fitzgerald@gmail.com

555-123-4567

Scribner

123 Main St, New York, NY

2

Pride and Prejudice

Jane Austen

austen@hotmail.com

555-234-5678

Penguin

456 Broad St, London, UK

3

To Kill a Mockingbird

Harper Lee

hlee@gmail.com

555-345-6789

HarperCollins

789 Broadway, New York, NY

This table violates the first normal form (1NF) because the author information is not atomic. It also violates the second normal form (2NF) because the publisher information is not dependent on the book ID.

First Normal Form (1NF)

To convert this table into 1NF, we need to split the author and publisher information into separate tables, and ensure that each column has only one value.

Book ID

Book Title

1

The Great Gatsby

2

Pride and Prejudice

3

To Kill a Mockingbird

Author ID

Author Name

Author Email

Author Phone

1

F. Scott Fitzgerald

fitzgerald@gmail.com

555-123-4567

2

Jane Austen

austen@hotmail.com

555-234-5678

3

Harper Lee

hlee@gmail.com

555-345-6789

Publisher ID

Publisher Name

Publisher Address

1

Scribner

123 Main St, New York, NY

2

Penguin

456 Broad St, London, UK

3

HarperCollins

789 Broadway, New York, NY

Now each column has only one value, and the author and publisher information is split into separate tables.

Second Normal Form (2NF)

To convert this table into 2NF, we need to ensure that the publisher information is dependent on the book ID.

Book ID

Book Title

Author ID

Publisher ID

1

The Great Gatsby

1

1

2

Pride and Prejudice

2

2

3

To Kill a Mockingbird

3

3

Author ID

Author Name

Author Email

Author Phone

1

F. Scott Fitzgerald

fitzgerald@gmail.com

555-123-4567

2

Jane Austen

austen@hotmail.com

555-234-5678

3

Harper Lee

hlee@gmail.com

555-345-6789

Publisher ID

Publisher Name

Publisher Address

1

Scribner

123 Main St, New York, NY

2

Penguin

456 Broad St, London, UK

3

HarperCollins

789 Broadway, New York, NY

Now the publisher information is dependent on the book ID, and there are no partial dependencies

Third Normal Form (3NF)

To convert this table into 3NF, we need to eliminate any transitive dependencies. In this case, the author information has a transitive dependency on the book ID through the author ID.

Book ID

Book Title

Publisher ID

1

The Great Gatsby

1

2

Pride and Prejudice

2

3

To Kill a Mockingbird

3

Author ID

Author Name

Author Email

Author Phone

1

F. Scott Fitzgerald

fitzgerald@gmail.com

555-123-4567

2

Jane Austen

austen@hotmail.com

555-234-5678

3

Harper Lee

hlee@gmail.com

555-345-6789

Book ID

Author ID

1

1

2

2

3

3

Publisher ID

Publisher Name

Publisher Address

1

Scribner

123 Main St, New York, NY

2

Penguin

456 Broad St, London, UK

3

HarperCollins

789 Broadway, New York, NY

Now the author information is split into a separate table, and there are no transitive dependencies.

Boyce-Codd Normal Form (BCNF)

To convert this table into BCNF, we need to ensure that each determinant is a candidate key. In this case, the Book ID is the only determinant, and it is already a candidate key.

Therefore, the table is already in BCNF.

Fourth Normal Form (4NF)

To convert this table into 4NF, we need to ensure that there are no multivalued dependencies. In this case, there are no multivalued dependencies.

Therefore, the table is already in 4NF.

Final Tables

Book ID

Book Title

Publisher ID

1

The Great Gatsby

1

2

Pride and Prejudice

2

3

To Kill a Mockingbird

3

Author ID

Author Name

Author Email

Author Phone

1

F. Scott Fitzgerald

fitzgerald@gmail.com

555-123-4567

2

Jane Austen

austen@hotmail.com

555-234-5678

3

Harper Lee

hlee@gmail.com

555-345-6789

Book ID

Author ID

1

1

2

2

3

3

Publisher ID

Publisher Name

Publisher Address

1

Scribner

123 Main St, New York, NY

2

Penguin

456 Broad St, London, UK

3

HarperCollins

789 Broadway, New York, NY

These tables are now in 4NF as there are no multivalued dependencies and all the dependencies are either trivial or based on candidate keys. This concludes the normalization process for this example.

Updated on: 18-May-2023

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements