What is Normalization in Database Management System?
- by bytethebuzz
- 1605
Normalization is a method of breaking down complex table structures into simple table structures by using certain rules. Using this method, you can reduce redundancy in a table, and eliminate the problems of inconsistency and disk space usage. You can also ensure that there is no loss of information.
Normalization has the following benefits:
- It helps in maintaining data integrity.
- It helps in simplifying the structure of tables, therefore, making a database more compact.
- It helps in reducing the null values, which reduces the complexity of data operations.
The performance of an application is directly linked to the database design. A poor design results in poor
performance of the system. The logical design of the database lays the foundation for an optimal database.
Some rules that should be followed to achieve a good database design are:
- Each table should have a unique identifier.
- Each table should store data for a single type of entity.
- Columns that accept NULL values should be avoided.
- The repetition of values or columns should be avoided.
Normalization results in the formation of tables that satisfy certain specified rules and represent certain
normal forms. The normal forms are used to ensure that various types of abnormalities and inconsistenciesare not introduced in the database. A table structure is always in a certain normal form. Several normal formshave been identified.
The most important and widely used normal forms are:
- First Normal Form (1NF)
- Second Normal Form (2NF)
- Third Normal Form (3NF)
- BCNF
First Normal Form
A table is said to be in 1NF when each cell of the table contains precisely one value.
The guidelines for converting a table to 1NF are:
Place the related data values in a table. Further, define similar data values with the column name.
There should be no repeating group in the table.
Every table must have a unique primary key.
Consider the following PROJECT table.
The preceding table already satisfies the two guidelines of the first normal form. The values in the PROJECT table are related, and similar data values have been defined by the common attribute name. The PROJECT table also has a primary key, ECODE, which uniquely defines the complete table. However, the data in the PROJECT table is still not in first normal form because the cells in PROJCODE and HOURS have more than one value. By applying 1NF definition to the PROJECT table, you arrive at the following table:
Now, consider the following STUDENT table:
The preceding table gives the details of students. Consider the following points about the STUDENT table:
- It consists of the student roll number (ROLL NUMBER), student first name (STUDENT FIRST NAME), student last name (STUDENT LAST NAME), and student residential phone numbers (PHONENUMBER1 and PHONE NUMBER2).
- It has a primary key, ROLL NUMBER, which uniquely defines the complete table.
The preceding table satisfies the two guidelines of the first normal form. However, it does contain null values and the number of telephone numbers per student is restricted to two. If a student has three telephone numbers, you are constrained to record only two and leave the third unrecorded. This means that the database design is imposing constraints on recording data in the tables. Therefore, there is a need of normalizing the STUDENT table.
By applying 1NF definition to the STUDENT table, you can arrive at the following tables.
Now, to convert the table to 2NF, you must first understand the concept of functional dependency.
Functional Dependency
The normalization theory is based on the fundamental notion of functional dependency. Given a relation (you may recall that a table is also called a relation) R, attribute A is functionally dependent on attribute B if each value of A in R is associated with precisely one value of B.
In other words, attribute A is functionally dependent on B if and only if, for each value of B, there is exactly one value of A.
Attribute B is called the determinant.
Consider the following REPORT table.
The preceding table has a composite key (ROLL_NUMBER+COURSE_CODE).
In the preceding table, for a particular value of ROLL_NUMBER+COURSE_CODE, there is precisely one
corresponding value for MARKS. Hence, MARKS is functionally dependent on ROLL_NUMBER+COURSE_CODE. This can be symbolically represented as (ROLL_NUMBER,COURSE_CODE)->MARKS. The following diagram shows the functional dependency between MARKS
and ROLL_NUMBER+COURSE_CODE.
The other functional dependencies in the preceding table are:
- COURSE_CODE->COURSE_NAME
- COURSE_CODE->T_NAME (Assuming one course is taught by only one teacher.)
- T_NAME->ROOM_NUMBER (Assuming each teacher has his/her own, unshared room.)
- MARKS->GRADE
- COURSE_CODE->ROOM_NUMBER
Further, the attribute COURSE_NAME is not functionally dependent on ROLL_NUMBER+COURSE_CODE. It is only dependent on COURSE_CODE. Similarly, T_NAME andROOM_NUMBER are not functionally dependent on the whole key. T_NAME and ROOM_NUMBER are only dependent on COURSE_CODE. Therefore, COURSE_NAME, T_NAME, and ROOM_NUMBERattributes are partially dependent on the whole key. This dependency is called partial dependency, as shown in the following diagram.
In the preceding diagram, ROOM_NUMBER is dependent on T_NAME, and T_NAME is dependent on COURSE_CODE. Therefore, ROOM_NUMBER is dependent on COURSE_CODE. This type of dependency is called transitive (indirect) dependency. The following diagram shows the transitive dependency.
Second Normal Form
A table is said to be in 2NF when:
- It is in 1NF.
- No partial dependency exists between non-key attributes and key attributes.
The guidelines for converting a table to 2NF are:
- Find and remove attributes that are functionally dependent on only a part of the key and not on the whole key. Place them in a different table.
- Group the remaining attributes.
Consider the following PROJECT table.
The preceding table could lead to the following problems:
- Insertion: The department of a particular employee cannot be recorded until the employee is assigned aproject.
- Updation: For a given employee, the employee code (ECODE), the department name (DEPT), and the department head (DEPTHEAD) are repeated several times. Hence, if an employee is transferred to another department, the change will have to be recorded in every row of the EMPLOYEE table. Any omission will lead to inconsistencies.
- Deletion: When an employee completes work on a project, the employee’s record is deleted. The information regarding the department to which the employee belongs will also be lost.
The primary key of the preceding table is composite (ECODE+PROJCODE). The table satisfies the
definition of 1NF. You need to check now if it satisfies 2NF by considering the following points:
- For each value of ECODE, there is more than one value of HOURS. For example, for ECODE, E101,there are three values of HOURS – 90, 101, and 60. Hence, HOURS is not functionally dependent on ECODE.
- For each value of PROJCODE, there is more than one value of HOURS. For example, for PROJCODEP27, there are three values of HOURS – 90, 10, and 72. Hence, HOURS is not functionally dependent onPROJCODE.
- For a combination of the ECODE and PROJCODE values, there is exactly one value of HOURS. Hence, HOURS is functionally dependent on the whole key, ECODE+PROJCODE.
Now, consider the following points to check if DEPT is functionally dependent on the whole key,ECODE+PROJCODE:
- For each value of ECODE, there is exactly one value of DEPT. For example, for ECODE, E101, there is exactly one value, the Systems department. Hence, DEPT is functionally dependent on ECODE.
- For each value of PROJCODE, there is more than one value of DEPT. For example, PROJCODE, P27 is associated with two values of DEPT, Systems and Finance. Hence, DEPT is not functionally dependent on PROJCODE.
Therefore DEPT is functionally dependent on part of the key, which is ECODE, and not functionally dependent on the whole key, ECODE+PROJCODE.
Similarly, DEPTHEAD is functionally dependent on ECODE; however, it is not dependent on the attribute,PROJCODE. Therefore, the PROJECT table is not in 2NF. For the table to be in 2NF, the non-key attributes must be functionally dependent on the whole key and not part of the key.
To convert the PROJECT table into 2NF, you must remove the attributes that are not functionally dependent on the whole key. You should place the removed attributes in a different table along with the attribute they are functionally dependent on.
In the preceding example, DEPT is not functionally dependent on the whole key, ECODE+PROJCODE. Therefore, you should place DEPT along with ECODE in a separate table called EMPLOYEEDEPT. You should also move the DEPTHEAD to the EMPLOYEEDEPT table. Now, the PROJECT table will contain ECODE, PROJCODE, and HOURS.
The following tables are in 2NF.
Now, consider the following STUD_SUBJECT_DETAILS table.
The preceding table gives the details of the subjects chosen by a student. Consider the following points about the STUD_SUBJECT_DETAILS table:
- The primary key of the preceding table is STUD_ID + SUBJECT_CODE.
- There are repeated values in columns, STUD_FIRST_NAME, STUD_LAST_NAME, andSUBJECT_NAME. Further, these attributes are dependent on only part of the primary key.
- The values in columns STUD_FIRST_NAME and STUD_LAST_NAME are dependent on STUD_ID.
- The values in SUBJECT_NAME are only dependent on SUBJECT_CODE, and all the non-primary fields are not functionally dependent on the whole primary key.
Therefore, to normalize the table in 2NF, you need to break the table into smaller tables.
The following tables are in 2NF.
Third Normal Form
A relation is said to be in 3NF if and only if:
- It is in 2NF.
- No transitive (indirect) dependency exists between non-key attributes and key attributes.
The guidelines for converting a table to 3NF are:
- Find and remove non-key attributes that are functionally dependent on attributes that are not the primary key. Place them in a different table.
- Group the remaining attributes.
Consider the following EMPLOYEE table.
The preceding table could lead to the following problems:
- Insertion: The department head of a new department that does not have any employees at present cannotbe entered in the DEPTHEAD column. This is because the primary key is unknown.
- Updation: For a given department, the code for a particular department head (DEPTHEAD) is repeated several times. Hence, if a department head moves to another department, the change will have to be made consistently across the table.
- Deletion: If the record of an employee is deleted, the information regarding the head of the department will also be deleted. Hence, there will be a loss of information.
The EMPLOYEE table contains a single value in each cell. Therefore, the table is in 1NF.
The primary key in the EMPLOYEE table is ECODE. For each value of ECODE, there is exactly one value of DEPT. Hence, the DEPT attribute is functionally dependent on the primary key, ECODE. Similarly, for each value of ECODE, there is exactly one value of DEPTHEAD. Therefore, DEPTHEAD is functionally dependent on the primary key, ECODE. Hence, all the attributes are functionally dependent on the whole key, ECODE. Hence, the table is in 2NF.
However, the attribute DEPTHEAD is dependent on the attribute DEPT also. As per 3NF, all non-key attributes have to be functionally dependent only on the primary key. This table is not in 3NF since DEPTHEAD is functionally dependent on DEPT, which is not a primary key.
To convert the Employee table into 3NF, you must remove the DEPTHEAD column because it is not functionally dependent on only the primary key, ECODE. You should place DEPTHEAD in another table called DEPARTMENT along with the attribute, DEPT.
The following tables are in 3NF.
Consider another table, STUD_DETAILS, which stores the student residential details. The following table
describes the structure of STUD_DETAILS.
The STUD_DETAILS table contains the values, as shown in the following table.
In the preceding table, street name (STREET), city (CITY), and state (STATE) are associated with the zip code. For example, if you want to know the number of students residing at a particular location in a city, then a zip code is enough for this. Therefore, you can move attributes, STATE, CITY, STREET, and ZIPCODE
into another table, ADDRESS.
The following table describes the structure of the ADDRESS table.
Then, you can move the rest of the attributes to another table, which must include ZIPCODE also.
The following table describes the structure of STUD_DETAILS.
BCNF
The original definition of 3NF was not sufficient in some situations. It was not satisfactory for the tables:
- That had multiple candidate keys.
- Where the multiple candidate keys were composite.
- Where the multiple candidate keys overlapped (had at least one attribute in common).
Therefore, a new normal form named BCNF was introduced. In tables, where the preceding three conditionsdo not apply, you can stop at the third normal form. In such cases, 3NF is the same as BCNF.
A relation is in BCNF if and only if every determinant is a candidate key.
The guidelines for converting a table to BCNF are:
- Find and remove the overlapping candidate keys. Place the part of the candidate key and the attribute itis functionally dependent on, in a different table.
- Group the remaining items into a table.
Consider the following PROJECT table.
The preceding table has redundancy. For example, if the name of an employee is modified, the change willhave to be made consistent across the table, otherwise there will be inconsistencies.
ECODE+PROJCODE is the primary key of the PROJECT table. You will notice that NAME+PROJCODEcould be chosen as the primary key and hence, is a candidate key.
The following points describe the functional dependencies in the PROJECT table:
- HOURS is functionally dependent on ECODE+PROJCODE.
- HOURS is also functionally dependent on NAME+PROJCODE.
- NAME is functionally dependent on ECODE.
- ECODE is functionally dependent on NAME.
You will notice that the PROJECT table has:
- Multiple candidate keys that are ECODE+PROJCODE and NAME+PROJCODE.
- Composite candidate keys.
- Candidate keys that overlap since the attribute PROJCODE is common between the two candidate keys.
This is a situation that requires conversion to BCNF. The table is essentially in the third normal form. The
only non-key item is HOURS, which is dependent on the whole key, ECODE+PROJCODE orNAME+PROJCODE.
ECODE and NAME are determinants since they are functionally dependent on each other. However, they arenot candidate keys by themselves. As per BCNF, the determinants have to be candidate keys.
Hence, remove NAME and ECODE and place them in a different table. You will arrive at the following
tables.
Normalization is a method of breaking down complex table structures into simple table structures by using certain rules. Using this method, you can reduce redundancy in a table, and eliminate the problems of inconsistency and disk space usage. You can also ensure that there is no loss of information. Normalization…
Normalization is a method of breaking down complex table structures into simple table structures by using certain rules. Using this method, you can reduce redundancy in a table, and eliminate the problems of inconsistency and disk space usage. You can also ensure that there is no loss of information. Normalization…