Unit 3--Section 3

Normalization

Objectives
Reading
Notes



Return to Study Guide--Unit 3 Directory



Objectives

After completing this section, you should be able to

  1. define the concept of normalization.

  2. define and discuss functional dependence.

  3. define and select different types of keys.

  4. apply the normal forms on relations to get normalized (i.e. well-structured) relations.

  5. transform an entity-relationship to relations.

  6. merge relations.


Return to Study Guide--Unit 3--Section 3 Directory


Reading

  1. Read pages 209-237 of Chapter 6, "Logical Design and the Relational Model", in Modern Database Management, by F.R. McFadden and J.A. Hoffer.


Return to Study Guide--Unit 3--Section 3 Directory


Notes:

A Comment on Normalization

Students often find normalization difficult as some of them do not have a sufficient mathematical background to help them to follow discussions in textbooks. The aim of the next few pages is to explain normalization and normal forms in simple words.

One phase in database development is mapping entity-relationship diagrams representing a business organization into relations. If this mapping is poor, then the resulting relational model will suffer from modification anomalies (drawbacks) when inserting or deleting data. Normalization is the process of applying rules (called normal forms) to the relational model to obtain a well-designed relational model. Theoretically, we have six rules called 1NF, 2NF, 3NF, BCNF, 4NF, and 5NF. But for practical purposes, putting relations in BCNF will result in a well-designed relational model, so we will discuss only the first four rules. These normal forms are interrelated in that before placing a relation in 2NF the relation should be in 1NF first.

To place a relation in 3NF, the relation should be in 1NF then 2NF. To place a relation in 4NF, then it should be in 1NF then 2NF, then 3NF, then BCNF, and so on.

Just for your information, there a single rule called DKNF (Domain Key Normal Form), which stands by itself. If we managed to put a relation in DKNF, then we guarantee that it is free from modification anomalies. But it is not known whether it is possible to put any relation in DKNF. Also, this rule is complex and more than we need to obtain a well designed relational model. So, to avoid confusion we will not discuss this rule here.

Now, lets return to normal forms. Since these are concerned with the relational model, we have to make sure that the tables we have really are relations, 1NF takes care of that:

A. 1NF: every relation is in 1NF. If the table we have is a relation then it is in 1NF.

Consider the following example:

A department offers courses for students who pay tuition fees. The following relation represents this situation.

DEPARTMENT (student-no, course-no, fee)

Student-no Course-no Fee
S101 C213 300
S201 C213 300
S301 C413 350

What will happen if S301decided to drop the course and s/he is the only student enrolled in course C413? If we deleted the third row (tuple), then we did not delete only that piece of information saying that S301 is enrolled in C413, but we also deleted that piece of information saying that course C413 is being offered by the department and it costs $350 (we deleted more information than we wanted to). This is an example of deleting anomalies.

Now consider another situation. If the department offers course C513 and no student is registered in the course, then we cannot enter that piece of information until a student enrolls in that course, because according to the definition of a relation we cannot insert a null value in an attribute. This is an example of inserting anomalies.

These anomalies occurred because as you see above the key is compound and the non-key attribute (i.e. fee) depends on only part of the key. The 2NF resolves this problem.

B. 2NF: a relation is in 2NF if

  1. it is in 1NF and,

  2. every non-key attribute depends on the whole key. If the key is a single attribute, then it will be in 2NF, but if the key is compound (i.e. more than one attribute) then all non-key attributes should depend on the whole key.

Obviously, in our example above, fee depends on course only (i.e. part of the key) and not on the whole compound key, so we split this relation into two relations, each in 2NF:

STUDENT (student-no, course-no), COURSE (course-no, fee)

STUDENT COURSE
Student-no Course-no Student-no Fee
S101 C213 C213 300
S201 C213 C413 350
S301 C413

Examine each of these two relations for the modification anomalies mentioned above and check yourself that each is in 2NF.

C. 3NF: a relation is in 3NF if
1. the relation is in 2NF
2. there is no transitive dependency

Consider the following situation:

Young children ride buses between home and school. Each student pays a rate according to bus type: basic (B), loaded (L), fully loaded (FL). We can represent this situation by the following relation:

TRANSPORTATION (student-no, bus-type, rate)

Student-no Bus-type Rate
S101 B 100
S201 B 100
S301 L 120

Now, this table is a relation, so it is in 1NF by definition. The key is a single attribute (i.e. there is no compound key) so it is in 2NF. But still this table suffers from modification anomalies. If student S301 moved from bus type L to B or changed schools and he/she is the only student using that bus, then we will have to delete that row (tuple). In this case we also deleted more information than we have to, we deleted a piece of information saying that there is a bus of type L and it costs $120 (a deleting anomaly). Also since no student uses the bus of type FL we cannot insert that piece of information in the relation because we cannot enter a null value in the student-no attribute (an inserting anomaly). These anomalies occurred because of transitive dependency. Student-no determines bus-type, bus-type determines rate, hence, student-no determines rate (since rate is associated mainly with bus-type but the key is student-no). To put this relation in 3NF we split it into:

STUDENT (student-no, bus-type) , BUS (bus-type, rate)

STUDENT BUS
Student-no Bus-type Bus-type Rate
S101 B B 100
S201 B L 120
S301 L FL 150

Examine these two relations yourself and make sure that they are in 1NF, 2NF and that each has no transitive dependency, making them in 3NF.

D. BYNF: a relation is in BCNF if
1. the relation is in 3NF
2. every determinant is a candidate key.

Consider the following situation:

Students in a school are asked to join activities supervised and trained by coaches. A student may join more than one activity. A coach may train students in only one activity. An activity may have more than one coach (if many students choose that activity, then they are divided into groups, each group supervised by a coach). The following relation represents this situation.

ST-AC-CO (student-no, activity, coach)

Student-no Activity Coach
S101 A1 C1
S101 A2 C2
S202 A1 C1
S303 A1 C3
S303 A2 C2
S303 A3 C4
S404 A3 C5

This relation is in 1NF by definition, in 2NF since it is in 1NF and every non-key attribute depends on the whole key, in 3NF since it is in 2NF and has no transitive dependency. But it still has modification anomalies. If S303 is not playing A1 any more and we deleted the 4th row (tuple), then we have also deleted the piece of information telling as that there is a coach C3 ready to train students who enroll in A1 (deleting anomaly). Also, if we have coach C6 who is ready is train A4 but no student is enrolled in A4, then we cannot insert that piece of information about C4 (inserting anomalies). The attribute COACH is a determinant since it determines activity (not the opposite). To put the above relation in BCNF we have to split it into:

ST-AC (student-no, activity), AC-CO (activity, coach)

ST-AC AC-CO
Student-no Activity Activity >Coach
S101 A1 A1 C1
S101 A2 A2 C2
S202 A1 A1 C3
S303 A1 A3 C4
S303 A2 A3 C5
S303 A3
S404 A3
S404 A2

Examine ST-AC and AC-CO for being in the 1NF, 2NF, 3NF and BCNF and being free of the modification anomalies that exist in ST-AC-CO. Can you check to see whether dividing ST-AC-CO into ST-AC and ST-CO will be an acceptable alternative to ST-AC and AC-CO?

A. 4NF: a relation is in 4NF if
1. it is in BYNF
2. has no multi-valued independent attributes (i.e. there are no multi-valued attributes, or if multi-valued attributes exist, then they are functionally dependent).

Consider the following situation: Students in schools are encouraged to have hobbies outside their studies. We want to record the courses and hobbies of students. A student may have more than one course and more than one hobby. This situation is represented by the relation ST-CO-HO-A.

Several points may be noticed about this relation:

1. redundancy: for student S101 who has two courses and one hobby, we repeated the value of his/her hobby because we cannot leave this second row with a null value in the hobby-name column.
2. deleting anomaly: if S210 dropped course C213, and we deleted the row representing that, then we deleted also that he is playing football.
3. inserting anomaly: if S401 is enrolled in course C213 but does not have a hobby, we cannot record this information because we cannot enter a null value in the hobby-name attribute.
4. false impressions: for S301, one may have the impression that C413 encourages mental activities since chess is with it in the same row, while C213 encourages physical activities as basketball and swimming. In fact, there is no relationship between courses and hobbies, so to remove such a false impression, we have to repeat each course with each hobby to end with the relation ST-CO-HO-B.

ST-CO-HO-A (student-no, course-no, hobby-name), ST-CO-HO-B (student-no, course-no, hobby-name)

A
ST-CO-HO-A
Student-no Course-no Hobby-name
S101 C213 billiards
S101 C413 billiards
S201 C213 football
S301 C413 chess
S301 C213 basketball
S301 C213 swimming
S301 C213 chess
S301 C213 basketball

B
ST-CO-HO-B
Student-no Course-no Hobby-name
S101 C213 billiards
S101 C413 billiards
S201 C213 football
S301 C413 chess
S301 C413 basketball
S301 C413 swimming
S301 C213 chess
S301 C213 swimming

Look at the redundancy in ST-CO-HO-B. We still have the anomalies mentioned above in points 1,2,3. This redundancy occurred because the two attributes (course-no and hobby-name) may have multiple values for each student and there is no relationship between the values of the attributes. ST-CO-HO-B is in 1NF, 2NF, 3NF and BCNF, but still has anomalies. 4NF resolves these anomalies, when we split ST-CO-HO-B into:

STU-CO (student-no, course-no), STU-HO (student-no, hobby-name)

STU-CO STU-HO
Student-no Course-no Student-no Hobby-name
S101 C213 S101 billiards
S101 C413 S201 football
S201 C213 S301 chess
S301 C413 S301 basketball
S301 C213 S301 swimming

Examine these two relations yourself to make sure that they are free from the modification anomalies existing in ST-CO-HO-B. STU-CO and STU-HO are both in 4NF.



Return to Study Guide--Unit 3--Section 3 Directory


Created: January 1, 1997, 00:00:00 hrs.
Last Updated: August 11, 1997, 12:01:00 hrs.

[Comp378 Home Page] [AU Home Page]