bdfd1m0c | Database Administration |
Assume that you want to keep passenger and flight information on the database.
To do this, you could have a single table containing rows holding all the
necessary information resulting from your data analysis. A table of
this kind is shown in Table 1.
Table 1. Flight Table (Nonnormalized Form)
Date | Flight number | Start | Destination | Aircraft type | Seat number | Seat class | Passenger number | Passenger name | Passenger address | Passenger facts |
Da1 | Fl1 | St1 | De1 | At1 |
Se1 Se2 Se3 Se4 |
Cl2 Cl2 Cl2 Cl3 |
Pn1 Pn2 Pn3 Pn4 |
Na1 Na2 Na3 Na4 |
Ad1 Ad2 Ad3 Ad4 |
Ft1 Ft2 |
Da1 | Fl2 | St2 | De2 | At1 | Se5 | Cl2 | Pn3 | Na3 | Ad3 |
This table is not in normalized form and has several undesirable features. For example:
Restructuring the table into first normal form eliminates the first of these problems. You will need to progress to a higher level of normalization to eliminate the second.
To get a table into first normal form, remove multiple occurrences of attribute values from the same row by creating a new row for each value. This ensures that every attribute in the table has only one value for each row.
Table 2 is in first normal form and contains the same data as the
nonnormalized table.
Table 2. Flight Table (Nonnormalized Form)
Date | Flight number | Start | Destination | Aircraft type | Seat number | Seat class | Passenger number | Passenger name | Passenger address | Passenger facts |
Da1 | Fl1 | St1 | De1 | At1 |
Se1 Se2 Se3 Se4 |
Cl2 Cl2 Cl2 Cl3 |
Pn1 Pn2 Pn3 Pn4 |
Na1 Na2 Na3 Na4 |
Ad1 Ad2 Ad3 Ad4 |
Ft1 Ft2 |
Da1 | Fl2 | St2 | De2 | At1 | Se5 | Cl2 | Pn3 | Na3 | Ad3 |
To get a table into second normal form, remove all attributes that are not dependent, either directly or indirectly, on the value of the primary key and put them into other tables.
In Table 2, the attribute destination directly depends on the primary key. Given the value of the primary key, especially the date and flight number, you can see that the aircraft will be flying to a particular destination. Without the primary key value, it would not make sense to speak of a flight destination at all.
In contrast, the seat class attribute is only indirectly dependent on the primary key. It really depends on the aircraft type and seat number attributes. Aircraft type is not part of the primary key, though it is itself dependent on the primary key. Because seat class is dependent on aircraft type, it is said to be indirectly dependent on the primary key.
Looking back to the flight table in first normal form (Table 2), you can see that it contains some attributes that are not dependent, either directly or indirectly, on the primary key. The primary key in this table is the combination of the date, flight number, and seat number attributes.
Even if the flight table's primary key had no value at all, the following attributes would still be meaningful:
Because they are not dependent on the primary key, you can remove these attributes from the flight table and put them into a separate table of their own.
Table 3 shows an example of such a table. Note that the primary key for this table is passenger number. This has been included in the new table because its value is always unique. For example, there may be more than one passenger named Smith, or more than one vegetarian passenger, but there can never be more than one passenger number Pn1.
Because it is unique, passenger number becomes the primary key of the new table. It also remains in the flight table because it is dependent on the primary key.
Table 3 is already in second normal form because every attribute in
the table is either directly or indirectly dependent on the primary
key.
Passenger number | Passenger name | Passenger address | Passenger facts |
Pn1 | Na1 | Ad1 | Ft1 |
Pn2 | Na2 | Ad2 | Ft2 |
Pn3 | Na3 | Ad3 | |
Pn4 | Na4 | Ad4 |
Now that you have removed the attributes that are independent on the
primary key and have put them in the passenger table (Table 3), the flight table is in second normal form, as you can see
in Table 4. The primary key is still the combination of the
date, flight number, and seat number attributes that are
shown in bold type.
Table 4. Flight Table (Second Normal Form)
Date | Flight number | Start | Destination | Aircraft type | Seat number | Seat class | Passenger number |
Da1 | Fl1 | St1 | De1 | At1 | Se1 | Cl2 | Pn1 |
Da1 | Fl1 | St1 | De1 | At1 | Se2 | Cl2 | Pn2 |
Da1 | Fl1 | St1 | De1 | At1 | Se3 | Cl2 | Pn3 |
Da1 | Fl1 | St1 | De1 | At1 | Se4 | Cl3 | Pn4 |
Da1 | Fl2 | St2 | De2 | At1 | Se5 | Cl2 | Pn3 |
The flight table (Table 4) has been improved by being put into second normal form, but it still contains some indirect dependencies, and also some duplicate data.
For example, the seat class attribute depends on aircraft type and seat number. It does not depend directly on the other two attributes of the primary key. Because of this, you can remove seat class and record it in a separate table.
The new table will also include the aircraft type and seat
number attributes. Together, they form the primary key of the new
table, which is shown in Table 5.
Aircraft type | Seat number | Seat class |
At1 | Se1 | Cl2 |
At1 | Se2 | Cl2 |
At1 | Se3 | Cl2 |
At1 | Se4 | Cl3 |
At1 | Se5 | Cl2 |
The primary key here is the combination of aircraft type and seat number.
The indirect dependency has now been removed from the flight table, but
with the table in second normal form, there is still some duplication, as you
can see in Table 6.
Table 6. Flight Table (Second Normal Form)
Date | Flight number | Start | Destination | Aircraft type | Seat number | Passenger number |
Da1 | Fl1 | St1 | De1 | At1 | Se1 | Pn1 |
Da1 | Fl1 | St1 | De1 | At1 | Se2 | Pn2 |
Da1 | Fl1 | St1 | De1 | At1 | Se3 | Pn3 |
Da1 | Fl1 | St1 | De1 | At1 | Se4 | Pn4 |
Da1 | Fl2 | St2 | De2 | At1 | Se5 | Pn3 |
Because the values for seat number and passenger number must be unique for each row, the table must contain a separate row for each of these values. To accommodate this, the other values in the table must be duplicated. For example, flight number Fl1 is repeated four times when only once would be enough.
Because seat number is already held in the aircraft table (Table 5) and passenger number is held in the passenger table (Table 3), and there is no indirect dependency through either of these attributes, you can remove them both from the flight table.
Because you need a table to record the passengers who are on any particular
flight, you could place these removed attributes in a seat table. This
table would show the seats booked on the flight and the passenger number for
the person booked on each seat. Table 7 is an example of this kind of table:
Date | Flight number | Seat number | Passenger number |
Da1 | Fl1 | Se1 | Pn1 |
Da1 | Fl1 | Se2 | Pn2 |
Da1 | Fl1 | Se3 | Pn3 |
Da1 | Fl1 | Se4 | Pn4 |
Da1 | Fl2 | Se5 | Pn3 |
The primary key in this table is the combination of date, flight number, and seat number.
The flight table is now in third normal form. Each of its attributes is directly dependent on the primary key and there is no unnecessary duplication of data in the table.
As you can see in Table 8, the value for date is the same in both rows of
the table. This duplication is necessary because there will certainly
be more than one flight each day.
Table 8. Flight Table (Third Normal Form)
Date | Flight number | Start | Destination | Aircraft type |
Da1 | Fl1 | St1 | De1 | At1 |
Da1 | Fl2 | St2 | De2 | At1 |
The passenger number attribute has been removed from the flight table along with seat number and seat class because passenger number was dependent on seat number and, without the seat number, no passenger number can be assigned.
The primary key in the flight table is now the combination of the date and flight number attributes.
You now have four complete and related tables (Table 9-Table 12) in third normal form:
Table 9. Flight Table (Third Normal Form)
Date | Flight number | Start | Destination | Aircraft type |
Da1 | Fl1 | St1 | De1 | At1 |
Da1 | Fl2 | St2 | De2 | At1 |
Passenger number | Passenger name | Passenger address | Passenger facts |
Pn1 | Na1 | Ad1 | Ft1 |
Pn2 | Na2 | Ad2 | Ft2 |
Pn3 | Na3 | Ad3 | |
Pn4 | Na4 | Ad4 |
Aircraft type | Seat number | Seat class |
At1 | Se1 | Cl2 |
At1 | Se2 | Cl2 |
At1 | Se3 | Cl2 |
At1 | Se4 | Cl3 |
At1 | Se5 | Cl2 |
Date | Flight number | Seat number | Passenger number |
Da1 | Fl1 | Se1 | Pn1 |
Da1 | Fl1 | Se2 | Pn2 |
Da1 | Fl1 | Se3 | Pn3 |
Da1 | Fl1 | Se4 | Pn4 |
Da1 | Fl2 | Se5 | Pn3 |
Looking back at the four normalized tables (Table 9-Table 12), you may notice that there is some duplication of data across them. For example, the passenger number attribute is held in both the passenger table and the seat table. Seat number is held in the seat table and the aircraft table, and there are further duplications as well.
This duplication occurs in normalized tables because they are generally being developed to be used in a relational database. Relational systems use these common attributes as links between one table and the next. The links provide paths around the database so that every table can access the data held in every other table. This means that, apart from the linking attributes, no data needs to be duplicated in the database.
In the example hierarchical database discussed in this publication, links of this kind are not needed because the tables are joined by pointers. Pointers provide a means of linking one table or file with another. Use the DBIDX and DBDIX TPFDF macros to maintain pointers.