Description of Normalization
Normalization is the process of organizing data in a database. This includes creating tables and establishing relationships between those tables according to rules designed both to protect the data and to make the database more flexible by eliminating redundancy and inconsistent dependency.Redundant data wastes disk space and creates maintenance problems. If data that exists in more than one place must be changed, the data must be changed in exactly the same way in all locations. A customer address change is much easier to implement if that data is stored only in the Customers table and nowhere else in the database.
What is an "inconsistent dependency"? While it is intuitive for a user to look in the Customers table for the address of a particular customer, it may not make sense to look there for the salary of the employee who calls on that customer. The employee's salary is related to, or dependent on, the employee and thus should be moved to the Employees table. Inconsistent dependencies can make data difficult to access because the path to find the data may be missing or broken.
There are a few rules for database normalization. Each rule is called a "normal form." If the first rule is observed, the database is said to be in "first normal form." If the first three rules are observed, the database is considered to be in "third normal form." Although other levels of normalization are possible, third normal form is considered the highest level necessary for most applications.
As with many formal rules and specifications, real world scenarios do not always allow for perfect compliance. In general, normalization requires additional tables and some customers find this cumbersome. If you decide to violate one of the first three rules of normalization, make sure that your application anticipates any problems that could occur, such as redundant data and inconsistent dependencies.
The following descriptions include examples.
First Normal Form
- Eliminate repeating groups in individual tables.
- Create a separate table for each set of related data.
- Identify each set of related data with a primary key.
Do not use multiple fields in a single table to store similar data. For example, to track an inventory item that may come from two possible sources, an inventory record may contain fields for Vendor Code 1 and Vendor Code 2.
What happens when you add a third vendor? Adding a field is not the answer; it requires program and table modifications and does not smoothly accommodate a dynamic number of vendors. Instead, place all vendor information in a separate table called Vendors, then link inventory to vendors with an item number key, or vendors to inventory with a vendor code key.
Second Normal Form
- Create separate tables for sets of values that apply to multiple records.
- Relate these tables with a foreign key.
Third Normal Form
- Eliminate fields that do not depend on the key.
For example, in an Employee Recruitment table, a candidate's university name and address may be included. But you need a complete list of universities for group mailings. If university information is stored in the Candidates table, there is no way to list universities with no current candidates. Create a separate Universities table and link it to the Candidates table with a university code key.
EXCEPTION: Adhering to the third normal form, while theoretically desirable, is not always practical. If you have a Customers table and you want to eliminate all possible interfield dependencies, you must create separate tables for cities, ZIP codes, sales representatives, customer classes, and any other factor that may be duplicated in multiple records. In theory, normalization is worth pursing. However, many small tables may degrade performance or exceed open file and memory capacities.
It may be more feasible to apply third normal form only to data that changes frequently. If some dependent fields remain, design your application to require the user to verify all related fields when any one is changed.
Other Normalization Forms
Fourth normal form, also called Boyce Codd Normal Form (BCNF), and fifth normal form do exist, but are rarely considered in practical design. Disregarding these rules may result in less than perfect database design, but should not affect functionality.
– Finding out and gathering user/business needs
– Developing the E-R Model based on user/business needs.
– Converting the E-R Model into relation union (table)
– Normalize the union in order to get rid of the anomaly
– Implementing the union into the database in the form of tables for each of the normalized relational union
– Normalization is a process of forming the database structure so that most of the ambiguity can be expelled.
– The first stage of normalization started with the lightest (1NF) up to the tightest one (5NF)
– Normally, we only need to make it into 3NF or BCNF since in these stages a good tables have already been produced.
Why is normalization needed?
– Optimalization of tables’ structures
– Increasing the speed
– Diminish the same input data
– Efficiency in terms of the use of storage media
– Reducing the redundancy
– Getting rid of anomaly (insertion anomalies, deletion anomalies, update anomalies).
– The increasing of data integrity
• A table is considered to be a good one (efficient) or normal under three circumstances:
1. If there is any table decomposition, it must be as safe as possible (Lossless-Join Decomposition). Meaning that after the table is broken down into new tables, the new tables resulted is identical with the first ones.
2. The functional dependency during the data transformation is preserved (Dependency Preservation)
3. It does not break the Boyce-Code Normal Form (BCNF)
• Suppose that the third criteria (BCNF) is not fulfilled, then the tables must at least not disobey the 3rd normal form/3NF.
• Functional dependency represents attributive relationship in a relation.
• An attribute is said to be functionally dependant to the others if we need the value of the attribute in order to determine the other attributes.
• The symbol used to represent functional dependency is à
à it is read ‘functinally determines…’
• Notation: A àB
A and B are attributes of a table. It means that A functionally determines B, or B is dependant to A, only and if only there are groups of data consisting the asme value of A, therefore the value of B is the same as well.
• Notation: A à B atau A xà B
It is simply the negation form of the above notation.
• Examples:
Functional Dependency:
- NRP à Nama
- Mata_Kuliah, NRP à Nilai
Non Functional Dependency:
1. Mata_Kuliah à NRP
2. NRP à Nilai
3. Functional Dependency observed from grade table:
– Nrp à Nama
This is so because for every same value of Nrp, the value of ‘nama’ is also the same.
– {Mata_kuliah, NRP} à Nilai
Because the attribute of nilai depends on Mata_kuliah and NRP all at once. In other words, for the same NRP and Maata_kuliah, then the value of uniques is also the same since Mata_kuliah and NRP are keys (unique)
– Mata_kuliah à NRP
– NRP à Nilai
A table is called in the state of first normal form if it is not in the form of unnormalized table, where a duplication of same field exists and there is a null field.
It is not permitted the occurrence of few things as follows:
– Multivalued attribute.
– Composite attribute or the fusion of both
So :
The value of attribute domain must be atomic value
Decompositioned form becomes:
– Students’ table:
– Table of hobbies:
– Second normal form is fulfilled when a table is already in the state of 1NF, and all attributes except primary key are functionally dependant on primary keys.
– A table does not met the condition of 2NF if there are attributes whose Functional Dependency are partial (depends on parts of primary key)
– If there is an attribute who has got no dependency toward the primary key, then it must be excluded.
– Functional Dependency of X à Y is considered to be full if when we delete an attribute(e.g. A) from X then Y is no longer functionally dependant.
– Functional Dependency of X à Y is said to be partial if when we delete an attribute(e.g. A) from X then Y is still functionally dependant.
– Relation scheme R in 2NF form if each of non primary key attribute A Î R is in full functional dependence to R primary keys.
– The table below meets the condition of 1NF, but can not be considered as 2NF
– Is not considered to fulfill 2NF, because {NIM, KodeMk} which are determined as primary key, meanwhile:
{NIM, KodeMk} à NamaMhs
{NIM, KodeMk} à Alamat
{NIM, KodeMk} à Matakuliah
{NIM, KodeMk} à Sks
{NIM, KodeMk} à NilaiHuruf
– The tables need to be decompositioned/broken down into several tables that meet the condition of 2NF
– The functional Dependency are as follows:
{NIM, KodeMk} à NilaiHuruf (fd1)
NIM à {NamaMhs, Alamat} (fd2)
KodeMk à {Matakuliah, Sks} (fd3)
– Therefore :
fd1 (NIM, KodeMk, NilaiHuruf) à Table Nilai
fd2 (NIM, NamaMhs, Alamat) à Table Mahasiswa
fd3 (KodeMk, Matakuliah, Sks) à Table MataKuliah
– The third normal form condition is fulfilled if it has been in a form of 2NF, and there is not any non primary key attribute that depend on the other non primary keys (transitive dependency)
– The following table meets the requirement of 2NF, but not 3NF
– It is so because there are still non primary key attributes (i.e. Kota and Provinsi) that depend on the others non primary key attributes (i.e. KodePos):
– KodePos à {Kota, Provinsi}
– Hence, the table needs to be decompositioned/broken down into::
– Mahasiswa (NIM, NamaMhs, Jalan, KodePos)
KodePos (KodePos, Provinsi, Kota)
– Boyce-Codd Normal Form has more powerful force than 3NF. In order to become BNCF, a relation must be in 1NF and every single of its attribute must be forced to depend on super key attribute.
– In the example below, there is a relation among, Seminar, Kunci Primer is NPM + Seminar.
§ Students are allowed to take either one or two seminars. Each of the seminar needs two supervisors and each student must be supervised by one of the two supervisors. Each supervisor is allowed to take only one seminar, and in this case NPM and Seminar shows one Pembimbing (supervisor).
– The form of Seminar Relation is 3NF, but not BCNF since Kode Seminar still has functional dependency towards Pembimbing, if one Pembimbing/supervisor can only teach one seminar.
– Seminar depends on one non super key attribute as required by BCNF.
– Therefore Seminar Relation should be broken down into two tables, those are:
– A relation is in a form of 4NF if the relation is already in BCNF and does not contain many values dependency. In order to diminish the amount of values dependency in a relation, we need to divide the relation into two new ones, each of which possesses two attributes that have many values dependency.
– Relation in a form of 5NF deals with properties called join without having the information loss (lossless join). This 5NF is often known as PJNF (Projection Join Normal Form). This kind of thing happens so rarely and is hard to be detected practically.
1. ER Ngurah Agus Sanjaya Slide Part 6 - NORMALIZATION
2. http://support.microsoft.com/kb/283878
Tidak ada komentar:
Posting Komentar