Tuesday, January 8, 2008

SQL: Database Normalization

Technique for organizing relational databases to minimize duplication of data. While performing this minimizing function, the technique also prevents certain logical or structural data anomalies.

Higher degrees of normalization tend to involve the creation of more tables, and more joins when retrieving functional datasets.

Data anomaly types addressed by normalization
  • Update Anomaly - similar data is represented in multiple areas. When data is updated, it may not capture each specific source. When queried, this results in multiple records and inconsistent results.
  • Insertion Anomaly - When combined with the incorrect combination of fields, an insert statement into the table may be incorrectly refused.
  • Deletion Anomaly - Similar to insertion, when combined incorrectly on a table, the last instance of a record may be incorrectly deleted.

The normal forms of a database determine the degree of vulnerability to data inconsistencies. The higher the form, the less vulnerable the table is.

Purposeful denormalization can be used to represent by read-only applications or Business Intelligence applications. These are characterized as OLAP (Online Analytical Processing). An implementation of this is the Amazon database previously written about, allowing data set in a field, more like hash table/key value pairs.

No comments: