Improve Your Data Modeling Skills
Data professionals should know that improving their data modeling skills
increases productivity and efficiency. Certifications can demonstrate these
skills, which also improves your marketability.
Motivation
o Essential data modeling skills:
Data Modeling
o process of analyzing data-oriented structures
o includes variety of specific model types
-
types range from models for physical data to models for high-level
concepts
o similar to class modeling for object-oriented (OO) design
-
data modelers versus OO developers:
. model entity types versus classes
. assign attributes to entity types versus attributes and
operations to classes
. associations between entities versus between classes in OO
design: similar
Entity types
o understanding entity types is fundamental skill for data models
o entity types represent:
-
collection of similar objects (such as people, places, and things)
-
non-physical concepts (such as events)
-
example: in order entry database: Customer, Order, and Item are
common entity types
o entity types only represent data whereas classes also describe
object's behavior
Attributes
o entity types have at least one attribute
o example: attributes for entity type Customer typically include attributes
First Name and Last Name
o developers typically implement attributes as columns in database tables
-
achieving optimum level of detail is often challenging
o expressing single attributes with multiple columns:
-
can provide greater control over data
-
incurs development and maintenance costs
o example: phone number in North America
-
has three components: Area Code, Prefix, and Line Number
-
rarely need to assign each component to separate columns
Naming Conventions
o naming conventions for data modeling:
-
typically maintained by enterprise administrators
-
essential for making code easy to understand and modify
-
physical and logical data models typically have different naming
conventions since they have different purposes
o example:
-
for logical data models: give greater priority to human readability
-
for physical models: focus more on technical considerations
Relationships
o relationships between entities:
-
key requirement for developing data modeling skills
-
conceptually identical to associations between objects in OO
programming
-
example: order entry system:
. Customers place Orders, so placement is typical relationship
between customers and orders
. Customers live at Address, and Zip Code is part of Address
o naming relationships often becomes unnecessary when specifying
entities’ role in relationships with sufficient clarity
Key Assignment
o data modeling uses two basic strategies to assign keys to tables:
-
assigning natural key:
. usually best option when table has at least one attribute that is
unique to table’s business concept
-
create surrogate key:
. data modelers need to add new column for tables without such
attribute
~ no business meaning
~ merely serves to identify entity type
. example:
~ addresses do not have obvious natural key because needs entire
address to identify it
~ data modelers often identify addresses with surrogate key called
something like Address Identifier
Normalization
o process of organizing data within data models
o make entity types of data models more cohesive
o generally involves reducing data redundancy
-
highly beneficial for application development
-
storing objects in relational databases becomes much easier when
information about those objects is maintained in only one place
o first three levels of normalization are most common
o higher levels are possible
o progressive hierarchy: next level meets all requirements of
previous level
o example:
-
entity type in first normal form (1NF):
. does not contain repeating data groups
-
entity type in second normal form (2NF)
. in 1NF
. its non-key attributes fully dependent on its primary key
-
entity type is in third normal form (3NF):
. in 2NF
. its attributes directly dependent on primary key
o incurs performance cost
-
denormalization also important skill for data modelers
-
data models often bear little resemblance to their normalized
schema
In addition to proper training, the key to improving data modeling skills
is practice.