When you create a physical data model, you map the logical
data model to the physical structures of a database that hosts the
data warehouse.
When you create a physical data model, you must define physical
structures, such as tables and data types to use when data is stored.
You may also define new data structures that can enhance query performance.
However, you must do define new structures without changing the meaning
of the logical data model schema.
Keep in mind the following considerations when you create a physical
data model:
- How scalable is your design? How scalable is the physical database
management system (DBMS)?
- What queries, ETL processes, and other applications does the data
warehouse require?
- Is there an abstracted data model that you can use to improve
performance?
- How will you operate or maintain the data warehouse?
Note: Physical data modeling in online transaction processing (OLTP)
does not differ much from physical modeling for the data warehouse.
At the conceptual model level, physical data modeling for OLTP differs
primarily in performance design. In OLTP, the design is focused on
data and transaction volumes, where with a data warehousing physical
data model is focused on load performance, population of the analytical
areas and the summary tables by batch or real-time applications, and
on performance of the analytical queries.
To create a physical dimensional data model, you perform the following
steps:
- Model the entities and attributes of the physical data model:
- Define a table for each entity that is in the logical data model.
Assign a name to each table.
- Create columns for each of the attributes of the entities that
are in the logical data model. Assign a name and data type to each
column.
- Define the primary and foreign keys of each table.
- Build the DDL for the physical data model:
- Create the target database.
- Connect to the target database.
- Generate the DDL.
- Implement the DDL.
- Design and tune the performance of the physical data model. Tune
the entities and relationships that you get from the logical data
model and focus on how those objects are populated. You tune the population
performance by using one of two methods:
- Batch population
- Use custom applications, ETL tools, or database utilities that
deliver good performance.
- Real-time population
- Use processes and techniques that enable data to become available
faster. For example, rather than using the typical ETL process, use
an extract, load, and transform (ELT) process. In the ELT process,
data is extracted and loaded before the transformation takes place,
which may improve performance.
Note: Performance depends on the physical data structures.
Altering or adding more appropriate physical structures may improve
the performance of queries, data extractions, or replications. However,
adding more physical structures may also increase the load time of
the data warehouse. Performance tuning is a cost minimization issue.
For example, performance can always be improved by adding more CPU
and I/O resources, but you should find a compromise between acceptable
performance and total cost of the system.
- Verify the physical design by making sure that you have addressed
the following areas:
- The physical model DDL script should properly define the physical
structures, including performance enhancements.
- The physical design should be fully documented.
- Each entity in the logical data model should represent a physical
table, including the appropriate attributes and relations.
- Each relationship should describe correct cardinalities (one-to-one,
one-to-many, and many-to-many).
- Properly describe each entity and attribute in the data dictionary.
- Validate all capacity estimates.