Data storage areas
The RIDW contains three data storage areas:
Table name | Purpose of the table |
---|---|
SYS_CONFIG | Contains system configuration information, such as the date dimension calendar type. |
ETL_INFO | Contains ETL (extract, transform, and load) configuration information, such as ETL schedule (daily, weekly, or monthly) |
Two additional control columns are also added: ISSOFDELETED and REC_DATETIME.
Surrogate keys
Surrogate keys are the primary keys created by the ETL process. They are required to override differences between the formats of the primary keys from different sources. In every extracted table, RIDW creates its own primary keys and retains the original key in the EXTERNAL_KEY1 or EXTERNAL_KEY2 columns. Surrogate keys are 0-based integers.
Classification tables
These are tables that represents commonly used artifacts such as projects, requests, requirements, tasks, activities, and components. They are recognizable by their name, which contains the word “CLASSIFICATION”. Every classification table has a pre-defined record type that represents a common understanding of the artifact. Converting extracted tables into classification tables is managed by the ETL process.
Dummy record
All classifications and artifact tables contain a dummy row identifiable by a primary key value of -1. Dummy rows support the normalization of data in the event of null values for the referential constraints defined.
The fact and dimension area implements a star schema design, which is a set of facts, dimensions, and historical traceability tables. A fact table contains the measures or ‘facts' of a particular business process. For instance, if project management is a process that you want to measure then a corresponding fact table might contain the number of requirements. Dimension tables contain attributes used to constrain and group data when performing data warehousing queries. If number of requirements is a fact table then you might want to use requirement type as a dimension.
Design conventions
Dimension table names start with “D_” and have surrogate keys to identify records. Fact table names with “F_”, while tables that provide historical insight on measures found in the fact tables start with “H_”. The primary key is usually a combination of surrogate key of the measure and the surrogate key of the artifact that contributed to the measure. Fact table names with “F_C_”, are equivalent to their original trending fact table, with the exception that they only store the most recent snapshot of the data, whereas their counterparts accumulate data to allow trending over time. The data mart also comprises a set of views that provide a window to the operational data store, for the purpose of drill down for more details or for the possibility of providing list reports.
Control columns
All fact tables have a column that references the date dimension D_DATE. For trending tables, the date in this context indicates the date that the snapshot of the data was taken. For the other fact tables, the date reflects the occurrence of an event such as the creation of a record. The snapshot of the data can be tagged to indicate a ‘per day', ‘per week', or ‘per month' basis, depending on the value stored in the SYS_CONFIG table in the configuration area. Other control columns include the REC_DATEIME column, which serves as a record stamp for the insert or update of a record.
NULL indicator record
As in the operational data store, dimension tables contain a NULL indicator row identifiable by a primary key value of -1. This represents the NULL value for tables with constraints.