Concept: Safety and Reliability View
This architectural view represents the strategic design decisions regarding the identification, isolation, and correction of faults at run-time. This typically includes the redundant architectural substructures and their management in the event of faults.
Relationships
Main Description

Safety is defined as "freedom from accidents or losses" whereas reliability is "a stochastic measure of the availability of services from a system." Both of these concerns are managed through the administration of redundancy. The safety and reliability architecture is concerned with correct functioning in the presence of faults and errors. Heterogeneous redundancy (also known as diverse redundancy) is used to provide protection from failures and errors.

Reliability is a measure of the up-time or availability of a system--specifically, it is the probability that a computation will successfully complete before the system fails. It is normally estimated with mean time between failure (MTBF).

Redundancy is one design approach that increases availability because if one component fails, another takes its place. Of course, redundancy only improves reliability when the failures of the redundant components are independent. The reliability of a component does not depend on what happens after the component fails. Whether the system fails safely or not, the reliability of the system remains the same.

Safety is distinct from reliability. A safe system is one that does not incur too much risk to persons or equipment. A risk is an event or condition that can occur but is undesirable. Risk is the product of the severity of the incident and its probability.

The key to managing both safety and reliability is redundancy. For improving reliability, redundancy allows the system to continue to work in the presence of faults because other system elements can take up the work of the broken one. For improving safety, additional elements are needed to monitor the system to ensure that it is operating properly; other elements may be needed to either shut down the system in a safe way or take over the required functionality.

Normally, this view is represented structurally with class or structure diagrams; the interaction of the system elements to achieve safety and/or reliability goals are shown on sequence diagrams; the behavior of individual classes or objects are depicted with state machines; the logical relation between faults, conditions, and failure events is shown as a Fault Tree Analysis (FTA) diagram; the reliability as a function of failure modes is shown as an FMEA; the relation between the faults, corrective measures, risks, and hazards is shown in a Hazard Analysis.