3. Hazard AnalysisHazard Analysis occurs throughout the life cycle of a safety-critical software project. The essential elements of each hazard are captured in a hazard report that is built up in stages during the development of the system/software project. Hazard analysis generally needs to be a joint effort with the safety personnel from both the systems and software safety teams. A good hazard analysis requires a thorough understanding of the system and how it will operate as well as an understanding of the software that may cause a hazard or act to monitor, mitigate or control the hazard. 3.1 Relationship of the Safety Phases to Systems and Software Development:The development of the hazard report is generally tied to the Safety Review phases which are intended to monitor the progress in identifying and addressing all of the hazards in the safety-critical system. There are four Safety Reviews at the end of Phases 0, 1, 2, and 3. The diagram 1.0 shows how these Safety Review Phases typically line up with the system and software life cycle phases.

Diagram 1.0
The following is a general description of what is expected in each of the System Safety Phases: Phase 0: During Phase 0, the Preliminary Hazard Analysis (PHA) is developed for the system and: - Uses a description of the system
- Identifies major high-level hazards, considering loss of control, loss of mission or loss of facilities
- This Safety Review occurs about the point where is system concept is complete and systems requirements are being developed
Phase 1: During Phase 1, the PHA is updated and expanded into initial Hazard Analysis (HA) and Hazards are recorded in Hazard Reports - At this point, most of the system hazards have been identified, using potential causes and contributors
- Risk of the identified hazards has been identified
- The Phase 1 Safety Review occurs about the point where the system and software requirements are being completed
Phase 2: The Hazard Reports are updated - Mitigations and controls for each hazard are identified
- Methods of verification for each mitigation and control are specified to ensure they will eliminate or reduce the impact of the hazard
- By the Phase 2 Safety Review, most of the system and software design are complete and implementation is underway.
Phase 3: Hazard Reports are completed - By the Phase 3 Safety Review, the tests and hazard verifications identified in Phase 2 have been completed and the results have shown that the hazards are controlled to an acceptable safe level.
3.2 Software Involvement in the Hazard Reports/Software Hazard Reports:(Also, see the detailed information found in SWE-205 - Determination of Safety-Critical Software, Tab 7.4) 3.2.1 Understanding the System:The first step in identifying software-related hazards or functions is developing a thorough understanding of the system to be built. Software Safety personnel should work with the Systems Safety personnel during the Concepts Phase and Requirements Development Phase to get a better understanding of how the system or software could fail, how the failure might be prevented, and how to mitigate or prevent an accident if a failure occurs. The systems personnel will work with the documentation initially available to make determinations of how the system might fail and document the initial results in a Preliminary Hazard Analysis. Early documentation that might be reviewed includes: - The Concept of Operation
- Generic Hazard Lists (including generic software causes)
- Critical Items List
- Preliminary System Reliability Analysis
- Project/System Risk Analyses
- Request for Proposals
- Computing System Safety Analysis
- Software Security Assessment
- Science Requirements Document
- Requirements and Specification Documents
- Safety analysis from previous similar projects (Often similar projects will have many of the same types of hazards.)
- Checklists
Establish a scope for the hazard analysis. Are there operational boundaries to be included? What phases need to be considered? What other items should be considered (e.g., human actions, software interfaces, utilities), Break the Preliminary Hazard Analysis into manageable groups. Analyze the interaction between the sub-elements. Typically, the PHA is done in teams including members from different roles in the project who have become familiar with the project operations. Using the system understanding they have gained, the team brainstorms possible hazards and records them as hazard statements. Hazard statements are often recorded in the form: Exposure to “something” causes “something undesirable to happen” or failure of “something” causes “something undesirable to happen”. The Preliminary Hazard Analysis will result in a list of hazard causes and a set of possible hazard controls which are used as inputs to develop the initial safety requirements. There is a list of potential software causes in 8.21 - Software Hazard Causes of the Assurance and Safety Topics in this Handbook. 3.2.2 Determine Software’s Role in the HazardsOnce there is an initial list of causes and initial safety requirements, more specific hazards can be defined for the systems level. Software safety personnel will help determine the role software has in the defined hazards—Will the software monitor the hazard or is the software a cause, a control or a mitigation of the hazard? Generally, the initial preliminary hazard analysis is followed by the identification of the high-level system hazards and their causes and controls. At this point, the software safety team will be reviewing the identified hazards with their causes and controls so they can help identify additional software safety-related hazards, causes and controls. The software safety team assures that all the software controls identified are included in the set of requirements. These controls may include monitoring the health of equipment, sending alarms or warnings to the operators, identifying faults about to occur or that have occurred and taking mitigating actions, lockouts, verifying input values, error handling, barriers, procedures and many others. The software related hazards at this point may be documented with the System Hazard analysis reports or they may be documented separately in Software Hazard Reports. Each hazard is documented in a separate hazard report along with its potential risk to the system. The software safety team often does a Fault Tree Analysis (FTA) at this point to try to identify any software related hazards that have been overlooked. A Fault Tree analysis is a top-down analysis to help identify the causes of presupposed hazards and is described in detail in Topic 8.07 - Software Fault Tree Analysis of the Assurance and Safety Topics in this Handbook. Another method that can be used to help identify software hazards and failures is the Software Failure Modes and Effects Analysis (FMEA), which is a bottoms-up structured analysis method. This method is covered in detail in Topic 8.05 - SW Failure Modes and Effects Analysis of the Assurance and Safety Topics in this Handbook. This method is more time-consuming and can be over-whelming in very large systems. Hazard Analysis must consider the software’s ability, by design, to cause or control a given hazard. It is a best practice to include the software within the system hazard analysis. The general hazard analysis must consider software common-mode failures that can occur in instances of redundant flight computers running the same software. A common mode failure is a specific type of common cause failure where several subsystems fail in the same way for the same reason. The failures may occur at different times and the common cause could be a design defect or a repeated event. There are several different perspectives that may be used to think about the specific hazards that may occur in a system. From a system perspective, there are 3 points of view to consider: 1) Physical – where the architectural view shows the system and how it is to be built 2) Functional – which describes what the system is supposed to do to obtain the required system behavior. This looks at the system broken into functions with inputs and outputs and 3) Operational – where the operator interface and the operation of the system are considered, including conditions, limitations, parameters, etc. Other perspectives that should be considered when identifying hazards are: 1) Software - that controls the computer systems 2) Environment – that looks at the various environments encountered by the software 3) Human - considers the human performance in the system and how any errors might affect the system and 4) Organizational -considers any organizational or management actions that might affect the hazards. 3.2.3 Analysis Updates During DesignBy the time the majority of the design has been completed, the safety teams will be focusing on two primary activities. Each of the hazards must have a verification method identified that can be used to show the hazard has been eliminated or mitigated to an acceptable safety level, depending on the risk associated with the hazard. During the design period each hazard should updated to include a verification method that can be used to ensure that the hazard can be controlled by the software mitigation or control. The design should be carefully reviewed to assure that all the hazards identified have been eliminated or controlled by the design. Any changes in the design should be examined to determine whether the changes have caused or exposed any new hazards that had not been captured previously or if any of the changes in design would prevent a control or mitigation to be by-passed or fail. Software is often relied on to work around hardware problems encountered which results in additions and/or changes to functionality. The Hazard Analysis Reports should be updated with any new information. 3.2.4 Hazard Analysis Updates During Implementation and TestingEach Hazard Analysis Report will be updated again during the implementation and testing phases to capture the results of running the verification methods and determining whether the results show that are the controls, mitigations, barriers, etc. are adequate to eliminate or control the hazards they were designed for. Many of the verifications will probably be run during unit testing since many of the functions being tested would be difficult to test in integrated system testing. As in the design phase, any changes to the requirements, design or code during implementation and testing should be analyzed to determine if there is any impact to the safety features or if any new hazards have emerged. Hazard reports (as well as any software documentation, i.e., requirements, design, etc.) need to be updated with any new information. The software should be reassessed during implementation and testing if there are any new concerns that need to be considered (e.g., previously unidentified security concerns). Such changes can easily ripple through the system and impact the safety requirements or features. When determining if the verifications are adequate, the goal is to confirm that the accepted hazard controls produce the expected result and do not cause unexpected problems. 3.2.5 Hazard Analysis Report ContentsThe minimum recommended Hazard Analysis Report Contents detail is found in Tab 4 of this topic. 3.3 Additional GuidanceLinks to Additional Guidance materials for this subject have been compiled in the Relevant Links table. Click here to see the in the Resources tab. The following SWEs also have links to this topic: |