bannerc
8.21 - Software Hazard Causes

1. Introduction - Software Hazard Causes

When a device or system can lead to injury, death, the destruction or loss of vital equipment, or damage to the environment, system safety is paramount.  The system safety discipline focuses on  “hazards” and the prevention of hazardous situations. 

A hazard is the presence of a potential risk situation that can result in or contribute to a mishap. To ensure the system being developed is as safe as possible, it is important to begin identifying potential hazards as early as possible in the development. Thus, the software and system safety personnel generally look at the hazardous events that could happen and what could potentially cause them.

Every hazard has at least one cause, which in turn can lead to a number of effects (e.g., damage, illness, failure). A hazard cause may be a defect in hardware or software, a human operator error, or an unexpected input or event which results in a hazard. The table below provides a number of potential software causes to consider in the project when developing the list of hazards and their potential causes.

A hazard control is a method for preventing the hazard, reducing the likelihood of the hazard occurring, or the reduction of the impact of that hazard .  Hazard controls use software (e.g. detection of stuck valve and automatic response to open secondary valve), hardware (e.g. pressure relief valve), operator procedures, or a combination of methods to avert the hazard.  For every hazard cause, there must be at least one control method, usually a design feature (hardware and/or software) or a procedural step.

2. Table of Software Causes

Potential Software Causes to Consider When Identifying Software Causes in Hazard Analysis

Software Cause Areas to Consider

Potential Software Causes

Data errors


1.      Asynchronous communications

2.      Single or double event upset/bit flip or Hardware induced error

3.      Communication to/from an unexpected system on the network

4.      An out-of-range input value, value above or below range

5.      Start-up or hardware initiation data errors

6.      Data from an antenna gets corrupted

7.      Failure of Software Interface to Memory

8.      Failure of flight software to suppress outputs from a failed component

9.      Failure of software to monitor Bus Controller Rates to ensure communication  with all remote terminals on the bus schedule's avionics buses

10.   Ground or Onboard database error, error in tag name database I/O configuration

11.   Interface error

12.   Latent data (Data delayed or not provided in required time)

13.   Communication bus overload

14.   Missing or failed integrity checks on inputs, failure to check the validity of input/output data

15.   Noise, babbling Node - keeps the node so busy it inhibits communication from other nodes

16.   Sensors or actuators stuck at some value (all zeros, all ones, some other value)

17.   Wrong software state for the input

Commanding errors


1.      Command buffer error or overflow

2.      Corrupted software load

3.      Error in real-time command build or sequence build

4.      Failure to command during hazardous operations

5.      Failure to perform prerequisite checks before the execution of safety-critical software commands.

6.      Ground or Onboard database error for command structure,

7.      Error in Command Data introduced by Command Server error

8.      Wrong commands are given by the operator,

9.      Wrong command or a miscalculated command sent,

10.   Sequencing error (Failure to issue commands in the correct sequence)

11.   Command send in wrong software state or software in an incorrect or unanticipated state

12.   An incorrect timestamp on the command

13.   Missing software error handling on incorrect commands

14.   Status messages on command execution not provided

15.   Memory corruption, critical data variables overwritten in memory

16.   Inconsistent syntax

17.   Inconsistent command options  

18.   Similarly named commands

19.   Inconsistent error handling rules

Flight computer errors

1.      Board support package software error

2.      Boot load software error

3.      Boot PROM corruption preventing reset

4.      Buffer overrun – A buffer overflow (or buffer overrun) occurs when the volume of data exceeds the storage capacity of the memory buffer. As a result, the program attempting to write the data to the buffer overwrites adjacent memory locations.

5.      CPU overload

6.      Cycle jitter

7.      Cycle over-run

8.      Deadlock (trying to write to the same memory at the same time or trying to update while reading it)

9.      Reset during program upload (PROM corruption)

10.   Reset with no restart

11.   Single or double event upset/bit flip or Hardware induced error

12.   Time to reset greater than time to failure

13.   Unintended persistent data/configuration on reset

14.   Watchdog active during reboot causing infinite boot loop

15.   Watchdog failure

16.   Failure to detect and transition to redundant or backup computer

17.   Incorrect or stale data in redundant or backup computer

Operating systems errors

1.      Application software incompatibility with upgrades/patches to an operating system

2.      Cyclomatic Complexity levels, the complexity of software components (preventing thorough testing and increasing likelihood of coding errors)

3.      Defects in Real-Time Operating System (RTOS) Board Support software

4.      Defects in the Real-Time Operating System (RTOS) Commercial-Off-The-Shelf (COTS) Software

5.      Missing or incorrect software error handling

6.      Partitioning errors

7.      Shared resource errors

8.      Single or double event upset/bit flip

9.      Doesn't do what the user expects

10.   Excessive functionality

11.   Missing function

12.   Wrong function

13.   Inadequate protection against operating system bugs

Programmable logic device errors

1.      Cyclomatic Complexity levels, the complexity of software components (preventing thorough testing and increasing likelihood of coding errors)

2.      Errors in programming and simulation tools used for PLC development

3.      Errors in the Programmable Logic Device interfaces

4.      Errors in the Software/Firmware Logic Design

5.      Missing software error handling in the Software/Firmware Logic Design

6.      Programmable Logic Controller (PLC) logic/sequence error

7.      Single or double event upset/bit flip or hardware induced error

8.      Timing errors

9.      Doesn't do what the user expects

10.   Excessive functionality

11.   Missing function

12.   Wrong function

Flight system time management errors


1.      Incorrect data latency/sampling rates

2.      Failure to terminate/complete process in a given time

3.      Incorrect time sync

4.      Latent data (Data delayed or not provided in required time) MET timing issues and distribution

5.      Incorrect function execution, performing a function at the wrong time, out of sequence, or when the program is in the wrong state

6.      Race Conditions

7.      The software cannot respond to an off-nominal condition within the time needed to prevent a hazardous event.

8.      Time function runs fast/slow

9.      Time skips (e.g., Global Positioning System time correction)

10.   Loss or incorrect time sync across flight system components

11.   Loss or incorrect time Synchronization between ground and spacecraft Interfaces

12.   Unclear software timing requirements

13.   Asynchronous systems or components

Coding, logic, and algorithm failures, algorithm specification errors


1.      Auto-Coding errors as a cause

2.      Bad configuration data/no checks on external input files and data

3.      Division by zero,

4.      Wrong sign,

5.      Syntax errors,

6.      Error coding software algorithm,

7.      Error in positioning algorithm,

8.      Case/type/conversion error/unit mismatch

9.      Buffer overflows

10.   High Cyclomatic Complexity levels (above 15), the complexity of software components (preventing thorough testing and increasing likelihood of coding errors)

11.   Dead code or used code

12.   Endless do loops

13.   Erroneous Outputs

14.   Failure of Flight Computer Software to Transition to or Operate in a Correct Mode or State

15.   Failure to check safety-critical outputs for reasonableness and hazardous values, and correct timing.

16.   Failure to generate a process error upon detection of arithmetic error (such as divide-by-zero).

17.   Failure to open a software problem report when an unexpected event occurs

18.   Inadvertent memory modification

19.   Incorrect "if-then" and incorrect "else."

20.   Missing default case in switch statement

21.   Incorrect implementation of a software change, software defect, or software non-conformance

22.   Incorrect passes (too many or too few or not at the correct time)

23.   Incorrect software operation if no commands are received or if a loss of commanding capability exists (Inability to issue commands)

24.   Insufficient or poor coding reviews, inadequate software peer reviews

25.   Insufficient use of coding standards

26.   Interface errors

27.   Missing or inadequate static analysis checks on code

28.   Missing or incorrect parameter range & boundary checking

29.   Non-functional loops

30.   Overflow or underflow in the calculation

31.   Precision mismatch

32.   Resource contention (e.g., thrashing: two or more processes accessing a shared resource)

33.   Rounding or truncation fault

34.   Sequencing error (Failure to issue commands in the correct sequence)

35.   Software is initialized to an unknown state; failure to properly initialize all system and local variables are upon startup, including clocks.

36.   Too many or too few parameters for the called function

37.   Undefined or non-initialized data

38.   Unfound defects in the software

39.   Untested COTS, MOTS, or reused code

40.   Incomplete end-to-end testing

41.   Incomplete or missing software stress test

42.   Errors in the data dictionary or data dictionary processes

43.   Confusing feature names

44.   More than one name for the same feature

45.   Repeated code modules

46.   Failure to initialize a loop-control

47.   Failure to initialize (or reinitialize) pointers

48.   Failure to initialize (or reinitialize) registers

49.   Failure to clear a flag

Fault tolerance and fault management errors

1.      Missing software error handling

2.      Missing or incorrect fault detection logic

3.      Missing or incorrect fault recovery logic

4.      Problems with the execution of emergency safing operations

5.      Failure to halt all hazard functions after an interlock failure

6.      The software cannot respond to an off-nominal condition within the time needed to prevent a hazardous event.

7.      Common mode software faults

o   Lack of backup flight software

o   Lack of dissimilar software

o   Lack of dissimilar control algorithms

8.      A hazard causal factor occurrence isn't detected

o   Software fails to sense a change in the condition of the hardware

o   Failure to detect a problem

o   Failure to detect and react to a system failure (Examples of system failures: Aural Warning System Failure, Communication System Failure, Facility Fire and Gas Detection System Failure, Critical Sensor Failure, Facility Control System Failure, Control System Failure, Common Control System Failure)

9.      False positives in Fault Detection Algorithms

10.   Failure to perform prerequisite checks before the execution of safety-critical software commands.

o   Failure to verify operator input commands

11.   Failure to terminate/complete process in a given time

12.   Memory corruption, critical data variables overwritten in memory

13.   Single or double event upset/bit flip or Hardware induced error

14.   Incorrect interfaces, errors in interfaces

15.   Missing self-test capabilities

16.   Failing to consider stress on the hardware

17.   Incomplete end-to-end testing

18.   Incomplete or missing software stress test

19.   Errors in the data dictionary or data dictionary processes

20.   Failure to provide or ensure secure access for input data, commanding, and software modifications.

Software processes errors

1.      Failure to implement software development processes or implementing inadequate processes

2.      Inadequate software assurance support and reviews

3.      Missing or inadequate software assurance audits

4.      Failure to follow the documented software development processes

5.      Missing, tailored, or incomplete implementation of the safety-critical software requirements in NPR 7150.2

6.      Missing, tailored, or incomplete implementation of the safety-critical software requirements in SSP 50038, Computer Based Control System Safety Requirements,

7.      Incorrect or incomplete HW/SW Lab Testing

8.      Inadequate testing of reused or heritage software

9.      Failure to open a software problem report when an unexpected event occurs

10.   Failure to include hardware personnel in reviews of software changes, software implementation, peer reviews, and software testing

11.   Failure to perform a safety review on all software changes and software defects.

12.   Defects in Commercial and Modification of Off-The-Shelf (OTS) Software, failure to perform assessments of available bug fixes and updates available in Commercial-Off-The-Shelf (COTS) software (Consider: operating systems, runtime libraries, device drivers packaged with the OS, third-party FPGA libraries)

13.   Insufficient use of coding standards

14.   Missing or inadequate static analysis checks on code

15.   Incorrect version loaded

16.   Incorrect configuration values or data

17.   No checks on external input files and data

18.   Errors in configuration data changes being uploaded to spacecraft

19.   Software/avionics simulator/emulator errors and defects

20.   Unverified software

21.   Unverified COTS software

22.   High Cyclomatic Complexity levels (over 15), the complexity of software components (preventing thorough testing and increasing likelihood of coding errors)

23.   Incomplete or inadequate software requirements analysis

24.   Compound software requirements

25.   Incomplete or inadequate software hazard analysis

26.   Incomplete or inadequate software safety analysis

27.   Incomplete or inadequate software static analyses

28.   Incomplete or inadequate software test data analysis

29.   Unrecorded software defects found during informal and formal software testing

30.   Auto-Coding tool faults and defects

31.   Errors in UML design models

32.   Software errors in hardware simulators due to lack of understanding of hardware requirements

33.   Incomplete or inadequate software test data analysis

34.   Inadequate BIT (Built- in-Test) coverage

35.   Inadequate regression testing and unit test coverage of flight software application-level source code (especially safety critical software)

36.   Failure to test  all nominal and planned contingency scenarios (breakout and re-rendezvous, launch abort) and complete mission duration (launch to docking to splashdown) in the hardware in the loop environment

o   Incomplete testing of unexpected conditions, boundary conditions, and software/interface inputs.

37.   Use or persistence of test data, files, or config files in an operational scenario

38.   Failure to provide multiple paths or triggers from safe states to hazardous states

39.   Interface Control Documents (ICD) and Interface Requirements Documents (IRD) Errors

40.   System requirements errors

41.   Misunderstanding of hardware configuration and operation

42.   Hardware requirements and interface errors, Incorrect description of the software/hardware functions and how they are to perform

43.   Missing or incorrect software requirements or specifications

44.   Missing software error handling

45.   Requirements/design errors not fully defined, detected, and corrected)

46.   Failure to identify the safety-critical software items

47.   Cyclomatic Complexity levels

48.   Failure to perform a function, performing the wrong function, performing the function incompletely

49.   An inadvertent/unauthorized event, an unexpected, unwanted event, an out-of-sequence event, the failure of a planned event to occur

50.   The magnitude or direction of an event is wrong

51.   Out-of-sequence event protection

52.   Multiple events/actions trigger simultaneously (when not expected)

53.   Error/Exception handling missing or incomplete

54.   Inadvertent or incorrect mode transition for required vehicle functional operation; undefined or incorrect mode transition criteria; unauthorized mode transition

55.   Failure of flight software to correctly initiate proper transition mode

56.   Software state transition error

57.   Software termination is an unknown state.

58.   Errors in the software data dictionary values (typical components of a data dictionary entry are: Channelization data, e.g., bus mapping, vehicle wiring mapping, hardware channelization, description of each I/O variable, formats, unit of measure, and definition, Rate group data, calibrated sensor data provides a description of the format, units of measure, and definition of each sensor, Specify data reduction for transforming raw data into calibrated data, Sensor data qualification criteria and senor data disqualification criteria, Telemetry format/layout and data provide a description of the telemetry mode, format, packetization and definition, Specify what types of packets are allowed to be sent for each of the telemetry modes, Specify maximum data rate for each mode, Data recorder format/layout and data, Command definition, e.g., onboard, ground, test specific, provides a description for each of the commands processed by the software work package. Effecter command information provides information about the setting of names or flags that cause command executions in the software work products, Operational limits, Scheduling procedures, Partitioning methods, means of preventing partition breaches, partitioning faults and partitioning metric data.

Human-machine interface errors



1.      Incorrect Data (unit conversion, incorrect variable type)

2.      Stale Data

3.      Poor design of Human Machine Interface (HMI)

o   Warnings for system failures are not evident on HMI

o   Timing issues result in warnings not being displayed soon enough to allow for human intervention

o   Operator data display overload

4.      Too much, too little, incorrect data displayed

5.      Ambiguous or incorrect messages

6.      User display locks up/fails

7.      Missing software error handling

o   Failure to check the validity of input/output data

o   Failure to check for constraints in algorithms/specifications and valid boundaries

8.      Unsolicited command (Command issued inadvertently, cybersecurity issue or without cause)

9.      Wrong Command or a Miscalculated Command to be Sent

10.   Failure of human interface software to check operator inputs

11.   Failure to pass along information or messages

12.   Display refresh rate leads to an incorrect operator response

13.   Lack of ordering scheme for hazardous event queues (such as alerts) in the human-computer interface (i.e., priority versus time of arrival, for example, when an abort must go to the top of the queue)

14.   Incorrect labeling of operator controls in the human interface software

15.   Failure to check for constraints in algorithms/specifications and valid boundaries

16.   Failure of human interface software to check operator inputs

17.   Failure to pass along information or messages

18.   Stale data

19.   Display refresh rate leads to an incorrect operator response

20.   Lack of ordering scheme for hazardous event queues (such as alerts) in the human-computer interface (i.e., priority versus time of arrival, for example, when an abort must go to the top of the queue)

21.   No onscreen instructions

22.   Undocumented features

23.   States that appear impossible to exit

24.   No cursor

25.   Failure to acknowledge an input

26.   Failure to advise when a change will take effect

27.   Wrong, misleading, or confusing information

28.   Poor aesthetics in the screen layout

29.   Menu layout errors

30.   Dialog box layout errors

31.   Obscured instructions

32.   Misuse of color

Security and virus errors

1.      Denial/Interruption of Service

2.      Spoofed/Jammed inputs

3.      Missing capabilities to defect insider threat activities

4.      Inadvertent or intentional memory modification

5.      Inadvertent or unplanned mode transition

6.      Missing software error handling or defect handling

7.      Unsolicited command (Command issued inadvertently, cybersecurity issue or without cause)

8.      Stack-based buffer overflows are common and leverage stack memory that only exists during the execution time.

9.      Heap-based attacks are harder to carry out and involve flooding the memory space allocated for a program beyond memory used for current runtime operations

10.   Cybersecurity vulnerability or computer virus

o   Virus infection in console PC

11.   Inadvertent access to ground system software

12.   Destruct commands incorrectly allowed in a hands-off zone

Unknown/Unknowns errors

1.      Undetected software defects

2.      Unknown limitations for COTS (operational, environmental, stress)

3.      COTS extra capabilities

4.      Incomplete or inadequate software safety analysis for COTS components

5.      Compiler behavior errors or undefined compiler behavior

6.      Software defects and investigations that are unresolved before the flight

Note:  Software is classified as safety-critical if the software is determined by and traceable to hazard analysis. See appendix A for guidelines associated with addressing software in hazard definitions. See SWE-205. Consideration for other independent means of protection (software, hardware, barriers, or administrative) should be a part of the system hazard definition process. 

Note:  Fault tolerant systems are built to handle most probable, and some less probable but hazardous, faults. Taking care of the faults will usually help prevent the software, or the system, from going into failure.  The down-side to fault tolerance is that it requires multiple checks and monitoring at very low levels.  If a system is failure tolerant, it will ignore most faults and only respond to higher-level failures. A presumption is that it requires less work and is simpler to detect, isolate, stop, or recover from the failures. A project must weigh the costs and benefits of each approach and determine what will provide the most safety for the least cost and effort.

2. Resources

2.1 References

2.2 Tools


Tools to aid in compliance with this SWE, if any, may be found in the Tools Library in the NASA Engineering Network (NEN). 

NASA users find this in the Tools Library in the Software Processes Across NASA (SPAN) site of the Software Engineering Community in NEN. 

The list is informational only and does not represent an “approved tool list”, nor does it represent an endorsement of any particular tool.  The purpose is to provide examples of tools being used across the Agency and to help projects and centers decide what tools to consider.



  • No labels

0 Comments