This version of SWEHB is associated with NPR 7150.2B. Click for the latest version of the SWEHB based on NPR7150.2C
3.7.2 When a project is determined to have safety-critical software, the project manager shall implement the following items in the software:
a. Safety-critical software is initialized, at first start and at restarts, to a known safe state.
b. Safety-critical software safely transitions between all predefined known states.
c. Termination performed by software of safety critical functions is performed to a known safe state.
d. Operator overrides of safety-critical software functions require at least two independent actions by an operator.
e. Safety-critical software rejects commands received out of sequence, when execution of those commands out of sequence can cause a hazard.
f. Safety-critical software detects inadvertent memory modification and recovers to a known safe state.
g. Safety-critical software performs integrity checks on inputs and outputs to/from the software system.
h. Safety-critical software performs prerequisite checks prior to the execution of safety-critical software commands.
i. No single software event or action is allowed to initiate an identified hazard.
j. Safety-critical software responds to an off nominal condition within the time needed to prevent a hazardous event.
k. Software provides error handling of safety-critical functions.
l. Safety-critical software has the capability to place the system into a safe state.
m. Safety-critical elements (requirements, design elements, code components, and interfaces) are uniquely identified as safety-critical.
n. Requirements are incorporated in the coding methods, standards, and/or criteria to clearly identify safety-critical code and data within source code comments.
These requirements are applicable to components that reside in a safety-critical system, and the components control, mitigate or contribute to a hazard as well as software used to command hazardous operations/activities.
1.2 Applicability Across Classes
Class A B C CSC D DSC E F G H Applicable?
Key: - Applicable | - Not Applicable
A & B = Always Safety Critical; C & D = Not Safety Critical; CSC & DSC = Safety Critical; E - H = Never Safety Critical.
Software and its computer-based control systems are integral parts of the system safety program. The software safety requirements listed in SWE-134 are the application of software engineering and assurance principles, criteria, and techniques to provide failure and error tolerance to minimize risks and control hazards.
Early planning and implementation dramatically ease the developmental burden of these requirements. Depending on the failure philosophy used (fault tolerance, control-path separation, etc), design and implementation trade-offs will be made. Trying to incorporate these requirements late in the life cycle will impact the project cost, schedule, and quality. It can also impact safety as an integrated design that incorporates software safety features such as those above. This allows the system perspective to be taken into account and the design to have a better chance of being implemented as needed to meet the requirements in an elegant, simple, and more reliable way.
The sub-requirements and notes included in the requirement are a collection of best practices for the implementation of safety-critical software. These sub-requirements are applicable to components that reside in a safety-critical system, and the components that control, mitigate or contribute to a hazard as well as software used to command hazardous operations/activities. The requirements contained in this section complement the processes identified in NASA-STD-8719.13, NASA Software Safety Standard. 571 Software engineering and software assurance disciplines each have specific responsibilities for providing project management with work products that meet the engineering, safety, quality, and reliability requirements on a project. A detailed explanation of the rationale for each of the notes can be found in CxP 70065 017 .
Additional specific clarifications for a few of the requirement notes include:
Item a: Aspects to consider when establishing a known safe state includes state of the hardware and software, operational phase, device capability, configuration, file allocation tables, and boot code in memory.
Item d: Multiple independent actions by the operator help to reduce potential operator mistakes.
Item f: Memory modifications may occur due to radiation-induced errors, uplink errors, configuration errors, or other causes so the computing system must be able to detect the problem and recover to a safe state. As an example, computing systems may implement error detection and correction, software executable and data load authentication, periodic memory scrub, and space partitioning to provide protection against inadvertent memory modification. Features of the processor and/or operating system can be utilized to protect against incorrect memory use.
Item g: Software needs to accommodate both nominal inputs (within specifications) and off-nominal inputs, from which recovery may be required.
Item h: The requirement is intended to preclude the inappropriate sequencing of commands. Appropriateness is determined by the project and conditions designed into the safety-critical system. Safety-critical software commands are commands that can cause or contribute to a hazardous event or operation. One must consider not only inappropriate sequencing of commands (as described in the original note) but also the execution of a command in the wrong mode or state. Safety-critical software commands must perform when needed (must work) or be prevented from performing when the system is not in a proper mode or state (must-not work).
Item j: The intent is to establish a safe state following detection of an off-nominal indication. The safety mitigation must complete between the time that the off-nominal condition is detected and the time the hazard would occur without the mitigation. The safe state can either be an alternate state from normal operations or can be accomplished by detecting and correcting the fault or failure within the timeframe necessary to prevent a hazard and continuing with normal operations. The intent is to design in the ability of software to detect and respond to a fault or failure before it causes the system or subsystem to fail. If failure cannot be prevented, then design in the ability for the software to place the system into a safe state from which it can later recover. In this safe state, the system may not have full functionality, but will operate with this reduced-functionality.
Item k: Error handling is an implementation mechanism or design technique by which software faults and/or failures are detected, isolated, and recovered to allow for correct run-time program execution. The software error handling features that support safety-critical functions must detect and respond to hardware and operational faults and/or failures as well as faults in software data and commands from within a program or from other software programs.
Item l: The design of the system must provide sufficient sensors and effectors, as well as self-checks within the software, in order to enable the software to detect and respond to system potential hazards.
4. Small Projects
This requirement applies to all projects regardless of size.
Tools relative to this SWE may be found in the table below. You may wish to reference the Tools Table in this handbook for an evolving list of these and other tools in use at NASA. Note that this table should not be considered all-inclusive, nor is it an endorsement of any particular tool. Check with your Center to see what tools are available to facilitate compliance with this requirement.
No tools have been currently identified for this SWE. If you wish to suggest a tool, please leave a comment below.
6. Lessons Learned
Early planning and coordination between the software engineering, software safety, and software assurance on the applicability and implementation of the SWE-134 software safety requirements will reduce schedule impacts.
The NASA Lesson Learned database contains the following lessons learned related to safety-critical software:
- Deficiencies in Mission Critical Software Development for Mars Climate Orbiter (MCO) (1999). Lesson Number 0740: "The root cause of the MCO mission loss was an error in the "Sm_forces" program output files, which were delivered to the navigation team in English units (pounds-force seconds) instead of the specified metric units (Newton-seconds). Comply with preferred software review practices, identify software that is mission critical (for which staff must participate in major design reviews, walkthroughs and review of acceptance test results), train personnel in software walkthroughs, and verify consistent engineering units on all parameters." 521
- Demonstration of Autonomous Rendezvous Technology (DART) spacecraft Type A Mishap: "NASA has completed its assessment of the DART MIB (Mishap Investigation Board) report, which included a classification review by the Department of Defense. The report was found to be NASA-sensitive, but unclassified, because it contained information restricted by International Traffic in Arms Regulations (ITAR) and Export Administration Regulations (EAR). As a result, the DART mishap investigation report was deemed not releasable to the public." The LL also "provides an overview of publicly releasable findings and recommendations regarding the DART mishap." 432