See edit history of this section
Post feedback on this section
- 1. The Requirement
- 2. Rationale
- 3. Guidance
- 4. Small Projects
- 5. Resources
- 6. Lessons Learned
- 7. Software Assurance
1. Requirements
3.7.3 If a project has safety-critical software or mission-critical software, the project manager shall implement the following items in the software:
a. The software is initialized, at first start and restarts, to a known safe state.
b. The software safely transitions between all predefined known states.
c. Termination performed by software functions is performed to a known safe state.
d. Operator overrides of software functions require at least two independent actions by an operator.
e. Software rejects commands received out of sequence when execution of those commands out of sequence can cause a hazard.
f. The software detects inadvertent memory modification and recovers to a known safe state.
g. The software performs integrity checks on inputs and outputs to/from the software system.
h. The software performs prerequisite checks prior to the execution of safety-critical software commands.
i. No single software event or action is allowed to initiate an identified hazard.
j. The software responds to an off-nominal condition within the time needed to prevent a hazardous event.
k. The software provides error handling.
l. The software can place the system into a safe state.
1.1 Notes
These requirements apply to components that reside in a mission-critical or safety-critical system, and the components control, mitigate, or contribute to a hazard as well as software used to command hazardous operations/activities.
1.2 History
1.3 Applicability Across Classes
Class A B C D E F Applicable?
Key: - Applicable | - Not Applicable
2. Rationale
Implementing safety-critical software or mission-critical software design requirements helps ensure that the systems are safe and that the safety-critical software or mission-critical software requirements and processes are followed.
3. Guidance
3.1 Safety-Critical Software and Mission-Critical Software
This requirement applies to safety-critical software and mission-critical software. These items are design practices that should be followed when developing safety-critical software and mission-critical software.
Derived from NPR 7150.2D para 3.7.3 SWE 134: Table 1, SA Tasks 1 - 6
1. Analyze the software requirements and the software design and work with the project to implement NPR 7150.2 requirement items "a" through "l."
2. Assess that the source code satisfies the conditions in the NPR 7150.2 requirement "a" through "l" for safety-critical and mission-critical software at each code inspection, test review, safety review, and project review milestone.
a. Use of partitioning or isolation methods in the
design and code,
b. That the design logically isolates the safety-critical
design elements and data from those that are
non-safety-critical.
6. Ensure the SWE-134 implementation supports and is consistent with the system hazard analysis.
See the software assurance tab for additional guidance material.
See also SWE-023 - Software Safety-Critical Requirements,
3.2 Requirement Notes
Additional specific clarifications for a few of the requirement notes include:
Item a: (The software is initialized, at first start, and restarts, to a known safe state.)
When establishing a known safe state, inspections include the state of the hardware and software, operational phase, device capability, configuration, file allocation tables, and boot code in memory.
Item d: (Operator overrides of software functions require at least two independent actions by an operator.)
Multiple independent actions by the operator help to reduce potential operator mistakes.
Item f: (The software detects inadvertent memory modification and recovers to a known safe state.)
Memory modifications may occur due to radiation-induced errors, uplink errors, configuration errors, or other causes. The computing system must be able to detect the problem and recover to a safe state. For example, computing systems may implement error detection and correction, software executable and data load authentication, periodic memory scrub, and space partitioning to protect against inadvertent memory modification. Features of the processor and/or operating system can be utilized to protect against incorrect memory use.
Item g: (The software performs integrity checks on inputs and outputs to/from the software system.)
The software needs to accommodate both nominal inputs (within specifications) and off-nominal inputs, from which recovery may be required. The software needs to accommodate start-up transient inputs from the sensors. Specify system interfaces clearly and thoroughly. Include, as part of the documentation, the required action or actions should include the interface.
Item h: (The software performs prerequisite checks prior to the execution of safety-critical software commands.)
The requirement is intended to preclude the inappropriate sequencing of commands. Appropriateness is determined by the project and conditions designed into the safety-critical system. Safety-critical software commands are commands that can cause or contribute to a hazardous event or operation. One must consider the inappropriate sequencing of commands (as described in the original note) and the execution of a command in the wrong mode or state. Safety-critical software commands must perform when needed (must work) or be prevented from performing when the system is not in a proper mode or state (must-not work).
Item j: (The software responds to an off-nominal condition within the time needed to prevent a hazardous event.)
The intent is to establish a safe state following the detection of an off-nominal indication. The safety mitigation must complete between the time the off-nominal condition is detected and the time the hazard would occur without the mitigation. The safe state can either be an alternate state from normal operations or can be accomplished by detecting and correcting the fault or failure within the timeframe necessary to prevent a hazard and continue with normal operations. The intent is to design software to detect and respond to a fault or failure before it causes the system or subsystem to fail. If failure cannot be prevented, then design in the software's ability to place the system into a safe state from which it can later recover. In this safe state, the system may not have full functionality but will operate with this reduced functionality.
Item k: (The software provides error handling.)
Error handling is an implementation mechanism or design technique by which software faults and/or failures are detected, isolated, and recovered to correct run-time program execution. The software error handling features that support safety-critical functions must detect and respond to hardware, software, and operational faults and failures and faults in software data and commands from within a program or from other software programs. Minimize common failure modes.
Item l: (The software can place the system into a safe state.)
The system's design must provide sufficient sensors and effectors, and self-checks within the software to detect and respond to system potential hazards. Identify safe states early in the design. Have these fully checked and verified for completeness. A safe state is a system state in which hazards are inhibited, and all hazardous actuators are in a non-hazardous state. The system can have more than one Safe State. Ensure that failures of dynamic system activities result in the system achieving a known and identified safe state within a specified time
Additional Safety-Critical Software Design guidelines include:
- Minimize complexity - For safety-critical code, anything over 15 should be assessed for testability, maintainability, and code quality.
- Avoid complex flow constructs, such as goto and recursion.
- All loops must have fixed bounds. This prevents runaway code.
- Avoid heap memory allocation.
- Use a minimum of two runtime assertions per function.
- Restrict the scope of data to the smallest possible.
- Check the return value of all non-void functions, or cast to void to indicate the return value is useless.
- Use the preprocessor sparingly.
- Limit pointer use to a single dereference, and do not use function pointers.
- Compile with all possible warnings active; all warnings should then be addressed before the release of the software.
- Appropriate security posture and mindset should be applied to all levels of development.
See also Topic 8.01 - Off Nominal Testing, 8.04 - Additional Requirements Considerations for Use with Safety-Critical Software. If software is acquired from a supplier, see Topic 7.03 - Acquisition Guidance. See also Topic 7.21 - Multi-condition Software Requirements, 7.23 - Software Fault Prevention and Tolerance,
See also SWE-184 - Software-related Constraints and Assumptions
3.3 Additional Guidance
Additional guidance related to this requirement may be found in the following materials in this Handbook:
3.4 Center Process Asset Libraries
SPAN - Software Processes Across NASA
SPAN contains links to Center managed Process Asset Libraries. Consult these Process Asset Libraries (PALs) for Center-specific guidance including processes, forms, checklists, training, and templates related to Software Development. See SPAN in the Software Engineering Community of NEN. Available to NASA only. https://nen.nasa.gov/web/software/wiki 197
See the following link(s) in SPAN for process assets from contributing Centers (NASA Only).
SPAN Links |
---|
4. Small Projects
This requirement applies to all projects regardless of size.
5. Resources
5.1 References
- (SWEREF-014) SSP 50038, Revision B, NASA International Space Station Program, 1995.
- (SWEREF-017) Constellation Computing Safety Requirements, CxP 70065, Revision A, 2005.
- (SWEREF-197) Software Processes Across NASA (SPAN) web site in NEN SPAN is a compendium of Processes, Procedures, Job Aids, Examples and other recommended best practices.
- (SWEREF-260) This NASA-only resource is available to NASA-users at https://nen.nasa.gov/web/faultmanagement.
- (SWEREF-271) NASA STD 8719.13 (Rev C ) , Document Date: 2013-05-07
- (SWEREF-276) NASA-GB-8719.13, NASA, 2004. Access NASA-GB-8719.13 directly: https://swehb-pri.msfc.nasa.gov/download/attachments/16450020/nasa-gb-871913.pdf?api=v2
- (SWEREF-278) NASA-STD-8739.8B , NASA TECHNICAL STANDARD, Approved 2022-09-08 Superseding "NASA-STD-8739.8A,
- (SWEREF-375) IEC 62304:2006, Medical device software — Software life cycle processes A copy of this standard is available from https://www.iso.org/standard/38421.html
- (SWEREF-376) ISO 26262-1:2011, Road vehicles — Functional safety — Part 1: Vocabulary A copy of this standard is available from: https://www.iso.org/standard/43464.html
- (SWEREF-432) For Public Release. (2006). Lessons Learned Reference.
- (SWEREF-521) Public Lessons Learned Entry: 740.
- (SWEREF-603) Carnegie Mellon University course 18-642 updated Fall 2020, Koopman, Phil
5.2 Tools
NASA users find this in the Tools Library in the Software Processes Across NASA (SPAN) site of the Software Engineering Community in NEN.
The list is informational only and does not represent an “approved tool list”, nor does it represent an endorsement of any particular tool. The purpose is to provide examples of tools being used across the Agency and to help projects and centers decide what tools to consider.
6. Lessons Learned
6.1 NASA Lessons Learned
Early planning and coordination between software engineering, software safety, and software assurance on the applicability and implementation of the SWE-134 software safety requirements will reduce schedule impacts.
The NASA Lesson Learned database contains the following lessons learned related to safety-critical software:
Deficiencies in Mission Critical Software Development for Mars Climate Orbiter (MCO) (1999). Lesson Number 0740 521: "The root cause of the MCO mission loss was an error in the "Sm_forces" program output files, which were delivered to the navigation team in English units (pounds-force seconds) instead of the specified metric units (Newton-seconds). Comply with preferred software review practices, identify mission-critical software (for which staff must participate in major design reviews, walkthroughs, and review of acceptance test results), train personnel in software walkthroughs, and verify consistent engineering units on all parameters."
6.2 Other Lessons Learned
Demonstration of Autonomous Rendezvous Technology (DART) spacecraft Type A Mishap 432: "NASA has completed its assessment of the DART MIB (Mishap Investigation Board) report, which included a classification review by the Department of Defense. The report was NASA-sensitive but unclassified because it contained information restricted by International Traffic in Arms Regulations (ITAR) and Export Administration Regulations (EAR). As a result, the DART mishap investigation report was deemed not releasable to the public." The LL also "provides an overview of publicly releasable findings and recommendations regarding the DART mishap."
7. Software Assurance
a. The software is initialized, at first start and restarts, to a known safe state.
b. The software safely transitions between all predefined known states.
c. Termination performed by software functions is performed to a known safe state.
d. Operator overrides of software functions require at least two independent actions by an operator.
e. Software rejects commands received out of sequence when execution of those commands out of sequence can cause a hazard.
f. The software detects inadvertent memory modification and recovers to a known safe state.
g. The software performs integrity checks on inputs and outputs to/from the software system.
h. The software performs prerequisite checks prior to the execution of safety-critical software commands.
i. No single software event or action is allowed to initiate an identified hazard.
j. The software responds to an off-nominal condition within the time needed to prevent a hazardous event.
k. The software provides error handling.
l. The software can place the system into a safe state.
7.1 Tasking for Software Assurance
Derived from NPR 7150.2D para 3.7.3 SWE 134: Table 1, SA Tasks 1 - 6
1. Analyze the software requirements and the software design and work with the project to implement NPR 7150.2 requirement items "a" through "l."
2. Assess that the source code satisfies the conditions in the NPR 7150.2 requirement "a" through "l" for safety-critical and mission-critical software at each code inspection, test review, safety review, and project review milestone.
a. Use of partitioning or isolation methods in the
design and code,
b. That the design logically isolates the safety-critical
design elements and data from those that are
non-safety-critical.
6. Ensure the SWE-134 implementation supports and is consistent with the system hazard analysis.
7.2 Software Assurance Products
- Software Assurance Status Reports
- Software Design Analysis
- SA analysis of software requirements and design to implement items "a" through "l."
- SA analysis of the design to satisfy "a" and "b" in task 6.
- Source Code Analysis
- Verification Activities Analysis
- SA assessment that source code meets "a" through "l" at inspections and reviews, including any risks and issues.
- Evidence of confirmation that requirements for test code coverage, complexity, and testing of support files affecting hazardous systems have been met.
- SA risk assessment of any software developers' rationale if requirements are not met.
Objective Evidence
- Evidence of confirmation that the values of the safety-critical loaded data, uplinked data, rules, and scripts that affect hazardous system behavior have been tested.
- NPR 7150.2 and NASA-STD-8739.8 requirements mapping matrices signed by the engineering and SMA technical authorities for each development organization.
7.3 Metrics
- # of software work product Non-Conformances identified by life cycle phase over time
- # of Non-Conformances from reviews (Open vs. Closed; # of days Open)
- # of safety-related requirement issues (Open, Closed) over time
- # of safety-related non-conformances identified by life cycle phase over time
- # of Hazards containing software that have been tested vs. total # of Hazards containing software
- # of Source Lines of Code (SLOC) tested vs. total # of SLOC
See also Topic 8.18 - SA Suggested Metrics.
7.4 Guidance
The sub-requirements and notes included in the requirement are a collection of best practices for implementing safety-critical software. These sub-requirements apply to components that reside in a safety-critical system. The components that control, mitigate, or contribute to a hazard and software are used to command hazardous operations/activities. Software engineering and software assurance disciplines each have specific responsibilities for providing project management with work products that meet the engineering, safety, quality, and reliability requirements of a project.
Steps 1 and 2: Analyze the software requirements and the software design and work with the project to implement NPR 7150.2 requirement items "a" through "l" and assess that the source code satisfies the conditions in the NPR 7150.2 requirement "a" through "l" for safety-critical and mission-critical software at each code inspection, test review, safety review, and project review milestone.
Additional specific clarifications for a few of the requirement notes include:
Item a: (The software is initialized, at first start, and restarts, to a known safe state.) When establishing a known safe state, inspections include the state of the hardware and software, operational phase, device capability, configuration, file allocation tables, and boot code in memory.
Item d: (Operator overrides of software functions require at least two independent actions by an operator.) Multiple independent actions by the operator help to reduce potential operator mistakes.
Item f: (The software detects inadvertent memory modification and recovers to a known safe state.) Memory modifications may occur due to radiation-induced errors, uplink errors, configuration errors, or other causes, so the computing system must detect the problem and recover to a safe state. For example, computing systems may implement error detection and correction, software executable and data load authentication, periodic memory scrub, and space partitioning to protect against inadvertent memory modification. Features of the processor and/or operating system can be utilized to protect against incorrect memory use.
Item g: (The software performs integrity checks on inputs and outputs to/from the software system.) The software needs to accommodate both nominal inputs (within specifications) and off-nominal inputs, from which recovery may be required. The software needs to accommodate start-up transient inputs from the sensors. Specify system interfaces clearly and thoroughly. Include, as part of the documentation, the required action or actions should include the interface. See also 8.01 - Off Nominal Testing.
Item h: (The software performs prerequisite checks prior to the execution of safety-critical software commands.) The requirement is intended to preclude the inappropriate sequencing of commands. Appropriateness is determined by the project and conditions designed into the safety-critical system. Safety-critical software commands are commands that can cause or contribute to a hazardous event or operation. One must consider the inappropriate sequencing of commands (as described in the original note) and the execution of a command in the wrong mode or state. Safety-critical software commands must perform when needed (must work) or be prevented from performing when the system is not in a proper mode or state (must-not work).
Item j: (The software responds to an off-nominal condition within the time needed to prevent a hazardous event.) The intent is to establish a safe state following the detection of an off-nominal indication. The safety mitigation must complete between the time the off-nominal condition is detected and the time the hazard would occur without the mitigation. The safe state can either be an alternate state from normal operations or can be accomplished by detecting and correcting the fault or failure within the timeframe necessary to prevent a hazard and continue with normal operations. The intent is to design software to detect and respond to a fault or failure before it causes the system or subsystem to fail. If failure cannot be prevented, then design in the software's ability to place the system into a safe state from which it can later recover. In this safe state, the system may not have full functionality but will operate with this reduced functionality. See also 8.01 - Off Nominal Testing.
Item k: (The software provides error handling.) Error handling is an implementation mechanism or design technique by which software faults and/or failures are detected, isolated, and recovered to correct run-time program execution. The software error handling features that support safety-critical functions must detect and respond to hardware, software, and operational faults and failures and faults in software data and commands from within a program or from other software programs. Minimize common failure modes.
Item l: (The software can place the system into a safe state.) The system's design must provide sufficient sensors and effectors, and self-checks within the software to detect and respond to system potential hazards. Identify safe states early in the design. Have these fully checked and verified for completeness. A safe state is a system state in which hazards are inhibited, and all hazardous actuators are in a non-hazardous state. The system can have more than one Safe State. Ensure that dynamic system activities' failures result in the system achieving a known and identified safe state within a specified time.
Step 3: Confirm that the values of the safety-critical loaded data, uplinked data, rules, and scripts that affect hazardous system behavior have been tested.
Step 4: Analyze the software design to ensure:
a. Use of partitioning or isolation methods in the design and code,
b. That the design logically isolates the safety-critical design elements and data from those that are non-safety-critical.
Step 5: Participate in software reviews affecting safety-critical software products.
Early planning and implementation dramatically ease the developmental burden of these requirements. Depending on the failure philosophy used (fault tolerance, control-path separation, etc.), design and implementation trade-offs will be made. Trying to incorporate these requirements late in the life cycle will impact the project cost, schedule, and quality. It can also impact safety as an integrated design that incorporates software safety features such as those above. This allows the system perspective to be taken into account. The design has a better chance of being implemented as needed to meet the requirements in an elegant, simple, and more reliable way.
Note that where conflicts with program safety requirements exist, program safety requirements take precedence.
Step 6 - Ensure the SWE-134 implementation supports and is consistent with the system hazard analysis.
See also 8.04 - Additional Requirements Considerations for Use with Safety-Critical Software.
7.5 Additional Guidance
Additional guidance related to this requirement may be found in the following materials in this Handbook: