


- 1. The Requirement
- 2. Rationale
- 3. Guidance
- 4. Small Projects
- 5. Resources
- 6. Lessons Learned
- 7. Software Assurance
1. Requirements
4.3.5 The space system shall provide the capability to mitigate the hazardous behavior of critical software where the hazardous behavior would result in a catastrophic event.
1.1 Notes
According to current software standards, the software system will be designed, developed, and tested to:
- Prevent hazardous software behavior.
- Reduce the likelihood of hazardous software behavior.
- Mitigate the negative effects of hazardous software behavior.
However, for complex software systems, it is very difficult to definitively prove the absence of hazardous behavior. Therefore, the crewed system has the capability to mitigate this hazardous behavior if it occurs. The mitigation strategy will depend on the phase of flight and the time to effect of the potential hazard. Hazardous behavior includes erroneous software outputs or performance.
1.2 History
1.3 Applicability Across Classes
Class A B C D E F Applicable?
Key: - Applicable |
- Not Applicable
2. Rationale
For complex software systems, it is very difficult to definitively prove the absence of hazardous behavior and anticipate all circumstances, and studies have shown historical reoccurrence of software performing unexpectedly in flight. Therefore, the crewed system should both be developed with best practices to minimize the chances of software/automation errors, but also mitigate this hazardous behavior should it occur during flight in line with basic fault tolerance. The mitigation strategy will depend on the phase of flight and the time to affect the potential hazard. Hazardous behavior includes erroneous software outputs, poor performance, or ceasing to operate.
3. Guidance
According to current software standards, the software system will be designed, developed, and tested to:
- Prevent hazardous software behavior (pre-flight).
- Reduce the likelihood of hazardous software behavior (pre-flight)
- Mitigate the negative effects of hazardous software behavior (in flight)
For complex software systems, it is very difficult to definitively prove the absence of hazardous behavior and anticipate all circumstances, and studies have shown historical reoccurrence of software performing unexpectedly in flight. Therefore, the crewed system should both be developed with best practices as reflected in NPR7150.2 to minimize the chances of software/automation errors, but also mitigate this hazardous behavior should it occur during flight in accordance with basic fault tolerance. The mitigation strategy will depend on the phase of flight and the time to affect the potential hazard. Hazardous behavior includes erroneous software outputs, poor performance, or ceasing to operate. More information and strategies to mitigate the negative effects of hazardous software behavior can be found in For software failure tolerance specifically, more detailed considerations and strategies are summarized in NESC Technical Bulletin 23-06: Considerations for Software Fault Prevention and Tolerance 687 and provided in the references.
See Topic 7.24 - Human Rated Software Requirements for other Software Requirements related to Human Rated Software.
3.1 Software Tasks To Mitigate Hazardous Behavior
To ensure that the space system can mitigate the hazardous behavior of critical software where such behavior could result in a catastrophic event, the following software tasks should be implemented:
- Hazard Analysis: Conduct a comprehensive hazard analysis to identify potential hazards that could result from critical software behavior. This analysis should include considering both erroneous behavior and failing silent and evaluating existing and potential hazards along with recommending mitigation strategies for identified hazards. This involves evaluating human-machine interface designs, operator training, and operational procedures and implementing checks, and error handling to identify optimal areas to insert mitigations (coding or procedural) that could prevent a catastrophic event.
- Safety Analysis Techniques: Utilize safety analysis techniques such as 8.07 - Software Fault Tree Analysis and 8.05 - SW Failure Modes and Effects Analysis to identify safety risks and formulate effective controls. These techniques help identify hazards, hazard causes, and potential failure modes.
- Safety Reviews: Perform safety reviews on all software changes and defects. This ensures that the modifications do not introduce new vulnerabilities or increase the risk of failure due to hazardous software behavior.
- Safety-Critical Software Requirements: Ensure that safety-critical software requirements are implemented per the NPR 7150.2 Requirements Mapping Matrix and tested and verified. This includes verifying that the software controls functions identified in a system hazard and provides mitigation for hazardous conditions.
- Error Handling and Recovery Mechanisms: Implement robust error handling and recovery mechanisms to address errors resulting from hazardous software behavior. This includes ensuring that error handling is adequate and that the system can recover from errors without leading to hazardous or catastrophic events.
- Configuration Management: Maintain strict configuration management to ensure that the correct software versions and configurations are used. This reduces the risk of errors due to incorrect or inconsistent configurations.
- Simulations and Testing: Develop, implement, and execute simulations to model and test the impact of hazardous software behavior. This includes conducting tests to verify that the software can handle off-nominal conditions without catastrophic consequences. The flight operations team should conduct simulations to thoroughly test the various scenarios.
- Code Coverage with MC/DC Criterion: Develop, implement, and execute test cases for all identified safety-critical software components to ensure that there is 100 percent code test coverage. Use the Modified Condition/Decision Coverage (MC/DC) criterion.
- Independent Verification and Validation (IV&V): Ensure independent verification and validation is performed to ensure that the software meets its specified requirements and that any modifications do not introduce new vulnerabilities or increase the risk of failure due to hazardous software behavior.
- IV&V Analysis Results: Assure that the software capability to mitigate hazardous behavior has been independently verified and validated to meet safety and mission requirements.
- IV&V Participation: Involve the IV&V provider in reviews, inspections, and technical interchange meetings to provide real-time feedback and ensure thorough assessment.
- IV&V Management and Technical Measurements: Track and evaluate the performance and results of IV&V activities to ensure continuous improvement and risk management.
- Safety-Critical Software Implementation: Ensure that the software performs integrity checks on inputs and outputs, performs prerequisite checks before the execution of safety-critical software commands, and safely transitions between all predefined known states (including error handling).
- Cyclomatic Complexity: Ensure that all identified safety-critical software components have a cyclomatic complexity value according to this standard. If not, provide a technically acceptable risk assessment explaining why the needed test coverage cannot be obtained and why complexity value should not be structured to be lower on a unit basis.
- Training and Documentation: Provide comprehensive training and documentation for operators to minimize the chances of hazardous behavior when using software/automation. This includes clear instructions, warnings, and recovery procedures. This is best done by providing a User Manual with instructions and applicable information about each error and how to gracefully recover from it.
By implementing these tasks, the space system can be designed to mitigate the hazardous behavior of critical software, ensuring safety and reliability.
3.2 Additional Guidance
Additional guidance related to this requirement may be found in the following materials in this Handbook:
3.3 Center Process Asset Libraries
SPAN - Software Processes Across NASA
SPAN contains links to Center managed Process Asset Libraries. Consult these Process Asset Libraries (PALs) for Center-specific guidance including processes, forms, checklists, training, and templates related to Software Development. See SPAN in the Software Engineering Community of NEN. Available to NASA only. https://nen.nasa.gov/web/software/wiki 197
See the following link(s) in SPAN for process assets from contributing Centers (NASA Only).
SPAN Links |
---|
To be developed later. |
4. Small Projects
No additional guidance is available for small projects. The community of practice is encouraged to submit guidance candidates for this paragraph.
5. Resources
5.1 References
5.2 Tools
NASA users find this in the Tools Library in the Software Processes Across NASA (SPAN) site of the Software Engineering Community in NEN.
The list is informational only and does not represent an “approved tool list”, nor does it represent an endorsement of any particular tool. The purpose is to provide examples of tools being used across the Agency and to help projects and centers decide what tools to consider.
6. Lessons Learned
6.1 NASA Lessons Learned
No Lessons Learned have currently been identified for this requirement.
6.2 Other Lessons Learned
No other Lessons Learned have currently been identified for this requirement.
7. Software Assurance
7.1 Tasking for Software Assurance
- Confirm that the hazard reports or safety data packages contain all known software contributions or events where software, either by its action, inaction, or incorrect action, leads to a hazard.
- Assess that the hazard reports identify the software components associated with the system hazards per the criteria defined in NASA-STD-8739.8, Appendix A.
- Assess that hazard analyses (including hazard reports) identify the software components associated with the system hazards per the criteria defined in NASA-STD-8739.8, Appendix A.
- Confirm that the traceability between software requirements and hazards with software contributions exists.
- Develop and maintain a software safety analysis throughout the software development life cycle.
- Ensure that safety-critical software requirements are implemented per the NPR 7150.2 Requirements Mapping Matrix and tested or verified.
- Perform or analyze Cyclomatic Complexity metrics on all identified safety-critical software components.
- Confirm that all identified safety-critical software components have a cyclomatic complexity value of 15 or lower. If not, assure that software developers provide a technically acceptable risk assessment, accepted by the proper technical authority, explaining why the cyclomatic complexity value needs to be higher than 15 and why the software component cannot be structured to be lower than 15 or why the cost and risk of reducing the complexity to below 15 are not justified by the risk inherent in modifying the software component.
- Confirm that 100% code test coverage is addressed for all identified safety-critical software components or that software developers provide a technically acceptable rationale or a risk assessment explaining why the test coverage is not possible or why the risk does not justify the cost of increasing coverage for the safety-critical code component.
- Analyze that the software test plans and software test procedures cover the software requirements and provide adequate verification of hazard controls, specifically the off-nominal scenarios to mitigate the impact of hazardous behaviors. (See SWE-071 - Update Test Plans and Procedures tasks.) Ensure that the project has developed and executed test cases to test the impact of hazardous behaviors.
- Perform safety reviews on all software changes and software defects.
- Perform test witnessing for safety-critical software to ensure the impact of hazardous behavior is mitigated.
- Confirm that strict configuration management is maintained to ensure that the correct software versions and configurations are used.
- Ensure comprehensive training and documentation for operators is available.
7.2 Software Assurance Products
- 8.52 - Software Assurance Status Reports
- 8.54 - Software Requirements Analysis
- 8.55 - Software Design Analysis
- 8.56 - Source Code Quality Analysis
- 8.57 - Testing Analysis
- 8.58 - Software Safety and Hazard Analysis
- 8.59 - Audit Reports
- Test Witnessing Signatures (See SWE-066 - Perform Testing)
Objective Evidence
- Completed Hazard Analyses and Hazard Reports identifying all of the potential hazards and their associated mitigations
- Audit reports, specifically the Functional Configuration Audit (FCA) and Physical Configuration Audit (PCA)
- Completed safety reviews on all software changes and software defects
- Results from the use of automated tools for code coverage, cyclomatic complexity, and other verification and validation activities
- SWE work product assessments for Software Test Plan, Software Test Procedures, Software Test Reports, and User Manuals
- Completed software safety and hazard analysis results
7.3 Metrics
For the requirement that the space system shall provide the capability to mitigate the hazardous behavior of critical software where the hazardous behavior would result in a catastrophic event, the following software assurance metrics are necessary:
- Verification and Validation Metrics:
- Test Coverage: Ensure comprehensive test coverage for all scenarios that could lead to hazardous behavior, including normal operations, failure modes, and recovery procedures.
- Defect Density: Track the number of defects identified during testing per thousand lines of code to ensure software reliability and robustness.
- Requirements Traceability: Ensure each requirement, including those for mitigating hazardous behavior, is traced to its implementation and corresponding test cases to maintain comprehensive coverage and validation.
- Safety Metrics:
- Hazard Analysis: Identify and evaluate potential hazards related to hazardous software behavior, ensuring adequate mitigation strategies are in place.
- Safety-critical Requirements Compliance: Verify that all safety-critical requirements related to hazardous behavior mitigation are met and adequately tested to prevent failures during mission-critical operations.
- Quality Metrics:
- Code Quality: Use metrics such as cyclomatic complexity and static analysis results to ensure the code is maintainable and less prone to errors. Specifically, ensure that safety-critical software components have a cyclomatic complexity value of 15 or lower, or provide a technically acceptable rationale if this value is exceeded.
- Code Churn: Measure changes in the codebase to monitor stability and identify areas of frequent modification that may need more rigorous testing.
- Performance Metrics:
- Response Time: Measure the time taken for the system to detect and mitigate hazardous software behavior to ensure timely and accurate execution of mitigation procedures.
- System Uptime: Ensure the system is available and operational when needed, especially during critical mission phases, to support hazardous behavior mitigation.
- Configuration Management Metrics:
- Version Control: Ensure proper version control for all software components involved in hazardous behavior mitigation to track changes and maintain consistency.
- Change Requests: Monitor the number of change requests and their impact on the system's reliability and safety.
- Training Metrics:
- Personnel Training Completion: Ensure that all personnel involved in the development, testing, and operation of the hazardous behavior mitigation system have completed the necessary training.
- Independent Verification and Validation (IV&V) Metrics:
- IV&V Analysis Results: Assure that the hazardous behavior mitigation capabilities have been independently verified and validated to meet safety and mission requirements.
- IV&V Participation: Involve the IV&V provider in reviews, inspections, and technical interchange meetings to provide real-time feedback and ensure thorough assessment.
- IV&V Management and Technical Measurements: Track and evaluate the performance and results of IV&V activities to ensure continuous improvement and risk management.
Examples of potential SA metrics are:
- # of potential hazards that could lead to catastrophic events
- # of Non-Conformances identified during each testing phase (Open, Closed, Severity)
- Code coverage data: % of code that has been executed during testing
- % of traceability completed for all hazards to software requirements and test procedures
- # of hazards with completed test procedures/cases vs. total # of hazards over time
- # of Non-Conformances identified while confirming hazard controls are verified through test plans/procedures/cases
- # of Hazards containing software that has been tested vs. total # of Hazards containing software
- # of safety-related Non-Conformances
- # of Safety Critical tests executed vs. # of Safety Critical tests witnessed by SA
- Software code/test coverage percentages for all identified safety-critical components (e.g., # of paths tested vs. total # of possible paths)
- # of safety-critical requirement verifications vs. total # of safety-critical requirement verifications completed
- Test coverage data for all identified safety-critical software components
- Software cyclomatic complexity # for all identified safety-critical software components
- # of Software Requirements that do not trace to a parent requirement
- % of traceability completed in each area: System Level requirements to Software requirements; Software Requirements to Design; Design to Code; Software Requirements to Test Procedures
- % of traceability completed for all hazards to software requirements and test procedures
- Defect trends for trace quality (# of circular traces, orphans, widows, etc.)
- # of Configuration Management Audits conducted by the project – Planned vs. Actual
These metrics ensure that the software supporting hazardous behavior mitigation is reliable, safe, and meets the specified requirements. For detailed guidance, referring to the Software Assurance and Software Safety Standard (NASA-STD-8739.8) and the NASA Procedural Requirements (NPR 7150.2) would provide a comprehensive framework.
See also Topic 8.18 - SA Suggested Metrics.
7.4 Guidance
To ensure that the space system can mitigate the hazardous behavior of critical software where such behavior could result in a catastrophic event, the following software assurance and software safety tasks should be implemented:
- Software Safety and Hazard Analysis: Develop and maintain a Software Safety Analysis throughout the software development life cycle. Assess that the Hazard Analyses (including hazard reports) identify the software components associated with the system hazards per the criteria defined in NASA-STD- 8739.8, Appendix A. (See SWE-205 - Determination of Safety-Critical Software tasks.) Perform these on all new requirements, requirement changes, and software defects to determine their impact on the software system's reliability and safety. Confirm that all safety-critical requirements related to tolerating inadvertent operator actions are met and adequately tested to prevent failures during mission-critical operations. It may be necessary to discuss these findings during the Safety Review so the reviewers can weigh the impact of implementing the changes. (See Topic 8.58 - Software Safety and Hazard Analysis.)
- Hazard Analysis/Hazard Reports: Confirm that a comprehensive hazard analysis has been conducted to identify potential hazards that could result from critical software behavior. This analysis should include evaluating existing and potential hazards and recommending mitigation strategies for identified hazards. The Hazard Reports should contain the results of the analyses and proposed mitigations (See Topic 5.24 - Hazard Report Minimum Content.)
- Software Safety Analysis: To develop this analysis, utilize safety analysis techniques such as 8.07 - Software Fault Tree Analysis and 8.05 - SW Failure Modes and Effects Analysis to identify safety risks and formulate effective controls. These techniques help in identifying hazards, hazard causes, and potential failure modes. When generating this SA product, see Topic 8.09 - Software Safety Analysis for additional guidance.
- Safety Reviews: Perform safety reviews on all software changes and software defects. This ensures that any modifications do not introduce new vulnerabilities or increase the risk of a hazardous behavior leading to a catastrophic event.
- Peer Reviews: Participate in peer reviews on all software changes and software defects affecting safety-critical software and hazardous functionality. (See SWE-134 - Safety-Critical Software Design Requirements tasks.)
- Change Requests: Monitor the number of software change requests and software defects and their impact on the system's reliability and safety. Increases in the number of changes may be indicative of requirements issues or code quality issues resulting in potential schedule slips. (See SWE-053 - Manage Requirements Changes, SWE-080 - Track and Evaluate Changes.)
- Test Witnessing: Perform test witnessing for safety-critical software to ensure the impact of inadvertent operator actions is mitigated. (See SWE-066 - Perform Testing.) This includes witnessing tests to:
- Confirm that the system can recover from hazardous behaviors without resulting in catastrophic consequences. This could include:
- Measuring the time taken for the system to detect and respond to hazardous behaviors to ensure timely and accurate execution of mitigation procedures. A prolonged period could cause catastrophic consequences.
- Ensuring the system is available and operational when needed, especially during critical mission phases.
- Uncover unrecorded software defects and confirm they get documented and recorded.
- Confirm robust error handling and recovery mechanisms to address errors resulting from hazardous software behavior are implemented. This includes ensuring adequate error handling and that the system can recover from errors without leading to catastrophic events.
- Confirm that the system can recover from hazardous behaviors without resulting in catastrophic consequences. This could include:
- Safety-Critical Software Requirements: Ensure that safety-critical software requirements are implemented per the NPR 7150.2 Requirements Mapping Matrix and tested or verified. This includes verifying that the software control functions identified in a system hazard provide mitigations for hazardous conditions or behaviors.
- Configuration Management: Ensure strict configuration management is maintained to ensure that the correct software versions and configurations are used. See SWE-187 - Control of Software Items for more information. This reduces the risk of errors due to incorrect or inconsistent configurations, tracks changes, and maintains consistency. This also includes performing the SWE-187 tasking.
- Assess that the software safety-critical items, including the hazard reports and safety analysis, are configuration-managed (See SWE-081 - Identify Software CM Items tasking).
- Simulations and Testing: Ensure that the project has developed and executed simulations to model and test the impact of hazardous software behavior. This includes conducting tests to verify that the software system can handle these scenarios (including off-nominal conditions) without resulting in catastrophic consequences.
- Test Results Assessment: Confirm that test results are assessed and recorded and that the test results are sufficient verification artifacts for the hazard reports. (See SWE-068 - Evaluate Test Results.)
- Safety-Critical Software Implementation: Ensure that the software performs integrity checks on inputs and outputs, performs prerequisite checks before the execution of safety-critical software commands, and safely transitions between all predefined known states.
- Cyclomatic Complexity: Perform or analyze the Cyclomatic Complexity metrics to ensure that all identified safety-critical software components have a cyclomatic complexity value of 15 or lower. If not, confirm that software engineering provides a technically acceptable risk assessment explaining why the complexity value needs to be higher and why the software component cannot be structured to be lower. (See SWE-220 - Cyclomatic Complexity for Safety-Critical Software.)
- Code Coverage: Confirm that 100% code test coverage is addressed for all identified software safety-critical software components or ensure that software developers provide a risk assessment explaining why the test coverage is impossible for the safety-critical code component. (See SWE-189 - Code Coverage Measurements, SWE-219 - Code Coverage for Safety Critical Software.)
- Training and Documentation: Ensure comprehensive training and documentation for operators to minimize the chances of hazardous behavior is available. This includes clear instructions, warnings, and recovery procedures.
By implementing these tasks, the space system can be designed to mitigate the hazardous behavior of critical software, ensuring safety and reliability.
7.5 Additional Guidance
Additional guidance related to this requirement may be found in the following materials in this Handbook: