


- 1. The Requirement
- 2. Rationale
- 3. Guidance
- 4. Small Projects
- 5. Resources
- 6. Lessons Learned
- 7. Software Assurance
1. Requirements
4.3.6 The space system shall provide the capability to detect and annunciate faults that affect critical systems, subsystems, or crew health.
1.1 Notes
NASA-STD-8719.29, NASA Technical Requirements for Human-Rating, does not include any notes for this requirement.
1.2 History
1.3 Applicability Across Classes
Class A B C D E F Applicable?
Key: - Applicable |
- Not Applicable
2. Rationale
It is necessary to alert the crew to faults (not just failures) that affect critical functions. A fault is defined as an undesired system state. A failure is an actual malfunction of a hardware or software item's intended function. The definition of the term fault envelopes the word failure since faults include other undesired events such as software anomalies and operational anomalies.
3. Guidance
3.1 Software Tasks to Detect and Annunciate Faults
To ensure that the space system can detect and annunciate faults that affect critical systems, subsystems, or crew health, the following software tasks should be implemented:
- Fault Detection and Reporting Mechanisms: Implement robust fault detection and reporting mechanisms in the software. This includes mechanisms for detecting faults in critical systems, subsystems, and health monitoring systems for the crew.
- Safety Analysis Techniques: Utilize safety analysis techniques such as 8.07 - Software Fault Tree Analysis and 8.05 - Software Failure Modes and Effects Analysis to identify potential faults and formulate effective controls. These techniques help identify hazards, hazard causes, and potential failure modes that could impact critical systems and crew health.
- Safety Reviews: Perform safety reviews on all software changes and defects to verify that the software detects and annunciates faults that affect safety-critical components. This ensures that each fault has a fault detection mechanism and the modifications do not introduce new vulnerabilities or increase the risk of failure due to the fault.
- Configuration Management: Maintain strict configuration management to ensure that the correct software versions and configurations are used. This reduces the risk of errors due to incorrect or inconsistent configurations that could affect fault detection and annunciation.
- Independent Verification and Validation (IV&V): Ensure independent verification and validation is performed to ensure that the software meets its specified requirements and that any modifications do not introduce new vulnerabilities or increase the risk of failure in fault detection systems.
- IV&V Analysis Results: Assure that fault detection and reporting mechanisms have been implemented in the software and independently verified and validated to meet safety and mission requirements.
- IV&V Participation: Involve the IV&V provider in reviews, inspections, and technical interchange meetings to provide real-time feedback and ensure thorough assessment.
- IV&V Management and Technical Measurements: Track and evaluate the performance and results of IV&V activities to ensure continuous improvement and risk management.
- Error Handling and Recovery Mechanisms: Implement robust error handling and recovery mechanisms to address errors resulting from detected faults. This includes ensuring that error handling is adequate and that the system can recover from errors without leading to hazardous or catastrophic events.
- Simulations and Testing: Develop, implement, and execute simulations to model and test the impact of detected faults on critical systems and crew health. This includes conducting tests to verify that the software can accurately detect and report faults without catastrophic consequences. The flight operations team should conduct simulations to thoroughly test the various scenarios.
- Code Coverage with MC/DC Criterion: Develop, implement, and execute test cases for all identified safety-critical software components to ensure that there is 100 percent code test coverage. Use the Modified Condition/Decision Coverage (MC/DC) criterion.
- Safety-Critical Software Requirements: Ensure that safety-critical software requirements are implemented per the NPR 7150.2 Requirements Mapping Matrix and tested and verified. This includes verifying that the software controls the functions identified in a system hazard and provides mitigation for hazardous conditions.
- Training and Documentation: Provide comprehensive training and documentation for operators to minimize the chances of faults going undetected. This includes clear instructions, warnings, and recovery procedures to ensure prompt and accurate fault reporting. This is best done by providing a User Manual with instructions and applicable information about each error/fault and how to gracefully recover from it.
By implementing these tasks, the space system can be designed to effectively detect and annunciate faults that affect critical systems, subsystems, or crew health, ensuring safety and reliability.
See Topic 7.24 - Human Rated Software Requirements for other Software Requirements related to Human Rated Software.
3.2 Additional Guidance
Additional guidance related to this requirement may be found in the following materials in this Handbook:
3.3 Center Process Asset Libraries
SPAN - Software Processes Across NASA
SPAN contains links to Center managed Process Asset Libraries. Consult these Process Asset Libraries (PALs) for Center-specific guidance including processes, forms, checklists, training, and templates related to Software Development. See SPAN in the Software Engineering Community of NEN. Available to NASA only. https://nen.nasa.gov/web/software/wiki 197
See the following link(s) in SPAN for process assets from contributing Centers (NASA Only).
SPAN Links |
---|
To be developed later. |
4. Small Projects
No additional guidance is available for small projects. The community of practice is encouraged to submit guidance candidates for this paragraph.
5. Resources
5.1 References
5.2 Tools
NASA users find this in the Tools Library in the Software Processes Across NASA (SPAN) site of the Software Engineering Community in NEN.
The list is informational only and does not represent an “approved tool list”, nor does it represent an endorsement of any particular tool. The purpose is to provide examples of tools being used across the Agency and to help projects and centers decide what tools to consider.
6. Lessons Learned
6.1 NASA Lessons Learned
No Lessons Learned have currently been identified for this requirement.
6.2 Other Lessons Learned
No other Lessons Learned have currently been identified for this requirement.
7. Software Assurance
7.1 Tasking for Software Assurance
- Confirm that the hazard reports or safety data packages contain all known software contributions or events where software, either by its action, inaction, or incorrect action, leads to a hazard.
- Assess that the hazard reports identify the software components associated with the system hazards per the criteria defined in NASA-STD-8739.8, Appendix A.
- Assess that hazard analyses (including hazard reports) identify the software components associated with the system hazards per the criteria defined in NASA-STD-8739.8, Appendix A.
- Confirm that the traceability between software requirements and hazards with software contributions exists.
- Develop and maintain a software safety analysis throughout the software development life cycle.
- Ensure that safety-critical software requirements are implemented per the NPR 7150.2 Requirements Mapping Matrix and tested or verified.
- Perform safety reviews on all software changes and software defects.
- Confirm that 100% code test coverage is addressed for all identified safety-critical software components or that software developers provide a technically acceptable rationale or a risk assessment explaining why the test coverage is not possible or why the risk does not justify the cost of increasing coverage for the safety-critical code component.
- Analyze that the software test plans and software test procedures cover the software requirements and provide adequate verification of hazard controls, specifically the off-nominal scenarios to mitigate the impact of hazardous behaviors. (See SWE-071 - Update Test Plans and Procedures tasks.) Ensure that the project has developed and executed test cases to test the detection and annunciation of faults.
- Analyze the software test procedures for the following:
a. Coverage of the software requirements.
b. Acceptance or pass/fail criteria,
c. The inclusion of operational and off-nominal conditions, including boundary conditions,
d. Requirements coverage and hazards per SWE-066 - Perform Testing and SWE-192 - Software Hazardous Requirements, respectively. - Perform test witnessing for safety-critical software to ensure that all faults that affect critical systems are detected and annunciated.
- Confirm that test results are sufficient verification artifacts for the hazard reports.
- Confirm that strict configuration management is maintained to ensure that the correct software versions and configurations are used.
- Ensure comprehensive training and documentation for operators is available.
7.2 Software Assurance Products
- 8.52 - Software Assurance Status Reports
- 8.54 - Software Requirements Analysis
- 8.55 - Software Design Analysis
- 8.56 - Source Code Quality Analysis
- 8.57 - Testing Analysis
- 8.58 - Software Safety and Hazard Analysis
- 8.59 - Audit Reports
- Test Witnessing Signatures (See SWE-066 - Perform Testing)
Objective Evidence
- System design showing the required levels of failure reporting or annunciation
- Software design that shows how the system design meets the required levels of failure reporting or annunciation
- Completed Hazard Analyses and Hazard Reports identifying all of the potential hazard faults with their associated annunciations
- Completed software safety and hazard analysis results
- Software Fault Tree Analysis (FTA) and Software Failure Modes and Effects Analysis (FMEA)
- Audit reports, specifically the Functional Configuration Audit (FCA) and Physical Configuration Audit (PCA)
- SWE work product assessments for Software Test Plan, Software Test Procedures, Software Test Reports, and User Manuals
- Results from the use of automated tools for code coverage and other verification and validation activities
7.3 Metrics
For the requirement that the space system shall provide the capability to detect and annunciate faults that affect critical systems, subsystems, or crew health, the following software assurance metrics are necessary:
- Verification and Validation Metrics:
- Test Coverage: Ensure comprehensive test coverage for all fault detection and annunciation scenarios, including normal operations, failure modes, and recovery procedures.
- Defect Density: Track the number of defects identified during testing per thousand lines of code to ensure software reliability and robustness.
- Requirements Traceability: Ensure each requirement, including those for fault detection and annunciation, is traced to its implementation and corresponding test cases to maintain comprehensive coverage and validation.
- Safety Metrics:
- Hazard Analysis: Identify and evaluate potential hazards related to fault detection and annunciation, ensuring adequate mitigation strategies are in place.
- Safety-critical Requirements Compliance: Verify that all safety-critical requirements related to fault detection and annunciation are met and adequately tested to prevent failures during mission-critical operations.
- Quality Metrics:
- Code Quality: Use metrics such as cyclomatic complexity and static analysis results to ensure the code is maintainable and less prone to errors.
- Code Churn: Measure changes in the codebase to monitor stability and identify areas of frequent modification that may need more rigorous testing.
- Performance Metrics:
- Response Time: Measure the time taken for the system to detect and annunciate faults to ensure timely and accurate execution of annunciation procedures.
- System Uptime: Ensure the system is available and operational when needed, especially during critical mission phases, to support fault annunciation.
- Configuration Management Metrics:
- Version Control: Ensure proper version control for all software components involved in fault detection and annunciation to track changes and maintain consistency.
- Change Requests: Monitor the number of change requests and their impact on the system's reliability and safety.
- Training Metrics:
- Personnel Training Completion: Ensure that all personnel involved in the development, testing, and operation of the fault detection and annunciation system have completed the necessary training.
- Independent Verification and Validation (IV&V) Metrics:
- IV&V Analysis Results: Provide assurance that the fault detection and annunciation capabilities have been independently verified and validated to meet safety and mission requirements.
- IV&V Participation: Involve the IV&V provider in reviews, inspections, and technical interchange meetings to provide real-time feedback and ensure thorough assessment.
- IV&V Management and Technical Measurements: Track and evaluate the performance and results of IV&V activities to ensure continuous improvement and risk management.
Examples of potential SA metrics are:
- # of potential hazards that could lead to catastrophic events
- # of Non-Conformances identified during each testing phase (Open, Closed, Severity)
- Code coverage data: % of code that has been executed during testing
- % of traceability completed for all hazards to software requirements and test procedures
- # of hazards with completed test procedures/cases vs. total # of hazards over time
- # of Non-Conformances identified while confirming hazard controls are verified through test plans/procedures/cases
- # of Hazards containing software that has been tested vs. total # of Hazards containing software
- # of safety-related Non-Conformances
- # of Safety Critical tests executed vs. # of Safety Critical tests witnessed by SA
- Software code/test coverage percentages for all identified safety-critical components (e.g., # of paths tested vs. total # of possible paths)
- # of safety-critical requirement verifications vs. total # of safety-critical requirement verifications completed
- Test coverage data for all identified safety-critical software components
- # of Software Requirements that do not trace to a parent requirement
- % of traceability completed in each area: System Level requirements to Software requirements; Software Requirements to Design; Design to Code; Software Requirements to Test Procedures
- % of traceability completed for all hazards to software requirements and test procedures
- Defect trends for trace quality (# of circular traces, orphans, widows, etc.)
- # of Configuration Management Audits conducted by the project – Planned vs. Actual
These metrics ensure that the software supporting fault detection and annunciation is reliable, safe, and meets the specified requirements. For detailed guidance, referring to the Software Assurance and Software Safety Standard (NASA-STD-8739.8) and the NASA Procedural Requirements (NPR 7150.2) would provide a comprehensive framework.
See also Topic 8.18 - SA Suggested Metrics.
7.4 Guidance
To ensure that the space system can detect and annunciate faults that affect critical systems, subsystems, or crew health, the following software assurance and software safety tasks should be implemented:
- Fault Detection and Reporting Mechanisms: Ensure robust fault detection and reporting mechanisms were implemented in the software. This includes mechanisms for detecting faults in critical systems, subsystems, and health monitoring systems for the crew.
- Software Safety and Hazard Analysis: Develop and maintain a Software Safety Analysis throughout the software development life cycle. Assess that the Hazard Analyses (including hazard reports) identify the software components associated with the system hazards per the criteria defined in NASA-STD- 8739.8, Appendix A. (See SWE-205 - Determination of Safety-Critical Software.) Perform these on all new requirements, requirement changes, and software defects to determine their impact on the software system's reliability and safety. Confirm that all safety-critical requirements related to the detection and reporting of faults that affect critical systems, subsystems, or crew health are met and adequately tested to prevent failures during mission-critical operations. It may be necessary to discuss these findings during the Safety Review so the reviewers can weigh the impact of implementing the changes. (See Topic 8.58 – Software Safety and Hazard Analysis.)
- Hazard Analysis/Hazard Reports: Confirm that a comprehensive hazard analysis was conducted to identify potential hazards that could result from critical software behavior. This analysis should include evaluating existing and potential hazards and recommending mitigation strategies for identified hazards. The Hazard Reports should contain the results of the analyses and proposed mitigations (See Topic 5.24 - Hazard Report Minimum Content.)
- Software Safety Analysis: To develop this analysis, utilize safety analysis techniques such as 8.07 - Software Fault Tree Analysis and 8.05 - SW Failure Modes and Effects Analysis to identify potential faults and formulate effective controls. These techniques help in identifying hazards, hazard causes, and potential failure modes that could impact critical systems and crew health leading to the need for faults detection. When generating this SA product, see Topic 8.09 - Software Safety Analysis for additional guidance.
- Safety Reviews: Perform safety reviews on all software changes and defects to verify that the software detects and annunciates faults that affect safety-critical components. This ensures that each fault has a fault detection mechanism and the modifications do not introduce new vulnerabilities or increase the risk of failure due to the fault.
- Peer Reviews: Participate in peer reviews on all software changes and software defects affecting safety-critical software and hazardous functionality to verify that the software detects and annunciates faults that affect safety-critical components. (See SWE-134 - Safety-Critical Software Design Requirements tasks.)
- Change Requests: Monitor the number of software change requests and software defects and their impact on the system's reliability and safety. Increases in the number of changes may be indicative of requirements issues or code quality issues resulting in potential schedule slips. (See SWE-053 - Manage Requirements Changes, SWE-080 - Track and Evaluate Changes.)
- Test Witnessing: Perform test witnessing for safety-critical software to ensure the impact of inadvertent operator actions is mitigated. (See SWE-066 - Perform Testing.) This includes witnessing tests to:
- Confirm that the system can recover from hazardous behaviors without resulting in catastrophic consequences. This could include:
- Measuring the time taken for the system to detect and report faults to ensure timely and accurate execution of mitigation procedures. A prolonged period could cause catastrophic consequences.
- Ensuring the system is available and operational when needed, especially during critical mission phases.
- Uncover unrecorded software defects and confirm they get documented and recorded.
- Confirm robust error handling and recovery mechanisms to address errors resulting from detected faults is implemented. This includes ensuring that error handling is adequate, and that the system can recover from errors without leading to hazardous or catastrophic events.
- Confirm that the system can recover from hazardous behaviors without resulting in catastrophic consequences. This could include:
- Configuration Management: Ensure strict configuration management is maintained to ensure that the correct software versions and configurations are used. (See SWE-187 - Control of Software Items for more information.) This reduces the risk of errors due to incorrect or inconsistent configurations that could affect fault detection and annunciation. This also includes performing the SWE-187 tasking.
- Assess that the software safety-critical items, including the hazard reports and safety analysis, are configuration-managed (See SWE-081 - Identify Software CM Items tasking.)
- Simulations and Testing: Ensure that the project developed and executed simulations to model and test the impact of detected faults on critical systems and crew health. This includes conducting tests to verify that the software can accurately detect and report faults without catastrophic consequences.
- Test Results Assessment: Confirm that test results are assessed and recorded and that the test results are sufficient verification artifacts for the hazard reports. (See SWE-068 - Evaluate Test Results.)
- Safety-Critical Software Requirements: Ensure that safety-critical software requirements are implemented per the NPR 7150.2 Requirements Mapping Matrix and tested or verified. This includes verifying that the software controls the functions identified in a system hazard and provides mitigation for hazardous conditions.
- Code Coverage: Confirm that 100% code test coverage is addressed for all identified software safety-critical software components or ensure that software developers provide a risk assessment explaining why the test coverage is impossible for the safety-critical code component. (See SWE-189 - Code Coverage Measurements, SWE-219 - Code Coverage for Safety Critical Software.)
- Training and Documentation: Ensure comprehensive training and documentation for operators to minimize the chances of faults going undetected is available. This includes clear instructions, warnings, and recovery procedures to ensure prompt and accurate fault reporting.
By implementing these tasks, the space system can be designed to effectively detect and annunciate faults that affect critical systems, subsystems, or crew health, ensuring safety and reliability.
7.5 Additional Guidance
Additional guidance related to this requirement may be found in the following materials in this Handbook: