


- 1. The Requirement
- 2. Rationale
- 3. Guidance
- 4. Small Projects
- 5. Resources
- 6. Lessons Learned
- 7. Software Assurance
1. Requirements
4.4.2 The crewed space system shall provide the capability for the crew to manually override higher level software control and automation (such as automated abort initiation, configuration change, and mode change) when the transition to manual control of the system will not cause a catastrophic event.
1.1 Notes
NASA-STD-8719.29 , NASA Technical Requirements for Human-Rating, does not include any notes for this requirement.
1.2 History
1.3 Applicability Across Classes
Class A B C D E F Applicable?
Key: - Applicable |
- Not Applicable
2. Rationale
This is a specific capability necessary for the crew to control the crewed space system. While this capability should be derived by the program per HR-31 - Single Failure Tolerance (paragraph 4.3.1 of NASA-STD-8719.29), the critical nature of software control and automation at the highest system level dictates specific mention in this standard. Therefore, the crew has the capability to control automated configuration changes and mode changes, including automated aborts, at the system level as long as the transition to manual control is feasible and will not cause a catastrophic event. The program and Technical Authorities will determine the appropriate implementation of this requirement - which is documented in the program’s Human Rating Certification Plan (HRCP) and evidenced by HRCP deliverables.
3. Guidance
This is a specific capability necessary for the crew to control the crewed space system. While this capability should be derived by the program per HR-31 - Single Failure Tolerance (paragraph 4.3.1 of NASA-STD-8719.29 458 ), the critical nature of software control and automation at the highest system level dictates specific mention in this standard. Therefore, the crew has the capability to control and override automation, automated configuration changes and mode changes, including automated aborts and flight paths, at the system level as long as the transition to manual control is feasible and will not cause a catastrophic event. The program and Technical Authorities will determine the appropriate implementation of this requirement - which is documented in the program’s Human Rating Certification Plan (HRCP) and evidenced by HRCP deliverables. It should be noted that automation is generally implemented with software, and so override of automation implies software override. Crew involvement, relationship, and interactions with vehicle/software automation should be clearly defined to allow the crew insight necessary to override the automation but also not be burdened by the situational awareness. Leveraging the role of the crew or ground along with automation override can be considered an integral part of software fault tolerant design. Dissimilar software fault tolerance can be achieved in part by crew override and manual control.
See Topic 7.24 - Human Rated Software Requirements for other Software Requirements related to Human Rated Software.
3.1 Software Tasks for Manual Override Capabilities
To ensure that the crewed space system provides the capability for the crew to manually override higher-level software control and automation without causing a catastrophic event, the following software tasks should be implemented:
- Manual Override Mechanisms: Design and implement robust manual override mechanisms that allow the crew to take control of automated systems. These mechanisms should be thoroughly tested to ensure they function correctly and safely.
- Safety Analysis: Perform comprehensive safety analysis, including 8.07 - Software Fault Tree Analysis and 8.05 - Software Failure Modes and Effects Analysis, to identify potential hazards associated with manual overrides and ensure that transitioning to manual control will not cause a catastrophic event.
- Human-Machine Interface (HMI): Develop and implement an intuitive and user-friendly HMI that allows the crew to easily and effectively engage manual overrides. The interface should indicate the status of both automated and manual control modes and provide feedback to the crew on the current system state. The HMI design should take into consideration the Display Standards in Appendix F of NASA Spaceflight Human-System Standard, Volume 2: Human Factors, Habitability, And Environmental Health (NASA-STD-3001, Vol 2, Rev D). 498
- Real-time Monitoring and Alerts: Develop and implement real-time monitoring systems that provide the crew with up-to-date information on system status and performance. This includes alerting the crew to any conditions that may necessitate a manual override and ensuring they have the information needed to make informed decisions.
- Redundancy and Fault Tolerance: Design and implement a system with redundancy and fault tolerance to ensure that critical functions remain operational during and after a transition to manual control. This includes backup systems and failover mechanisms to maintain control and prevent catastrophic events.
- Safety Reviews: Perform safety reviews on all software changes and defects related to manual override mechanisms to verify that transitioning to manual control will not cause a catastrophic event. This ensures that each fault has a fault detection and recovery mechanism and the modifications do not introduce new vulnerabilities or increase the risk of failure due to the fault.
- Independent Verification and Validation (IV&V): Ensure independent verification and validation is performed to ensure that manual override systems meet specified requirements and function correctly under all operational conditions. IV&V activities should include rigorous testing and analysis of these systems.
- IV&V Analysis Results: Providing assurance that the manual override capabilities have been independently verified and validated to meet safety and mission requirements.
- IV&V Participation: Involving the IV&V provider in reviews, inspections, and technical interchange meetings to provide real-time feedback and ensure thorough assessment.
- IV&V Management and Technical Measurements: Tracking and evaluating the performance and results of IV&V activities to ensure continuous improvement and risk management.
- Simulation and Testing: Perform extensive simulations and testing to verify that manual override systems can handle all nominal and off-nominal scenarios without causing catastrophic events. This includes testing for unexpected conditions and boundary conditions. The flight operations team should participate in these simulations to thoroughly test the various conditions and scenarios.
- Code Coverage with MC/DC Criterion: Develop, implement, and execute test cases for all identified safety-critical software components to ensure that there is 100% code test coverage. This includes normal operations, failure modes, fault detection, isolation, and recovery procedures. Use the Modified Condition/Decision Coverage (MC/DC) criterion.
- Error Handling and Recovery Mechanisms: Implement robust error handling and recovery mechanisms to address errors and faults detected during manual overrides. This includes ensuring that error handling is adequate and that the system can recover from errors and faults without leading to hazardous or catastrophic events.
- Configuration Management: Maintain strict configuration management to ensure that the correct software versions and configurations are used. This reduces the risk of errors due to incorrect or inconsistent configurations that could affect manual override capabilities.
- Training and Documentation: Provide comprehensive training and documentation for the crew on how to use the manual override systems. This includes detailed procedures, troubleshooting guides, and emergency protocols to ensure the crew is well-prepared to handle any situation. This is best done by providing a User Manual with instructions and applicable information about each error/fault and how the system autonomously recovers from it.
By implementing these tasks, the crewed space system can be designed to provide the necessary capabilities for the crew to manually override higher-level software control and automation safely, ensuring mission success and preventing catastrophic events.
3.2 Additional Guidance
Additional guidance related to this requirement may be found in the following materials in this Handbook:
3.3 Center Process Asset Libraries
SPAN - Software Processes Across NASA
SPAN contains links to Center managed Process Asset Libraries. Consult these Process Asset Libraries (PALs) for Center-specific guidance including processes, forms, checklists, training, and templates related to Software Development. See SPAN in the Software Engineering Community of NEN. Available to NASA only. https://nen.nasa.gov/web/software/wiki 197
See the following link(s) in SPAN for process assets from contributing Centers (NASA Only).
SPAN Links |
---|
To be developed later. |
4. Small Projects
No additional guidance is available for small projects. The community of practice is encouraged to submit guidance candidates for this paragraph.
5. Resources
5.1 References
5.2 Tools
NASA users find this in the Tools Library in the Software Processes Across NASA (SPAN) site of the Software Engineering Community in NEN.
The list is informational only and does not represent an “approved tool list”, nor does it represent an endorsement of any particular tool. The purpose is to provide examples of tools being used across the Agency and to help projects and centers decide what tools to consider.
6. Lessons Learned
6.1 NASA Lessons Learned
No Lessons Learned have currently been identified for this requirement.
6.2 Other Lessons Learned
No other Lessons Learned have currently been identified for this requirement.
7. Software Assurance
7.1 Tasking for Software Assurance
- Ensure the development, implementation, and testing of robust control algorithms capable of managing critical functions with crew overrides. These algorithms must undergo thorough testing to guarantee their reliability and safety in all operational scenarios.
- Ensure redundancy and fault tolerance are included in the design to ensure that critical functions can continue to operate autonomously or the crew can perform overrides, even in the presence of faults or failures. This includes implementing backup systems and failover mechanisms.
- Ensure that Integrated real-time monitoring and diagnostic tools are used to continuously assess the health and status of critical systems and subsystems. These tools should detect anomalies and trigger autonomous responses to mitigate potential catastrophic events.
- Employ safety analysis techniques such as 8.07 - Software Fault Tree Analysis and 8.05 - Software Failure Modes and Effects Analysis to identify potential hazards and failure modes. This helps in designing controls and mitigations to allow the crew to manually override higher level software controls and automations to prevent the system from causing a catastrophic event.
- Ensure extensive simulations and testing are conducted to verify that the manual override of systems can handle all nominal and off-nominal scenarios without causing catastrophic events. This includes testing for unexpected situations and boundary conditions.
- Confirm that strict configuration management to ensure that the correct software versions and configurations are used. This reduces the risk of errors due to incorrect or inconsistent configurations that could impact crew operations.
- Ensure robust error handling and recovery mechanisms to address errors stemming from detected faults or failures. This ensures that error handling is adequate and that the crew can manually override the system to prevent it from executing autonomous functions that could lead to hazardous or catastrophic events.
- Perform safety reviews on all software changes and software defects.
- Confirm that 100% code test coverage is addressed for all identified safety-critical software components or that software developers provide a technically acceptable rationale or a risk assessment explaining why the test coverage is not possible or why the risk does not justify the cost of increasing coverage for the safety-critical code component.
- Analyze that the software test plans and software test procedures cover the software requirements and provide adequate verification of hazard controls, specifically that the crew is able to manually override the system under various conditions, including nominal and off-nominal scenarios, without causing catastrophic events. (See SWE-071 - Update Test Plans and Procedures tasks). Ensure that the project has developed and executed test cases to test the software system’s recovery from faults and failures.
- Analyze the software test procedures for the following:
- Coverage of the software requirements.
- Acceptance or pass/fail criteria,
- The inclusion of operational and off-nominal conditions, including boundary conditions,
- Requirements coverage and hazards per SWE-066 - Perform Testing and SWE-192 - Software Hazardous Requirements, respectively.
- Perform test witnessing for safety-critical software to ensure that the crew can manually override systems under various conditions, including nominal and off-nominal scenarios.
- Confirm that test results are sufficient verification artifacts for the hazard reports.
- Ensure comprehensive training and documentation for operators is available.
7.2 Software Assurance Products
- 8.52 - Software Assurance Status Reports
- 8.54 - Software Requirements Analysis
- 8.55 - Software Design Analysis
- 8.56 - Source Code Quality Analysis
- 8.57 - Testing Analysis
- 8.58 - Software Safety and Hazard Analysis
- 8.59 - Audit Reports
- Test Witnessing Signatures (See SWE-066 - Perform Testing)
Objective Evidence
- System design showing the crew can manually override the system under various conditions, including nominal and off-nominal scenarios
- Software design that allows the crew to manually override the system under various conditions, including nominal and off-nominal scenarios
- Completed Hazard Analyses and Hazard Reports identifying all of the potential crew overrides with their associated override instructions
- Completed software safety and hazard analysis results
- Software Fault Tree Analysis (FTA) and Software Failure Modes and Effects Analysis (FMEA)
- Audit reports, specifically the Functional Configuration Audit (FCA) and Physical Configuration Audit (PCA)
- SWE work product assessments for Software Test Plan, Software Test Procedures, Software Test Reports, and User Manuals
- Results from the use of automated tools for code coverage and other verification and validation activities
7.3 Metrics
For the requirement that the crewed space system shall provide the capability for the crew to manually override higher level software control and automation (such as automated abort initiation, configuration change, and mode change) when the transition to manual control of the system will not cause a catastrophic event, the following software assurance metrics are necessary:
- Verification and Validation Metrics:
- Test Coverage: Ensuring comprehensive test coverage for all scenarios, including manual override of automated controls, to verify that these transitions do not lead to catastrophic events.
- Defect Density: Tracking the number of defects found during testing per thousand lines of code to ensure software reliability and robustness.
- Requirements Traceability: Ensuring that each requirement, including the manual override capabilities and conditions for non-catastrophic transitions, is traced to its implementation and corresponding test cases.
- Safety Metrics:
- Hazard Analysis: Identifying and evaluating potential hazards related to the manual override functionality and ensuring adequate mitigation strategies are in place.
- Safety-critical Requirements Compliance: Verifying that all safety-critical requirements are followed and adequately tested to prevent any failure during manual override operations.
- Quality Metrics:
- Code Quality: Metrics such as cyclomatic complexity and static analysis results to ensure that the code is maintainable and less prone to errors.
- Code Churn: Measuring changes in the codebase to monitor stability and identify areas of frequent modification that may need more rigorous testing.
- Performance Metrics:
- Response Time: Measuring the time taken for the system to respond to manual override inputs from the crew to ensure timely and accurate execution.
- System Uptime: Ensuring that the system is available and operational when needed, especially during critical mission phases.
- Configuration Management Metrics:
- Version Control: Ensuring proper version control for all software components involved in manual override capabilities to track changes and maintain consistency.
- Change Requests: Monitoring the number of change requests and their impact on the system's reliability and safety.
- Training Metrics:
- Personnel Training Completion: Ensuring that all personnel involved in the development, testing, and operation of the manual override system have completed the necessary training.
- Independent Verification and Validation (IV&V) Metrics:
- IV&V Analysis Results: Providing assurance that the manual override capabilities have been independently verified and validated to meet safety and mission requirements.
- IV&V Participation: Involving the IV&V provider in reviews, inspections, and technical interchange meetings to provide real-time feedback and ensure thorough assessment.
- IV&V Management and Technical Measurements: Tracking and evaluating the performance and results of IV&V activities to ensure continuous improvement and risk management.
Examples of potential SA metrics are:
- # of potential hazards that could lead to catastrophic events
- # of Non-Conformances identified during each testing phase (Open, Closed, Severity)
- Code coverage data: % of code that has been executed during testing
- % of traceability completed for all hazards to software requirements and test procedures
- # of hazards with completed test procedures/cases vs. total # of hazards over time
- # of Non-Conformances identified while confirming hazard controls are verified through test plans/procedures/cases
- # of Hazards containing software that has been tested vs. total # of Hazards containing software
- # of safety-related Non-Conformances
- # of Safety Critical tests executed vs. # of Safety Critical tests witnessed by SA
- Software code/test coverage percentages for all identified safety-critical components (e.g., # of paths tested vs. total # of possible paths)
- # of safety-critical requirement verifications vs. total # of safety-critical requirement verifications completed
- Test coverage data for all identified safety-critical software components
- # of Software Requirements that do not trace to a parent requirement
- % of traceability completed in each area: System Level requirements to Software requirements; Software Requirements to Design; Design to Code; Software Requirements to Test Procedures
- % of traceability completed for all hazards to software requirements and test procedures
- Defect trends for trace quality (# of circular traces, orphans, widows, etc.)
- # of Configuration Management Audits conducted by the project – Planned vs. Actual
These metrics ensure that the software supporting the manual override capabilities is reliable, safe, and meets the specified requirements. For detailed guidance, referring to the Software Assurance and Software Safety Standard (NASA-STD-8739.8) and the NASA Procedural Requirements (NPR 7150.2) would provide a comprehensive framework.
See also Topic 8.18 - SA Suggested Metrics
7.4 Guidance
To guarantee that the crewed space system effectively empowers the crew to manually override automated systems without risking catastrophic failure, the following software assurance and safety tasks should be implemented:
- Manual Override Mechanisms: Ensure robust manual override mechanisms that enable the crew to seize control of automated systems are designed and implemented. These mechanisms must undergo rigorous testing to confirm their reliability and safe operation.
- Human-Machine Interface (HMI): Ensure an intuitive and user-friendly HMI is designed and implemented so that there is seamless engagement of manual overrides. The interface must convey the status of both automated and manual control modes while providing real-time feedback on the system’s current state. The HMI design should take into consideration the Display Standards in Appendix F of NASA Spaceflight Human-System Standard, Volume 2: Human Factors, Habitability, And Environmental Health (NASA-STD-3001, Vol 2, Rev D 498.
- Real-time Monitoring and Alerts: Ensure advanced real-time monitoring systems that deliver up-to-the-minute information about system status and performance are developed and implemented. This includes promptly alerting the crew to conditions that necessitate a manual override, ensuring they have the crucial information needed for informed decision-making.
- Redundancy and Fault Tolerance: Ensure the system is designed to include built-in redundancy and fault tolerance to guarantee that critical functions remain operational during and after any transition to manual control. This strategy includes the integration of backup systems and failover mechanisms to maintain control and avert catastrophic events.
- Software Safety and Hazard Analysis: Develop and maintain a Software Safety Analysis throughout the software development life cycle. Assess that the Hazard Analyses (including hazard reports) identify the software components associated with the system hazards per the criteria defined in NASA-STD- 8739.8, Appendix A. (See SWE-205 - Determination of Safety-Critical Software) Perform these on all new requirements, requirement changes, and software defects to determine their impact on the software system's reliability and safety. Confirm that all safety-critical requirements related to the manual override of mechanisms that enable the crew to seize control of automated systems have been implemented and adequately tested to prevent failures during mission-critical operations. It may be necessary to discuss these findings during the Safety Review so the reviewers can weigh the impact of implementing the changes. (See Topic 8.58 – Software Safety and Hazard Analysis.
- Hazard Analysis/Hazard Reports: Confirm that a comprehensive hazard analysis was conducted to identify potential hazards that could result from critical software behavior. This analysis should include evaluating existing and potential hazards and recommending mitigation strategies for identified hazards. The Hazard Reports should contain the results of the analyses and proposed mitigations (See Topic 5.24 - Hazard Report Minimum Content)
- Software Safety Analysis: To develop this analysis, utilize comprehensive safety analyses, including 8.07 - Software Fault Tree Analysis and 8.05 - SW Failure Modes and Effects Analysis, to identify potential hazards associated with manual overrides. This critical step ensures that transitioning to manual control will not lead to catastrophic outcomes. When generating this SA product, see Topic 8.09 - Software Safety Analysis for additional guidance.
- Safety Reviews: Perform safety reviews on all software changes and defects to verify that the crew can effectively monitor, operate, and control the space system under various conditions, including nominal and off-nominal scenarios, that could result in a catastrophic event. This ensures that each fault has a fault detection mechanism and the modifications do not introduce new vulnerabilities or increase the risk of failure due to the fault.
- Peer Reviews: Participate in peer reviews on all software changes and software defects affecting safety-critical software and hazardous functionality to verify that the crew can manually override the system under various conditions, including nominal and off-nominal scenarios, without causing catastrophic events. (See SWE-134 - Safety-Critical Software Design Requirements)
- Change Requests: Monitor the number of software change requests and software defects and their impact on the system's reliability and safety. Increases in the number of changes may be indicative of requirements issues or code quality issues resulting in potential schedule slips. (See SWE-053 - Manage Requirements Changes, SWE-080 - Track and Evaluate Changes.)
- Test Witnessing: Perform test witnessing for safety-critical software to verify that the crew can manually override the system under various conditions, including nominal and off-nominal scenarios, without causing catastrophic events. (See SWE-066 - Perform Testing.) This includes witnessing tests to:
- Confirm that the crew is able to manually override the system under various conditions, including nominal and off-nominal scenarios, without causing catastrophic events. This could include:
- Measuring the time taken for the system to detect and report faults to the crew so they can implement mitigation procedures in timely and accurate manner. A prolonged period could cause catastrophic consequences.
- Ensuring the system is available and operational when needed, especially during critical mission phases.
- Uncover unrecorded software defects and confirm they get documented and recorded.
- Confirm robust error handling and recovery mechanisms to effectively address errors and faults encountered during manual overrides are implemented. This measure ensures that the system can recover from errors and faults without leading to hazardous or catastrophic events.
- Confirm that the crew is able to manually override the system under various conditions, including nominal and off-nominal scenarios, without causing catastrophic events. This could include:
- Simulation and Testing: Ensure extensive simulations and testing are performed to confirm that manual override systems can effectively manage both nominal and off-nominal scenarios without resulting in catastrophic events. Testing must cover unexpected and boundary conditions comprehensively.
- Test Results Assessment: Confirm that test results are assessed and recorded and that the test results are sufficient verification artifacts for the hazard reports. (See SWE-068 - Evaluate Test Results.)
- Configuration Management: Ensure strict configuration management is maintained to guarantee the use of correct software versions and configurations. (See SWE-187 - Control of Software Items for more information.) This practice minimizes the risk of errors due to incorrect or inconsistent configurations that could compromise manual override capabilities. This also includes performing the SWE-187 tasking.
- Assess that the software safety-critical items, including the hazard reports and safety analysis, are configuration-managed (See SWE-081 - Identify Software CM Items tasking).
- Code Coverage: Confirm that 100% code test coverage is addressed for all identified safety-critical software components or ensure that software developers provide a risk assessment explaining why the test coverage is impossible or why the risk does not justify the cost of increasing coverage for the safety-critical code component. This includes normal operations, failure modes, fault detection, isolation, and recovery procedures. (See SWE-189 - Code Coverage Measurements, SWE-219 - Code Coverage for Safety Critical Software.)
- Training and Documentation: Ensure comprehensive training and documentation for the crew on utilizing manual override systems are available. This should include detailed procedures, troubleshooting guides, and emergency protocols, ensuring the crew is fully equipped to handle any situation.
By executing these tasks decisively, the crewed space system will provide the necessary capabilities for the crew to safely and effectively manually override higher-level software control and automation, ensuring mission success and preventing catastrophic events.
7.5 Additional Guidance
Additional guidance related to this requirement may be found in the following materials in this Handbook: