Invalid license: Your evaluation license of Refined expired.
bannerd


Renew your license to continue

Your evaluation license of Visibility for Confluence expired. Please use the Buy button to purchase a new license.

HR-39 - Autonomous Operation
This page contains macros or features from a plugin which requires a valid license.

You will need to contact your administrator.

1. Requirements

4.3.9 The crewed space system shall provide the capability for autonomous operation of system and subsystem functions which, if lost, would result in a catastrophic event.

1.1 Notes

This capability means that the crewed system does not depend on communication with Earth (e.g., mission control) to perform functions that are required to keep the crew alive (refer to the definition for Autonomous in Section 3.2  458). 

Autonomous. The ability of a space system to perform operations independent from any Earth-based system. This includes no communication with, or real-time support from, mission control or other Earth systems. (source NPR 8705.2 024)

1.2 History

HR-39 - First published in NASA-STD-8719.29. First used in Software Engineering Handbook Version D.

SWEHB RevHR RevRequirement Statement
DBaseline

4.3.9 The crewed space system shall provide the capability for autonomous operation of system and subsystem functions which, if lost, would result in a catastrophic event.

1.3 Applicability Across Classes

Class

     A      

     B      

     C      

     D      

     E      

     F      

Applicable?

   

   

   

   

   

   

Key:    - Applicable | - Not Applicable


Renew your license to continue

Your evaluation license of Visibility for Confluence expired. Please use the Buy button to purchase a new license.

2. Rationale

For critical functions on a crewed vehicle,  if a loss of communication occurs or the crew is unavailable/incapacitated, the system must have the capability to safe itself autonomously without crew or ground intervention in order to keep the crew alive.

3. Guidance

For critical functions on a crewed vehicle, if a loss of communication occurs  or the crew is unavailable, the system must have the capability to safe itself autonomously without crew or ground intervention. This capability means that the crewed system does not depend on communication with Earth (e.g., mission control) or crew intervention to perform functions that are required to keep the crew alive.  The possibility of an absent or incapacitated crew should also be considered within the scope of this requirement. 

Autonomous. The ability of a space system to perform operations independent from any Earth-based system. This includes no communication with, or real-time support from, mission control or other Earth systems. (source NPR 8705.2 024)

See Topic 7.24 - Human Rated Software Requirements for other Software Requirements related to Human Rated Software. 

3.1 Software Tasks for Autonomous Operations

To ensure that the crewed space system can autonomously operate system and subsystem functions which, if lost, would result in a catastrophic event, the following software tasks should be implemented:

  1. Autonomous Control Algorithms: Develop and implement robust autonomous control algorithms capable of managing critical functions without human intervention. These algorithms should be thoroughly tested to ensure reliability and safety in all operational scenarios.
  2. Redundancy and Fault Tolerance: Design and implement a system with redundancy and fault tolerance to ensure that critical functions can continue operating autonomously even in the presence of faults or failures. This includes implementing backup systems and failover mechanisms.
  3. Real-time Monitoring and Diagnostics: Incorporate real-time monitoring and diagnostic tools to continuously assess the health and status of critical systems and subsystems. These tools should be able to detect anomalies and trigger autonomous responses to mitigate potential catastrophic events.
  4. Safety Analysis Techniques: Utilize safety analysis techniques such as 8.07 - Software Fault Tree Analysis and 8.05 - SW Failure Modes and Effects Analysis to identify potential hazards and failure modes. This helps in designing controls and mitigations for autonomous operations of critical functions.
  5. Safety Reviews: Perform safety reviews on all software changes and defects to verify that the system has redundancy and fault tolerance to ensure critical functions can continue operating autonomously if the crew is unavailable or incapacitated. This ensures that each fault has a fault detection and recovery mechanism and the modifications do not introduce new vulnerabilities or increase the risk of failure due to the fault.
  6. Independent Verification and Validation (IV&V): Ensure independent verification and validation is performed to ensure that autonomous control systems meet specified requirements and function correctly under all conditions. IV&V activities should include rigorous testing and analysis to validate the effectiveness of these systems.
    1. IV&V Analysis Results: Assure that the autonomous operation capabilities have been implemented in the software and independently verified and validated to meet safety and mission requirements.
    2. IV&V Participation: Involve the IV&V provider in reviews, inspections, and technical interchange meetings to provide real-time feedback and ensure thorough assessment.
    3. IV&V Management and Technical Measurements: Track and evaluate the performance and results of IV&V activities to ensure continuous improvement and risk management.
  7. Simulation and Testing: Conduct extensive simulations and testing to verify that the autonomous control systems can handle all nominal and off-nominal scenarios without human intervention. This includes testing for unexpected conditions and boundary conditions. The flight operations team should participate in these simulations to thoroughly test the various conditions and scenarios.
    1. Code Coverage with MC/DC Criterion: Develop, implement, and execute test cases for all identified safety-critical software components to ensure that there is 100 percent code test coverage. This includes normal operations, failure modes, fault detection, isolation, and recovery procedures. Use the Modified Condition/Decision Coverage (MC/DC) criterion.
  8. Configuration Management: Maintain strict configuration management to ensure that the correct software versions and configurations are used. This reduces the risk of errors due to incorrect or inconsistent configurations that could affect autonomous operations.
  9. Error Handling and Recovery Mechanisms: Implement robust error handling and recovery mechanisms to address errors resulting from detected faults. This includes ensuring that error handling is adequate and that the system can recover autonomously from errors without leading to hazardous or catastrophic events.
  10. Training and Documentation: Provide comprehensive documentation and training for operators to understand the autonomous systems and their operations. This includes guidelines on interpreting system status, identifying potential issues, and understanding the autonomous responses. This is best done by providing a User Manual with instructions and applicable information about each error/fault and how the system autonomously recovers from it.

By implementing these tasks, the crewed space system can be designed to autonomously operate critical functions, ensuring safety and reliability even in the event of faults or failures.

3.2 Additional Guidance

Additional guidance related to this requirement may be found in the following materials in this Handbook:

3.3 Center Process Asset Libraries

SPAN - Software Processes Across NASA
SPAN contains links to Center managed Process Asset Libraries. Consult these Process Asset Libraries (PALs) for Center-specific guidance including processes, forms, checklists, training, and templates related to Software Development. See SPAN in the Software Engineering Community of NEN. Available to NASA only. https://nen.nasa.gov/web/software/wiki  197

See the following link(s) in SPAN for process assets from contributing Centers (NASA Only). 

SPAN Links

To be developed later. 

4. Small Projects

No additional guidance is available for small projects. The community of practice is encouraged to submit guidance candidates for this paragraph.

5. Resources

5.1 References

Renew your license to continue

Your evaluation license has expired. Contact your administrator to renew your Reporting for Confluence license.

Renew your license to continue

Your evaluation license of Visibility for Confluence expired. Please use the Buy button to purchase a new license.


5.2 Tools

Tools to aid in compliance with this SWE, if any, may be found in the Tools Library in the NASA Engineering Network (NEN). 

NASA users find this in the Tools Library in the Software Processes Across NASA (SPAN) site of the Software Engineering Community in NEN. 

The list is informational only and does not represent an “approved tool list”, nor does it represent an endorsement of any particular tool.  The purpose is to provide examples of tools being used across the Agency and to help projects and centers decide what tools to consider.

6. Lessons Learned

6.1 NASA Lessons Learned

No Lessons Learned have currently been identified for this requirement.

6.2 Other Lessons Learned

No other Lessons Learned have currently been identified for this requirement.

7. Software Assurance

HR-39 - Autonomous Operation
4.3.9 The crewed space system shall provide the capability for autonomous operation of system and subsystem functions which, if lost, would result in a catastrophic event.

7.1 Tasking for Software Assurance

  1. Ensure the development, implementation, and testing of robust autonomous control algorithms capable of managing critical functions without human intervention. These algorithms must undergo thorough testing to guarantee their reliability and safety in all operational scenarios.
  2. Ensure redundancy and fault tolerance is included in the design so that critical functions continue to operate autonomously, even in the presence of faults or failures. This includes implementing backup systems and failover mechanisms.
  3. Ensure that integrated real-time monitoring and diagnostic tools are used to continuously assess the health and status of critical systems and subsystems. These tools should detect anomalies and trigger autonomous responses to mitigate potential catastrophic events.
  4. Employ safety analysis techniques such as 8.07 - Software Fault Tree Analysis and 8.05 - SW Failure Modes and Effects Analysis to identify potential hazards and failure modes. This helps in designing controls and mitigations for the autonomous operation of critical functions.
  5. Ensure comprehensive test coverage for all autonomous operation scenarios, including normal operations, failure modes, and recovery scenarios, to verify that these functions operate correctly and reliably. This includes testing for unexpected situations and boundary conditions.
  6. Confirm that strict configuration management to ensure that the correct software versions and configurations are used. This reduces the risk of errors due to incorrect or inconsistent configurations that could impact autonomous operations.
  7. Ensure robust error handling and recovery mechanisms to address errors stemming from detected faults. This ensures that error handling is adequate and that the system can autonomously recover from errors without leading to hazardous or catastrophic events.
  8. Perform safety reviews on all software changes and software defects.
  9. Confirm that 100% code test coverage is addressed for all identified safety-critical software components or that software developers provide a technically acceptable rationale or a risk assessment explaining why the test coverage is not possible or why the risk does not justify the cost of increasing coverage for the safety-critical code component.
  10. Analyze that the software test plans and software test procedures cover the software requirements and provide adequate verification of hazard controls, specifically that the autonomous control systems can handle all nominal and off-nominal scenarios without human intervention. (See SWE-071 - Update Test Plans and Procedures tasks). Ensure that the project has developed and executed test cases to test the software system’s recovery from faults.
  11. Analyze the software test procedures for the following:
    1. Coverage of the software requirements.
    2. Acceptance or pass/fail criteria,
    3. The inclusion of operational and off-nominal conditions, including boundary conditions,
    4. Requirements coverage and hazards per SWE-066 - Perform Testing and SWE-192 - Software Hazardous Requirements, respectively.
  12. Perform test witnessing for safety-critical software to ensure that the autonomous control systems can handle all nominal and off-nominal scenarios without human intervention.
  13. Confirm that test results are sufficient verification artifacts for the hazard reports.
  14. Ensure comprehensive training and documentation for operators is available.

7.2 Software Assurance Products

  1. 8.52 - Software Assurance Status Reports
  2. 8.54 - Software Requirements Analysis
  3. 8.55 - Software Design Analysis
  4. 8.56 - Source Code Quality Analysis
  5. 8.57 - Testing Analysis 
  6. 8.58 - Software Safety and Hazard Analysis 
  7. 8.59 - Audit Reports
  8. Test Witnessing Signatures (See SWE-066 - Perform Testing)


Objective Evidence

  1. System design showing the autonomous control algorithms capable of managing critical functions without human intervention 
  2. Software design that shows how the system design meets the required levels of autonomous operation for system and subsystem functions that require intervention to prevent a catastrophic failure
  3. Completed Hazard Analyses and Hazard Reports identifying all of the potential hazard faults with their associated autonomous operations
  4. Completed software safety and hazard analysis results
  5. Software Fault Tree Analysis (FTA) and Software Failure Modes and Effects Analysis (FMEA)
  6. Audit reports, specifically the Functional Configuration Audit (FCA) and Physical Configuration Audit (PCA)
  7. SWE work product assessments for Software Test Plan, Software Test Procedures, Software Test Reports, and User Manuals
  8. Results from the use of automated tools for code coverage and other verification and validation activities

Objective evidence is an unbiased, documented fact showing that an activity was confirmed or performed by the software assurance/safety person(s). The evidence for confirmation of the activity can take any number of different forms, depending on the activity in the task. Examples are:

  • Observations, findings, issues, risks found by the SA/safety person and may be expressed in an audit or checklist record, email, memo or entry into a tracking system (e.g. Risk Log).
  • Meeting minutes with attendance lists or SA meeting notes or assessments of the activities and recorded in the project repository.
  • Status report, email or memo containing statements that confirmation has been performed with date (a checklist of confirmations could be used to record when each confirmation has been done!).
  • Signatures on SA reviewed or witnessed products or activities, or
  • Status report, email or memo containing a short summary of information gained by performing the activity. Some examples of using a “short summary” as objective evidence of a confirmation are:
    • To confirm that: “IV&V Program Execution exists”, the summary might be: IV&V Plan is in draft state. It is expected to be complete by (some date).
    • To confirm that: “Traceability between software requirements and hazards with SW contributions exists”, the summary might be x% of the hazards with software contributions are traced to the requirements.
  • The specific products listed in the Introduction of 8.16 are also objective evidence as well as the examples listed above.


7.3 Metrics

For the requirement that the crewed space system shall provide the capability for autonomous operation of system and subsystem functions which, if lost, would result in a catastrophic event, the following software assurance metrics are necessary:

  1. Verification and Validation Metrics:
    • Test Coverage: Ensure comprehensive test coverage for all autonomous operation scenarios, including normal operations, failure modes, and recovery scenarios, to verify that these functions operate correctly and reliably.
    • Defect Density: Track the number of defects identified during testing per thousand lines of code to ensure the reliability and robustness of the software.
    • Requirements Traceability: Ensure each requirement, including those for autonomous operation capabilities, is traced to its implementation and corresponding test cases to maintain comprehensive coverage and validation.
  2. Safety Metrics:
    • Hazard Analysis: Identify and evaluate potential hazards related to the loss of autonomous functions, ensuring adequate mitigation strategies are in place.
    • Safety-critical Requirements Compliance: Verify that all safety-critical requirements are met and adequately tested to prevent failures that could lead to catastrophic events.
  3. Quality Metrics:
    • Code Quality: Use metrics such as cyclomatic complexity and static analysis results to ensure the code is maintainable and less prone to errors.
    • Code Churn: Measure changes in the codebase to monitor stability and identify areas of frequent modification that may need more rigorous testing.
  4. Performance Metrics:
    • Response Time: Measure the time taken for the system to respond to autonomous control inputs to ensure the timely and accurate execution of commands.
    • System Uptime: Ensure the system is available and operational when needed, especially during critical mission phases, to avoid catastrophic events.
  5. Configuration Management Metrics:
    • Version Control: Ensure proper version control for all software components involved in autonomous operation capabilities to track changes and maintain consistency.
    • Change Requests: Monitor the number of change requests and their impact on the system's reliability and safety.
  6. Training Metrics:
    • Personnel Training Completion: Ensure that all personnel involved in the development, testing, and operation of the autonomous control system have completed the necessary training.
  7. Independent Verification and Validation (IV&V) Metrics:
    • IV&V Analysis Results: Provide assurance that the autonomous operation capabilities have been independently verified and validated to meet safety and mission requirements.
    • IV&V Participation: Involve the IV&V provider in reviews, inspections, and technical interchange meetings to provide real-time feedback and ensure thorough assessment.
    • IV&V Management and Technical Measurements: Track and evaluate the performance and results of IV&V activities to ensure continuous improvement and risk management.

Examples of potential SA metrics are: 

  • # of potential hazards that could lead to catastrophic events  
  • # of Non-Conformances identified during each testing phase (Open, Closed, Severity)  
  • Code coverage data: % of code that has been executed during testing  
  • % of traceability completed for all hazards to software requirements and test procedures  
  • # of hazards with completed test procedures/cases vs. total # of hazards over time  
  • # of Non-Conformances identified while confirming hazard controls are verified through test plans/procedures/cases  
  • # of Hazards containing software that has been tested vs. total # of Hazards containing software  
  • # of safety-related Non-Conformances  
  • # of Safety Critical tests executed vs. # of Safety Critical tests witnessed by SA  
  • Software code/test coverage percentages for all identified safety-critical components (e.g., # of paths tested vs. total # of possible paths)   
  • # of safety-critical requirement verifications vs. total # of safety-critical requirement verifications completed  
  • Test coverage data for all identified safety-critical software components  
  • # of Software Requirements that do not trace to a parent requirement  
  • % of traceability completed in each area: System Level requirements to Software requirements; Software Requirements to Design; Design to Code; Software Requirements to Test Procedures
  • % of traceability completed for all hazards to software requirements and test procedures
  • Defect trends for trace quality (# of circular traces, orphans, widows, etc.)
  • #  of Configuration Management Audits conducted by the project – Planned vs. Actual

These metrics ensure that the software supporting autonomous operation capabilities is reliable, safe, and meets the specified requirements. For detailed guidance, referring to the Software Assurance and Software Safety Standard (NASA-STD-8739.8) and the NASA Procedural Requirements (NPR 7150.2) would provide a comprehensive framework.

See also Topic 8.18 - SA Suggested Metrics

7.4 Guidance

To ensure that the crewed space system can autonomously manage its systems and subsystems—particularly those that, if lost, could lead to catastrophic events, the following software assurance and software safety tasks should be implemented:

  1. Autonomous Control Algorithms: Ensure robust autonomous control algorithms capable of managing critical functions without human intervention are developed and implemented. These algorithms must undergo thorough testing to guarantee their reliability and safety in all operational scenarios.
  2. Redundancy and Fault Tolerance: Ensure the system is designed with redundancy and fault tolerance to ensure that critical functions can continue to operate autonomously, even in the presence of faults or failures. This includes implementing backup systems and failover mechanisms.
  3. Real-Time Monitoring and Diagnostics: Ensure real-time monitoring and diagnostic tools to continuously assess the health and status of critical systems and subsystems are developed and implemented. These tools should detect anomalies and trigger autonomous responses to mitigate potential catastrophic events.
  4. Software Safety and Hazard Analysis: Develop and maintain a Software Safety Analysis throughout the software development life cycle. Assess that the Hazard Analyses (including hazard reports) identify the software components associated with the system hazards per the criteria defined in NASA-STD- 8739.8, Appendix A. (See SWE-205 - Determination of Safety-Critical Software.) Perform these on all new requirements, requirement changes, and software defects to determine their impact on the software system's reliability and safety. Confirm that all safety-critical requirements related to the autonomous detection, isolation, and recovery from faults that affect critical systems, subsystems, or crew health are met and adequately tested to prevent failures during mission-critical operations. It may be necessary to discuss these findings during the Safety Review so the reviewers can weigh the impact of implementing the changes. (See Topic 8.58 – Software Safety and Hazard Analysis.)
    1. Hazard Analysis/Hazard Reports: Confirm that a comprehensive hazard analysis was conducted to identify potential hazards that could result from critical software behavior. This analysis should include evaluating existing and potential hazards and recommending mitigation strategies for identified hazards. The Hazard Reports should contain the results of the analyses and proposed mitigations (See Topic 5.24 - Hazard Report Minimum Content.)
    2. Software Safety Analysis: To develop this analysis, utilize safety analysis techniques such as 8.07 - Software Fault Tree Analysis and 8.05 - SW Failure Modes and Effects Analysis to identify potential hazards and failure modes. This helps in designing controls and mitigations for the autonomous operation of critical functions. When generating this SA product, see Topic 8.09 - Software Safety Analysis for additional guidance.
  5. Safety Reviews: Perform safety reviews on all software changes and defects to verify that autonomous control systems can handle all nominal and off-nominal scenarios without human intervention that could result in a catastrophic event. This ensures that each fault has a fault detection mechanism and the modifications do not introduce new vulnerabilities or increase the risk of failure due to the fault.
  6. Peer Reviews: Participate in peer reviews on all software changes and software defects affecting safety-critical software and hazardous functionality to verify that the autonomous control systems can handle all nominal and off-nominal scenarios without human intervention. (See SWE-134 - Safety-Critical Software Design Requirements tasks.) 
    1. Change Requests: Monitor the number of software change requests and software defects and their impact on the system's reliability and safety. Increases in the number of changes may be indicative of requirements issues or code quality issues resulting in potential schedule slips. (See SWE-053 - Manage Requirements ChangesSWE-080 - Track and Evaluate Changes.) 
  7. Test Witnessing: Perform test witnessing for safety-critical software to verify that the autonomous control systems can handle all nominal and off-nominal scenarios without human intervention. (See SWE-066 - Perform Testing.) This includes witnessing tests to:
    1. Confirm that the system can autonomously detect, isolate, and recover from faults without resulting in catastrophic consequences. This could include:
      1. Measuring the time taken for the system to detect and report faults to ensure timely and accurate execution of mitigation procedures. A prolonged period could cause catastrophic consequences.
      2. Ensuring the system is available and operational when needed, especially during critical mission phases.
    2. Uncover unrecorded software defects and confirm they get documented and recorded.
    3. Confirm robust error handling and recovery mechanisms to address errors resulting from detected faults are implemented. This includes ensuring that error handling is adequate and that the system can autonomously recover from errors without leading to hazardous or catastrophic events.
  8. Simulation and Testing: Ensure extensive simulations and testing are performed to verify that the autonomous control systems can handle all nominal and off-nominal scenarios without human intervention. This includes testing for unexpected situations and boundary conditions.
  9. Test Results Assessment: Confirm that test results are assessed and recorded and that the test results are sufficient verification artifacts for the hazard reports. (See SWE-068 - Evaluate Test Results.)
  10. Configuration Management: Ensure strict configuration management is maintained to ensure that the correct software versions and configurations are used. (See SWE-187 - Control of Software Items for more information.) This reduces the risk of errors due to incorrect or inconsistent configurations that could impact autonomous operations. This also includes performing the SWE-187 tasking.
    1. Assess that the software safety-critical items, including the hazard reports and safety analysis, are configuration-managed (See SWE-081 - Identify Software CM Items tasking).
  11. Code Coverage: Confirm that 100% code test coverage is addressed for all identified safety-critical software components or ensure that software developers provide a risk assessment explaining why the test coverage is impossible or why the risk does not justify the cost of increasing coverage for the safety-critical code component. This includes normal operations, failure modes, fault detection, isolation, and recovery procedures. (See SWE-189 - Code Coverage MeasurementsSWE-219 - Code Coverage for Safety Critical Software.)
  12. Training and Documentation: Ensure comprehensive documentation and training for operators to help them understand the autonomous systems and their operations are available. This should include guidelines for interpreting system status, identifying potential issues, and understanding the autonomous responses.

By implementing these tasks, the crewed space system can be designed to autonomously manage critical functions, thereby ensuring safety and reliability even in the event of faults or failures.

7.5 Additional Guidance

Additional guidance related to this requirement may be found in the following materials in this Handbook:

  • No labels