...

Set Data

hidden	true
name	reftab

Show If

spacePermission	edit

Panel

borderColor	red
title	Visible to editors only

Expand

Page built from "Software Data Needed for Certification in Human-Rated Missions v8.docx"

Content updates needed on this page:

PAT-082 - Software Certification in Human-Rated Missions Checklist - Done

Tabsetup

0	1. Introduction
1	2. Generative AI MetricsCompliance Data Needs
2	3. Safety Case
3	4. Resources

Div

id	tabs-1

1. Introduction

Excerpt
As NASA organizations use Artificial Intelligence (AI) to make decisions, the need for robust Software Assurance (SA) becomes important.

(AI) has emerged as a project option, changing the way we interact with and use data. Recommend, at this time, that we limit to use of AI to non-safety critical applications.

Software Assurance plays a crucial role in the AI development lifecycle. It involves a systematic process of monitoring, assessing, and improving the software development and AI implementation processes. In the context of AI, SA aims to:

evaluate the use of AI in software development activities,
validate the accuracy of algorithms,
ensure model robustness, and
ensure that the software meets the highest standards of performance.

AI introduces unique challenges to traditional SA methodologies. Unlike conventional software, AI systems continuously learn and adapt based on new data. This dynamic nature poses challenges in establishing fixed criteria for testing and validation. SA in AI must evolve alongside the models it scrutinizes, necessitating a more iterative and adaptive approach.

1.1 Data Quality

The quality of the data used to train and test AI models directly influences their performance and reliability. Garbage in, garbage out (GIGO) holds in the world of AI, emphasizing the critical importance of high-quality data. SA should also ensure the security and configuration control aspects of the data used to train and test AI. Data quality encompasses several dimensions, including accuracy, completeness, consistency, timeliness, and relevance. In the context of AI, accuracy is particularly vital, as even small inaccuracies in the training data can lead to significant errors in predictions. Ensuring that data is representative and unbiased is also crucial to avoid reinforcing existing biases within AI models.

There are a number of key considerations for software assurance on AI models:

1.1.1 Validating The Training Data

SA needs to review (evaluate and approve) how the project and engineering are testing the data used to train the AI model to ensure it is accurate, representative, and unbiased. This helps identify and mitigate potential issues with the model's performance and predictions. Effective data preprocessing and cleaning are foundational steps in ensuring data quality for AI. This involves identifying and addressing missing values, handling outliers, and normalizing data to create a standardized and reliable dataset. SA processes must include rigorous checks at these stages to guarantee that the data fed into AI models is of the highest quality. SA should analyze and think through all possible scenarios to make sure that the AI actions are correct.

1.1.2 Continuous Testing

Unlike traditional software, AI models that are generative or continually learning may require ongoing testing even after deployment, if the model continues to learn and evolve with new real-world data. SA needs to ensure that the software engineering process includes repeatedly testing the model's behavior and performance over time and evaluating the results. Testing AI models is a multifaceted process. It involves validating the accuracy of predictions, assessing model generalization to new data, and evaluating performance under diverse conditions. The dynamic nature of some AI techniques may require continuous testing throughout the development lifecycle. Rigorous testing not only ensures the reliability of AI applications but also contributes to building trust among end-users. The dynamic nature of AI necessitates continuous monitoring and feedback loops. SA processes should include mechanisms for monitoring of AI applications in development, production, and in testing. This ongoing evaluation helps identify and address issues promptly, ensuring that the AI system remains accurate and effective as it encounters new data.

1.1.3 Documentation And Traceability

Comprehensive documentation and traceability are essential aspects of AI applications. Documenting the entire development and testing process, along with the data used, facilitates transparency allowing for effective debugging and auditing. In the event of issues or unexpected outcomes, traceability enables developers and software assurance teams to identify and rectify problems efficiently. SA needs to ensure that sufficient documentation and traceability exist for the AI application and associated data. SA should ensure if the model needs to be recreated, then all of the inputs need to be documented so that they can be fed into the AI tool and produce the same/correct results.

1.1.4 Leveraging AI/ML For SA

Automated tools and machine learning models can be used to analyze large codebases, identify patterns, and predict potential issues more efficiently than manual review. SA needs to analyze the engineering data or perform independent static code analysis to check for code defects, software quality objectives, code coverage objectives, software complexity values, and software security objectives. SA needs to confirm the static analysis tool(s) are used with checkers to identify security and coding errors and defects. SA needs to confirm that the project addresses the errors and defects and assesses the results from the static analysis tools used by software assurance, software safety, engineering, or the project. SA should confirm that Software Quality Objectives or software quality threshold levels are defined and set for static code analysis defects, or software security objectives.

1.1.5 Defining Quality Criteria

It's important to establish clear quality criteria and variables to assess the AI model's performance, quality, risk, security, maintainability, and compliance with requirements. The SA process should confirm that these criteria are defined and aim to continuously improve the code quality towards all of the criteria.

1.1.6 Balancing Quality Goals

While the goal is to maximize code quality, projects must also consider factors like features, costs, and schedule. The SA process should help identify the critical vulnerabilities to address based on objectives, risks, and schedules.

1.1.7 Ensuring Data Security And Privacy

Protecting sensitive data used to train and operate the AI model is important. Appropriate access controls and security measures must be in place. SA should confirm that the project has the proper security controls in place and configuration management for the project.

1.1.8 Addressing Bias Considerations

One of the most significant challenges in AI software assurance is addressing bias. Biased training data can lead to incorrect outcomes. Software assurance should look at how the project identifies and mitigates bias in the data and the algorithms. SA should also ensure ongoing monitoring and adjustment are used to ensure objectivity in AI applications.

1.1.9 Requirements Verification And Validation

AI applications introduce a unique challenge of indeterministic behavior. The ability to verify and validate the behavior of AI systems may not be predictably repeatable. SA should work to develop verification and validation criteria that address the probabilistic nature of AI systems to establish qualitative measures for validation and verification test acceptance, pass, and fail criteria.

1.2 Approach

In summary, software assurance for AI models requires a comprehensive approach that focuses on validating training data, continuous testing, leveraging AI/ML tools, defining quality benchmarks, balancing project goals and risks, ensuring data security and privacy, and addressing ethical considerations. This helps ensure the AI system is accurate, robust, secure, and compliant.

In the era of artificial intelligence, software quality assurance emerges as a linchpin for ensuring the reliability, accuracy, and ethicality of AI applications. By focusing on data quality as a foundational element, SA plays a pivotal role in mitigating challenges such as bias, ensuring transparency, and building trust in AI systems. As technology continues to advance, a proactive and adaptive approach to SA will be crucial in unlocking the full potential of AI while safeguarding against unintended consequences. Through continuous improvement, collaboration, and a commitment to ethical standards, the marriage of SA and AI promises a future where intelligent systems enhance our lives responsibly and reliably.

1.3 Additional Guidance

Links to Additional Guidance materials for this subject have been compiled in the Relevant Links table. Click here to see the

Tablink2

tab	3
linktext	Additional Guidance

in the Resources tab.

Div

id	tabs-2

2. Generative AI Metrics

Defining metrics for the complexity of generative AI models requires tailoring your evaluation to the unique characteristics of these systems—such as their architecture, size, computational requirements, capabilities, and usability. Unlike traditional software complexity metrics (e.g., cyclomatic complexity), generative AI complexity is often evaluated through model-specific engineering, mathematical, and operational characteristics. Below is a comprehensive guide to key metrics that you can use:

2.1 Architectural Complexity

Metrics:

Number of Layers: The depth or number of layers in the neural network architecture (e.g., 12 layers for GPT-2, 96 layers for GPT-4).
Parameters: Total number of trainable parameters (e.g., billions for most large language models).
Attention Heads: In transformer models, attention heads drive complexity. Evaluate the number of heads per layer and their interactions.
Non-Linearity: Measure the types and number of activation functions (e.g., ReLU, GELU).

Why Important:

These metrics indicate the model's capacity to learn complex patterns. Larger and deeper architectures typically have higher expressivity but come with increased computational cost.

2.2 Computational Complexity

Metrics:

Floating-Point Operations per Second (FLOPs): The number of computations required to perform training and inference.
Memory Requirements: GPU or RAM usage during training and inference—especially significant for deployment on constrained systems.
Inference Time: Latency in generating outputs. Faster inference models are considered less complex and more efficient.
Power Consumption: Energy required for training and inference, relevant for sustainable AI practices.

Why Important:

These metrics determine the model's scalability and operational costs for deployment and training. For example, models with high FLOPs and memory requirements are often harder to scale.

2.3 Model Representational Complexity

Metrics:

Expressive Power: The ability of the model to learn and represent complex functions or dynamics.
Entropy of Outputs: Capturing the diversity and unpredictability of model outputs during inference.
Embedding Space Size: The dimensionality of the embeddings used internally (e.g., 768 for GPT-2, 4096 for GPT-4).

Why Important:

These metrics highlight how effectively the model can generalize across diverse tasks and inputs while maintaining rich representations.

2.4 Training Complexity

Metrics:

Dataset Size: The volume of training data required (e.g., tokens or examples in billions for large models).
Training Iterations: Number of epochs or updates needed to achieve convergence.
Learning Rate Dynamics: The adaptation of learning rates during training, which impacts convergence speed.
Optimization Complexity: Evaluate the type of optimizer used (e.g., Adam vs. AdaFactor) and its configuration.

Why Important:

High training complexity can imply longer development times and greater hardware requirements to train a model properly.

2.5 Fine-Tuning and Adaptation Complexity

Metrics:

Number of Parameters Adapted: How much of the model can or must be fine-tuned for specific tasks (e.g., fine-tuning full models vs. adapter layers in PEFT [Parameter-Efficient Fine-Tuning]).
Data Requirements for Fine-Tuning: The amount of task-specific data required to adapt the model.
Domain Generalization: The model’s ability to generalize across new domains without full retraining.

Why Important:

Assessing fine-tuning complexity helps determine the model’s usability for downstream applications.

2.6 Output Complexity

Metrics:

Sequence Length: The maximum number of tokens or characters the model can process or generate in a single inference step.
Coherence Score: How logically connected the outputs are over long sequences (subjective or algorithmic measures).
Temperature and Diversity: Configurations used during inference and their influence on creativity or randomness of generative outputs.

Why Important:

Output complexity impacts the quality and usability of generative and conversational results, especially for tasks requiring coherence, relevance, or creativity.

2.7 Interpretability Complexity

Metrics:

Explainability: How easy it is to understand the internal workings of the model (e.g., decision-making pathways or attention distributions).
Saliency Maps: Highlights in the input that influence the outputs, which are useful for interpretability tools.
Layer Contribution Analysis: Understanding which layers contribute most to model performance.
Bias and Fairness Audits: The complexity of detecting and mitigating bias in the model outputs.

Why Important:

Interpretability metrics are crucial for ethical AI deployment and trust-building in sensitive applications.

2.8 Real-World Deployment Complexity

Metrics:

Scalability: How easy it is to scale up or down the model architecture for different hardware configurations.
Latency: The time taken for the model to respond or process input in real-world usage scenarios.
API Complexity: The ease or difficulty of integrating the model into applications (e.g., REST APIs vs. custom libraries).
Security and Robustness: Complexity of ensuring the model is robust to adversarial attacks or misuse.

Why Important:

Deployment complexity plays a significant role in practical utility, customer satisfaction, and security of generative AI solutions.

2.9 Best Practices for Defining Metrics

Task-Specific Design: Tailor metrics to your specific use case, whether it's text generation, image generation, or conversational AI.
Benchmarking: Use standard benchmarks such as GLUE, SuperGLUE, BLEU, ROUGE, or human evaluation to assess performance alongside complexity.
Holistic View: Combine several complexity metrics for a more complete picture (architectural, computational, and deployment complexity).
Comparative Analysis: Compare your model against others (e.g., GPT, BERT, DALL-E) to contextualize complexity scores.

2.10 Tools and Frameworks for Complexity Evaluation

Example: You can use tools like these for computation-heavy components:

Weights & Biases (W&B): For tracking FLOPs, memory use, and other training metrics.
Hugging Face Benchmarking Tools: For evaluating inference performance.
Explainability Libraries: Captum, SHAP, or LIME for interpretability complexity.
Energy Usage Estimators: Like CodeCarbon, to assess power consumption.

By defining and measuring these complexity metrics, you can assess generative AI models more effectively, ensure performance optimization, and improve deployment decisions.

This checklist provides comprehensive data and evidence required to certify software for human-rated missions.

It ensures compliance with applicable safety standards, regulatory requirements (NASA NPR 7150.2D

Swerefn

refnum	083

, SSP 50038

Swerefn

refnum	014

, FAA 450.141 Computing systems, NASA-STD-8739.8B

Swerefn

refnum	278

), mission-critical functionality, and stakeholder acceptance of residual risks, demonstrating that the software is safe, reliable, and mission-ready for crewed spaceflight operations.

Excerpt Include

	SITE:PAT-082 - Software Certification in Human-Rated Missions Checklist
	SITE:PAT-082 - Software Certification in Human-Rated Missions Checklist
nopanel	true

Div

id	tabs-2

2. Key Software Compliance Data Needs

2.1 Summary Table of Key Software Compliance Data Needs

Requirements	Access to the software requirements or software user stories, software acceptance criteria, software use cases, software functional specifications, software behaviors, software feature descriptions, or software tickets. Access to the System/Software requirements traceability data Access to the hazard control software requirements traceability data Explanation on software safety constraints and assumptions Access to the software requirements analysis results
Design	Access to the software design Explanation on software fault containment and management approach Access to the software design analysis results Access to the Interface Control Documents (ICDs) for software interfaces Access to the data dictionary data Data demonstrating prerequisite checks for all hazardous commands Explanation on how the software design and requirements meet the safety critical requirements Explanation on how the safety-critical code is modifiable Demonstrate that the applicable human factors design principles are in place in the software
Development	Access to the software operational plans or user’s manual Explanation of the Fault Tolerance approach used in the design Access to the source code Approval data for all AI uses in flight software before deployment. Demonstrate that the displayed confidence levels are correct for all critical AI decisions and predictions. Demonstrate that the AI systems used in critical space flight software are managed with transparency to maintain safety, reliability, and accountability. Access to the software defect reports (open/closed with severity), anomaly logs, residual risk acceptance with stakeholder sign-off Access to all tailoring/waiver/deviation and TBR/TBD/FWD closure records. Evidence that the approved software processes were followed during all software development activities Evidence of closer of peer reviews/inspection findings for requirements, architecture/design, code and test artifacts review Evidence of qualification and pedigree of reused/third‑party software Provide an understanding of the certifications process for the software models & simulations used to qualify flight software. Data showing the Worst‑Case Execution Time (WCET), timing analysis, and scheduling evidence, including partitioning and multicore interference analysis where relevant.
Verification & Validation	Access to the software test results Access to the MC/DC coverage data for safety-critical components Access to the static and dynamic analysis testing approach and results Demonstrate the fault injection resilience under worst-case conditions Demonstrate how the software handles spurious signals and invalid inputs Provide an understanding of the certifications process for the software test environments used to qualify flight software. Provide results of a cyclomatic complexity analyses Provide a list of identified software issues, anomalies, and defects, with clear and traceable justifications for acceptance of any unresolved items. Provide data showing the operational test results and data confirming system resource margins under nominal and worst-case scenarios, including CPU/memory/throughput margins data Access to the IV&V reports and findings and an understanding of the IV&V scope
Hazard Analysis	Access to the software related hazard analysis reports, fault tree analysis, and FMEA addressing all software hazards and mitigation strategies. Demonstrate that all AI systems generate warnings and recommendations for anomalies that could lead to hazards Demonstrate the operator Action Validation Explanation of the software safing Procedures
Configuration Management	Access to the software documentation, Access to the software change Logs Access to the software version control, baseline definitions, and audit records for all changes.
Operational Procedures	Explanation of the software control Sequences, Explanation of any software manual safing data and processes
Cybersecurity	Demonstrate that the software has encryption, authentication, meets secure coding practices, and has access control Access to the results showing penetration testing and vulnerability analysis

2.2 Key Software Compliance Data Needs Rationale

Section: Requirements

Item	Rationale
Access to the software requirements or software user stories, software acceptance criteria, software use cases, software functional specifications, software behaviors, software feature descriptions, or software tickets	Demonstrated the software capability definition and software testing criteria. SWEHB emphasizes complete, correct, validated SRS content as the foundation for safe, verifiable software; early clarity prevents costly downstream defects and safety gaps.
Access to the System/Software requirements traceability data	Bi‑directional traceability is required to prove every design, code, and test artifact maps to a validated system need—preventing orphan functionality and ensuring hazards are controlled.
Access to the hazard control software requirements traceability data	Hazard controls must trace explicitly from hazard analysis to software requirements and back to verification evidence to avoid missing or partial mitigations.
Explanation on software safety constraints and assumptions	Documented constraints and assumptions prevent hidden coupling and incorrect operating expectations—frequent root causes in lessons learned for hazardous behavior.
Access to the software requirements analysis results	Structured requirements analysis (ambiguity checks, safety impacts, HSI considerations) provides objective evidence that requirements are testable, feasible, and complete before design.

Section: Design

Item	Rationale
Access to the software design	Architecture/design descriptions enable validation of partitioning, redundancy, timing, and safety mechanisms; undocumented design is a recurring integration‑failure source in SWEHB.
Explanation on software fault containment and management approach	A clear FDIR strategy limits fault propagation, preserves redundancy, and supports safe recovery for safety‑critical functions.
Access to the software design analysis results	Design reviews/analyses expose interface risks, safety gaps, and performance limits early, reducing costly rework during integration/test.
Access to the Interface Control Documents (ICDs) for software interfaces	ICDs prevent interface mismatch failures—one of the most common categories of integration defects called out in SWEHB checklists.
Access to the data dictionary data	Data dictionaries provide authoritative definitions and units, preventing misinterpretation and unit conversion defects across teams/tools. (SWEHB practice captured in certification checklist.)
Data demonstrating prerequisite checks for all hazardous commands	Prerequisite checks are explicitly required for safety‑critical design to block unsafe command execution unless the system state is validated.
Explanation on how the design and requirements meet safety‑critical requirements	Explicit mapping from safety‑critical requirements to design elements verifies that every control is implemented and testable—closing common gaps seen in hazard mitigations.
Explanation on how the safety‑critical code is modifiable	Modular, isolated safety‑critical code reduces regression risk during updates and enables focused assurance/maintenance. (Emphasized in SWEHB certification guidance.)
Demonstrate applicable human‑factors design principles in the software	HSI evidence (clear displays, alerts, error tolerance) mitigates operator error and automation surprise—frequent contributors to unsafe states in lessons learned and SWEHB checklists.

Section: Development

Item	Rationale
Access to the software operational plans or user’s manual	Operational manuals drive correct crew/operator actions; unclear procedures have historically led to unsafe system states—documented guidance is essential.
Explanation of the Fault Tolerance approach used in the design	Development evidence must confirm the implemented redundancy, monitoring, and recovery match the approved design/FDIR strategy.
Access to the source code	Assurance/IV&V require code access to assess correctness, safety patterns, and compliance; “code is the design” for verification in SWEHB practice.
Approval data for all AI uses in flight software before deployment	The human‑rated certification checklist in SWEHB requires governance of novel tech; AI decisions must be bounded, transparent, and verified before mission use.
Demonstrate correct AI confidence levels for critical decisions/predictions	Accurate confidence cues preserve appropriate human trust; misleading confidence can induce hazardous operator actions (SWEHB human‑rated checklist).
Demonstrate AI transparency/accountability in critical flight software	Transparency and auditable behavior enable assurance teams to verify AI outputs, edge cases, and failure handling consistent with safety requirements (SWEHB PAT guidance).
Access to defect reports, anomaly logs, and residual risk acceptance	Defect trends and residual‑risk approvals provide readiness evidence and ensure risks are visible and accepted prior to flight (SWEHB PAT‑082).
Access to tailoring/waiver/deviation and TBR/TBD/FWD closures	Tailoring/waiver records show what SWEHB/NPR expectations were modified and how verification gaps were closed—preventing undocumented process escapes.
Evidence that approved processes were followed during development	SWEHB lessons emphasize objective evidence (reviews, audits, checklists) to confirm disciplined execution—avoiding “process‑on‑paper” failures.
Evidence of closure of peer review/inspection findings	Peer reviews are high‑value defect prevention in SWEHB; closure evidence ensures findings are resolved, not deferred into flight risk.
Evidence of qualification/pedigree of reused/third‑party software	SWEHB certification content calls for assessing provenance, constraints, known defects, and verification completeness to control integration risk.
Certification process for models & simulations used to qualify flight software	SWEHB requires validated/accredited models/sims for qualification; unvalidated M&S can mask defects and create false confidence.
WCET/timing/scheduling evidence (partitioning & multicore interference)	Deterministic timing & schedulability evidence prevent deadline misses and resource starvation—common latent hazards noted in SWEHB certification checklists.

Section: Verification & Validation

Item	Rationale
Access to the software test results	Test evidence must trace to requirements and hazards to prove correctness and safety under nominal/off‑nominal conditions (SWEHB).
Access to MC/DC coverage data for safety‑critical components	For safety‑critical logic, MC/DC helps reveal untested decision paths—improving confidence that critical branches cannot hide defects (SWEHB practice).
Static and dynamic analysis approach/results	Static/dynamic analyses expose memory, concurrency, and API defects early, complementing functional tests with deeper correctness checks (SWEHB PAT‑082).
Fault‑injection resilience under worst‑case conditions	Fault‑injection validates robustness of error handling and safe‑state transitions under realistic fault scenarios (SWEHB).
Handling spurious signals and invalid inputs	Defensive input handling and integrity checks are SWEHB safety‑critical design requirements to prevent unsafe behavior from noise or corruption.
Certification of software test environments	Accredited test environments assure fidelity so test results represent flight behavior—preventing false positives/negatives (SWEHB PAT‑082).
Cyclomatic complexity analyses results	Complexity correlates with defect likelihood and test effort; tracking supports risk‑based testing and maintainability (SWEHB checklist).
Issues/anomalies/defects list with unresolved‑item justifications	Residual issues require documented operational mitigations and rationale to ensure informed risk acceptance before flight (SWEHB).
Operational test results & resource margins (CPU/memory/throughput)	Resource margin evidence demonstrates the system sustains mission loads without entering unsafe degraded states (SWEHB).
Access to IV&V reports/findings & IV&V scope understanding	Independent assessment adds rigor and identifies latent risks; scope clarity ensures findings are dispositioned (SWEHB/SMA IV&V discussion).

Section: Hazard Analysis

Item	Rationale
Hazard analysis, FTA, FMEA	Hazard analyses formally identify software causes/contributions and define controls; they are the backbone for safety‑critical requirements and tests (SWEHB).
AI warnings & recommendations for anomalies	AI components must surface anomaly cues and recommended mitigations in time for operator/software safing to prevent hazard escalation (SWEHB PAT‑082 scope).
Operator Action Validation	Operator action validation ensures human‑software interactions are unambiguous and error‑tolerant—reducing unsafe actions under stress (SWEHB checklist).
Software safing procedures	Safing procedures must reliably place the system into known safe states on detection of hazardous conditions—captured in SWEHB safety‑critical design items.

Section: Configuration Management

Item	Rationale
Access to the software documentation	CM control of documentation prevents divergence between what is built, tested, and flown—frequent contributors to integration defects (SWEHB PAT‑082).
Access to the software change logs	Change history enables traceability and impact assessment for safety‑critical components/interfaces (SWEHB).
Version control, baseline definitions, audit records	Version/baseline and audits ensure the exact tested configuration proceeds to flight and prevent unauthorized changes (SWEHB checklist).

Section: Operational Procedures

Item	Rationale
Explanation of the software control sequences	Clear control sequences reduce operator confusion and ensure predictable behavior during time‑critical operations (SWEHB).
Explanation of any software manual safing data/processes	Manual safing provides a human‑initiated fallback when automation fails; procedures must be explicit, tested, and accessible (SWEHB).

Section: Cybersecurity

Item	Rationale
Encryption, authentication, secure coding, access control	Security controls protect command authority, data integrity, and system availability—security weaknesses can become safety hazards (SWEHB).
Penetration testing and vulnerability analysis results	Pen/vuln testing finds exploitable weaknesses before flight, supporting remediation and risk reduction (SWEHB checklist practice).

Div

id	tabs-3

3. Example Potential Safety Case for Human-Rated Software Certification

This safety case demonstrates that the software used in this human-rated mission adheres to rigorous safety, quality, and regulatory standards. Based on the evidence provided, the software is flight-ready and capable of supporting critical mission operations while ensuring the safety of the crew and spacecraft under both nominal and adverse conditions.

1. Requirements and Traceability

Argument: The software requirements are clearly defined, traceable, and aligned with safety-critical mission needs.
Evidence:
- Comprehensive Software Requirements Specification (SRS) covering high-level mission-critical systems (e.g., navigation, propulsion, anomaly detection, life support, and abort operations).
- Verified safety requirements (fault tolerance, redundancy, and safe initialization/termination).
- Acceptable quality of detailed low-level safety-critical requirements, including specifics like algorithm designs and timing constraints.
- A completed and validated Requirements Traceability Matrix (RTM) showing bi-directional traceability from requirements through design, code, and test results.
- Reviewed system-level safety analyses to document "Must Work" (MWF) and "Must Not Work" (MNWF) requirements, prerequisite checks for hazardous commands, and mitigation strategies.

2. Software Design and Architecture

Argument: The software architecture is resilient, modular, and designed for fault tolerance and safety-critical operations.
Evidence:

Architecture documentation detailing modular fault isolation, redundancy, and resiliency mechanisms.
Block diagrams illustrating fault containment, fail-safe control paths, and separation of critical functions.
Documentation and analysis of safety-critical subsystems (e.g., propulsion, crew displays, navigation) with clearly defined responsibilities.
Verified Interface Control Documents (ICDs), ensuring compatibility between internal software, hardware systems, and external interactions.
Safety validation evidence for safeguards like fault containment, error detection, operator validation, integrity checks, and anomaly recovery processes.
Independent redundant system designs ensuring physical and logical separation to mitigate single points of failure.
Validation of fault-tolerant mechanisms, including cosmic radiation protection in CPU designs.

3. Hazard Analysis and Safety Evidence

Argument: All hazards associated with software functionality are identified, analyzed, and mitigated to acceptable levels of risk.
Evidence:

A complete Hazard Analysis Report (HAR) identifying software-driving hazards and the mitigation strategies in place.
Fault Tree Analysis (FTA) and Failure Mode and Effects Analysis (FMEA) showing robust fault prevention and recovery mechanisms or a completed System Theoretic Process Analysis (STPA) showing robust fault prevention and recovery mechanisms.
Time-to-effect (TTE) analyses ensuring hazardous conditions can be addressed by safing systems within operational thresholds.
Residual risk documentation showing resolution or acceptance of remaining risks by stakeholders.

4. Verification and Validation (V&V) Evidence

Argument: Rigorous testing, validation, and coverage analyses demonstrate software compliance with safety-critical requirements.
Evidence:
- 100% Statement Coverage.
- 100% Decision Coverage.
- 100% Modified Condition/Decision Coverage (MC/DC) for safety-critical components.

Unit testing, system integration testing, end-to-end validation, and operational flight simulations confirming that expected functional performance aligns with safety goals.
Validation of reused components (COTS, GOTS, OSS, MOTS) to ensure compatibility and reliable integration into human-rated environments.
Coverage analysis demonstrating:
Static analysis reports showing compliance with coding standards and identification/remediation of software defects.
Fault injection testing results validating responses to corrupted data, anomalies during power disruptions, and memory errors.
Worst-case response timing analysis confirming safing systems meet TTE requirements under degraded conditions.

5. Configuration Management and Change Tracking

Argument: Configuration management processes ensure version control and traceability for all software changes.
Evidence:
- Documentation showing version-controlled baselines for flight-ready software and data loads, including configuration hashes and release notes.
- Audit records verifying modifications, regression testing, impact analyses, and stakeholder approvals

6. Cybersecurity and Security Validation

Argument: The software architecture incorporates robust cybersecurity measures to mitigate threats in operation environments.
Evidence:
- Security validation reports demonstrating encryption protocols, authentication mechanisms, access control, and secure coding practices.
- Penetration testing results validating resilience against cyberattacks and unauthorized system access during pre-launch and flight.
- Vulnerability analysis reports confirming detection, resolution, and closure of security-related risks.

7. Defect Management and Residual Risks

Argument: All software defects have been resolved or mitigated to acceptable levels of residual risk.
Evidence:

Defect reports showing all open and closed defects categorized by severity and justifications for acceptance of residual risks.
Logs documenting defect resolutions and testing data validating the outcomes of mitigation measures.
Residual risk acceptance documentation signed off by stakeholders, with sufficient evidence showing safe system behavior despite unresolved minor risks.

8. Resource Utilization and Performance Metrics

Argument: The software demonstrates sufficient resource margins and acceptable performance under normal and worst-case conditions.
Evidence:

Validation test results confirming acceptable command execution timing (e.g., abort triggers).
Operating analysis showing CPU utilization below 80% even under maximum load conditions.
Methods for anomaly detection and recovery to safe states outlined and validated.

9. Team Training and Software Process Compliance

Argument: Development teams adhere to validated processes and are properly trained in safety-critical mission standards.
Evidence:
- Records of team training addressing human-rated software workflows, defect management, and compliance with coding guidelines.
- Process compliance reports documenting adherence to validated development processes.
- Operator manuals ensuring deliberate, independent actions are necessary to execute critical safety commands

10. Certification and Regulatory Compliance

Argument: The software complies with all applicable standards and safety regulations for human-rated missions.
Evidence:

Certification artifacts for compliance with standards like NASA NPR 7150.2D
Swerefn
refnum 083
,
NASA SSP 50038
Swerefn
refnum 014
, FAA requirements, and NASA-STD-8739.8B
Swerefn
refnum 278
.
IV&V certification reports confirming operational maturity and compliance with safety standards by independent entities.
Regulatory compliance statements from authorities certifying readiness for human-rated missions.
Validation of software updates (patched or upgraded) ensuring continued compliance with safety requirements.

11. Flight Readiness Review (FRR) Certification

Argument: The software is flight-ready and capable of safely supporting mission operations.
Evidence:

Software Version Description Document (VDD) completion demonstrating proper documentation of the deployed software.
Final test results confirming readiness during flight operations in all mission environments.
FRR exit criteria signed off by stakeholders, certifying acceptance or resolution of all known risks, hazards, defects, and anomalies.

12. Flight Software Structural Quality

Argument: The software architecture and implementation are structurally sound and meet all quality standards for safety-critical applications.
Evidence:

Cyclomatic complexity analysis showing all safety-critical components meet thresholds (≤ 15).
Documentation verifying fault-tolerant mechanisms for error handling, failure recovery, and system operation under degraded conditions.
Maintainability analysis supporting modular coding practices for long-term sustainability and easy updates.
Code quality reports validating compliance with architecture, standards, security, and testability requirements.

Div

id	tabs-4

4. Resources

4

Div

id	tabs-3

3. Resources

3

.1 References

refstable-topic

Show If

group	confluence-users

Panel

titleColor	red
title	Instructions for Editors

Expand

Enter the necessary modifications to be made in the table below:

SWEREFs to be added	SWEREFS to be deleted
7150.2D - SWEREF-039 - C2PA AI ML Specification: Guidelines for trustworthy AI development	Deleted per Tim
EU GDPR: Data privacy regulations applicable to AI systems	Deleted per Tim
-083
SSP 50038 - SWEREF-014
8739.8B - SWEREF-278	IEC 62150: International standard for functional safety of electrical/electronic/programmable electronic safety-related systems	Deleted per Tim

SWEREFs called out in text: none014, 083, 278

SWEREFs NOT called out in text but listed as germane: none

4.2 Tools

Include Page

	Tools Table Statement
	Tools Table Statement

3

4.3 Additional Guidance

Additional guidance related to this requirement may be found in the following materials in this Handbook:

4.4 Center Process Asset Libraries

Excerpt Include

	SITE:SPAN
	SITE:SPAN
nopanel	true

See the following link(s) in SPAN for process assets from contributing Centers (NASA Only).

SPAN Links

Show If

label	activity

3

4.5 Related Activities

This Topic is related to the following Life Cycle Activities:

Content

Space Tools

Page History

Versions Compared

Old Version 1

New Version 26

Key

1. Introduction

1.1 Data Quality

1.1.1 Validating The Training Data

1.1.2 Continuous Testing

1.1.3 Documentation And Traceability

1.1.4 Leveraging AI/ML For SA

1.1.5 Defining Quality Criteria

1.1.6 Balancing Quality Goals

1.1.7 Ensuring Data Security And Privacy

1.1.8 Addressing Bias Considerations

1.1.9 Requirements Verification And Validation

1.2 Approach

1.3 Additional Guidance

2. Generative AI Metrics

2.1 Architectural Complexity

2.2 Computational Complexity

2.3 Model Representational Complexity

2.4 Training Complexity

2.5 Fine-Tuning and Adaptation Complexity

2.6 Output Complexity

2.7 Interpretability Complexity

2.8 Real-World Deployment Complexity

2.9 Best Practices for Defining Metrics

2.10 Tools and Frameworks for Complexity Evaluation

2. Key Software Compliance Data Needs

2.1 Summary Table of Key Software Compliance Data Needs

2.2 Key Software Compliance Data Needs Rationale

3. Example Potential Safety Case for Human-Rated Software Certification

4. Resources

4

3. Resources

.1 References

Related Links Pages

4.2 Tools

4.3 Additional Guidance

4.4 Center Process Asset Libraries

4.5 Related Activities