...

Set Data

hidden	true
name	reftab

Show If

spacePermission	edit

Panel

borderColor	red
title	Visible to editors only

Expand
Content updates needed on this page:

Tabsetup

0	1. Introduction
1	2. Generative AI Metrics
2	3. Resources

Div

id	tabs-1

1. Introduction

Excerpt
As NASA organizations use Artificial Intelligence (AI) to make decisions, the need for robust Software Assurance (SA) becomes important.

(AI) has emerged as a project option, changing the way we interact with and use data. Recommend, at this time, that we limit to use of AI to non-safety critical applications.

Software Assurance plays a crucial role in the AI development lifecycle. It involves a systematic process of monitoring, assessing, and improving the software development and AI implementation processes. In the context of AI, SA aims to:

evaluate the use of AI in software development activities,
validate the accuracy of algorithms,
ensure model robustness, and
ensure that the software meets the highest standards of performance.

AI introduces unique challenges to traditional SA methodologies. Unlike conventional software, AI systems continuously learn and adapt based on new data. This dynamic nature poses challenges in establishing fixed criteria for testing and validation. SA in AI must evolve alongside the models it scrutinizes, necessitating a more iterative and adaptive approach.

1.1 Data Quality

The quality of the data used to train and test AI models directly influences their performance and reliability. Garbage in, garbage out (GIGO) holds in the world of AI, emphasizing the critical importance of high-quality data. SA should also ensure the security and configuration control aspects of the data used to train and test AI. Data quality encompasses several dimensions, including accuracy, completeness, consistency, timeliness, and relevance. In the context of AI, accuracy is particularly vital, as even small inaccuracies in the training data can lead to significant errors in predictions. Ensuring that data is representative and unbiased is also crucial to avoid reinforcing existing biases within AI models.

There are a number of key considerations for software assurance on AI models:

1.1.1 Validating The Training Data

SA needs to review (evaluate and approve) how the project and engineering are testing the data used to train the AI model to ensure it is accurate, representative, and unbiased. This helps identify and mitigate potential issues with the model's performance and predictions. Effective data preprocessing and cleaning are foundational steps in ensuring data quality for AI. This involves identifying and addressing missing values, handling outliers, and normalizing data to create a standardized and reliable dataset. SA processes must include rigorous checks at these stages to guarantee that the data fed into AI models is of the highest quality. SA should analyze and think through all possible scenarios to make sure that the AI actions are correct.

1.1.2 Continuous Testing

Unlike traditional software, AI models that are generative or continually learning may require ongoing testing even after deployment, if the model continues to learn and evolve with new real-world data. SA needs to ensure that the software engineering process includes repeatedly testing the model's behavior and performance over time and evaluating the results. Testing AI models is a multifaceted process. It involves validating the accuracy of predictions, assessing model generalization to new data, and evaluating performance under diverse conditions. The dynamic nature of some AI techniques may require continuous testing throughout the development lifecycle. Rigorous testing not only ensures the reliability of AI applications but also contributes to building trust among end-users. The dynamic nature of AI necessitates continuous monitoring and feedback loops. SA processes should include mechanisms for monitoring of AI applications in development, production, and in testing. This ongoing evaluation helps identify and address issues promptly, ensuring that the AI system remains accurate and effective as it encounters new data.

1.1.3 Documentation And Traceability

Comprehensive documentation and traceability are essential aspects of AI applications. Documenting the entire development and testing process, along with the data used, facilitates transparency allowing for effective debugging and auditing. In the event of issues or unexpected outcomes, traceability enables developers and software assurance teams to identify and rectify problems efficiently. SA needs to ensure that sufficient documentation and traceability exist for the AI application and associated data. SA should ensure if the model needs to be recreated, then all of the inputs need to be documented so that they can be fed into the AI tool and produce the same/correct results.

1.1.4 Leveraging AI/ML For SA

Automated tools and machine learning models can be used to analyze large codebases, identify patterns, and predict potential issues more efficiently than manual review. SA needs to analyze the engineering data or perform independent static code analysis to check for code defects, software quality objectives, code coverage objectives, software complexity values, and software security objectives. SA needs to confirm the static analysis tool(s) are used with checkers to identify security and coding errors and defects. SA needs to confirm that the project addresses the errors and defects and assesses the results from the static analysis tools used by software assurance, software safety, engineering, or the project. SA should confirm that Software Quality Objectives or software quality threshold levels are defined and set for static code analysis defects, or software security objectives.

1.1.5 Defining Quality Criteria

It's important to establish clear quality criteria and variables to assess the AI model's performance, quality, risk, security, maintainability, and compliance with requirements. The SA process should confirm that these criteria are defined and aim to continuously improve the code quality towards all of the criteria.

1.1.6 Balancing Quality Goals

While the goal is to maximize code quality, projects must also consider factors like features, costs, and schedule. The SA process should help identify the critical vulnerabilities to address based on objectives, risks, and schedules.

1.1.7 Ensuring Data Security And Privacy

Protecting sensitive data used to train and operate the AI model is important. Appropriate access controls and security measures must be in place. SA should confirm that the project has the proper security controls in place and configuration management for the project.

1.1.8 Addressing Bias Considerations

One of the most significant challenges in AI software assurance is addressing bias. Biased training data can lead to incorrect outcomes. Software assurance should look at how the project identifies and mitigates bias in the data and the algorithms. SA should also ensure ongoing monitoring and adjustment are used to ensure objectivity in AI applications.

1.1.9 Requirements Verification And Validation

AI applications introduce a unique challenge of indeterministic behavior. The ability to verify and validate the behavior of AI systems may not be predictably repeatable. SA should work to develop verification and validation criteria that address the probabilistic nature of AI systems to establish qualitative measures for validation and verification test acceptance, pass, and fail criteria.

1.2 Approach

In summary, software assurance for AI models requires a comprehensive approach that focuses on validating training data, continuous testing, leveraging AI/ML tools, defining quality benchmarks, balancing project goals and risks, ensuring data security and privacy, and addressing ethical considerations. This helps ensure the AI system is accurate, robust, secure, and compliant.

In the era of artificial intelligence, software quality assurance emerges as a linchpin for ensuring the reliability, accuracy, and ethicality of AI applications. By focusing on data quality as a foundational element, SA plays a pivotal role in mitigating challenges such as bias, ensuring transparency, and building trust in AI systems. As technology continues to advance, a proactive and adaptive approach to SA will be crucial in unlocking the full potential of AI while safeguarding against unintended consequences. Through continuous improvement, collaboration, and a commitment to ethical standards, the marriage of SA and AI promises a future where intelligent systems enhance our lives responsibly and reliably.

1.3 Additional Guidance

Links to Additional Guidance materials for this subject have been compiled in the Relevant Links table. Click here to see the

Tablink2

tab	3
linktext	Additional Guidance

in the Resources tab.

Div

id	tabs-2

2. Generative AI Metrics

Defining metrics for the complexity of generative AI models requires tailoring your evaluation to the unique characteristics of these systems—such as their architecture, size, computational requirements, capabilities, and usability. Unlike traditional software complexity metrics (e.g., cyclomatic complexity), generative AI complexity is often evaluated through model-specific engineering, mathematical, and operational characteristics. Below is a comprehensive guide to key metrics that you can use:

2.1 Architectural Complexity

Metrics:

Number of Layers: The depth or number of layers in the neural network architecture (e.g., 12 layers for GPT-2, 96 layers for GPT-4).
Parameters: Total number of trainable parameters (e.g., billions for most large language models).
Attention Heads: In transformer models, attention heads drive complexity. Evaluate the number of heads per layer and their interactions.
Non-Linearity: Measure the types and number of activation functions (e.g., ReLU, GELU).

Why Important:

These metrics indicate the model's capacity to learn complex patterns. Larger and deeper architectures typically have higher expressivity but come with increased computational cost.

2.2 Computational Complexity

Metrics:

Floating-Point Operations per Second (FLOPs): The number of computations required to perform training and inference.
Memory Requirements: GPU or RAM usage during training and inference—especially significant for deployment on constrained systems.
Inference Time: Latency in generating outputs. Faster inference models are considered less complex and more efficient.
Power Consumption: Energy required for training and inference, relevant for sustainable AI practices.

Why Important:

These metrics determine the model's scalability and operational costs for deployment and training. For example, models with high FLOPs and memory requirements are often harder to scale.

2.3 Model Representational Complexity

Metrics:

Expressive Power: The ability of the model to learn and represent complex functions or dynamics.
Entropy of Outputs: Capturing the diversity and unpredictability of model outputs during inference.
Embedding Space Size: The dimensionality of the embeddings used internally (e.g., 768 for GPT-2, 4096 for GPT-4).

Why Important:

These metrics highlight how effectively the model can generalize across diverse tasks and inputs while maintaining rich representations.

2.4 Training Complexity

Metrics:

Dataset Size: The volume of training data required (e.g., tokens or examples in billions for large models).
Training Iterations: Number of epochs or updates needed to achieve convergence.
Learning Rate Dynamics: The adaptation of learning rates during training, which impacts convergence speed.
Optimization Complexity: Evaluate the type of optimizer used (e.g., Adam vs. AdaFactor) and its configuration.

Why Important:

High training complexity can imply longer development times and greater hardware requirements to train a model properly.

2.5 Fine-Tuning and Adaptation Complexity

Metrics:

Number of Parameters Adapted: How much of the model can or must be fine-tuned for specific tasks (e.g., fine-tuning full models vs. adapter layers in PEFT [Parameter-Efficient Fine-Tuning]).
Data Requirements for Fine-Tuning: The amount of task-specific data required to adapt the model.
Domain Generalization: The model’s ability to generalize across new domains without full retraining.

Why Important:

Assessing fine-tuning complexity helps determine the model’s usability for downstream applications.

2.6 Output Complexity

Metrics:

Sequence Length: The maximum number of tokens or characters the model can process or generate in a single inference step.
Coherence Score: How logically connected the outputs are over long sequences (subjective or algorithmic measures).
Temperature and Diversity: Configurations used during inference and their influence on creativity or randomness of generative outputs.

Why Important:

Output complexity impacts the quality and usability of generative and conversational results, especially for tasks requiring coherence, relevance, or creativity.

2.7 Interpretability Complexity

Metrics:

Explainability: How easy it is to understand the internal workings of the model (e.g., decision-making pathways or attention distributions).
Saliency Maps: Highlights in the input that influence the outputs, which are useful for interpretability tools.
Layer Contribution Analysis: Understanding which layers contribute most to model performance.
Bias and Fairness Audits: The complexity of detecting and mitigating bias in the model outputs.

Why Important:

Interpretability metrics are crucial for ethical AI deployment and trust-building in sensitive applications.

2.8 Real-World Deployment Complexity

Metrics:

Scalability: How easy it is to scale up or down the model architecture for different hardware configurations.
Latency: The time taken for the model to respond or process input in real-world usage scenarios.
API Complexity: The ease or difficulty of integrating the model into applications (e.g., REST APIs vs. custom libraries).
Security and Robustness: Complexity of ensuring the model is robust to adversarial attacks or misuse.

Why Important:

Deployment complexity plays a significant role in practical utility, customer satisfaction, and security of generative AI solutions.

2.9 Best Practices for Defining Metrics

Task-Specific Design: Tailor metrics to your specific use case, whether it's text generation, image generation, or conversational AI.
Benchmarking: Use standard benchmarks such as GLUE, SuperGLUE, BLEU, ROUGE, or human evaluation to assess performance alongside complexity.
Holistic View: Combine several complexity metrics for a more complete picture (architectural, computational, and deployment complexity).
Comparative Analysis: Compare your model against others (e.g., GPT, BERT, DALL-E) to contextualize complexity scores.

2.10 Tools and Frameworks for Complexity Evaluation

Example: You can use tools like these for computation-heavy components:

Weights & Biases (W&B): For tracking FLOPs, memory use, and other training metrics.
Hugging Face Benchmarking Tools: For evaluating inference performance.
Explainability Libraries: Captum, SHAP, or LIME for interpretability complexity.
Energy Usage Estimators: Like CodeCarbon, to assess power consumption.

By defining and measuring these complexity metrics, you can assess generative AI models more effectively, ensure performance optimization, and improve deployment decisions.

This checklist provides comprehensive data and evidence required to certify software for human-rated missions.

It ensures compliance with applicable safety standards, regulatory requirements (NASA NPR 7150.2D, SSP 50038, FAA, NASA-STD-8739.8B), mission-critical functionality, and stakeholder acceptance of residual risks, demonstrating that the software is safe, reliable, and mission-ready for crewed spaceflight operations.

PAT for Comprehensive Checklist for Software Certification in Human-Rated Missions

1.3 Additional Guidance

Links to Additional Guidance materials for this subject have been compiled in the Relevant Links table. Click here to see the

Tablink2

tab	3
linktext	Additional Guidance

in the Resources tab.

Div

id	tabs-2

2. Key Compliance Data Needs

2.1 Summary Table of Key Compliance Data Needs

Category	Key Data/Documentation
Requirements	System/Software Requirements Traceability, Hazard Control Requirements
Design	Software Architecture, Fault Containment, Safeguard Integration
Development	Development Plans, Fault Tolerance Implementation Logs
Verification & Validation	Test Results: Initialization, Recovery, Redundancy, IV&V Reports
Hazard Analysis	Hazard Reports, Operator Action Validation, Safing Procedures
Configuration Management	Baseline Documentation, Change Logs
Operational Procedures	Control Sequences, OCAD Validation, Manual Safing Data

2.2 Key Compliance Data Needs

Software Requirements
1. High-level system/software requirements
2. Detailed software requirements (or whatever the developer used)
3. All known software safety constraints
4. Software bi-directional traceability data
5. Specifications for internal and external software interfaces definition and testing
6. Encryption protocols, authentication mechanisms, secure coding practices, and access control procedures.
Software Design
1. Description of software designed
2. Hardware design data on safety-critical subsystems
3. Data Dictionary: input/output data formats, telemetry parameters, and command sequences.
Software Development
1. All software analyses results
2. Completed Time-to-effect (TTE) analysis
3. Completed Fault Tree Analyses
4. Completed Failure Mode and Effects Analysis
5. Software process audit results
6. Developer software process training records
Software Verification and Validation (software testing)
1. Software test data,
2. safety-critical requirements test results,
3. fault Injection Test Results,
4. End-to-End Integration Testing results,
5. Penetration Testing Results (resilience testing and telemetry plans against unauthorized system access and cyberattacks),
6. test results and data showing command execution timing within acceptable,
7. test results and data confirming adequate system resource margins
8. Detailed description of the software test environments
9. software interfaces (internal and external) test results
10. Code test coverage data
11. Software static analysis results reports
12. Number and types of static analysis tools used.
13. Results of a Security Vulnerability Analysis: detected and resolved vulnerabilities in the software's security framework.
14. All of the Independent Verification and Validation (IV&V) assessments results
15. Data showing that the safety-critical software components meet complexity thresholds
16. Evidence that the code structural quality has low risks.
Hazards
1. Hazards and mitigation controls that include software
2. List of any unresolved hazards
CM
1. Processes used for version control, change tracking, and baseline management.
2. Identification of flight-ready software configurations,
Flight readiness and Operations
1. Clear understanding of the operational environment for the mission.
2. Operational procedures for updating the software and data
3. Any software related threats for the operational environment on the software operation
4. List of and access to all open software defects
5. List of and access to all open and closed high-risk software defects.
6. Stakeholder-approved sign-off on any unavoidable operational software related risks.
7. Evidence of adherence to validated development processes, coding guidelines, and testing protocols.
8. Deliverables required for regulatory certification
9. Software Version Description Document (VDD)
10. FRR Exit Criteria Sign-Off for software
11. Crew software user guides, operational procedures, and troubleshooting documentation.
12. Documentation showing mechanisms to handle errors, recover failures, and preserve system operation under degraded conditions.

Div

id	tabs-3

3. Safety Case for Human-Rated Software Certification

This safety case demonstrates that the software used in this human-rated mission adheres to rigorous safety, quality, and regulatory standards. Based on the evidence provided, the software is flight-ready and capable of supporting critical mission operations while ensuring the safety of the crew and spacecraft under both nominal and adverse conditions.

1. Requirements and Traceability

Argument: The software requirements are clearly defined, traceable, and aligned with safety-critical mission needs.
Evidence:
- Comprehensive Software Requirements Specification (SRS) covering high-level mission-critical systems (e.g., navigation, propulsion, anomaly detection, life support, and abort operations).
- Verified safety requirements (fault tolerance, redundancy, and safe initialization/termination).
- Acceptable quality of detailed low-level safety-critical requirements, including specifics like algorithm designs and timing constraints.
- A completed and validated Requirements Traceability Matrix (RTM) showing bi-directional traceability from requirements through design, code, and test results.
- Reviewed system-level safety analyses to document "Must Work" (MWF) and "Must Not Work" (MNWF) requirements, prerequisite checks for hazardous commands, and mitigation strategies.

2. Software Design and Architecture

Argument: The software architecture is resilient, modular, and designed for fault tolerance and safety-critical operations.
Evidence:

Architecture documentation detailing modular fault isolation, redundancy, and resiliency mechanisms.
Block diagrams illustrating fault containment, fail-safe control paths, and separation of critical functions.
Documentation and analysis of safety-critical subsystems (e.g., propulsion, crew displays, navigation) with clearly defined responsibilities.
Verified Interface Control Documents (ICDs), ensuring compatibility between internal software, hardware systems, and external interactions.
Safety validation evidence for safeguards like fault containment, error detection, operator validation, integrity checks, and anomaly recovery processes.
Independent redundant system designs ensuring physical and logical separation to mitigate single points of failure.
Validation of fault-tolerant mechanisms, including cosmic radiation protection in CPU designs.

3. Hazard Analysis and Safety Evidence

Argument: All hazards associated with software functionality are identified, analyzed, and mitigated to acceptable levels of risk.
Evidence:

A complete Hazard Analysis Report (HAR) identifying software-driving hazards and the mitigation strategies in place.
Fault Tree Analysis (FTA) and Failure Mode and Effects Analysis (FMEA) showing robust fault prevention and recovery mechanisms.
Time-to-effect (TTE) analyses ensuring hazardous conditions can be addressed by safing systems within operational thresholds.
Residual risk documentation showing resolution or acceptance of remaining risks by stakeholders.

4. Verification and Validation (V&V) Evidence

Argument: Rigorous testing, validation, and coverage analyses demonstrate software compliance with safety-critical requirements.
Evidence:

Unit testing, system integration testing, end-to-end validation, and operational flight simulations confirming that expected functional performance aligns with safety goals.
Validation of reused components (COTS, GOTS, OSS, MOTS) to ensure compatibility and reliable integration into human-rated environments.
Coverage analysis demonstrating:

100% Statement Coverage.
100% Decision Coverage.
100% Modified Condition/Decision Coverage (MC/DC) for safety-critical components.

Static analysis reports showing compliance with coding standards and identification/remediation of software defects.
Fault injection testing results validating responses to corrupted data, anomalies during power disruptions, and memory errors.
Worst-case response timing analysis confirming safing systems meet TTE requirements under degraded conditions.

5. Configuration Management and Change Tracking

Argument: Configuration management processes ensure version control and traceability for all software changes.
Evidence:
- Documentation showing version-controlled baselines for flight-ready software, including configuration hashes and release notes.
- Audit records verifying modifications, regression testing, impact analyses, and stakeholder approvals

6. Cybersecurity and Security Validation

Argument: The software architecture incorporates robust cybersecurity measures to mitigate threats in operation environments.
Evidence:
- Security validation reports demonstrating encryption protocols, authentication mechanisms, access control, and secure coding practices.
- Penetration testing results validating resilience against cyberattacks and unauthorized system access during pre-launch and flight.
- Vulnerability analysis reports confirming detection, resolution, and closure of security-related risks.

7. Defect Management and Residual Risks

Argument: All software defects have been resolved or mitigated to acceptable levels of residual risk.
Evidence:

Defect reports showing all open and closed defects categorized by severity and justifications for acceptance of residual risks.
Logs documenting defect resolutions and testing data validating the outcomes of mitigation measures.
Residual risk acceptance documentation signed off by stakeholders, with sufficient evidence showing safe system behavior despite unresolved minor risks.

8. Resource Utilization and Performance Metrics

Argument: The software demonstrates sufficient resource margins and acceptable performance under normal and worst-case conditions.
Evidence:

Validation test results confirming acceptable command execution timing (e.g., abort triggers).
Operating analysis showing CPU utilization below 80% even under maximum load conditions.
Methods for anomaly detection and recovery to safe states outlined and validated.

9. Team Training and Software Process Compliance

Argument: Development teams adhere to validated processes and are properly trained in safety-critical mission standards.
Evidence:
- Records of team training addressing human-rated software workflows, defect management, and compliance with coding guidelines.
- Process compliance reports documenting adherence to validated development processes.
- Operator manuals ensuring deliberate, independent actions are necessary to execute critical safety commands

10. Certification and Regulatory Compliance

Argument: The software complies with all applicable standards and safety regulations for human-rated missions.
Evidence:

Certification artifacts for compliance with standards like NASA NPR 7150.2D, NASA SSP 50038, FAA requirements, and NPR 8739.8B.
IV&V certification reports confirming operational maturity and compliance with safety standards by independent entities.
Regulatory compliance statements from authorities certifying readiness for human-rated missions.
Validation of software updates (patched or upgraded) ensuring continued compliance with safety requirements.

11. Flight Readiness Review (FRR) Certification

Argument: The software is flight-ready and capable of safely supporting mission operations.
Evidence:

Software Version Description Document (VDD) completion demonstrating proper documentation of the deployed software.
Final test results confirming readiness during flight operations in all mission environments.
FRR exit criteria signed off by stakeholders, certifying acceptance or resolution of all known risks, hazards, defects, and anomalies.

12. Flight Software Structural Quality

Argument: The software architecture and implementation are structurally sound and meet all quality standards for safety-critical applications.
Evidence:

Cyclomatic complexity analysis showing all safety-critical components meet thresholds (≤ 15).
Documentation verifying fault-tolerant mechanisms for error handling, failure recovery, and system operation under degraded conditions.
Maintainability analysis supporting modular coding practices for long-term sustainability and easy updates.
Code quality reports validating compliance with architecture, standards, security, and testability requirements.

Div

id	tabs-4

4. Resources

4

Div

id	tabs-3

3. Resources

3

.1 References

refstable-topic

Show If

group	confluence-users

Panel

titleColor	red
title	Instructions for Editors

Expand

Enter the necessary modifications to be made in the table below:

SWEREFs to be added	SWEREFS to be deleted
SWEREF-039 - C2PA AI ML Specification: Guidelines for trustworthy AI development	Deleted per Tim
EU GDPR: Data privacy regulations applicable to AI systems	Deleted per Tim
IEC 62150: International standard for functional safety of electrical/electronic/programmable electronic safety-related systems	Deleted per Tim

SWEREFs called out in text: none

SWEREFs NOT called out in text but listed as germane: none

4.2 Tools

Include Page

	Tools Table Statement
	Tools Table Statement

3

4.3 Additional Guidance

Additional guidance related to this requirement may be found in the following materials in this Handbook:

4.4 Center Process Asset Libraries

Excerpt Include

	SITE:SPAN
	SITE:SPAN
nopanel	true

See the following link(s) in SPAN for process assets from contributing Centers (NASA Only).

SPAN Links

Show If

label	activity

3

4.5 Related Activities

This Topic is related to the following Life Cycle Activities:

Content

Space Tools

Page History

Versions Compared

Old Version 1

New Version 2

Key

1. Introduction

1.1 Data Quality

1.1.1 Validating The Training Data

1.1.2 Continuous Testing

1.1.3 Documentation And Traceability

1.1.4 Leveraging AI/ML For SA

1.1.5 Defining Quality Criteria

1.1.6 Balancing Quality Goals

1.1.7 Ensuring Data Security And Privacy

1.1.8 Addressing Bias Considerations

1.1.9 Requirements Verification And Validation

1.2 Approach

1.3 Additional Guidance

2. Generative AI Metrics

2.1 Architectural Complexity

2.2 Computational Complexity

2.3 Model Representational Complexity

2.4 Training Complexity

2.5 Fine-Tuning and Adaptation Complexity

2.6 Output Complexity

2.7 Interpretability Complexity

2.8 Real-World Deployment Complexity

2.9 Best Practices for Defining Metrics

2.10 Tools and Frameworks for Complexity Evaluation

1.3 Additional Guidance

2. Key Compliance Data Needs

2.1 Summary Table of Key Compliance Data Needs

2.2 Key Compliance Data Needs

3. Safety Case for Human-Rated Software Certification

4. Resources

4

3. Resources

.1 References

Related Links Pages

4.2 Tools

4.3 Additional Guidance

4.4 Center Process Asset Libraries

4.5 Related Activities