2.2.11 When a project is determined to have safety-critical software, the project shall ensure that the safety requirements of NASA-STD-8719.13, NASA Software Safety Standard, are implemented by the project.
Engineering and software assurance initially determine software safety criticality in the formulation phase per NASA-STD-8739.8, Software Assurance Standard; the results are compared and any differences are resolved. As the software is developed or changed and the software components, software models, and software simulations are identified, the safety-critical software determination can be reassessed and applied at lower levels. Further scoping and tailoring of the safety effort is found in NASA-STD-8719.13, Software Safety Standard and NASA-GB-8719.13, NASA Software Safety Guidebook.
1.2 Applicability Across Classes
In our modern world, software controls much of the hardware (equipment, electronics, and instruments). Sometimes hardware failure can lead to a loss of human life. When software controls, operates, or interacts with such hardware, software safety becomes a vital concern. Per NASA-STD-8719.13, "The task of developing safe software falls squarely on the shoulders of the software developer (also referred to as the software engineer), who creates the 'code' that must be safe."
It is important to have a systematic, planned approach for ensuring that safety is designed into developed or acquired software, and that safety is maintained throughout the software and system life cycle. NASA-STD-8719.13 "specifies the software safety activities, data, and documentation necessary for the acquisition or development of software in a safety-critical system. Safety-critical systems that include software must be evaluated for software's contribution to the safety of the system during the concept phase, and prior to the start, or in the early phases, of the acquisition or planning for the given software."
Topic 7.16 - Traceability of 7150.2 to other NPRs, NASA-STDs of this Handbook includes a table, "Mapping Spreadsheet 2" that shows the relationship between the software engineering practices described in the NPR 7150.2 requirements and the software safety requirements documented in NASA-STD-8719.13. Understanding these relationships can help project personnel see requests regarding software safety analysis as reasonable and know what to expect when software safety criticality assessments are conducted or repeated as the project evolves. Note that the requirements within NASA-STD-8719.13 must be met, how they are met is negotiable based on project criticality, size, complexity, etc.
When implementing the requirements of NASA-STD-8719.13, follow the basic steps, with guidance provided in NASA-GB-8719.13, which is summarized here:
- Identify safety-critical software (see SWE-133).
- Document identification efforts and results.
- If no safety-critical software is found, stop.
- Determine the software safety criticality (see SWE-133).
- Determine the safety effort and oversight required.
- Effort is based on the Software Risk Index defined in NASA-GB-8719.13.
- Revisit the effort, as requirements change because requirements changes can move safety-critical functions from hardware to software, etc.
- Oversight effort is based on the System Risk Index defined in NASA-GB-8719.13.
- Tailor the software safety effort for the project.
- Safety activities need to be sufficient to match the software development effort and yet ensure that the overall system will be safe.
- Minimum effort includes, but is not limited to, project review and top-level analyses of all pertinent specifications, designs, implementations, tests, engineering change requests, and problem/failure reports, to determine if any hazards have been inadvertently introduced, but must address all software hazard controls, contributions, and mitigations flowed down from the systems hazard analyses by meeting requirements of NASA-GB-8719.13.
- Software assurance and Independent Verification and Validation (IV&V) also have tailorable efforts for software safety.
- The range of selected activities needs to be negotiated and approved by project management, software development, software quality assurance, and software systems safety personnel together.
The appropriate project personnel perform the following development activities to fulfill the software safety requirements:
- Identify and clearly mark software safety requirements in requirements documents.
- Includes identifying requirements to be added due to safety-critical nature of the software or component.
- Includes analyzing or working with system safety to analyze software control of critical functions and the identification of software that causes, controls, mitigates, or contributes to hazards.
- Identify and clearly mark software safety design features and methods in design documents.
- Follow proper coding standards (which may include safety features).
- Identify safety-critical code and data (e.g., via comments).
- Use hazards analyses to identify failures and failure combinations to be tested.
- Conduct tests to ensure what follows below.
- Hazards are eliminated or controlled to an acceptable risk level.
- Correct/safe operation of software, hardware, operator inputs in the presence of failures and faults, under system load, stress, and off-nominal conditions.
- Subject updates and reconfigurations of the operational system to the same safety requirements.
The following roles are involved in fulfilling the software safety requirements, as described in NASA-GB-8719.13. Note that where independence is not required, one person may fill multiple roles; alternatively, more than one person may cooperatively fill a single role such as that of the software safety engineer.
Maintains high-level overview of process, handles scheduling and budgeting of safety activities, assists with negotiations among team members.
Designs system and partition elements between hardware and software, have "big picture" view of system and can make sure safety-critical elements are not overlooked.
System safety engineer
Determines which system components are hazardous, makes sure appropriate controls and mitigations are in place and verified.
Software safety engineer
Verifies software safety is addressed in the requirements and is designed into the software; and verifies the safety of the software.
Designs, codes, tests the system, and implements defensive programming techniques.
Works with developers and safety engineers to assure safe software is created.
May perform their own review of project software and present findings to project manager; IV&V occurs in addition to, not a replacement for, software assurance and software safety.
When identifying software safety requirements applicable to a project, consult existing lists of software safety requirements to identify generic safety requirements. In addition, use techniques such as hazards analyses and design analyses to identify safety requirements specific to a particular project. NASA-GB-8719.13 provides a list of sources for generic requirements. Appendix H of that guidebook includes a checklist of generic software safety requirements from Marshall Space Flight Center (MSFC).
Remember to include risk as a factor when determining which requirements are more critical than others.
When developing safety-critical software, the project needs to:
- Design in a degree of fault tolerance, since not all faults can be prevented
- Choose a "safe" programming language; one that enforces good programming practices, finds errors at compile time, has strict data types, bounds checking on arrays, discourages use of pointers, etc.
- Appendix H of NASA-GB-8719.13 includes checklists of safe programming practices for several programming languages.
- Use coding standards that enforce "safe" programming practices.
- Implement defensive programming.
- Specifically look for unexpected interactions among units during integration testing.
- Evaluate complexity of software components and interfaces.
- Design for maintain ability and reliability.
- Use formal inspections / software peer reviews.
- Use Design Logic Analysis (DLA).
- Use design data analysis, design interface analysis, and design traceability analysis.
- Use coding checklists and standards.
- Develop safety tests for safety-critical software units that cannot be fully tested once the units are integrated.
- Use code logic analysis, code data analysis, code interface analysis, and unused code analysis.
- Use interrupt analysis.
- Use test coverage analysis.
- Use stress testing, stability testing, resistance to failure tests, disaster testing.
- Evaluate operating systems for safety before choosing one for the project.
- Appendix H of NASA-GB-8719.13 includes a checklist for selecting a real-time operating system (RTOS).
- Review the Design for Safety checklist in Appendix H of the NASA Software Safety Guidebook
If the project involves programmable logic devices, consult NASA-HDBK-8739.23, NASA Complex Electronics Handbook for Assurance Professionals.
For tools that are used in the development of safety-critical software, including compilers, linkers, debuggers, test environments, simulators, code generators, etc., consider the following:
- Use tools previously validated for use in development of safety-critical software, but consider the differences in how those tools were used on the projects for which they were validated and their use on the new project to determine if re-validation is required.
- Tools previously validated for use in development of safety-critical software and which have been in use for many years in the same environment for the same purposes may not require re-validation.
- For tools not yet approved or for which re-validation is being considered:
- Consider the tool's maturity.
- Try to obtain any known bug lists for the tool.
- Try to obtain any existing tests, analyses, and results for the tool.
- Obtain an understanding of how the tool could fail and determine if those failures could negatively affect the safety of the software or system for which they are used.
- Perform safety testing and analysis to ensure that the tools do not influence known hazards or adversely affect the residual risk of the software.
- Consider independent validation for the tool.
- Review the guidance on tool selection in NASA-GB-8719.13.
If the project involves off-the-shelf (OTS) or reused software, the project needs to:
- Evaluate system differences that could affect safety.
- Look at interfaces needed to incorporate it into the system or isolate it from critical or non-critical software, as appropriate.
- Perform analysis of the impacts of this software on the overall project, such as:
- Identifying extra functions that could cause safety hazards.
- Determining the effects of extra functionality needed to integrate the software with the rest of the system.
- Evaluate the cost of extra analyses and tests needed to ensure system safety due to the use of OTS or reused software.
- Seek insight into the practices used to develop the software.
- Evaluate the V&V results of OTS software to make sure that it is consistent with the level of V&V of developed software.
For other OTS considerations that could affect safety, see SWE-027.
- For checklists of considerations for OTS, see Appendix H of NASA-GB-8719.13.
For contractor-developed software, the project:
- Includes in the contract:
- Surveillance or insight activities for the contractor development process.
- Identification of responsibility for preparing and presenting the Safety Compliance Data Package to the Safety Review Panel.
- Safety analysis and test requirements.
- Requirements for delivery of software safety deliverables including software safety plan, all hazard analyses, audit reports, verification reports, etc.
- Evaluates contractor/provider track record, skills, capabilities, stability.
- Considers performing additional software testing beyond that conducted by the provider.
MSFC's Software Development Process Description Document on the NASA PAL notes in section 8.10.3:
If the product is classified as 'safety critical' per NASA-STD-8719.13, then the supplier agreement must include safety requirements. In addition, a process is established to monitor that the safety requirements are traced and satisfied.
Note: The safety requirements can be included with the other supplier requirements, but must be annotated as safety critical. Software safety activities to be performed by a supplier (e.g., an outside specialist) are noted in the appropriate supplier agreement and approved by Senior Management.
For additional considerations when acquiring safety-critical software, see Topic 7.3 - Acquisition Guidance.
Training in software safety is available in the NASA SMA Technical Excellence Program (STEP).
These topics and more are expanded in NASA-GB-8719.13.Consult the guidebook for additional guidance, techniques, analysis, references, resources, and more for software developers creating safety-critical software as well as guidance for project managers, software assurance personnel, system engineers, and safety engineers. Knowledge of the software safety tasks performed by persons in roles outside of software engineering will help engineering personnel understand requests from these persons for software engineering products and processes.
Additional guidance related to software safety may be found in the following related requirements in this Handbook:
Develop a software safety plan
Software Safety Determination
Safety Critical Software Requirements
Software Safety Plan Contents
4. Small Projects
While the requirements of NASA-STD-8719.13Bmust be met, they can be modified to the project size and criticality. The specific activities and depth of analyses needed to meet the requirements can, and should, be modified to the software safety risk. In other words, while the requirements must be met, the implementation and approach to meeting those requirements may and should vary to reflect the system to which they are applied. Substantial differences may exist when the same software safety requirements are applied to dissimilar projects.
Appendix A of NASA-STD-8719.13provides guidance for how a sample, medium-sized project might meet the requirements.
For projects designated as small project based on personnel or budget, the following options may be considered to assist in the fulfillment of this requirement:
- Utilize existing tools already validated and approved for use in development of safety-critical software.
- If a standard set of validated and approved tools does not exist, consider establishing them for future projects.
- Use an existing safety plan specifically developed for small projects.
- If such a plan does not exist, consider creating one so future projects do not have to create a new one.
- Use one person to fill multiple roles.
- The software safety engineer may have other project roles or fill similar roles for other projects.
- Keep in mind, that safety, quality, and reliability analyses and activities must be either performed, or assessed, verified and validated by a party independent of those developing the product.
6. Lessons Learned
The NASA Lesson Learned database contains the following lessons learned related to software safety:
- Fault-Detection, Fault-Isolation and Recovery (FDIR) Techniques. Lesson Number 0839: "Apply techniques such as Built in Test (BIT), strategic placing of sensors, centralized architecture, and fault isolation and recovery to optimize system availability... Operating in such a critical environment as outer space, astronauts' lives and mission success are dependent on the integrity of a system. Since time and resources are limited, the sooner failures can be accurately detected and a failed system repaired and recovered, the more likely crew survival rate and mission success are to be improved."
- Fault Tolerant Design. Lesson Number 0707: "Incorporate hardware and software features in the design of spacecraft equipment which tolerate the effects of minor failures and minimize switching from the primary to the secondary string. This increases the potential availability and reliability of the primary string."
- Fault Protection. Lesson Number 0772: "Fault protection is the use of cooperative design of flight and ground elements (including hardware, software, procedures, etc.) to detect and respond to perceived spacecraft faults. Its purpose is to eliminate single point failures or their effects and to ensure spacecraft system integrity under anomalous conditions."
- Mars Observer Inappropriate Fault Protection Response Following Contingency Mode Entry due to a Postulated Propulsion Subsystem Breach. Lesson Number 0343: The Recommendations are: "(1) It is imperative that spacecraft designers consider the consequences of anomalies at all mission phases and ensure that fault protection takes proper action regardless of spacecraft state. (2) Fault responses should not be allowed to interrupt critical activities unless they have the ability to assure completion of these activities. Final, stable fault protection modes (such as contingency mode) should autonomously assure communications."
- Aero-Space Technology/X-34 In-Flight Separation from L-1011 Carrier. Lesson Number 1122: "The X-34 technology demonstrator program faces safety risks related to the vehicle's separation from the L-1011 carrier aircraft and to the validation of flight software. Moreover, safety functions seem to be distributed among the numerous contractors, subcontractors, and NASA without a clear definition of roles and responsibilities" The Recommendation is that "NASA should review and assure that adequate attention is focused on the potentially dangerous flight separation maneuver, the thorough and proper validation of flight software, and the pinpointing and integration of safety responsibilities in the X-34 program."