This version of SWEHB is associated with NPR 7150.2B. Click for the latest version of the SWEHB based on NPR7150.2D
3.7.1 When a project is determined to have safety-critical software, the project manager shall implement the requirements of NASA-STD-8719.13.
NPR 7150.2, NASA Software Engineering Requirements, does not include any notes for this requirement.
1.2 Applicability Across Classes
If Class C or D software is safety critical, this requirement applies to the safety-critical aspects of the software.
Key: - Applicable | - Not Applicable
A & B = Always Safety-Critical; C & D = Not Safety-Critical; CSC & DSC = Safety-Critical; E - H = Never Safety-Critical.
“Software's effect on system safety can be through the commands executed, the data produced, or the effects on resources (e.g., computer memory; file space; bandwidth). Safety could potentially be compromised if software executes a command unexpectedly, executes the wrong command, generates the wrong data, uses unplanned resources, or uses resources incorrectly.” 271
Software safety requirements must cover “both action (must work) and inaction (must not work). There are two kinds of software safety requirements: process and technical. Both need to be addressed and properly documented within a program, project, or facility.” 271 The Standard required in this requirement was “developed by the NASA Office of Safety and Mission Assurance (OSMA) to provide the requirements for ensuring software safety across all NASA Centers, programs, projects, and facilities. It describes the activities necessary to ensure that safety is designed into the software that is acquired for or developed by NASA. All program, project, and facility managers, area safety managers, information technology managers, and other responsible managers are to assess the contribution of software to the inherent safety risk of the systems and operations software in their individual programs, projects, and facilities. The magnitude and depth of software safety activities should be commensurate with ... the risk posed by the software.” 271
Safety critical is a term “describing any condition, event, operation, process, equipment, or system that could cause or lead to severe injury, major damage, or mission failure if performed or built improperly, or allowed to remain uncorrected.” 271
Software safety is defined as “the aspects of software engineering and software assurance that provide a systematic approach to identifying, analyzing, tracking, mitigating, and controlling hazards and hazardous functions of a system where software may contribute either to the hazard or to its mitigation or control, to ensure safe operation of the system.” 271
It is important to have a systematic, planned approach for ensuring that safety is designed into developed or acquired software, and that safety is maintained throughout the software and system life cycle. NASA-STD-8719.13 "specifies the software safety activities, data, and documentation necessary for the acquisition and development of software in a safety-critical system... Safety-critical systems that include software are evaluated for software's contribution to the safety of the system during the concept phase, and repeated at each major milestone as the design matures." 271
Engineering and software assurance initially determine software safety criticality in the formulation phase per NASA-STD-8739.8, Software Assurance Standard; the results are compared and any differences are resolved. As the software is developed or changed and the software components, software models, and software simulations are identified, the safety-critical software determination can be reassessed and applied at lower levels. Further scoping and tailoring of the safety effort is found in NASA-STD-8719.13, Software Safety Standard 271and NASA-GB-8719.13, NASA Software Safety Guidebook. 276
The Software Safety Standard defines “the requirements to implement a systematic approach to software safety as an integral part of system safety and the overall safety program of a program, project, or facility. This Standard specifies the software activities, data, and documentation necessary for the acquisition and development of software in a safety critical system. These activities may be performed by a collaboration of various personnel in the program, project, or facility, and Safety and Mission Assurance (SMA) organizations. Safety critical systems that include software are evaluated for software’s contribution to the safety of the system during the concept phase, and repeated at each major milestone as the design matures.
This Standard describes the activities required to ensure and promote safety processes that are utilized for software that is created, acquired, or maintained by or for NASA. The NASA-GB-8719.13, NASA Software Safety Guidebook, provides additional information on acceptable approaches for implementing software safety. While the requirements of this Standard must be met, the implementation and approach to meeting these requirements will vary to reflect the system to which they are applied...This Standard contains process-oriented requirements (what needs to be done to ensure software safety). Technical requirements are those that specify what the system includes or implements (e.g., two fault tolerance). Use of this Standard does not preclude the necessity to follow applicable technical standards. Some typical technical software safety requirements are provided as examples in Appendix [C] of this document. NPR 7150.2, NASA Software Engineering Requirements (requirement SWE-134) contains some minimum technical safety requirements.
Software safety requirements do more than prohibit unsafe system behavior. Software is used to command critical, must-work functions. Software can be used proactively to monitor the system, analyze critical data, look for trends, and signal when events occur that may be precursors to a hazardous state. Software can also be used in the control or mitigation of a hazard, event, or condition. Therefore, program, project, and facility software safety requirements include those requirements that will embody these behaviors, both proactive and reactive, and include the system and software states where they are valid.
The requirements specified in this Standard obligate the program, project, and facility, and safety and mission assurance organizations to:
- Identify when software plays a part in system safety and generate appropriate requirements to ensure safe operation of the system.
- Ensure that software is considered within the context of system safety, and that appropriate measures are taken to create safe software.
- Ensure that software safety is addressed in project acquisition, planning, management, and control activities.
- Ensure that software safety is considered throughout the system life-cycle, including mission concept, generation of requirements, design, coding, test, maintenance and operation of the software.
- Ensure that the acquisition of software, whether off-the-shelf or contracted, includes evaluation, assessment, and planning for addressing and mitigating risks due to the software’s contribution to safety and any limitations of the software.
- Ensure that software verification and validation activities include software safety verifications and validations.
- Ensure that the proper certification requirements are in place and accomplished prior to the actual operational use of the software.
- Ensure that changes and reconfigurations of the software, during development, testing, and operational use of the software, are analyzed for their impacts to system safety.” 271
The Engineering Technical Authority and S&MA Technical Authority shall jointly determine if the software is designated as “safety critical.” The “safety critical” designation defines additional requirements mapping within this NPR. Software Safety Critical Assessment Tool guidance is provided in NASA-HDBK-2203 as well as the litmus test guidance in NASA-STD-8719.13. Allocation of system safety requirements, hardware and risk need to be considered in the assessment. The Engineering Technical Authority and S&MA Technical Authority must reach agreement on safety critical designation of software. Disagreements are elevated via both the Engineering Technical Authority and Safety and Mission Assurance Technical Authority chains.
Basic Steps for Implementing NASA-STD-8719.13
- Identify safety-critical software (see SWE-133).
- Document identification efforts and results.
- If no safety-critical software is found, stop.
- Determine the software safety criticality (see SWE-133).
- Determine the safety effort and oversight required.
- Tailor the software safety effort for the project.
- Safety activities need to be sufficient to match the software development effort and yet ensure that the overall system will be safe.
- Minimum effort includes, but is not limited to, project review and top-level analyses of all pertinent specifications, designs, implementations, tests, engineering change requests, and problem/failure reports, to determine if any hazards have been inadvertently introduced, but must address all software hazard controls, contributions, and mitigations flowed down from the systems hazard analyses by meeting requirements of NASA-GB-8719.13. 271
- Software assurance and Independent Verification and Validation (IV&V) also have tailorable efforts for software safety.
- The range of selected activities needs to be negotiated and approved by project management, software development, software quality assurance, and software systems safety personnel together.
The appropriate project personnel perform the following development activities to fulfill the software safety requirements:
- Identify and clearly mark software safety requirements in requirements documents.
- Includes identifying requirements to be added due to safety-critical nature of the software or component.
- Includes analyzing or working with system safety to analyze software control of critical functions and the identification of software that causes, controls, mitigates, or contributes to hazards.
- Identify and clearly mark software safety design features and methods in design documents.
- Follow proper coding standards (which may include safety features).
- Identify safety-critical code and data (e.g., via comments).
- Use hazards analysis to identify failures and failure combinations to be tested.
- Conduct tests to ensure what follows below.
- Hazards are eliminated or controlled to an acceptable risk level.
- Correct/safe operation of software, hardware, operator inputs in the presence of failures and faults, under system load, stress, and off-nominal conditions.
- Subject updates and reconfigurations of the operational system to the same safety requirements.
Safety and Risk
When identifying software safety requirements applicable to a project, consult existing lists of software safety requirements to identify generic safety requirements. In addition, use techniques such as hazards analysis and design analysis to identify safety requirements specific to a particular project. NASA-GB-8719.13 276 provides a list of sources for generic requirements. Appendix H of that guidebook includes a checklist of generic software safety requirements from Marshall Space Flight Center (MSFC).
Remember to include risk as a factor when determining which requirements are more critical than others.
When developing safety-critical software, the project needs to:
- Design in a degree of fault tolerance, since not all faults can be prevented
- Choose a "safe" programming language; one that enforces good programming practices, finds errors at compile time, has strict data types, bounds checking on arrays, discourages use of pointers, etc.
- Appendix H of NASA-GB-8719.13 276 includes checklists of safe programming practices for several programming languages.
- Use coding standards that enforce "safe" programming practices.
- Implement defensive programming.
- Look specifically for unexpected interactions among units during integration testing.
- Evaluate complexity of software components and interfaces.
- Design for maintain ability and reliability.
- Use formal inspections / software peer reviews.
- Use Design Logic Analysis (DLA).
- Use design data analysis, design interface analysis, and design traceability analysis.
- Use coding checklists and standards.
- Develop safety tests for safety-critical software units that cannot be fully tested once the units are integrated.
- Use code logic analysis, code data analysis, code interface analysis, and unused code analysis.
- Use interrupt analysis.
- Use test coverage analysis.
- Use stress testing, stability testing, resistance to failure tests, disaster testing.
- Evaluate operating systems for safety before choosing one for the project.
- Appendix H of NASA-GB-8719.13 276 includes a checklist for selecting a real-time operating system (RTOS).
- Review the Design for Safety checklist in Appendix H of the NASA Software Safety Guidebook 276
Programmable Logic Devices, Tools, and Off-the-Shelf (OTS) Software
If the project involves programmable logic devices, consult NASA-HDBK-8739.23, NASA Complex Electronics Handbook for Assurance Professionals. 034
For tools that are used in the development of safety-critical software, including compilers, linkers, debuggers, test environments, simulators, code generators, etc., consider the following:
- Use tools previously validated for use in development of safety-critical software, but consider the differences in how those tools were used on the projects for which they were validated and their use on the new project to determine if re-validation is required.
- Tools previously validated for use in development of safety-critical software and which have been in use for many years in the same environment for the same purposes may not require re-validation.
- For tools not yet approved or for which re-validation is being considered:
- Consider the tool's maturity.
- Try to obtain any known bug lists for the tool.
- Try to obtain any existing tests, analyses, and results for the tool.
- Obtain an understanding of how the tool could fail and determine if those failures could negatively affect the safety of the software or system for which they are used.
- Perform safety testing and analysis to ensure that the tools do not influence known hazards or adversely affect the residual risk of the software.
- Consider independent validation for the tool.
- Review the guidance on tool selection in NASA-GB-8719.13. 276
If the project involves off-the-shelf (OTS) or reused software, the project needs to:
- Evaluate system differences that could affect safety.
- Look at interfaces needed to incorporate it into the system or isolate it from critical or non-critical software, as appropriate.
- Perform analysis of the impacts of this software on the overall project, such as:
- Identifying extra functions that could cause safety hazards.
- Determining the effects of extra functionality needed to integrate the software with the rest of the system.
- Evaluate the cost of extra analysis and tests needed to ensure system safety due to the use of OTS or reused software.
- Seek insight into the practices used to develop the software.
- Evaluate the V&V results of OTS software to make sure that it is consistent with the level of V&V of developed software.
For other OTS considerations that could affect safety, see SWE-027.
- For checklists of considerations for OTS, see Appendix H of NASA-GB-8719.13. 276
For contractor-developed software, the project:
- Includes in the contract:
- Surveillance or insight activities for the contractor development process.
- Identification of responsibility for preparing and presenting the Safety Compliance Data Package to the Safety Review Panel.
- Safety analysis and test requirements.
- Requirements for delivery of software safety deliverables including software safety plan, all hazard analyses, audit reports, verification reports, etc.
- Evaluates contractor/provider track record, skills, capabilities, stability.
- Considers performing additional software testing beyond that conducted by the provider.
MSFC's Software Development Process Description Document, accessible to NASA users in Software Processes Across NASA (SPAN) via the SPAN tab in this Handbook, notes in section 8.10.3:
If the product is classified as 'safety critical' per NASA-STD-8719.13, then the supplier agreement must include safety requirements. In addition, a process is established to monitor that the safety requirements are traced and satisfied.
Note: The safety requirements can be included with the other supplier requirements, but must be annotated as safety critical. Software safety activities to be performed by a supplier (e.g., an outside specialist) are noted in the appropriate supplier agreement and approved by Senior Management.
Training and Additional Guidance
For additional considerations when acquiring safety-critical software, see Topic 7.3 - Acquisition Guidance.
Training in software safety is available in the NASA SMA Technical Excellence Program (STEP). 294
Additional guidance related to software safety may be found in the following related requirements in this Handbook:
4. Small Projects
While the requirements of NASA-STD-8719.13B 271 must be met, they can be modified to the project size and criticality. The specific activities and depth of analyses needed to meet the requirements can, and should, be modified to the software safety risk. In other words, while the requirements must be met, the implementation and approach to meeting those requirements may and should vary to reflect the system to which they are applied. Substantial differences may exist when the same software safety requirements are applied to dissimilar projects.
For projects designated as small project based on personnel or budget, the following options may be considered to assist in the fulfillment of this requirement:
- Utilize existing tools already validated and approved for use in development of safety-critical software.
- If a standard set of validated and approved tools does not exist, consider establishing them for future projects.
- Use an existing safety plan specifically developed for small projects.
- If such a plan does not exist, consider creating one so future projects do not have to create a new one.
- Use one person to fill multiple roles.
- The software safety engineer may have other project roles or fill similar roles for other projects.
- Keep in mind, that safety, quality, and reliability analyses and activities must be either performed, or assessed, verified and validated by a party independent of those developing the product.
- Mars Observer Inappropriate Fault Protection Response Following Contingency Mode Entry due to a Postulated Propulsion Subsystem BreachPublic Lessons Learned Entry: 343.
Tools to aid in compliance with this SWE, if any, may be found in the Tools Library in the NASA Engineering Network (NEN).
NASA users find this in the Tools Library in the Software Processes Across NASA (SPAN) site of the Software Engineering Community in NEN.
The list is informational only and does not represent an “approved tool list”, nor does it represent an endorsement of any particular tool. The purpose is to provide examples of tools being used across the Agency and to help projects and centers decide what tools to consider.
6. Lessons Learned
The NASA Lesson Learned database contains the following lessons learned related to software safety:
- Fault-Detection, Fault-Isolation and Recovery (FDIR) Techniques. Lesson Number 0839: "Apply techniques such as Built in Test (BIT), strategic placing of sensors, centralized architecture, and fault isolation and recovery to optimize system availability... Operating in such a critical environment as outer space, astronauts' lives and mission success are dependent on the integrity of a system. Since time and resources are limited, the sooner failures can be accurately detected and a failed system repaired and recovered, the more likely crew survival rate and mission success are to be improved." 527
- Fault Tolerant Design. Lesson Number 0707: "Incorporate hardware and software features in the design of spacecraft equipment which tolerate the effects of minor failures and minimize switching from the primary to the secondary string. This increases the potential availability and reliability of the primary string." 517
- Fault Protection. Lesson Number 0772: "Fault protection is the use of cooperative design of flight and ground elements (including hardware, software, procedures, etc.) to detect and respond to perceived spacecraft faults. Its purpose is to eliminate single point failures or their effects and to ensure spacecraft system integrity under anomalous conditions." 522
- Mars Observer Inappropriate Fault Protection Response Following Contingency Mode Entry due to a Postulated Propulsion Subsystem Breach. Lesson Number 0343: The Recommendations are: "(1) It is imperative that spacecraft designers consider the consequences of anomalies at all mission phases and ensure that fault protection takes proper action regardless of spacecraft state. (2) Fault responses should not be allowed to interrupt critical activities unless they have the ability to assure completion of these activities. Final, stable fault protection modes (such as contingency mode) should autonomously assure communications." 504
- Aero-Space Technology/X-34 In-Flight Separation from L-1011 Carrier. Lesson Number 1122: "The X-34 technology demonstrator program faces safety risks related to the vehicle's separation from the L-1011 carrier aircraft and to the validation of flight software. Moreover, safety functions seem to be distributed among the numerous contractors, subcontractors, and NASA without a clear definition of roles and responsibilities" The Recommendation is that "NASA should review and assure that adequate attention is focused on the potentially dangerous flight separation maneuver, the thorough and proper validation of flight software, and the pinpointing and integration of safety responsibilities in the X-34 program." 539