bannerd


SWE-134 - Safety-Critical Software Design Requirements

1. Requirements

3.7.3 If a project has safety-critical software or mission-critical software, the project manager shall implement the following items in the software: 

a. The software is initialized, at first start and restarts, to a known safe state.
b. The software safely transitions between all predefined known states.
c. Termination performed by software functions is performed to a known safe state.
d. Operator overrides of software functions require at least two independent actions by an operator.
e. Software rejects commands received out of sequence when execution of those commands out of sequence can cause a hazard.
f. The software detects inadvertent memory modification and recovers to a known safe state.
g. The software performs integrity checks on inputs and outputs to/from the software system.
h. The software performs prerequisite checks prior to the execution of safety-critical software commands.
i. No single software event or action is allowed to initiate an identified hazard.
j. The software responds to an off-nominal condition within the time needed to prevent a hazardous event.
k. The software provides error handling.
l. The software can place the system into a safe state.

1.1 Notes

These requirements apply to components that reside in a mission-critical or safety-critical system, and the components control, mitigate, or contribute to a hazard as well as software used to command hazardous operations/activities. 

1.2 History

SWE-134 - Last used in rev NPR 7150.2D

RevSWE Statement
A

2.2.12 When a project is determined to have safety-critical software, the project shall ensure the following items are implemented in the software:

a. Safety-critical software is initialized, at first start and at restarts, to a known safe state.
b. Safety-critical software safely transitions between all predefined known states.
c. Termination performed by software of safety-critical functions is performed to a known safe state.
d. Operator overrides of safety-critical software functions require at least two independent actions by an operator.
e. Safety-critical software rejects commands received out of sequence, when execution of those commands out of sequence can cause a hazard.
f.  Safety-critical software detects inadvertent memory modification and recovers to a known safe state.
g. Safety-critical software performs integrity checks on inputs and outputs to/from the software system.
h. Safety-critical software performs prerequisite checks prior to the execution of safety-critical software commands.
i.  No single software event or action is allowed to initiate an identified hazard.
j.  Safety-critical software responds to an off nominal condition within the time needed to prevent a hazardous event.
k. Software provides error handling of safety-critical functions.
l.  Safety-critical software has the capability to place the system into a safe state.
m. Safety-critical elements (requirements, design elements, code components, and interfaces) are uniquely identified as safety-critical.
n.  Incorporate requirements in the coding methods, standards, and/or criteria to clearly identify safety-critical code and data within source code comments.

Difference between A and B

No change

B

3.7.2 When a project is determined to have safety-critical software, the project manager shall implement the following items in the software:

a. Safety-critical software is initialized, at first start and at restarts, to a known safe state.
b. Safety-critical software safely transitions between all predefined known states.
c. Termination performed by software of safety-critical functions is performed to a known safe state.
d. Operator overrides of safety-critical software functions require at least two independent actions by an operator.
e. Safety-critical software rejects commands received out of sequence, when execution of those commands out of sequence can cause a hazard.
f. Safety-critical software detects inadvertent memory modification and recovers to a known safe state.
g. Safety-critical software performs integrity checks on inputs and outputs to/from the software system.
h. Safety-critical software performs prerequisite checks prior to the execution of safety-critical software commands.
i. No single software event or action is allowed to initiate an identified hazard.
j. Safety-critical software responds to an off nominal condition within the time needed to prevent a hazardous event.
k. Software provides error handling of safety-critical functions.
l. Safety-critical software has the capability to place the system into a safe state.
m. Safety-critical elements (requirements, design elements, code components, and interfaces) are uniquely identified as safety-critical.
n. Requirements are incorporated in the coding methods, standards, and/or criteria to clearly identify safety-critical code and data within source code comments.

Difference between B and CChanged "When a project is determined to have" to "If a project has " safety-critical software;
Added mission-critical software to the requirement;
Removed "Safety-Critical" from items a. - l. as the entire requirement pertains to it;
Changed "has the capability to"  to "can" in item l.;
Deleted items m. and n. 
C

3.7.3 If a project has safety-critical software or mission-critical software, the project manager shall implement the following items in the software:

a. The software is initialized, at first start and restarts, to a known safe state.

b. The software safely transitions between all predefined known states.

c. Termination performed by the software functions is performed to a known safe state.

d. Operator overrides of software functions require at least two independent actions by an operator.

e. The software rejects commands received out of sequence when the execution of those commands out of sequence can cause a hazard.

f. The software detects inadvertent memory modification and recovers to a known safe state.

g. The software performs integrity checks on inputs and outputs to/from the software system.

h. The software performs prerequisite checks prior to the execution of safety-critical software commands.

i. No single software event or action is allowed to initiate an identified hazard.

j. The software responds to an off-nominal condition within the time needed to prevent a hazardous event.

k. The software provides error handling.    

l. The software can place the system into a safe state.

Difference between C and DNo change
D

3.7.3 If a project has safety-critical software or mission-critical software, the project manager shall implement the following items in the software: 

a. The software is initialized, at first start and restarts, to a known safe state.
b. The software safely transitions between all predefined known states.
c. Termination performed by software functions is performed to a known safe state.
d. Operator overrides of software functions require at least two independent actions by an operator.
e. Software rejects commands received out of sequence when execution of those commands out of sequence can cause a hazard.
f. The software detects inadvertent memory modification and recovers to a known safe state.
g. The software performs integrity checks on inputs and outputs to/from the software system.
h. The software performs prerequisite checks prior to the execution of safety-critical software commands.
i. No single software event or action is allowed to initiate an identified hazard.
j. The software responds to an off-nominal condition within the time needed to prevent a hazardous event.
k. The software provides error handling.
l. The software can place the system into a safe state.



1.3 Applicability Across Classes

Class

     A      

     B      

     C      

     D      

     E      

     F      

Applicable?

   

   

   

   

   

   

Key:    - Applicable | - Not Applicable


2. Rationale

Implementing safety-critical software or mission-critical software design requirements helps ensure that the systems are safe and that the safety-critical software or mission-critical software requirements and processes are followed.

3. Guidance

3.1 Safety-Critical Software and Mission-Critical Software

This requirement applies to safety-critical software and mission-critical software.  These items are design practices that should be followed when developing safety-critical software and mission-critical software. 


Software safety requirements contained in NASA-STD-8739.8B

Derived from NPR 7150.2D para 3.7.3 SWE 134: Table 1, SA Tasks 1 - 6

1. Analyze the software requirements and the software design and work with the project to implement NPR 7150.2 requirement items "a" through "l."

2. Assess that the source code satisfies the conditions in the NPR 7150.2 requirement "a" through "l" for safety-critical and mission-critical software at each code inspection, test review, safety review, and project review milestone.

3. Confirm that the values of the safety-critical loaded data, uplinked data, rules, and scripts that affect hazardous system behavior have been tested.

4. Analyze the software design to ensure the following:
   a. Use of partitioning or isolation methods in the
         design and code,
   b. That the design logically isolates the safety-critical
         design elements and data from those that are
         non-safety-critical.

5. Participate in software reviews affecting safety-critical software products.

6. Ensure the SWE-134 implementation supports and is consistent with the system hazard analysis.

See the software assurance tab for additional guidance material. 

See also SWE-023 - Software Safety-Critical Requirements

3.2 Requirement Notes

Additional specific clarifications for a few of the requirement notes include:

Item a: (The software is initialized, at first start, and restarts, to a known safe state.)

When establishing a known safe state, inspections include the state of the hardware and software, operational phase, device capability, configuration, file allocation tables, and boot code in memory.

Item d: (Operator overrides of software functions require at least two independent actions by an operator.)

Multiple independent actions by the operator help to reduce potential operator mistakes.

Item f: (The software detects inadvertent memory modification and recovers to a known safe state.)

Memory modifications may occur due to radiation-induced errors, uplink errors, configuration errors, or other causes. The computing system must be able to detect the problem and recover to a safe state. For example, computing systems may implement error detection and correction, software executable and data load authentication, periodic memory scrub, and space partitioning to protect against inadvertent memory modification. Features of the processor and/or operating system can be utilized to protect against incorrect memory use.

Item g: (The software performs integrity checks on inputs and outputs to/from the software system.)

The software needs to accommodate both nominal inputs (within specifications) and off-nominal inputs, from which recovery may be required. The software needs to accommodate start-up transient inputs from the sensors. Specify system interfaces clearly and thoroughly. Include, as part of the documentation, the required action or actions should include the interface.

Item h: (The software performs prerequisite checks prior to the execution of safety-critical software commands.)

The requirement is intended to preclude the inappropriate sequencing of commands. Appropriateness is determined by the project and conditions designed into the safety-critical system. Safety-critical software commands are commands that can cause or contribute to a hazardous event or operation. One must consider the inappropriate sequencing of commands (as described in the original note) and the execution of a command in the wrong mode or state. Safety-critical software commands must perform when needed (must work) or be prevented from performing when the system is not in a proper mode or state (must-not work). 

Item j: (The software responds to an off-nominal condition within the time needed to prevent a hazardous event.)

The intent is to establish a safe state following the detection of an off-nominal indication. The safety mitigation must complete between the time the off-nominal condition is detected and the time the hazard would occur without the mitigation. The safe state can either be an alternate state from normal operations or can be accomplished by detecting and correcting the fault or failure within the timeframe necessary to prevent a hazard and continue with normal operations. The intent is to design software to detect and respond to a fault or failure before it causes the system or subsystem to fail. If failure cannot be prevented, then design in the software's ability to place the system into a safe state from which it can later recover. In this safe state, the system may not have full functionality but will operate with this reduced functionality. 

Item k: (The software provides error handling.)

Error handling is an implementation mechanism or design technique by which software faults and/or failures are detected, isolated, and recovered to correct run-time program execution. The software error handling features that support safety-critical functions must detect and respond to hardware, software, and operational faults and failures and faults in software data and commands from within a program or from other software programs. Minimize common failure modes. 

Item l: (The software can place the system into a safe state.)

The system's design must provide sufficient sensors and effectors, and self-checks within the software to detect and respond to system potential hazards. Identify safe states early in the design. Have these fully checked and verified for completeness. A safe state is a system state in which hazards are inhibited, and all hazardous actuators are in a non-hazardous state. The system can have more than one Safe State. Ensure that failures of dynamic system activities result in the system achieving a known and identified safe state within a specified time

Additional Safety-Critical Software Design guidelines include:

  • Minimize complexity - For safety-critical code, anything over 15 should be assessed for testability, maintainability, and code quality.
  • Avoid complex flow constructs, such as goto and recursion.
  • All loops must have fixed bounds. This prevents runaway code.
  • Avoid heap memory allocation.
  • Use a minimum of two runtime assertions per function.
  • Restrict the scope of data to the smallest possible.
  • Check the return value of all non-void functions, or cast to void to indicate the return value is useless.
  • Use the preprocessor sparingly.
  • Limit pointer use to a single dereference, and do not use function pointers.
  • Compile with all possible warnings active; all warnings should then be addressed before the release of the software.
  • Appropriate security posture and mindset should be applied to all levels of development.

See also Topic 8.01 - Off Nominal Testing, 8.04 - Additional Requirements Considerations for Use with Safety-Critical Software. If software is acquired from a supplier, see Topic 7.03 - Acquisition Guidance.  See also Topic 7.21 - Multi-condition Software Requirements, 7.23 - Software Fault Prevention and Tolerance

See also SWE-184 - Software-related Constraints and Assumptions

3.3 Additional Guidance

Additional guidance related to this requirement may be found in the following materials in this Handbook:

3.4 Center Process Asset Libraries

SPAN - Software Processes Across NASA
SPAN contains links to Center managed Process Asset Libraries. Consult these Process Asset Libraries (PALs) for Center-specific guidance including processes, forms, checklists, training, and templates related to Software Development. See SPAN in the Software Engineering Community of NEN. Available to NASA only. https://nen.nasa.gov/web/software/wiki  197

See the following link(s) in SPAN for process assets from contributing Centers (NASA Only). 

SPAN Links

4. Small Projects

This requirement applies to all projects regardless of size.

5. Resources

5.1 References

5.2 Tools


Tools to aid in compliance with this SWE, if any, may be found in the Tools Library in the NASA Engineering Network (NEN). 

NASA users find this in the Tools Library in the Software Processes Across NASA (SPAN) site of the Software Engineering Community in NEN. 

The list is informational only and does not represent an “approved tool list”, nor does it represent an endorsement of any particular tool.  The purpose is to provide examples of tools being used across the Agency and to help projects and centers decide what tools to consider.

6. Lessons Learned

6.1 NASA Lessons Learned

Early planning and coordination between software engineering, software safety, and software assurance on the applicability and implementation of the SWE-134 software safety requirements will reduce schedule impacts.

The NASA Lesson Learned database contains the following lessons learned related to safety-critical software:

Deficiencies in Mission Critical Software Development for Mars Climate Orbiter (MCO) (1999). Lesson Number 0740 521: "The root cause of the MCO mission loss was an error in the "Sm_forces" program output files, which were delivered to the navigation team in English units (pounds-force seconds) instead of the specified metric units (Newton-seconds). Comply with preferred software review practices, identify mission-critical software (for which staff must participate in major design reviews, walkthroughs, and review of acceptance test results), train personnel in software walkthroughs, and verify consistent engineering units on all parameters."

6.2 Other Lessons Learned

Demonstration of Autonomous Rendezvous Technology (DART) spacecraft Type A Mishap 432: "NASA has completed its assessment of the DART MIB (Mishap Investigation Board) report, which included a classification review by the Department of Defense. The report was NASA-sensitive but unclassified because it contained information restricted by International Traffic in Arms Regulations (ITAR) and Export Administration Regulations (EAR). As a result, the DART mishap investigation report was deemed not releasable to the public." The LL also "provides an overview of publicly releasable findings and recommendations regarding the DART mishap."

7. Software Assurance

SWE-134 - Safety-Critical Software Design Requirements
3.7.3 If a project has safety-critical software or mission-critical software, the project manager shall implement the following items in the software: 

a. The software is initialized, at first start and restarts, to a known safe state.
b. The software safely transitions between all predefined known states.
c. Termination performed by software functions is performed to a known safe state.
d. Operator overrides of software functions require at least two independent actions by an operator.
e. Software rejects commands received out of sequence when execution of those commands out of sequence can cause a hazard.
f. The software detects inadvertent memory modification and recovers to a known safe state.
g. The software performs integrity checks on inputs and outputs to/from the software system.
h. The software performs prerequisite checks prior to the execution of safety-critical software commands.
i. No single software event or action is allowed to initiate an identified hazard.
j. The software responds to an off-nominal condition within the time needed to prevent a hazardous event.
k. The software provides error handling.
l. The software can place the system into a safe state.

7.1 Tasking for Software Assurance

Software safety requirements contained in NASA-STD-8739.8B

Derived from NPR 7150.2D para 3.7.3 SWE 134: Table 1, SA Tasks 1 - 6

1. Analyze the software requirements and the software design and work with the project to implement NPR 7150.2 requirement items "a" through "l."

2. Assess that the source code satisfies the conditions in the NPR 7150.2 requirement "a" through "l" for safety-critical and mission-critical software at each code inspection, test review, safety review, and project review milestone.

3. Confirm that the values of the safety-critical loaded data, uplinked data, rules, and scripts that affect hazardous system behavior have been tested.

4. Analyze the software design to ensure the following:
   a. Use of partitioning or isolation methods in the
         design and code,
   b. That the design logically isolates the safety-critical
         design elements and data from those that are
         non-safety-critical.

5. Participate in software reviews affecting safety-critical software products.

6. Ensure the SWE-134 implementation supports and is consistent with the system hazard analysis.

7.2 Software Assurance Products

  • Software Assurance Status Reports
  • Software Design Analysis
  • SA analysis of software requirements and design to implement items "a" through "l."
  • SA analysis of the design to satisfy "a" and "b" in task 6.
  • Source Code Analysis
  • Verification Activities Analysis
  • SA assessment that source code meets "a" through "l" at inspections and reviews, including any risks and issues.
  • Evidence of confirmation that requirements for test code coverage, complexity, and testing of support files affecting hazardous systems have been met.
  • SA risk assessment of any software developers' rationale if requirements are not met.


    Objective Evidence

    • Evidence of confirmation that the values of the safety-critical loaded data, uplinked data, rules, and scripts that affect hazardous system behavior have been tested.
    • NPR 7150.2 and NASA-STD-8739.8 requirements mapping matrices signed by the engineering and SMA technical authorities for each development organization.

    Objective evidence is an unbiased, documented fact showing that an activity was confirmed or performed by the software assurance/safety person(s). The evidence for confirmation of the activity can take any number of different forms, depending on the activity in the task. Examples are:

    • Observations, findings, issues, risks found by the SA/safety person and may be expressed in an audit or checklist record, email, memo or entry into a tracking system (e.g. Risk Log).
    • Meeting minutes with attendance lists or SA meeting notes or assessments of the activities and recorded in the project repository.
    • Status report, email or memo containing statements that confirmation has been performed with date (a checklist of confirmations could be used to record when each confirmation has been done!).
    • Signatures on SA reviewed or witnessed products or activities, or
    • Status report, email or memo containing a short summary of information gained by performing the activity. Some examples of using a “short summary” as objective evidence of a confirmation are:
      • To confirm that: “IV&V Program Execution exists”, the summary might be: IV&V Plan is in draft state. It is expected to be complete by (some date).
      • To confirm that: “Traceability between software requirements and hazards with SW contributions exists”, the summary might be x% of the hazards with software contributions are traced to the requirements.
    • The specific products listed in the Introduction of 8.16 are also objective evidence as well as the examples listed above.

7.3 Metrics

  • # of software work product Non-Conformances identified by life cycle phase over time
  • # of Non-Conformances from reviews (Open vs. Closed; # of days Open)
  • # of safety-related requirement issues (Open, Closed) over time
  • # of safety-related non-conformances identified by life cycle phase over time
  • # of Hazards containing software that have been tested vs. total # of Hazards containing software
  • # of Source Lines of Code (SLOC) tested vs. total # of SLOC 

See also Topic 8.18 - SA Suggested Metrics.  

7.4 Guidance

The sub-requirements and notes included in the requirement are a collection of best practices for implementing safety-critical software. These sub-requirements apply to components that reside in a safety-critical system. The components that control, mitigate, or contribute to a hazard and software are used to command hazardous operations/activities. Software engineering and software assurance disciplines each have specific responsibilities for providing project management with work products that meet the engineering, safety, quality, and reliability requirements of a project.

Steps 1 and 2: Analyze the software requirements and the software design and work with the project to implement NPR 7150.2 requirement items "a" through "l" and assess that the source code satisfies the conditions in the NPR 7150.2 requirement "a" through "l" for safety-critical and mission-critical software at each code inspection, test review, safety review, and project review milestone.

Additional specific clarifications for a few of the requirement notes include:

Item a: (The software is initialized, at first start, and restarts, to a known safe state.) When establishing a known safe state, inspections include the state of the hardware and software, operational phase, device capability, configuration, file allocation tables, and boot code in memory.

Item d: (Operator overrides of software functions require at least two independent actions by an operator.) Multiple independent actions by the operator help to reduce potential operator mistakes.

Item f: (The software detects inadvertent memory modification and recovers to a known safe state.) Memory modifications may occur due to radiation-induced errors, uplink errors, configuration errors, or other causes, so the computing system must detect the problem and recover to a safe state. For example, computing systems may implement error detection and correction, software executable and data load authentication, periodic memory scrub, and space partitioning to protect against inadvertent memory modification. Features of the processor and/or operating system can be utilized to protect against incorrect memory use.

Item g: (The software performs integrity checks on inputs and outputs to/from the software system.) The software needs to accommodate both nominal inputs (within specifications) and off-nominal inputs, from which recovery may be required. The software needs to accommodate start-up transient inputs from the sensors. Specify system interfaces clearly and thoroughly. Include, as part of the documentation, the required action or actions should include the interface. See also 8.01 - Off Nominal Testing

Item h: (The software performs prerequisite checks prior to the execution of safety-critical software commands.) The requirement is intended to preclude the inappropriate sequencing of commands. Appropriateness is determined by the project and conditions designed into the safety-critical system. Safety-critical software commands are commands that can cause or contribute to a hazardous event or operation. One must consider the inappropriate sequencing of commands (as described in the original note) and the execution of a command in the wrong mode or state. Safety-critical software commands must perform when needed (must work) or be prevented from performing when the system is not in a proper mode or state (must-not work). 

Item j: (The software responds to an off-nominal condition within the time needed to prevent a hazardous event.) The intent is to establish a safe state following the detection of an off-nominal indication. The safety mitigation must complete between the time the off-nominal condition is detected and the time the hazard would occur without the mitigation. The safe state can either be an alternate state from normal operations or can be accomplished by detecting and correcting the fault or failure within the timeframe necessary to prevent a hazard and continue with normal operations. The intent is to design software to detect and respond to a fault or failure before it causes the system or subsystem to fail. If failure cannot be prevented, then design in the software's ability to place the system into a safe state from which it can later recover. In this safe state, the system may not have full functionality but will operate with this reduced functionality. See also 8.01 - Off Nominal Testing

Item k: (The software provides error handling.) Error handling is an implementation mechanism or design technique by which software faults and/or failures are detected, isolated, and recovered to correct run-time program execution. The software error handling features that support safety-critical functions must detect and respond to hardware, software, and operational faults and failures and faults in software data and commands from within a program or from other software programs. Minimize common failure modes. 

Item l: (The software can place the system into a safe state.) The system's design must provide sufficient sensors and effectors, and self-checks within the software to detect and respond to system potential hazards. Identify safe states early in the design. Have these fully checked and verified for completeness. A safe state is a system state in which hazards are inhibited, and all hazardous actuators are in a non-hazardous state. The system can have more than one Safe State. Ensure that dynamic system activities' failures result in the system achieving a known and identified safe state within a specified time.

Step 3: Confirm that the values of the safety-critical loaded data, uplinked data, rules, and scripts that affect hazardous system behavior have been tested.

Step 4: Analyze the software design to ensure:

a. Use of partitioning or isolation methods in the design and code,

b. That the design logically isolates the safety-critical design elements and data from those that are non-safety-critical.

Step 5: Participate in software reviews affecting safety-critical software products.

Early planning and implementation dramatically ease the developmental burden of these requirements. Depending on the failure philosophy used (fault tolerance, control-path separation, etc.), design and implementation trade-offs will be made. Trying to incorporate these requirements late in the life cycle will impact the project cost, schedule, and quality. It can also impact safety as an integrated design that incorporates software safety features such as those above. This allows the system perspective to be taken into account. The design has a better chance of being implemented as needed to meet the requirements in an elegant, simple, and more reliable way.

Note that where conflicts with program safety requirements exist, program safety requirements take precedence.

Step 6 - Ensure the SWE-134 implementation supports and is consistent with the system hazard analysis.

See also 8.04 - Additional Requirements Considerations for Use with Safety-Critical Software

7.5 Additional Guidance

Additional guidance related to this requirement may be found in the following materials in this Handbook:

  • No labels

0 Comments