See edit history of this section
Post feedback on this section
1. Introduction
This product section focuses on analyzing the design that has been developed from the requirements.This product section focuses on analyzing the design that has been developed from the requirements. The design process begins with a good understanding of the requirements and initially develops a basic architecture to use in coding the software. The basic architecture is then expanded into a more detailed design that can actually be used to code the desired system.
Since the design will be what primarily guides the coding, it is particularly important to make sure the design is correct, complete, understandable, and captures what the requirements intend for the system to do. The detailed design components capture the approach to implementing the software requirements, including the requirements associated with fault management, security, and safety. Analysis of the relationship between the detailed design and the software requirements provides evidence that all of the requirements are in the detailed design.
Thus, an important part of ensuring the final system is correct, safe and secure is making sure the design accurately represents all the requirements. The sections for this product describe some of the methods and techniques Software Assurance and Software Safety personnel can use to evaluate and improve the quality of the design elements that are being developed.
The second tab in this topic provides general guidance for doing software design analysis.
The third tab in this topic provides additional guidance for design analysis that may be done when there is safety critical software involved.
2. Software Design Analysis Guidance
In software design, software requirements are transformed into the software architecture and then into a detailed software design for each software component. The software design also includes databases and system interfaces (e.g., hardware, operator/user, software components, and subsystems). The design addresses software architectural design and software detailed design. The objective of doing design analysis is to ensure that:
- the design is a correct, accurate, and complete transformation of the software requirements that will meet the operational needs under nominal and off-nominal conditions,
- introduces no unintended features, and
- design choices do not result in unacceptable operational risk.
The design should also be created with modifiability and maintainability so future changes can be made quickly without the need for significant redesign changes.
There are several design techniques described below that can help with the analysis of the design. Each of these may be used by Software Assurance and Software Safety personnel to help ensure a more robust design.
Tab 3 contains a more extensive list of analysis techniques that can be used by the Software Safety personnel.
Software Assurance and Software Safety tasks in NASA-STD-8739.8 that relate to design analysis are found in SWE-052, SWE-058, SWE-060, SWE-087, SWE-134, and SWE-157.
2.1 Use of Checklists
Consider the checklist below, from SADESIGN, when evaluating the software design. Another checklist that can be used for safety-critical software is found in this Handbook, under the Programming Checklists Topic: 6.1 - Design for Safety Checklist.
SADESIGN Checklist:
- Has the software design been developed at a low enough level for coding?
- Is the design complete and does it cover all the approved requirements?
- Have complex algorithms been correctly derived, provide the needed behavior under off-nominal conditions and assumed conditions, and is the derivation approach known and understood to support future maintenance?
- Examine the design to ensure that it does not introduce any undesirable behaviors or any capabilities, not in the requirements?
- Have all requirements sources been considered when developing the design (for example, think about interface control requirements, databases, etc.)?
- Have the interfaces with COTS, MOTS, GOTS, and Open Source been designed?
- Have all internal and external software interfaces been designed for all (in-scope) interfaces with hardware, user, operator, software, and other systems and are they detailed enough to enable the development of software components that implement the interfaces?
- Are all safety features in the design (mitigations, controls, barriers, must-work requirements, must-not-work requirements)
- Does the design provide the dependability and fault tolerance required by the system, and is the design capable of controlling identified hazards? Does the design create any hazardous conditions?
- Does the design adequately address the identified security requirements both for the system and security risks, including the integration with external components as well as information and data utilized, stored, and transmitted through the system?
- Does the design prevent, control, or mitigate any identified security threats and vulnerabilities? Are any unmitigated threats and vulnerabilities documented and addressed as part of the system and software operations?
- Operational scenarios have been considered in the design (for example, use of multiple individual programs to obtain one particular result may not be operationally efficient or reasonable; transfers of data from one program to another should be electronic, etc.).
- Have users/operators been consulted during design to identify any potential operational issues?
- Maintainability: Has maintainability been considered? Is the design modular? Can additions and changes be made quickly?
- Is the design easy to understand?
- Is the design unnecessarily complicated?
- Is the design adequately documented for usability and maintainability?
- Has system performance been considered during design?
- Has the level of coupling (interactivity between modules) been kept to a minimum?
- Has software planned for reuse and OTS software in the system been examined to see that it meets the requirements and performs appropriately within the required limits for this system?
- Does this software introduce any undesirable capabilities or behaviors?
- Has the software design been peer reviewed?
2.2 Use of peer reviews or inspections
Design items designated in the software development plans should be peer reviewed or inspected. Some of the items to look for during these meetings are:
- Assess the design against the hardware and identify any gaps.
- Confirm that the detailed design is consistent with the architecture design and describes the units at a low enough level for coding.
- Confirm the design does not contain undesirable functionality.
- Confirm the requirements in SWE-134 have been taken into account for safety-critical software.
- Confirm the design addresses any possible unauthorized access.
2.3 Review of Traceability
Review the traces from requirements to design and design to requirements and ensure they are complete. As the project moves into implementation, the bi-directional trace matrices between design and code should also be checked.
2.4 Analysis by Software Architecture Review Board (SARB) - applies to NASA projects only
The SARB is a NASA-wide board that engages with flight projects in the formative stages of software architecture. The objectives of SARB are to manage and/or reduce flight software complexity through better software architecture and help improve mission software reliability and save costs. NASA projects that meet certain criteria (for example, large projects, ones with safety critical concerns, projects destined for considerable reuse, etc.) may request the SARB to do a review and assessment for their architecture.
2.5 Reporting of Results
Any design analysis done in the interim between status reports or prior to milestone reviews should be reported on to management and the rest of the team. When a project has safety-critical software, any analysis done by Software Assurance should be shared with the Software Safety personnel. The results reporting should include:
- Identification of what was analyzed: Mission/Project/Application
- Person or group doing analysis
- Period/Timeframe/Phase analysis performed during
- Documents used in analysis (e.g., requirements version, etc.)
- Description or identification of analysis techniques used
- Overall assessment of design, based on analysis
- Major findings and associated risk
- Current status of findings: open/closed; projection for closure timeframe
2.6 Problem/Issue Tracking System
Findings, issues, and concerns from all the different software and safety design analyses performed should be documented in a problem/issue tracking system and tracked to closure. These items should be communicated to the software development personnel and possible solutions discussed. The analysis done by Software Assurance and Software Safety can be reported in one combined report if desired.
3. Safety Design Analysis
3.1 Review Software Design Analysis
There are many considerations for analyzing the design with respect to safety. Most of the design analysis that is used for non-safety projects is still applicable for safety critical software. So, to begin with, the Software Safety personnel should either review or ensure that the Software Assurance personnel have reviewed the set of items listed in Tab 2 -Software Design Analysis Guidance. The first of these is the SADESIGN checklist (previously in Topic 7.18). Another checklist that can be used for safety-critical software is found in this Handbook, under the Programming Checklists Topic: 6.1 - Design for Safety Checklist.
3.2 Design peer reviews or design walkthroughs
Design peer reviews or design walkthroughs for safety-critical components are recommended for safety-critical components to identify design problems or other issues. One of the most important aspects of a software design for safety critical software is to design for minimum risk. “Minimum risk” includes the hazard risk, the risk of software defects, risk of human operator errors and other types of risk such as programmatic, cost, schedule, etc. When possible, eliminate identified hazards and risks or reduce the associated risk through design. Some of the ways risk can be reduced through design are listed below. This list can be used by attendees of design peer reviews or walk-throughs to help evaluate the design with respect to safety and risk considerations.
Safety Considerations during Design Peer Reviews/Walk-throughs:
- Reduce the complexity of the software and interfaces.
Design for user-safety instead of user-friendly.
Design for testability during development and integration.
Give more design “resources” (such as time, effort) to the higher risk aspects such as hazard controls.
Include separation of commands, functions, files, and ports.
Include design for Shutdown/Recovery/Safing.
Plan for monitoring and detection.
Isolate the components containing safety-critical requirements as much as possible.
Interfaces between safety-critical components should be designed for minimum interaction.
Document the positions and functions of safety critical components in the design hierarchy.
Document how each safety-critical component can be traced back to the original safety requirements and how the requirements are implemented.
Specify safety-related design and implementation constraints.
Document execution control, interrupt characteristics, initialization, synchronization, and control of the components. For high risk systems, interrupts should be avoided since they may interfere with software safety controls. Any interrupts used should be priority-based.
Specify any error detection or recovery schemes for safety-critical components.
Consider hazardous operations scenarios.
The design of safing and recovery actions should fully consider the real-world conditions and the corresponding time to criticality. Automatic safing is often required if the time to criticality is shorter than the realistic human operator response time, or if there is no human in the loop. This can be performed by either hardware or software or a combination depending on the best system design to achieve safing.
Select a strategy for handling faults and failures. Some of the techniques that can be used in fault management are below:
- To prevent fault propagation (cascading of a software error from one component to another) safety-critical components must be fully independent of non-safety-critical components, be able to detect an error and not pass it along.
- Shadowing: A higher level process emulates lower level processes to predict expected performance and decides if failures have occurred in the lower processes. The higher level process implements appropriate redundancy switching when it detects a discrepancy.
- Built-in Test: Fault/Failure Detection, Isolation and Recovery (FDIR) can be based on self-test (BIT) of lower tier processors where the lower level units test themselves and report their status to the higher processor. The higher processor switches out units reporting a failed or bad status.
- Majority voting: Some redundancy schemes are based on majority voting. This technique is especially useful when the criteria for diagnosing failures is complicated (e.g. when an unsafe condition is defined by exceeding an analog value rather than simply a binary value). An odd number of parallel units are required to achieve majority voting.
- Fault Containment Regions: Establish a Fault Containment Region(FCR) to prevent fault propagation such as from non-critical software to safety-critical components; from one redundant software unit to another, or from one safety-critical component to another. Techniques such as firewalling or “come from” checks should be used to provide sufficient isolation of FCRs to prevent hazardous fault propagation. FCRs can be best partitioned or firewalled by hardware. A typical method of obtaining independence between FCRs is to host them on different and independent hardware processors.
- Redundant architecture: In redundant architecture, there are two versions of the operational code which do not need to operate identically. The primary version is a high performance version with all required functionality and performance requirements. If problems occur with this version, the other version (called a safety kernel )will be given control. This version may have the same functionality, or it may have a more limited scope.
- Recovery blocks: These use multiple software versions to find and recover from faults. Outputs from a block will be checked against an acceptance test. If it fails, then another version computes the output and the process continues. Each version is more reliable but less efficient. If the last block fails, the program must determine some way to fail safe.
- Self-checks: This is a type of dynamic fault detection. Self-checks can include replication (copies must be identical if the data is to be considered correct), reasonableness (is the data reasonable, based on other data in the system), and structural (are components manipulating complex data correctly).
- Consider any potential issues with the use of COTS, Open Source , reused or inherited code.
- Select sampling rates with consideration for noise levels and expected variations of control system and physical parameters.
- Identify test and/or verification methods for each safety-critical design feature.
- Design for testability. Include ways that the internals of a component can be adequately tested to verify that they are working properly.
- Consider maintainability in the design (For example: anticipate potential changes in the software, use a modular design, object-oriented design, uniform conventions, and naming conventions, use coding standards that support safety practices, use documentation standards, common tool sets)
A few more safety-specific design considerations are below:
- Are the design and its safety features appropriately flowed from the requirements and the evolving hazard analyses?
- Has the design been reviewed to ensure that software design’s correct implementation of safety controls or processes does not compromise other system safety features or the functionality of the software?
- Have additional system hazards, causes, or contributions discovered during the software design analysis been documented in the required system safety documentation (e.g. Safety Data Package and or Hazard Reports)?
- Have Safety reviews approved the controls, mitigations, inhibits, and safety design features to be incorporated into the design?
- Are any needed or identified safety conditions, constraints, parameters, trigger points, boundary conditions, environments, and other software circumstances for safe operation, in the appropriate modes and states all flowed from the software requirements and incorporated into the design?
- Does the design maintain the system in a safe state during all modes of operation or can it transition to a safe state when and if necessary?
- Are any partitioning or isolation methods used in the design to logically isolate the safety critical design elements from those that are non-safety critical effective? This is particularly important with the incorporation of COTS or integration of legacy, heritage, and reuse software. Any software that can write or provide data to safety critical software will also be considered safety critical unless there is isolation built in, and then the isolation design is considered safety critical.
- Are appropriate fault and or failure tolerance incorporated into the software design as designated?
- If heritage code is being used, is there a clear understanding of the design and constraints associated with any fault management in the heritage code? Are they appropriate for the current system being developed?
3.3 Other types of design analysis can be done to analyze particular aspects of the design.
All of these design analyses would be useful to perform, but they require more time and effort so the safety team should choose those they feel would provide the most value, depending on the areas where risk is highest in the design. Some of the other available design analysis methods are below:
a. Acceptable Level of Safety: Once the design is fairly mature, a design safety analysis can be done to determine whether an acceptable level of safety will be attained by the designed system. This analysis involves analyzing the design of the safety components to ensure that all the safety requirements are specified correctly. The requirements may need to be updated once the design has determined exactly what safety features will be included in the system. Then review the design looking for the places and conditions that lead to unacceptable hazards. Consider the credible faults or failure that could occur and evaluate their effects on the designed system. Does the designed system produce the desired result with respect to the hazards?
b. Prototyping or simulating: Prototyping or simulating parts of the design may show where the software can fail. In addition, this can demonstrate whether the software can meet the constraints it might have, such as response time, or data conversion speed. This could also be used to provide the operator’s inputs on the user interface. If the prototypes show that a requirement cannot be met, the requirement must be modified as appropriate or the design may need to be revised.
c. Independence Analysis: To perform this analysis, map the safety-critical functions to the software components, and then map the software components to the hardware hosts and FCRs. All the input and output of each safety-critical component should be inspected. Consider global or shared variables, as well as the directly passed parameters. Consider “side effects” that may be included when a component is run.
d. Design Logic Analysis: The Design Logic Analysis (DLA) evaluates the equations, algorithms, and control logic of the software design. Logic analysis examines the safety-critical areas of a software component. A technique for identifying safety-critical areas is to examine each function performed by the software component. If it responds to or has the potential to violate one of the safety requirements, it should be considered critical and undergo logic analysis. A technique for performing logic analysis is to compare design descriptions and logic flows and note discrepancies. This most rigorous type of analysis can also be done using Formal Methods. Less formal DLA involves a human inspector reviewing a relatively small quantity of critical software products (e.g., PDL, prototype code) and manually tracing the logic. Safety-critical logic to be inspected can include failure detection and diagnosis, redundancy management, variable alarm limits, and command inhibit logical preconditions.
e. Design Data Analysis: The Design Data Analysis evaluates the description and intended use of each data item in the software design. Data analysis ensures that the structure and intended use of data will not violate a safety requirement. A technique used in performing design data analysis is to compare the description to the use of each data item in the design logic.
Interrupts and their effect on data must receive special attention in safety-critical areas. Analysis should verify that interrupts and interrupt handling routines do not alter critical data items used by other routines.
The integrity of each data item should be evaluated with respect to its environment and host. Shared memory and dynamic memory allocation can affect data integrity. Data items should also be protected from being overwritten by unauthorized applications.
f. Design Interface Analysis: The Design Interface Analysis verifies the proper design of a software component's interfaces with other components of the system. The interfaces can be with other software components, with hardware, or with human operators. This analysis will verify that the software component's interfaces, especially the control and data linkages, have been properly designed. Interface requirements specifications (which may be part of the requirements or design documents, or a separate document) are the sources against which the interfaces are evaluated.
Interface characteristics to be addressed should include inter-process communication methods, data encoding, error checking and synchronization.
The analysis should consider the validity and effectiveness of checksums, CRCs, and error correcting code. The sophistication of error checking or correction that is implemented should be appropriate for the predicted bit error rate of the interface. An overall system error rate should be defined and budgeted to each interface.
g. Design Traceability Analysis: This analysis ensures that each safety-critical software requirement is included in the design. Tracing the safety requirements throughout the design (and eventually into the source code and test cases) is vital to making sure that no requirements are lost, that safety is “designed in”, that extra care is taken during the coding phase, and that all safety requirements are tested. A safety requirement traceability matrix is one way to implement this analysis.
3.4 Documenting and Reporting of Results of the Design Analysis:
Any design analysis done in the interim between status reports or prior to milestone reviews should be reported on to management and the rest of the team. When a project has safety-critical software, any analysis done by Software Assurance should be shared with the Software Safety personnel. The results reporting should include:
- Identification of what was analyzed: Mission/Project/Application
- Person or group doing analysis
- Period/Timeframe/Phase analysis performed during
- Documents used in analysis (e.g., requirements version, etc.)
- Description or identification of analysis techniques used
- Overall assessment of design, based on analysis
- Major findings and associated risk
- Current status of findings: open/closed; projection for closure timeframe
3.5 Problem/Issue Tracking System
Findings, issues, and concerns from all the different safety design analyses performed should be documented in a problem/issue tracking system.
- These items should be communicated to the software development personnel and possible solutions discussed.
- A high-level summary along with an overall assessment of the design should be communicated to the project management.
- All items should be addressed and tracked to closure.
- The detailed reporting should include:
- the type of analysis where the finding, issue, or concern was discovered,
- the problem found, and
- an assessment of the amount of risk involved with this finding.
The results of the analysis done by Software Assurance personnel and that done by Software Safety personnel can be reported in one combined report if desired.
4. Resources
4.1 References
No references have been currently identified for this Topic. If you wish to suggest a reference, please leave a comment below.
4.2 Tools
NASA users find this in the Tools Library in the Software Processes Across NASA (SPAN) site of the Software Engineering Community in NEN.
The list is informational only and does not represent an “approved tool list”, nor does it represent an endorsement of any particular tool. The purpose is to provide examples of tools being used across the Agency and to help projects and centers decide what tools to consider.