1. Requirements

4.2.1 The project shall identify, analyze, plan, track, control, communicate, and document software risks in accordance with NPR 8000.4, Agency Risk Management Procedural Requirements.

1.1 Notes

A project needs to include an assessment of the risk that any untested code poses to the system or subsystem.

1.2 Applicability Across Classes

Classes C through E and Safety Critical are labeled, "SO." This means that this requirement applies to the safety-critical aspects of the software.

Class G is labeled with "P (Center)." This means that an approved Center-defined process which meets a non-empty subset of the full requirement can be used to achieve this requirement.




2. Rationale

The purpose of risk management is to identify potential problems before they occur so that risk handling activities can be planned and invoked as needed across the life of the product or project. Risk handling activities are intended to mitigate adverse impacts on achieving the project's objectives. "Generically, risk management is a set of activities aimed at achieving success by proactively risk-informing the selection of decision alternatives and then managing the implementation risks associated with the selected alternative." Identification and management of risks provide a basis for systematically examining changing situations over time to uncover and correct circumstances that impact the ability of the project to meet its objectives. NPR 7150.2 states: "Identifying major risks, both technical and managerial, and determining how to lessen the risk helps keep the software development process under control."


3. Guidance


"Provide a description of the methods and procedures employed to identify, assess, monitor, and control areas of risk arising during the software assurance activities."


This recommendation could be extended to methods and procedures used to manage risk throughout all aspects of the software development life cycle.


To ensure consistent application of risk management, NPR 7150.2 requires that risk management follow NPR 8000.4, Agency Risk Management Procedural Requirements.


NPR 7150.2 also includes a list of steps to be included when a project plans its risk management strategy; these steps are common to continuous risk management. This diagram from NASA/SP-2007-6105, NASA Systems Engineering Handbook, provides an overview of a risk management process:

Guidance for each of the risk management steps is provided below. In addition to the guidance found in this Handbook, consult Center Process Asset Libraries (PALs) for Center-specific guidance and resources related to continuous risk management.


Risk management activities begin during the project concept phase and continue through project retirement. Larger projects may have project leads with responsibility for risk management, but all project team members need to look for and bring risks to management's attention.


Identify software risks

When identifying software risks, consider the following insights and suggestions:

  • "Identify risks before they become problems. Communication is the center of the Risk Management paradigm (see NPR 8000.4, Agency Risk Management Procedures and Guidelines). Brainstorming is often used to identify project risks. People from varying backgrounds and points-of-view see different risks. A diverse team, skilled in communication, will usually find better solutions to the problems."
  • Use a checklist to avoid "missing" risks that have been identified on previous projects.
  • Add new risks to existing risk checklists for future projects.
  • Review lessons learned from past projects.
  • Use existing reference lists; NASA/SP-2007-6105, NASA Systems Engineering Handbook, includes a list of example sources of risk.
  • Include software assurance and/or software safety personnel on change control boards (CCBs) as roles specifically assigned to identify risks.
  • Risk identification needs to be proactive and a continual process.
  • Risk identification needs to occur whenever there is a significant change in project circumstances that could result in new risks.
  • Software risk identification needs to include cost, performance, and schedule risks as well as technical or skill risks.

Analyze software risks

Once the team identifies the initial set of risks, analysis needs to be performed to determine the likelihood (probability) and severity of the consequences of each risk.

When performing this analysis, many risk management guidebooks suggest including the following:

  • Scenarios in which the risk could occur.
  • Likelihood of occurrence.
  • Consequences.


Keep in mind that "a rare but severe risk contributor may warrant a response different from that warranted by a frequent, less severe contributor, even though both have the same expected consequences."


One analysis method is a probabilistic risk assessment (PRA). Per NASA/SP-2007-6105, NASA Systems Engineering Handbook, "PRA is a scenario-based risk assessment technique that quantifies the likelihoods of various possible undesired scenarios and their consequences, as well as the uncertainties in the likelihoods and consequences.... For additional information on probabilistic risk assessments, refer to [NPR 8705.5A, Technical Probabilistic Risk Assessment (PRA) Procedures Guide for Safety and Mission Success for NASA Programs and Projects]." (Editor's Note: NPR 8705.3 has been updated to NPR 8705.5A in this quotation.)

Another recommendation is to model the scenarios and use those models to assess the consequences and determine the likelihood of a risk occurring.

The results of this step are used to rank the identified risks and the possible alternatives for those risks so that informed plans can be put into place to address those risks. NASA/SP-2007-6105, NASA Systems Engineering Handbook, describes tools and techniques for analyzing and managing risks, including:

  • Risk matrices - to facilitate discussions regarding "the status and effects of risk-handling efforts, and communicate risk status information."
  • FMEA (failure mode and effects analysis) and FMECA (failure modes, effects, and criticality analysis) – "an ongoing procedure by which each potential failure in a system is analyzed to determine the results or effects thereof on the system, and to classify each potential failure mode according to its consequence severity."
  • FTA (fault tree analysis) - "identify potential failure modes for a product or process, to assess the risk associated with those failure modes, to rank the issues in terms of importance, and to identify and carry out corrective actions to address the most serious concerns."

Plan to address software risks

After the team identifies and analyzes the initial set of risks, a plan for managing those risks (and any risks identified later in the project life cycle) is needed. This plan may be standalone or be captured in the Software Development Plan/Software Management Plan (SDP/SMP) and updated throughout the project life cycle to reflect current risk management status. It is also important to inform providers, typically via their contract, that their risk management plans will be reviewed periodically by the acquirer.

Typical options for addressing risks include:

  • Accepting all or part of a risk.
  • Eliminating the risk.
  • Mitigating the risk (reducing the likelihood, reducing the negative effects).
  • Monitoring the risk.
  • Conducting further research on the risk.

The risk management plan needs to include topics such as:

  • Risk control and tracking steps describing what will be tracked.
  • Risk control actions.
  • Criteria for taking corrective actions.
  • The project's continuous risk management activities which will identify potential technical problems before they occur and mitigate the impact of those problems on the outcome of the project.
  • Risk owner, role responsible for responding to the risk.

The team needs to consider costs associated with managing, controlling, and mitigating risks when developing the risk management plan. This can be especially important for projects with limited or constrained budgets.

Project-level risk management plans need to describe coordination with program-level plans to ensure proper risk tracking and information sharing. Once the plan is created, it is reviewed and approved by an appropriate level of project management before it is implemented.

Track software risks

Risks that are not eliminated need to be tracked throughout the project life cycle to ensure their mitigation strategies remain effective. For low-risk items that are not formally included in the risk management plan, consider using a watch list so that they are not forgotten and to help ensure that they do not escalate to a higher level risk later in the project.

Additionally, conditions that the team has identified as risk triggers are also monitored and tracked until those situations are no longer risk factors. Risk status also needs to be tracked and weighed against risk criteria to determine if corrective action needs to be taken.

If a risk management tool is in use for the project, risks need to be added to and tracked using this tool. A tracking tool could be a simple spreadsheet or database for a small project, a tool purchased specifically for tracking risks, or part of an integrated tool used to track multiple aspects of the project.

Control software risks

When a risk occurs, action needs to be taken. Those actions should have been included in the risk management plan and need to be implemented in this step. Their effectiveness also needs to be measured so adjustments to the plan can be made, if necessary.

Communicate software risk information

Risk information is communicated to all relevant stakeholders throughout the project life cycle. Stakeholders include project managers, project technical personnel, test team members, and anyone else affected by or with the need to know about risks, their impact, and their mitigations. Project life cycle reviews are one mechanism for risk communication.


Information, such as the effectiveness of risk mitigations and action plans, needs to be communicated to project managers, technical authorities, and other roles that make risk decisions and risk-based decisions throughout the project life cycle.


Document software risks

Documenting software risks is an activity that the team needs to do as part of all previous steps. Documentation could include:

  • Analysis records decisions based on that analysis.
  • Records of risk acceptance (approval signatures and reasons for acceptance).
  • Records of planned mitigations and control mechanisms.
  • A list of identified risks.
  • A list of planned controls.
  • Risk acceptance rationale.


NASA-STD-8719.13, NASA Software Safety Standard, requires that "The software safety manager shall assure that risks affecting software safety are captured, addressed, and managed as part of program, project, and facility risk management processes, and those risks which could impose a system hazard are captured in the system hazard analyses."


The table below shows roles and responsibilities typical for continuous risk management:


Role

Responsibilities

Center SMA (Safety and Mission Assurance) organizations
Software assurance and safety personnel

Provide risk management consultation, facilitation, and training to program/project organizations.
Participate in CCB (Configuration Control Boards) to help identify risks; assure safety risks are captured and managed by programs, projects, facilities.

Software management

Review and approve risk management plan; ensure continuous risk management is implemented; designate the risk manager; ensure that key decisions are risk-informed; coordinate management of risks across affected projects or project elements.

Software Risk Manager

Overall responsibility for software risk management; ensures risk management plan developed.
Note: could be the Software Lead Engineer and is not necessarily a full-time role.

Project software team members

Bring risks to management's attention; support Risk Manager in monitoring and controlling risks.


A recommended practice is that the Software Lead Engineer maintain a list of software risks independent of the program's risk list. Frequently, the program risks are larger than any given software risk item. The software risk data needs to be maintained in an organizational database.

Additional guidance related to risk management may be found in the following related requirement in this Handbook:


SWE-102

Software Development/Management Plan



4. Small Projects

Projects with limited budgets may consider using spreadsheets or small databases to track their project risks rather than purchase a tool for this purpose. Small projects could also consider using tools available at the Center level since those may have no associated purchase or lease costs.


5. Resources



6. Lessons Learned

The NASA Lessons Learned database contains the following lessons learned related to risk management:

  • Lewis Spacecraft Mission Failure Investigation Board. Lesson Number 0625: Adopt Formal Risk Management Practices. "Faster, Better, Cheaper methods are inherently more risk prone and must have their risks actively managed. Disciplined technical risk management must be integrated into the program during planning and must include formal methods for identifying, monitoring and mitigating risks throughout the program. Individually small, but unmitigated risks on Lewis produced an unpredicted major effect in the aggregate."
  • Identification, Control, and Management of Critical Items Lists. Lesson Number 0803: The Use of Probabilistic Risks Assessments: "Probabilistic risk assessments have proven to be useful procedures in providing product development teams with an insight into factors of safety and to strengthen critical item or single failure point retention rationale. Margins of safety have a strong influence on the acceptability of retaining potential failure modes or critical items if it can be proven that risk of failure is reduced to an acceptably low level."
  • Flight Anomaly of Atmospheric Trace Molecule Spectroscopy (ATMOS) Instrument, Risk Assessment. Lesson Number 0272:  Lesson Learned No. 2 states: "Low-cost STS (Shuttle Transportation System)-borne experiments with plans for repeated flights, exemplified by the ATMOS (Atmospheric Trace Molecule Spectroscopy) spectrometer, require risk assessments different from those used for single launch experiments."