2

1. Purpose

This topic provides a set of minimum content guidance for software project plans, reports, and procedures.  This guidance previously appeared in the version of this Handbook associated with NPR 7150.2A, which contained requirements for software documentation.  Those requirements have been removed from the NPR and incorporated into the guidance provided in this Handbook topic.

1.1 Introduction

Guidance for each document is provided via the linked pages below. Software documentation resources and lessons learned are also reiterated on those respective pages.

The tab labels are abbreviated as follows:


To supplement and implement this guidance, NASA-specific documentation templates, examples, checklists, and more are available in Software Processes Across NASA (SPAN), accessible to NASA users from the SPAN tab in this Handbook.  Center-specific guidance and resources, such as templates, are available in Center Process Asset Libraries (PALs).



2. References

2.1 Tools

3. Lessons Learned

The NASA Lesson Learned database contains the following lessons learned related to software project documentation:

  • Computer Hardware-Software/International Space Station/Software Development (Plan for user involvement). Lesson Learned 1132: "The lack of user involvement results in increased schedule and safety risk to the program... follow a concurrent engineering approach to building software that involves users and other key discipline specialists early in the software development process to provide a full range of perspectives and improve the understanding of requirements before code is developed."
  • Computer Software/Software Safety Policy Requirements/Potential Inadequacies (Cover essential requirements for the project). Lesson Learned 1021: "NASA is committed to assuring that required program management plans and any subordinate plans such as software or safety management plans cover the essential requirements for programs where warranted by cost, size, complexity, lifespan, risk, and consequence of failure."
  • Kennedy Space Center (KSC) Projects and Resources Online (KPRO) Software Development and Implementation (Project team planning). Lesson Learned 1384: "When planning and selecting team resources for a project, consider how the resources can work together and support each other, along with the skills required. This can be a factor in meeting or delaying software project milestones if an alternative resource has not been endorsed by the team members."
  • Place Flight Scripts Under Configuration Management Prior to ORT (Project attention to configuration control). Lesson Number 2476: "Project attention to the configuration control of flight scripts is likely to prevent the generation of unnecessary software iterations, improve the rigor of mission system engineering processes, and ensure consistency in the test and operations environments."
  • MPL Uplink Loss Timer Software/Test Errors (1998) (Plan to test against full range of parameters). Lesson Number 0939: "Unit and integration testing should, at a minimum, test against the full operational range of parameters. When changes are made to database parameters that affect logic decisions, the logic should be re-tested."
  • Deep Space 2 Telecom Hardware-Software Interaction (1999) (Plan to test as you fly). Lesson Number 1197: "To fully validate performance, test integrated software and hardware over the flight operational temperature range."
  • International Space Station (ISS) Program/Computer Hardware-Software/Software (Plan realistic but flexible schedules). Lesson Number 1062: "NASA should realistically reevaluate the achievable ... software development and test schedule and be willing to delay ... deployment if necessary rather than potentially sacrificing safety."
  • Thrusters Fired on Launch Pad (1975) (Plan for safe exercise of command sequences). Lesson Number 0403: "When command sequences are stored on the spacecraft and intended to be exercised only in the event of abnormal spacecraft activity, the consequences should be considered of their being issued during the system test or the pre-launch phases."
  • Quality Assurance Access to Critical Areas, Management (Quality Assurance Access). Lesson Number 0332: "If program or engineering management exercises the authority to deny QA access to critical areas, QA oversite will be severely [compromised]." [Note that, while this lesson does not specifically call out software, the lesson remains relevant regarding assurance personnel access to information and areas required to perform software assurance activities.]
  • Quality Assurance Expertise in Special Technical Areas (e.g., optics) (Quality Assurance Expertise). Lesson Number 0331: "If quality assurance personnel responsible for oversight of the quality of highly technical state-of-the-art development do not have a degree of expertise in that technical area, the likelihood of discovering QA problems decreases significantly." [Note that, while this lesson does not specifically call out software, the lesson remains relevant regarding assurance personnel access to information and areas required to perform software assurance activities.]
  • International Space Station (ISS) Program/Computer Hardware-Software/Software (Schedules). Lesson Number 1062: "The ISS software development schedule is almost impossibly tight. If something else does not cause a further delay in ISS deployment, software development may very well do so. The decision this year to add integrated testing of some modules ... is a very positive step for safety. However, there is no room in the schedule for required changes that may be discovered during this testing."
  • Lack of Education and Training in the Use and Processes of Independent Verification & Validation (IV&V) for Software Within NASA (2001). Lesson Number 1173: "While NASA has made major changes to emphasize the need to utilize IV&V on safety critical projects, the technology is not well understood by program managers and other relevant NASA personnel."
  • Erroneous Onboard Status Reporting Disabled IMAGE's Radio. Lesson Number 1799: "The loss of the IMAGE satellite was attributed to a Single Event Upset-induced "instant trip" of the Solid State Power Controller (SSPC) that supplies power to the single-string Transponder. The circuit breaker was not reset because this hybrid device incorrectly reported the circuit breaker as closed, and ground could not command a reset because the satellite's single telemetry receiver had been disabled by the SSPC. The SSPC's problematic state reporting characteristic was an intentional design feature that was not reflected in any part documentation, and three similar "instant trips" on other NASA satellites had not been reported in the GIDEP system. Consider hardwiring receiver power to the power bus, or build redundancy into the power switching or into the operational status sensing. Ensure that GIDEP reports or NASA Alerts are written and routed to mission operations (as well as to hardware developers), and that flight software responds to command loss with a set of timed spacecraft-level fault responses."
  • Probable Scenario for Mars Polar Lander Mission Loss (1998) (Affects of an incomplete software requirements specification). Lesson Number 0938: "All known hardware operational characteristics, including transients and spurious signals, must be reflected in the software requirements documents and verified by test."
  • Consider Language Differences When Conveying Requirements to Foreign Partners (1997) (Diagrams may be useful in requirements specifications). Lesson Number 0608: "It is especially important when working with foreign partners to document requirements in terms that describe the intent very clearly; include graphics where possible."
  • Develop and Test the Launch Procedure Early (1997). Lesson Number 0609: The Abstract states: "During the terminal countdown for the first attempted launch of Cassini, spacecraft telemetry channels indicated a false alarm condition that delayed verification of spacecraft readiness for launch, and contributed to a delay on the first launch day. The anomaly was traced to erroneous telemetry documentation. Develop and release the launch procedure early enough for comprehensive testing before launch. Rigorously test and verify all telemetry channels and their alarms and ensure documentation such as telemetry definitions is kept up to-date."
  • Mars Climate Orbiter Mishap Investigation Board - Phase I Report. Lesson Number 0641: "The MCO Mishap Investigation Board (MIB)...determined that the root cause for the loss of the MCO spacecraft was the failure to use metric units in the coding of a ground software file." The data in the ... file was required to be in metric units per existing software interface documentation, and the trajectory modelers assumed the data was provided in metric units per the requirements. In fact, the angular momentum (impulse) data was in English units rather than metric units. The failure to properly communicate what was in the interface documentation is a warning that the effective implementation of the IDD requirement includes adequate communication of its contents, not just the writing and recording.
  • Interface Control and Verification. Lesson Number 0569: Problems occurred during the Mars Pathfinder spacecraft integration and test due to out-of-date or incomplete interface documentation. (While this lesson involves a hardware-related problem, it illustrates the need for accuracy in interface documentation.) Investigation showed that the main wiring harness was built in accordance with documentation that had not been updated after design changes were made. In part, this was due to independently prepared Mechanical Interface Control Drawings by the Government and the contractor. The MICD s should have had periodic verification for accuracy and compatibility.
  • Problem Reporting and Corrective Action System. Lesson Number 0738: Impact of Non-Practice states: "Hardware/software problems that require further investigation may not be identified and tracked. Development of corrective action and need for improvement will not be highlighted to engineering. Opportunities for early elimination of the causes of failures and valuable trending data can be overlooked." Practice states: "A closed-loop Problem (or Failure) Reporting and Corrective Action System ( PRACAS or FRACAS ) is implemented to obtain feedback about the operation of ground support equipment used for the manned spaceflight program."
  • Pre-Flight Problem/Failure Reporting Procedures. Lesson Number 0733: Impact of Non-Practice states: "Without ... formal reporting procedures, problems/failures, particularly minor glitches, may be overlooked or not considered serious enough to investigate or report to Project Management. This could result in recurrence of the problem/failure during the mission and result in significant degradation in performance."
  • Probable Scenario for Mars Polar Lander Mission Loss (1998) (Importance of including known hardware characteristics). Lesson Number 0938: "1. Project test policy and procedures should specify actions to be taken when a failure occurs during test. When tests are aborted, or known to have had flawed procedures, they must be rerun after the test deficiencies are corrected. When test article hardware or software is changed, the test should be rerun unless there is a clear rationale for omitting the rerun. 2. All known hardware operational characteristics, including transients and spurious signals, must be reflected in the software requirements documents and verified by test."
  • Take CM Measures to Control the Renaming and Reuse of Old Command Files. Lesson Number 1481: The Mars Odyssey mission ran into a version control issue when they discovered an improperly named file call script. It was determined that the team had taken an old Mars Global surveyor file to reuse. The file was renamed, but its code creation time captured in the header was not changed. This caused the system to label the file as an old file. As a result, the operations team had to manually specify the correct file to use, until subsequent code fixes were implemented.
  • Computer Hardware-Software/Software Development Tools/Maintenance. Lesson Number 1128: "NASA concurs with the finding that no program-wide plan exists addressing the maintenance of COTS software development tools. A programmatic action has been assigned to develop the usage requirements for COTS/modified off-the-shelf software including the associated development tools. These guidelines will document maintenance and selection guidelines to be used by all of the applicable program elements."
  • International Space Station (ISS) Program/Computer Hardware-Software/International Partner Source Code (Maintenance Agreements.) Lesson Number 1153: The Recommendation states to "Solidify long-term source code maintenance and incident investigation agreements for all software being developed by the International Partners as quickly as possible, and develop contingency plans for all operations that cannot be adequately placed under NASA's control."