9.01 Software Design Principles Upon startup, flight systems need to autonomously enter a state that requires no immediate ground intervention to ensure its health and safety, and that preserves vital system resources, even in the presence of faults. Flight software initialization spans three typical scenarios: nominal, multiple restart, and minimal boot. The minimal boot scenario results in a stable, commandable state, where downlinks are possible, and that maximizes preservation of system resources---a safe state. Multiple restart is a type of fault response where the software attempts a start after a prior failed attempt. Multiple restarts require the software system to preserve knowledge of a prior failed attempt so it can invoke degraded performance restart modes all the way down to minimal boot. Nominal boot is a safe state where certain capabilities are inactive by design. The definition of safe state is dependent on mission phase. Initialization-Safe Mode may require incorporation of the following software design elements: Software designed to detect off-nominal restarts and to successively reinitialize with less and less dependency on preserved state (e.g., inertial, temporal, device capability or configuration, file allocation tables, boot code in RAM, etc.) from before the most recent reset, until a fully known and tested initial configuration is obtained, and until stable operation has been restored. See also Topic 8.01 - Off Nominal Testing. Links to Additional Guidance materials for this subject have been compiled in the Relevant Links table. Click here to see the
Additional Guidance in the Resources tab. None Additional guidance related to this requirement may be found in the following materials in this Handbook: SPAN - Software Processes Across NASA See the following link(s) in SPAN for process assets from contributing Centers (NASA Only). Lessons that appear in the NASA LLIS 439 or Center Lessons Learned Databases.
See edit history of this section
Post feedback on this section
1. Principle
1.1 Rationale
2. Examples and Discussion
2.1 Additional Guidance
3. Inputs
3.1 ARC
Note: Reset is commonly used as a means of autonomous recovery from serious software problems caused by errors or single event upsets. Reset is not effective unless the problematic software state is cleared during re-initialization. Ultimately, all software states must be presumed suspect and expendable, if prior re-initializations have failed to resolve a problem. A complete accounting of preserved state is essential, if effective measures are to be taken against it.
Note: A safe state is a state in which the spacecraft thermal condition and inertial orientation are stable, the spacecraft is commandable and is transmitting a downlink signal, and requires no immediate commanding to ensure spacecraft health and safety that preserves vital spacecraft resources. The safe state shall be power-positive.3.2 GSFC
3.3 JPL
Note: The safing mode may be a single state or more than one state. The downlink signal need not be continuous, but must be predictable in its timing.
Rationale: The spacecraft must autonomously recover from a detected fault when the function(s) affected by the fault threaten spacecraft/instrument survival (e.g., functions necessary to maintain Safe mode). Ensure spacecraft survivability and viability by preserving vital spacecraft resources (e.g., thermal, power), while enabling ground interaction (e.g., command and downlink) for recovery operations. It is not enough merely to diagnose and isolate faults, or to restore lost functionality, if the resulting system state still threatens the rest of the mission (e.g., through stress, loss of consumables, or unresponsiveness to operator control).
Note: A missed tracking pass should not be reason to declare a spacecraft emergency, thus requiring rescheduling of tracking resources.
Note: 14 days is a typical duration based on the interval between ground contacts, but can be project and mission phase dependent.
Rationale: Transition to safing may be due to an operational mistake, and the system should still be single fault tolerant while awaiting ground recovery.
Note: Autonomous completion implies restoring the functionality needed to complete the mission-critical event. See 4.9.1.2 and 4.9.1.3 in theJPL Rules on Protection for Credible Single Faults and Protection for Multiple Faults, respectively.
Rationale: For certain mission critical events, ground response may not be possible and the autonomous fault protection design must ensure completion in the event of a single fault.
Note: Elements to consider when establishing state include inertial, temporal, device capability or configuration, file allocation tables, and boot code in RAM.
Note: Reset is commonly used as a means of autonomous recovery from serious software problems caused by errors or single event upsets. Reset is not effective unless the problematic software state is cleared during re-initialization. Ultimately, all software states must be presumed suspect and expendable, if prior re-initializations have failed to resolve a problem. A complete accounting of preserved state is essential, if effective measures are to be taken against it.
Note: This would include the ability to boot without resources that are of higher risk, or are not strictly required for safing. For example, some missions have included a separate flight software version that was capable of minimal operations without the file system.3.4 MSFC
Rationale: For certain mission critical events, ground response may not be possible and the autonomous fault protection design must ensure completion in the event of a single fault.
Rationale: Diagnostic code is to be designed and incorporated into the software early, and be accessible through flight interfaces, so that problem resolution can be done rapidly and easily at element and flight system level in development and during flight operations. Mission critical event data and visibility of mission-critical errors should be available via real time telemetry for diagnostic use on the ground or during testing.4. Resources
4.1 References
4.2 Additional Guidance
Related Links 4.3 Center Process Asset Libraries
SPAN contains links to Center managed Process Asset Libraries. Consult these Process Asset Libraries (PALs) for Center-specific guidance including processes, forms, checklists, training, and templates related to Software Development. See SPAN in the Software Engineering Community of NEN. Available to NASA only. https://nen.nasa.gov/web/software/wiki 197SPAN Links 5. Lessons Learned
5.1 NASA Lessons Learned
9.10 Initialization - Safe Mode
Web Resources
View this section on the websiteUnknown macro: {page-info}