9.01 Software Design Principles Upon startup, flight systems need to autonomously enter a state that requires no immediate ground intervention to ensure its health and safety, and that preserves vital system resources, even in the presence of faults. Flight software initialization spans three typical scenarios: nominal, multiple restart, and minimal boot. The minimal boot scenario results in a stable, commandable state, where downlinks are possible, and that maximizes preservation of system resources---a safe state. Multiple restart is a type of fault response where the software attempts a start after a prior failed attempt. Multiple restarts require the software system to preserve knowledge of a prior failed attempt so it can invoke degraded performance restart modes all the way down to minimal boot. Nominal boot is a safe state where certain capabilities are inactive by design. The definition of safe state is dependent on mission phase. Initialization-Safe Mode may require incorporation of the following software design elements: None Lessons that appear in the NASA LLIS 439 or Center Lessons Learned Databases.
See edit history of this section
Post feedback on this section
1. Principle
1.1 Rationale
2. Examples and Discussion
3. Inputs
3.1 ARC
Note: Reset is commonly used as a means of autonomous recovery from serious software problems caused by errors or single event upsets. Reset is not effective unless the problematic software state is cleared during re-initialization. Ultimately, all software states must be presumed suspect and expendable, if prior re-initializations have failed to resolve a problem. A complete accounting of preserved state is essential, if effective measures are to be taken against it.
Note: A safe state is a state in which the spacecraft thermal condition and inertial orientation are stable, the spacecraft is commandable and is transmitting a downlink signal, and requires no immediate commanding to ensure spacecraft health and safety that preserves vital spacecraft resources. The safe state shall be power-positive.3.2 GSFC
3.3 JPL
Note: The safing mode may be a single state or more than one state. The downlink signal need not be continuous, but must be predictable in its timing.
Rationale: The spacecraft must autonomously recover from a detected fault when the function(s) affected by the fault threaten spacecraft/instrument survival (e.g., functions necessary to maintain Safe mode). Ensure spacecraft survivability and viability by preserving vital spacecraft resources (e.g., thermal, power), while enabling ground interaction (e.g., command and downlink) for recovery operations. It is not enough merely to diagnose and isolate faults, or to restore lost functionality, if the resulting system state still threatens the rest of the mission (e.g., through stress, loss of consumables, or unresponsiveness to operator control).
Note: A missed tracking pass should not be reason to declare a spacecraft emergency, thus requiring rescheduling of tracking resources.
Note: 14 days is a typical duration based on the interval between ground contacts, but can be project and mission phase dependent.
Rationale: Transition to safing may be due to an operational mistake, and the system should still be single fault tolerant while awaiting ground recovery.
Note: Autonomous completion implies restoring the functionality needed to complete the mission-critical event. See 4.9.1.2 and 4.9.1.3 in theJPL Rules on Protection for Credible Single Faults and Protection for Multiple Faults, respectively.
Rationale: For certain mission critical events, ground response may not be possible and the autonomous fault protection design must ensure completion in the event of a single fault.
Note: Elements to consider when establishing state include inertial, temporal, device capability or configuration, file allocation tables, and boot code in RAM.
Note: Reset is commonly used as a means of autonomous recovery from serious software problems caused by errors or single event upsets. Reset is not effective unless the problematic software state is cleared during re-initialization. Ultimately, all software states must be presumed suspect and expendable, if prior re-initializations have failed to resolve a problem. A complete accounting of preserved state is essential, if effective measures are to be taken against it.
Note: This would include the ability to boot without resources that are of higher risk, or are not strictly required for safing. For example, some missions have included a separate flight software version that was capable of minimal operations without the file system.3.4 MSFC
Rationale: For certain mission critical events, ground response may not be possible and the autonomous fault protection design must ensure completion in the event of a single fault.
Rationale: Diagnostic code is to be designed and incorporated into the software early, and be accessible through flight interfaces, so that problem resolution can be done rapidly and easily at element and flight system level in development and during flight operations. Mission critical event data and visibility of mission-critical errors should be available via real time telemetry for diagnostic use on the ground or during testing.4. Resources
4.1 References
5. Lessons Learned
5.1 NASA Lessons Learned
9.10 Initialization - Safe Mode
Web Resources
View this section on the websiteUnknown macro: {page-info}