9.01 Software Design Principles The ability to continue operating in the presence of invalid data and react appropriately can prevent responses that lead to hazardous conditions. Invalid data can be the result of noisy signals, environmental conditions outside the expected range, malfunctions, software design errors (e.g., function calls outside of valid range) that result in faults, reuse of software in a system with different interface specifications, and other unforeseen situations. Although typically the optimum solution is to design the software to prevent acceptance of invalid input data, robust software often must be able to handle invalid input data without total loss of functionality (defensive programming). An analysis must be performed to determine how the software should respond to erroneous inputs. In some cases, the best response is to simply ignore the erroneous input (e.g., non-critical temperature sensor). In other cases, (e.g., navigation as mentioned in the documents listed below) the best response is for the software to estimate the value of the missing/erroneous inputs. In all cases, an analysis must also be performed to determine how long the software should continue to operate without valid data from each component, in light of safety concerns. One specific case is data that is received that is not valid in the current state of the control system. A common example is a command being received that is not valid in the current state. The software must validate not only the formatting of commands but also the state of the system when a command is received to ensure validity. If possible, command validity for the current state of the target system is checked by both the sender (e.g., Ground Station) and the receiver. This double checking method forces the sender to check the state of the target and the receiver to protect itself against invalid commands. In the same way that commands can be received that are not valid in the current state, data can be received that is within the range of valid outputs from a component but is not valid in the current state of the system. For instance, the software might receive a message from a component that includes a set of sensor data before power has been applied to the subsystem. Some methods that have been used to mitigate the effects of invalid data include smoothing algorithms for noisy data, using a previous valid value, use of a redundant data item, and discarding invalid data. For discrete data, transient discretes can be suppressed by requiring a number of consecutive discretes to match before taking action. Reporting the receipt of invalid data and any action taken in response can alert operators to a potential problem. Software should accommodate both nominal inputs (within specifications) and off-nominal inputs, from which recovery may be required. None The NASA Lesson Learned 439 database contains the following lessons learned related to invalid data handling:
See edit history of this section
Post feedback on this section
1. Principle
1.1 Rationale
2. Examples and Discussion
3. Inputs
3.1 ARC
3.2 GSFC
3.3 JPL
a. Flight software shall be designed to detect and respond safely to corrupted commands, data, or loads, and memory faults allocated to the software, such as stuck bits or single event effects (SEE).
Note: For example, flight computer designs have included Error Detection And Correction (EDAC) logic on EEPROMs, and the load process has been designed to detect and respond to failure if the EDAC detects an uncorrectable bit error. Software designs have included check sum logic and periodic verification of memory to detect command, data, or load, and memory faults.
b. Flight software shall be designed to detect and respond safely to commands, data, or loads, that are incorrectly formatted, including invalid values, or out of range parameters.
c. Flight software shall be designed to detect and respond safely to commands, data, or loads that are invalid in the current context.
Note: For example, a command handler should check whether a received command is appropriate for the current system mode, and a software module should check whether a command is appropriate for its local state.3.4 MSFC
Rationale: Inputs to algorithms outside of expected range are indicators of potential fault conditions and software must continue to function until the fault condition is detected and resolved.
Rationale: The software design should ensure that only valid inputs and outputs are incorporated into the control system state. An integrity check ensures the message is well-formed and not corrupted. Potential faults and the action taken must be defined and determined so that actions taken upon error detection do not set off a chain reaction leading to more serious fault conditions, e.g., issuance of questionable commands to actuators as a result of a fault condition that exacerbates the problem.
Note: For example, flight computer designs have included Error Detection And Correction (EDAC) logic on EEPROMs, and the load process has been designed to detect and respond to failure if the EDAC detects an uncorrectable bit error. Software designs have included check sum logic and periodic verification of memory to detect command, data, or load, and memory faults.
Rationale: Inputs to the software outside of expected range are indicators of potential fault conditions and software must continue to function until the fault condition is detected and resolved. Incorrectly formatted inputs should be detected and handled by the software as a part of the Fault detection, isolation and recovery functionality.4. Resources
4.1 References
5. Lessons Learned
5.1 NASA Lessons Learned
9.11 Invalid Data Handling
Web Resources
View this section on the websiteUnknown macro: {page-info}