Include a robust and well thought out response to resource oversubscription situations in the software design.
1.1 Rationale
Resource oversubscription is a severe fault condition that can lead to unpredictable behavior of the software system and render it inoperable. Timely detection and planned response to oversubscriptions can preserve critical system capabilities.
Div
id
tabs-2
2. Examples and Discussion
Many resources can become oversubscribed during system operation. Examples include: buffers overflowing, exceeding a rate group time boundary, and excessive inputs or interrupts. The usual response consists of reducing the demand presented on the system by non-essential items, especially if they are the cause of the oversubscription. The system can also generate error messages, being careful not to overload the system further, and attempt to throttle the demand presented to it by external entities. In severe cases interrupts may be locked out. Additionally, the system may be reconfigured to reduce/eliminate non-essential functionality, or to allow use of less-demanding (even if less-accurate) algorithms. In severe cases processes and even the computer may have to be shut down.
An example of a graceful response to an overload is the Apollo Lunar Module onboard software, which correctly handled an unplanned scenario in which the LM's ascent stage rendezvous radar was incorrectly switched on, overloading the Apollo Guidance Computer with input data. In this case the use of task prioritization (the Guidance, Navigation and Control (GNC) had higher priority than the radar), prevented the critical GNC functions from being starved of processor time, saving the mission and crew.
It is recommended that the system design include the monitoring of resource usage with appropriate thresholds set to trigger carefully designed escalation responses, and that the method for detection and response protocol be explicitly documented and verified in the requirements.
This design principle is closely related to the 9.12 Resource Margins principle. Implementing run time measurements enables monitoring of margins during development as well as protecting against oversubscription once the software has been deployed operationally.
Div
id
tabs-3
3. Inputs
Show If
group
confluence-users
Panel
titleColor
red
title
Visible to editors only
Excerpts from two documents are included below but no information on the documents that the excerpts were taken from is available. These documents should be properly referenced.
3.1 ARC
3.7.2.4.5 Response to Resource Over-Subscription - The software design should accommodate unintended situations where resource usage is oversubscribed. The action to be taken in such situations should be specified as part of the requirements on the design.
Note: Examples of these situations include buffers overflowing, exceeding a rate group time boundary, and excessive inputs or interrupts. There are several common methods for tolerating these situations, most of which relate to reducing demand from non-essential items, especially if they are the source of over subscription:
a. Generate warning messages when appropriate. b. Instruct external systems to reduce their demands. c. Lock out interrupts. d. Change operational behavior to handle the load. For example, the software may use faster but less accurate algorithms to keep up with the load. e. Reduce the functionality of the software, or even halt or suspend a process or shutdown a computer.
3.2 GSFC
None
3.3 JPL
4.11.4.5 Response to resource over-subscription - The software design shall contain a robust response to situations where computer resources are oversubscribed. The action to be taken in such situations shall be specified as part of the requirements on the design.
Note: Examples of these situations include buffers overflowing, exceeding a rate group time boundary, and excessive inputs or interrupts. There are several common methods for tolerating these situations, most of which relate to reducing demand from non-essential items, especially if they are the source of over subscription:
a. Generate warning messages when appropriate. b. Instruct external systems to reduce their demands. c. Lock out interrupts. d. Change operational behavior to handle the load. For example, the software may use faster but less accurate algorithms to keep up with the load. e. Reduce the functionality of the software, or even halt or suspend a process or shutdown a computer.
3.4 MSFC
None
Div
id
tabs-4
4. Resources
4.1 References
Include Page
REF RPT p13
REF RPT p13
refstable-topic
Show If
group
confluence-users
Panel
titleColor
red
title
Visible to editors only
Enter the necessary modifications to be made in the table below:
SWEREFs to be added
SWEREFS to be deleted
SWEREFs called out in the text: 439, 675
SWEREFs NOT called out in text but listed as germane: NONE
Include Page
REF RPT p13
REF RPT p13
Div
id
tabs-5
5. Lessons Learned
5.1 NASA Lessons Learned
The NASA Lesson Learned
Swerefn
refnum
439
database contains the following lessons learned related to resource oversubscription:
Science Data Downlink Process Must Address Constraints Stemming from Fixed Deep Space Network (DSN) Assets. Lesson Learned 1843:
Swerefn
refnum
675
"Given their minimal ability to mitigate DSN resource limitations, flight projects must consider mission design and mission operations improvements that may help to achieve Level 1 requirements, such as the 9 measures effectively employed by the Spitzer project."