bannerc

9.13 Resource Oversubscription

1. Principle

Include a robust and well thought out response to resource oversubscription situations in the software design.

1.1 Rationale

Resource oversubscription is a severe fault condition that can lead to unpredictable behavior of the software system and render it inoperable. Timely detection and planned response to oversubscriptions can preserve critical system capabilities.









2. Examples and Discussion

Many resources can become oversubscribed during system operation. Examples include: buffers overflowing, exceeding a rate group time boundary, and excessive inputs or interrupts. The usual response consists of reducing the demand presented on the system by non-essential items, especially if they are the cause of the oversubscription. The system can also generate error messages, being careful not to overload the system further, and attempt to throttle the demand presented to it by external entities. In severe cases interrupts may be locked out. Additionally, the system may be reconfigured to reduce/eliminate non-essential functionality, or to allow use of less-demanding (even if less-accurate) algorithms. In severe cases processes and even the computer may have to be shut down.

An example of a graceful response to an overload is the Apollo Lunar Module onboard software, which correctly handled an unplanned scenario in which the LM's ascent stage rendezvous radar was incorrectly switched on, overloading the Apollo Guidance Computer with input data. In this case the use of task prioritization (the Guidance, Navigation and Control (GNC) had higher priority than the radar), prevented the critical GNC functions from being starved of processor time, saving the mission and crew.

It is recommended that the system design include the monitoring of resource usage with appropriate thresholds set to trigger carefully designed escalation responses, and that the method for detection and response protocol be explicitly documented and verified in the requirements.

This design principle is closely related to the 9.12 Resource Margins principle. Implementing run time measurements enables monitoring of margins during development as well as protecting against oversubscription once the software has been deployed operationally.

3. Inputs

3.1 ARC

  • 3.7.2.4.5 Response to Resource Over-Subscription - The software design should accommodate unintended situations where resource usage is oversubscribed. The action to be taken in such situations should be specified as part of the requirements on the design.


Note: Examples of these situations include buffers overflowing, exceeding a rate group time boundary, and excessive inputs or interrupts. There are several common methods for tolerating these situations, most of which relate to reducing demand from non-essential items, especially if they are the source of over subscription:

     a. Generate warning messages when appropriate.
     b. Instruct external systems to reduce their demands.
     c. Lock out interrupts.
     d. Change operational behavior to handle the load. For example, the software may use faster but less accurate algorithms to keep up with the load.
     e. Reduce the functionality of the software, or even halt or suspend a process or shutdown a computer.

3.2 GSFC

None

3.3 JPL

  • 4.11.4.5 Response to resource over-subscription - The software design shall contain a robust response to situations where computer resources are oversubscribed. The action to be taken in such situations shall be specified as part of the requirements on the design.


Note: Examples of these situations include buffers overflowing, exceeding a rate group time boundary, and excessive inputs or interrupts. There are several common methods for tolerating these situations, most of which relate to reducing demand from non-essential items, especially if they are the source of over subscription:

     a. Generate warning messages when appropriate.
     b. Instruct external systems to reduce their demands.
     c. Lock out interrupts.
     d. Change operational behavior to handle the load. For example, the software may use faster but less accurate algorithms to keep up with the load.
     e. Reduce the functionality of the software, or even halt or suspend a process or shutdown a computer.

3.4 MSFC

None

4. Resources

4.1 References



5. Lessons Learned

5.1 NASA Lessons Learned

The NASA Lesson Learned  439 database contains the following lessons learned related to resource oversubscription:

  • Science Data Downlink Process Must Address Constraints Stemming from Fixed Deep Space Network (DSN) Assets.  Lesson Learned 1843: 675  "Given their minimal ability to mitigate DSN resource limitations, flight projects must consider mission design and mission operations improvements that may help to achieve Level 1 requirements, such as the 9 measures effectively employed by the Spitzer project."

  • No labels