3. Recommended Contents
3.1 Architecture Terminology
Where appropriate, architects are encouraged to use terminology from international standard ISO/IEC/IEEE 42010, Systems and software engineering — Architecture description, for the obvious reason that the terminology is well-defined in a widely used international standard. Important terms include system, environment, stakeholder, concern, view, viewpoint, and rationale. The figure below shows a partial concept map of how different terms relate to each other. For example, every system has one or more stakeholders, and every stakeholder has one or more concerns, and an architecture description selects viewpoints that are addressed to those concerns.
3.2 Mission Overview
As the preceding figure shows, a system is intended to fulfill a mission, and software is largely responsible for the behavior of the system (flight and ground systems). As such, it is helpful to readers to provide context for the architectural drivers that follow. Most readers won't be scientists, and external reviewers might come from outside of the aerospace industry, so it's not necessary to go deep into the science objectives. However, it's very helpful to highlight aspects of the mission that give rise to the driving requirements.
3.3 Context Diagram, Context Description
An architecture description is about a system, and that system always exists in a larger context. Clements et al describe this very well:
- "A top-level context diagram (TLCD) establishes the scope for the system whose architecture is being documented, defining the boundaries around the system that shows what's in and what's out. A TLCD shows how the system under consideration interacts with the outside world. Entities in that outside world may be humans, other computer systems, or physical objects, such as sensors or controlled devices. A TLCD identifies sources of data to be processed by the system, destinations of data produced by the systems, and other systems with which it must interact."
A context diagram helps readers avoid confusion about what is in scope and what is not. Also, a context description allows the architect to describe the internal boundaries between the system and the software, and discuss the system architecture as well as the software architecture, and the degree to which the system in which the software resides is a part of the architecture description.
3.4 Architectural Drivers
One of the most important aspects of architecture description is to identify the major architectural challenges. These challenges may appear in the form of functional requirements, quality attribute requirements (sometimes called non-functional requirements), critical resources, and constraints, and are identified as architectural drivers when they have a major influence on design. Explanation of architectural drivers is an opportunity to educate non-software stakeholders about software challenges.
When discussing requirements, it's useful to make a distinction between key requirements (important to the customer) and driving requirements (challenging to meet; will drive cost, schedule, or some other aspect of the system). Some requirements are both, but the distinction is important since the latter shape the architecture.
- EDL (Entry, Descent, and Landing) requires completely autonomous control and exact timing of pyrotechnic events.
- Space Shuttle primary avionics system had to allow astronauts to make most critical decisions, including failover to the backup flight system.
- Fault tolerance design almost always impacts software architecture.
3.5 Critical Resources and Margins
Flight system software often must deal with resources that are severely limited. Such critical resources may include power or energy, nonvolatile storage, bus data rate or latency, uplink or downlink rate, and processor memory or speed. A critical resource is often what makes a requirement "driving". It's important to identify critical resources for the benefit of stakeholders who might not otherwise understand the difficulties that software must face and the tradeoffs to be made.
Critical resources usually have associated margins, and development organizations usually have required margins at each phase to accommodate unforeseen growth. An architecture description shows, for each critical resource, the current best estimate of usage along with the required margin for the current phase.
Example Critical Resources:
- Non-volatile memory. The development team for Mars Exploration Rover (MER) had to do a lot of work to manage what was stored in a relatively small non-volatile memory.
- Power/energy. Lack of power and energy on MER required careful operational planning with overnight sleep periods and recharge periods.
3.6 Stakeholders and Concerns
Every system has many stakeholders. Among these are: customer, owner, user, operator, architect, systems engineer, designer, developer, tester, installer, maintainer, vendor, and subcontractor. A software architecture description identifies major stakeholders, lists their software-related concerns (sometimes called "care-abouts"), and addresses those concerns through appropriate views. Failure to identify all the concerns can lead to unplanned events, rework, and schedule delay.
Amount of risk due to new technology
Project Systems Engineer
Interactions between flight computer and payload
Want ability to run various tests starting from tester-defined starting states
Want telemetry data organized in ways that facilitate state determination
3.7 Quality Attribute Analysis
Quality attributes that are important in mission software often include availability, modifiability, performance, safety, security, testability, and usability. As Bass et al describe:
"Business considerations determine qualities that must be accommodated in a system's architecture. These qualities are over and above that of functionality ... Systems are frequently redesigned not because they are functionally deficient — the replacements are often functionally identical — but because they are difficult to maintain, port, or scale, or are to slow, or have been compromised by network hackers."
The quality attributes that have the greatest influence on one mission are not necessarily the same on other missions. An architecture description will say several things about quality attributes. For each quality attribute it will define what the attribute means and describe how the attribute will be measured, often in terms of a usage scenario. The attributes are often ranked in importance and difficulty, and the architecture description will explain how the architecture satisfactorily achieves each desired quality.
- Quality attribute: Availability/Failure Recovery.
- Scenario: "A user reports an issue with accessing mission data. The ground team determines that there has been a disk failure. The faulty disk is replaced and mission data restored from backup within 24 hours of the problem report."
- Quality attribute: Replicability.
- Description: When the Hubble Space Telescope upgraded a flight processor, the software had to be recoded in C, and its behavior had to replicate the previous software behavior — potentially with the same anomalies — so that operations would not be affected and not require new training.
3.8 Measures of Performance
NPR 7123.1A, NASA Systems Engineering Processes and Requirements, recommends that projects establish Measures of Performance (MOPs), a selected set of which is extended to Technical Performance Measures (TPMs). MOPs are quantitative measures of the system's fitness to satisfy stakeholder expectations. TPMs have the additional characteristic of monitoring performance by comparing a TPM's actual value against a time or event-based model of expected values (which are usually derived from histories of previous projects). The architecture description identifies MOPs and TPMs (if any) that relate to the architecture and identify those elements or attributes of the architecture that address these MOPs and TPMs.
3.9 Architectural Decisions and Rationale
Architecting is a process of understanding the problem, evaluating possible solutions, and making decisions about design. A good approach to developing a software architecture does not present it as a fait accompli, a fact. An architecture description will identify the big decisions and substantiate them. Rationale is hugely important to those who come aboard a project later, and is hugely important to future architects who may consider reusing an architecture.
- Architects for the Core Flight Executive (CFE) software framework decided to provide a publish/subscribe software bus and to clearly distinguish its application program interface (API) from possible implementations. It's important that an architecture description identifies and explains major decisions such as this, including limitations as well as benefits.
- One flight project designed a component-based software architecture in order to make software subsystems more testable and maintainable by virtue of architecturally-prescribed interfaces. The decision facilitated on-orbit FSW updates by reducing the amount of data to uplink.
3.10 Architectural Alternatives (Trade Studies)
While it's good to state the big architectural decisions, it's even better to describe what alternatives were considered. This may occur naturally in providing rationale, especially if a decision is the result of a careful trade study. Descriptions of alternatives don't have to be elaborate; often, readers simply want to know what plausible alternatives were considered and why they were found to be less suitable.
Sometimes an alternative is so appealing — but so radical — that a prototype must be built and demonstrated in order to get serious consideration from stakeholders. In those cases the architectural description typically describes the prototype and its results, as an objective comparison between the old way and new way. See 3.14 Prototypes.
- The choice between centralized and distributed processing often has wide-ranging effects on quality attributes, and is therefore often subject to a trade study.
- The Radiation Belt Storm Probe (RBSP) project conducted a flight software trade study to compare a software bus architecture (using CFE) to a more "traditional" architecture having tightly coupled inter-task communication.
- Some missions have conducted trade studies concerning the software-based portion of fault protection. Does fault protection go in the "main" processor, or does part of it reside in a separate, simpler processor (or even in a Field Programmable Gate Array (FPGA))?
3.11 Multiple Views
An architecture description must address the diverse concerns of its stakeholders. Certainly, an architecture description must address how the design satisfies functional requirements, but there are many other concerns such as cost, schedule, assembly, integration, verifiability, operability, and maintainability. Different concerns require different views of the architecture — views such as structure, behavior, deployment, and operation. The key is to create views that not only address stakeholder concerns but also clearly convey the ideas to the stakeholders. Some views can be well described in the diagrams of UML (Unified Modeling Language) , OMG (Object Management Group), SysML (Systems Modelling Language), and AADL (Architecture Analysis & Design Language), but architecture descriptions need not be restricted to such diagrams.
Examples of useful views:
- Run-time: Components with Data Flows.
- Compile-time: module structure (source code tree), layers.
- Fault containment regions.
- Bus traffic shown in terms of average and peak message rate and data volume.
- Deployment: diagrams to document the deployment of software components onto the hardware and/or OS of the target system. This would include an indication of processes utilized, threading, and partitions (if applicable). It would cover process and thread creation and scheduling to document how the software will be deployed on the target system and meet any timing, availability, or other system requirements.
See 3.14 Architecture Frameworks (Views and Viewpoints) for additional ideas.
3.12 Diagrams and Legends
The purpose of every diagram is to visually convey important information with a minimum amount of associated explanation. To that end, a diagram will contain a legend that explains what the boxes and lines and other symbols mean. When standard diagrams are used, such as UML or SysML diagrams, it is still helpful to include a notation summary (perhaps as a reference page) for readers and reviewers who are not as familiar with the notations. Also, when a diagram contains acronyms, it's helpful to include acronym definitions in the legend, even if they are repeated in the glossary.
3.13 Architecture Frameworks (Views and Viewpoints)
As noted earlier, an architecture description addresses numerous stakeholder concerns, and many architecture thought leaders have thought about how to organize their thoughts — and architecture descriptions — from different viewpoints. As a result, there are now many architecture frameworks for architects to draw upon, such as DoDAF (Department of Defense Architecture Framework), MoDAF (British Ministry of Defence Architecture Framework), TOGAF (The Open Group Architecture Forum), RASDS (Reference Architecture for Space Data Systems), Krutchen 4+1, RM-OPD (Reference Model of Open Distributed Processing), and Zachman ,. Our purpose in mentioning architecture frameworks is not to recommend adherence to a particular framework but simply to point to them as aids in developing views and viewpoints that clearly communicate to stakeholders. In reality, most of this document's recommendations can be mapped into existing viewpoints.
The Department of Defense Architecture Framework, DoDAF 2.0 , organizes architecture descriptions into eight viewpoints named: Project, Capability, Operational, Services, Systems, Standards, Data and Information, and All, as shown in the following figure.
Whenever an architecture contains an idea that's new, or at least novel within the community, it's natural for some stakeholders to be concerned about risk, whether technology risk, verification risk, schedule risk, or operational risk. An effective way to address such concerns is to build and demonstrate an executable prototype.
In the Mars Pathfinder mission one engineer advocated a new idea of prioritized telemetry packets, in contrast to conventional channelized telemetry. The idea was not well understood by some stakeholders and was resisted until that engineer built a simple working prototype. The prototype illustrated how the idea worked in a live demo, and allowed observers to see how it behaved in different scenarios. Suddenly the light bulb went on, and the idea was accepted enthusiastically. The moral of this story is that a prototype can be an essential part of an architecture description, in some cases. Also see 3.10 Architectural Alternatives (Trade Studies).
3.15 Heritage Analysis
Most software built for today's missions involve significant "heritage" or "legacy" from an earlier mission, whether in architecture, design, or code. Such inheritance can be a smart move for projects, but it is critical to understand the differences between the new mission and the earlier mission. The well-known story of Ariane 5 Flight 501 offers a cautionary tale. Ariane 5 reused code from Ariane 4 that contained a limitation on the forces it was designed to process, and the greater forces in Ariane 5 caused an arithmetic overflow that resulted in the loss of four spacecraft. The lesson here is that architects must carefully examine any heritage with respect to differences between prior usage and planned usage, and document that analysis in the architecture description. This can be especially challenging since designs are often shaped by unstated assumptions.
Some questions to address in a heritage analysis:
- If there is major re-use in flight software, to what extent is the avionics hardware being re-used?
- Are the same operations concepts being re-used?
- What is the organizational experience with re-use? Has it produced the claimed benefits in the past? Has re-use in the past required a re-use of people?
- What's different from what was done previously?
- Does the area of re-use make sense, given differences between the new mission and the old mission?
See SWE-027 for a broader discussion on heritage or reused software and additional items to consider.
3.16 Assumptions and Limitations
Architects recognize that the systems they are architecting might well provide heritage for a future project. As such, it is valuable to document any assumptions underlying an architecture and any inherent limitations. Admittedly, this can be difficult because many assumptions are unconsciously made based on mission specifics, resulting in hidden limitations of the system's suitability for future missions. In essence, this is the flip side of Heritage Analysis; if Ariane 4 had documented its force limitation, Ariane 5 Flight 501 might have avoided disaster.
3.17 Architectural Principles, Patterns, Invariants, and Rules
Although a system to be built may be large, its architecture can often be described compactly, and more readily understood, in terms of principles that shape the design and architectural patterns that are applied consistently. While it is extremely useful to document an architecture in terms of principles and patterns, there's no guarantee that downstream designers and developers will consistently adhere to them, meaning that the built system may not possess all the characteristics promised by the architect. As such, it's often important to provide some means for checking adherence to the principles and patterns.
3.18 Fault Management
Due to its complexity, fault management has often been a source of problems in the integration and testing phase of flight systems. Because of this fault management deserves to be given suitable attention in architecture descriptions. Fault management encompasses topics that often appear under a variety of names such as fault protection; failure modes and effects analysis (FMEA); fault detection, isolation, and recovery (FDIR); fault detection, diagnostics and response (FDDR ); fault detection, notification and response (FDNR), integrated vehicle health management (IVHM ); integrated systems health management (ISHM); caution & warning; and aborts. At a minimum, fault management is to be examined in terms of its interactions with the nominal control system, its behavior in the face of concurrent and/or nearly simultaneous faults and responses, and it's ability to support diagnosis, whether automated or human-in-the-loop. For more information, see the Fault Management Community of Practice on NASA Engineering Network (NEN).
Systems often have some easy-to-meet requirements, and often build upon mature heritage designs and software. For example, consider a mission that has modest uplink and downlink data rates relative to the proven capabilities of a heritage design that they are reusing and adapting. In that context, downlink data handling might be considered a "non-concern." It is reasonable, and even desirable, to treat these areas of non-concern lightly in the architecture description. However, good practice indicates that such areas first be identified as non-concerns, with good explanations as to why they are non-concerns. (See 3.15 Heritage Analysis.)
3.20 Glossary and Acronyms
Architecture descriptions will often be reviewed by people outside the project, or even by people outside NASA. These people won't necessarily understand all the numerous technical terms and acronyms in use, so a glossary and list of acronyms can be very helpful to readers.