futz f&ts v.i. To tinker experimentally; to change something just to see what it does.
Getting things done with today's computer systems requires a lot of futzing. This is both expensive and aggravating. The mission of the No-Futz Computing project is to learn how to build general-purpose computer systems where less tinkering is required.
In 1999, the OS researchers at HotOS identified "futz" as the most important problem facing the community. Progress since then (as of late 2003):
- State of the Art
- System State Space
- Current Approaches
- Tackling the State Problem
- Building Lower-Futz Systems
- Computer-Assisted State Management
- Evaluation and Measurement
...and read our paper from HotOS 2001 that discusses the research agenda for no-futz computing.
This is the general case of the way systems must be handled today. When you install a system, or it stops working, you have to update its configuration. Even though you may have a fairly good idea of what you want the system to do or how you want it to behave, there usually isn't any way to express that directly. Instead, you must change the configuration — experimentally — until the observed behavior appears to match your expectations.
Usually, there isn't any easy way to predict what change you need to make. Sometimes, the thing you need to change may be hidden off in a corner someplace. In many cases, the state variables that you might need to adjust are deliberately hidden from you: the system designer, in a futile attempt to make the system "easy" to use, has buried switches and knobs that really need to be accessible.
Experience with a particular system can streamline this process. By the time you've configured 3-4 PCMCIA network cards in Red Hat Linux, for instance, you have probably learned what the trick is, and doing another one will usually be straightforward. Unfortunately, most users only have one system to fiddle with (their own) and so they only ever do these things once. If they have to fiddle with the same thing again, it's probably months later and they've forgotten the details.
Worse, even in the general case, such knowledge is deeply platform-specific: network configuration in Red Hat may not even transfer effectively to Debian, much less to FreeBSD or Solaris... let alone Windows.
The system state space is
- Enormous. It's exponential in the number of switches present. Furthermore, we as systems designers have gotten into a bad habit: when a question arises about system functionality, we tend to avoid making a decision; instead we add switches to push the decision down to the user. Then we pat ourselves on the back for "empowering" the user, without thinking about whether the user will be able to actually make the decision usefully.
- Poorly designed. The state variables that exist are rarely orthogonal; that is, they're not independent of each other. Changing one affects the others, and not always in ways anyone has thought about carefully. Similarly, the states of whole subsystems, that from a naive point of view ought to be independent, are usually entangled, and so subsystems interact in ways that nobody understands.
- Poorly expressed. State variables are normally stored all over everywhere, often in unexpected or inappropriate places. State variables are rarely grouped by semantic category or by importance. Often the variables affecting a single program will be stored together, but the relationship of such a program to tasks that a user has in mind is at best not clear. Attempts in some existing systems (Windows, AIX) to systematize the handling of various kinds of system state have been hampered to the point of ineffectiveness by implementation difficulties.
- Incompletely expressed. Important state variables are often deliberately hidden from the user in the name of "ease" of use.
- Unmanageable. Good tools for state manipulation do not exist. Nor is there, typically, any support for automated integrity checking.
State management has received little attention from researchers until recently.
- Specialize. Special-purpose systems have fewer switches and thus much smaller state spaces. This makes them more manageable.
- Centralize. Putting all the state in one place makes it easier to get to. It doesn't make it any simpler, though.
- Standardize. Deploying 1000 identical machines reduces the state space of a LAN immensely, but only at a huge cost in flexibility and robustness.
These approaches do not generalize.
In our <../syrah/publications/research-issues-no-futz-computing/">HotOS paper we went through a bunch of state we found on our systems and classified it. Based on what we found, and later experience, we believe that the problem is tractable.
1. Identify and eliminate state that
- ...is completely unused
- ...supports impossible or meaningless configurations
- ...supports configurations that are completely irrelevant today
- ...should be probed or autoconfigured
- ...should be derived from other state
It's ok to cache derived or probed state, but it's important to mark it clearly, to maintain cache consistency, and to regenerate it if it becomes corrupted.
2. Represent state in its natural form. State should not be represented as computation. It is not necessary or appropriate for the configuration system to support Towers of Hanoi or other programs.
Programmable state is useful when a (sub)system is new and its operating environment not clearly understood yet. However, the general purpose computer system, and most of its components, have developed to a state of maturity where this should no longer be necessary.
3. Catalogue the system state. Once the gunk has been flushed out, it's important to get a complete inventory of everything. As we argued in our paper, working on just a subset of the state doesn't get you anywhere, because it's not clear that the "proper" structure of the subset will or should remain unchanged when the rest of the state is factored in.
4. and 5. Orthogonalize and decompose the state. Just how one does this properly (other than "have a smart person look at it for six months") is one of the key research questions facing us.
Fixing the state space is only the beginning. To build truly lower-futz systems, one needs other support as well:
- Better tools for managing the system state space
- Techniques for developing true subsystems
- Computer-assisted state management
We think programming-by-contract, if applied at the system level rather than at the subroutine level, has promise for helping to keep subsystems separate.
There are three parts to this:
- Monitoring - the system should monitor itself and alert the operator if invalid states are encountered.
- Diagnosis - the system should be able to analyze its state and answer user queries about its behavior.
- Repair/Recovery - the system should be able to extricate itself from invalid or unwanted states that it may enter.
How do we know we've succeeded?
FutzMark f&tsmärk n. a unit of measure that indicates the "futziness" of a system.
- Descriptions of the system state
- User studies
- Psychological studies
- Physiological studies
The time to act is now!