Ephraim P. Glinert, R. Lindsay Todd and G. Bowden Wise
Computer Science Department, Rensselaer Polytechnic Institute,
Troy NY 12180, USA
{glinee
toddr
wiseb}@rpi.edu
A position paper prepared for the Working Group on Human Computer Interaction (Isabel Cruz and Brad Myers, Co-chairs), as part of the ACM Workshop on Strategic Directions in Computing Research, to be held in Cambridge, MA, on June 14-15, 1996.
Supporting many different platforms is one way software developers can increase market share. To control costs, the differences in the code needed to support these platforms must be minimized. Yet, users rightly expect an application to fully exploit system capabilities, so that it is as powerful and easy to use as possible. They also expect each new application to possess a ``look and feel'' that blends with the style of other applications on the same platform.
Thus, an ASCII terminal version of a program would not be acceptable to users whose platform supports the X Window System. Then again, there is no single ``X Window System'' style, and in any case applications that use the X Window System do not work with other window systems, such as Microsoft Windows or Apple's Macintosh. The bottom line is that no universally accepted, standard interface style currently exists, nor do we foresee such a development in the near future.
With multimedia computing now emerging as the next step beyond graphical interfaces, the myriad applications that live in today's window systems must be ``retooled'' if they are to effectively use the new technology. But many existing applications do not even take advantage of window systems! This is because the retooling required to update ``legacy'' code so that it incorporates new display technologies requires much effort. Someday, multimedia will, in its turn, be the legacy technology giving way to a new generation of systems that exploit still-unknown features. The retooling cycle will then repeat itself once again.
Returning to today's graphics interface technology, some applications run continuously or perform long running operations that need to be monitored; others have much internal state, accumulated through use over a period of time. Such applications are usually attached to a single terminal. Someone using one of these applications can only interact with it through that one terminal. It is not possible, for example, to ``disconnect'' an X Window System display from a GNU EMACS session, then to later ``connect'' to that session with an ASCII terminal, even though GNU EMACS can support both interface styles. Yet advances in networking and mobile computing make these reasonable expectations of future applications.
Taking all of the above into consideration, we see a need for environments for writing applications which will: (a) support different styles of user interfaces, by isolating interface code from the remainder of the application; (b) allow terminals supporting different interface styles to be connected to and disconnected from an application while it is running; and (c) enable an application to exploit new interface technologies, without destroying its existing interface support or modifying the application itself.
To achieve these goals, we must define a framework for coding applications so that they interact, at an appropriately high level of abstraction, with networked user interface servers rather than directly with the user. These servers would then handle all style and device dependencies. The architecture we envisage would concurrently support multiple applications and multiple presenters (the user interface servers), where the number of applications and presenters might change over time.
An application would specify objects to be presented to users. Presenters, each of which manages a single terminal with a particular interface style, would refer to the type of the presented object to select an editor for it. To make this practical (i.e., to allow a single editor to be used with multiple types, and to allow presenters to attempt to construct an editor for an object when none is otherwise available), we'd allow presenters to construct objects from parameterized types and to impose a ``projection'' type on an object that may be different than its true type.
Mechanisms such as these for combining object types would extend the power of distributed object systems much as templates in C++ have added power beyond that afforded by inheritance alone. We end up with a user interface management system that has the following advantages over existing technology:
This last point is especially relevant at the present time, as multimedia environments which include sophisticated audio output capabilities are fast becoming the standard platform for most users, both at work and at home. No longer restricted to a visual medium, these systems allow users to not only see the information presented to them but to hear it as well. Research into so-called virtual reality hints that it may not be long before our repertoire of standard interaction techniques is further augmented to include touch, gestures, voice and 3D sound. The expanded palette of interaction technologies is attractive, in that they may enable users to communicate with their computers in a more ``natural'' way.
But this additional freedom also presents new challenges to software designers, who must now develop applications which deliver information to end users in the most effective manner possible in a multisensory realm that encompasses text, graphics, speech, nonspeech audio, etc. Successful concurrent exploitation of several modalities requires careful planning, otherwise some information may not be perceived by the user. Today's multimedia applications typically prescribe the modality in which any given information is presented in a hard coded (predetermined) manner. The drawback of this strategy is that, no matter how much or how well the systems organizer wrestles with the issue of how best to design the output, he/she is fighting a losing battle.
The reason is simple. Inflexible assignment of information to a particular modality by the designer must eventually lead to situations in which the chosen output modality is unacceptable to a given user and/or the circumstances at hand. For example, environmental conditions such as a noisy factory floor can preclude the use of sound. Even disregarding such ``external'' sources of interference, users often run several applications simultaneously and must absorb information from all of them. Perhaps most importantly, from our viewpoint, many users have sensory impairments and cannot interpret information in one or more modalities.
The solution is for applications to become multimodal at a high (abstract) level so that they are able to display all information by exploiting alternative and/or complementary sensory modalities, as individual users' needs and preferences, (changes in) the working environment, and other factors (both extra- and intra-system), dictate. To make this work, the environment must possess a repository in which alternative representations for information can be stored and from which alternative representations from among those available may be selected (by the system, with minimal or no user intervention) when needed. A resource monitor would be responsible for determining the contribution to the total cognitive load by each application running in the environment, on the basis of its consumption of cognitive resources relevant to the user. The presentation manager then coordinates the presentation of information so that the user can assimilate all of it in the optimal manner.