ACM Computing Surveys
28A(4), December 1996, http://www.acm.org/surveys/1996/ManolaOpen/.
Copyright © 1996 by the Association for Computing Machinery,
Inc. See the permissions statement
below.
Abstract: This paper identifies a number of strategic directions for database-related computing research. A particularly important requirement, in order to maximize the utility of database technology within large-scale distributed systems, is technology to make database-like organization and processing facilities available as database services to arbitrary data and processing resources accessible via networks, without requiring that these resources be specifically inserted into a DBMS, or significantly modified in any other way.
Categories and Subject Descriptors: H.2.4 [Database Management]: Systems - distributed systems; H.2.5 [Database Management]: Heterogeneous Databases, H.5.1 [Information Interfaces and Presentation]: Multimedia Information Systems - Hypertext navigation and maps
General Terms: Design, Architecture
Additional Key Words and Phrases: Internet, CORBA
Individual organizations are increasingly attempting to provide environments that allow integrated access to more and more of their data. For decentralized organizations, this requires distributed database capabilities and, since these organizations frequently have data that cannot be converted to a common format, heterogeneous database capabilities as well. At the same time, applications such as virtual enterprises involve interconnection of heterogeneous, autonomous, and distributed (HAD) data and processing resources from multiple organizations, for limited purposes and time periods, in cooperative activities. The Internet increasingly illustrates these ideas, with individuals and organizations providing open access to data (and processing) resources, and others providing separate indexing, search, and manipulation services to facilitate access by both internal and external users.
These issues are further complicated as database technology is applied to types of data not previously stored in databases, and used in support of the specialized programs that require and process such data. This data frequently involves highly specialized storage and processing requirements to adequately support end-user requirements. The intent of all this is to further integrate and provide flexible access to information resources of all kinds. At the same time, the security and integrity of these resources become increasingly serious concerns. This general situation suggests a number of strategic areas for database-related computing research.
Technology will be required to make database-like organization and processing facilities available as database services to arbitrary data and processing resources accessible via networks, without requiring that these resources be specifically inserted into a DBMS, or significantly modified in any other way. The use of a conventional DBMS should be viewed as a possible performance enhancement, but not as a basic requirement to provide access and related functionality. Developments in open middleware architectures, such as the OMG's CORBA architecture, increasingly emphasize the provision of services, such as query and transaction services, that make database-like facilities available for use with any computing resource attached to the network. Similarly, it will be necessary to provide such database services for data as it is moved or copied within the network, e.g., whether the data is in a database or file system, in a replicated database or file, or in an application or system cache, and wherever it is located in the network. This requires improvements in such things as notification and concurrency control mechanisms.
Technology will also be required to support more flexible organizations of data. For example, support should be provided to maintain ad hoc data collections and data relationships created by different users, including relationships between data maintained on home computers and on centralized servers. This requires not only the notification and other technologies mentioned above, but improvements in what would conventionally be thought of as view technology. Particularly in the case of network-accessible data, this requires facilities to deal with resources that move, or are attached and detached or reorganized autonomously, and improved facilities to alert users to changes in the contents, location, or other aspects of resources they depend on.
Since support for extended data types is increasingly a requirement, enhanced database services must include management of code as well as data. Moreover, technology is needed to enable more flexible linking of data and code to form meaningful "objects" (instances of extended data types). Current object technology is too rigid to support the full generality of these requirements. For example, it should be possible to easily attach code to existing data to create such types. The criteria for associating data and code should be explicitly represented, e.g., in extended data type concepts or other associated metadata. The World Wide Web provides a simple illustration of these facilities in the way different helper applications can be used to process data based on the information provided by file extensions. Improvements in the definition of metadata, and the components that use it, will also be vital in using and combining data from diverse network-accessible sources.
Requirements for supporting extended data types and more flexible data organizations merge in the need to support the organization of accesses to attached network resources into meaningful transactions. The semantics of these transactions must be increasingly general, and tailored to specific application requirements and more complex data semantics. Additional research is required in understanding the correctness properties of these extended transactions, particularly when several autonomously-created extended transactions are applied to the same data.
Technologies such as Java illustrate the beginnings of more flexible code movement and manipulation within distributed networks. The combination of extended data type facilities and easy and data (and code) migration makes the design space for distributed systems much larger, hence additional research is required to support the design of such systems, and corresponding optimization of queries and resource migration.
At the implementation level, components whose primary role is providing access to data must be better able to cooperate with other software components as parts of a coherent distributed environment. This requires more detailed interaction between database and other software components, and possibly different architectures for these components. Well-defined lower level interfaces and techniques such as computational reflection are needed to enable components to interact at the required points. Similarly, the ability of these components to interact with advanced network facilities to obtain specified bandwidth and quality of service on demand provides increasing optimization opportunities and support for real-time data access requirements, but also involves increasingly complex design decisions and optimization mechanisms. System implementations must also deal with increasing problems of scale (both in space and time). Systems must be more flexible in adapting to change, and support more stringent availability requirements. In addition to providing the required data access, research must increasingly consider requirements for data security and integrity. Addressing these requirements will involve the design of basic aspects of system implementations, as opposed to access controls added externally.
Dealing with some of these issues might well require revisiting older research problems which conventionally are thought of as "solved", since many of these problems are apparently not sufficiently "solved" in current commercial products. Query optimization and view technology are examples. The research community needs to determine if the problem is just that the vendors have not gotten around to using the research that has already been done, or if the research had not gone far enough to make the technology ready (or cheap) enough for practical use.
Last modified: Mon Nov 04 11:04:36 EST 1996
Frank Manola <fmanola@objs.com>