Database management systems (DBMSs) is at a crossroads, perhaps the first since its successful entry into information processing marketplace. On the one hand, the relational systems have been enormously successful, creating a multi-billion dollar industry over the last two decades. On the other hand, the current technological developments and application demands are severely testing the limits of the current commercial systems. Failure to address these changes and demands may result in the marginalization of database management with more and more data stored elsewhere and managed by systems without the typical database functionality (e.g., querying, transactional support, integrity enforcement). The following is some of my views on the challenges and the directions that seem to be important to investigate.
Changing Applications
The unsuitability of the existing (relational) DBMSs in servicing the data management requirements of complex (aka ÏadvancedÓ) application domains is a well known (and much recited) fact. A majority of the problems arise from the multiplicity and complexity of data types and the uncertainty of accessing them. The existing systems are optimized to manage structured data of a few, relatively simple, types. Users are expected to pose well-formed queries and/or simple transactions to access the database. All of these change in the new applications which now demand DBMS services. Most of these applications deal with multiple types of quite complex data which are not well structured; user access to these data are ill-defined requiring partial match searches and the transactional access to the data is at the workflow complexity. Since many of these applications exhibit multimedia characteristics, IÌll use multimedia information systems to demonstrate some of the issues. What are the characteristics of multimedia data that differentiate these databases from traditional ones?
Current relational DBMSs cannot meet these requirements for a number of model and architectural reasons. This is perhaps why DBMS technology has not had a significant impact on data management requirements of applications with similar requirements. The main use of database technology has been as the holder of meta-data, but the actual multimedia data, in particular continuous media data such as audio and video, are stored in ordinary files. This, unfortunately eliminates the possibility of posing queries on these data. Ultimately, we would should be able to pose queries such as ÏFind all multimedia documents which show Bill Clinton standing next to John Chretien and uttering the words ÎCanada is our best neighborÌ Ó or ÏShow me all the images which contain an object that looks like Ö (depicting a shape)Ó and have these queries executed efficiently.
What needs to be done to enable DBMSs to fulfill the requirements of these applications? The needs are both architectural and model-dependent. For example, since the delivery of audio and video streams (assuming they are delivered on different streams) are both time-dependent and need to be synchronized with each other, either the communication between the client and the server have to be adjusted to meet these real-time synchronization requirements, or the server interface of the systems need to be opened to enable syncronization routines to access multimedia objects at the server buffer. From a modeling perspective, more sophisticated models are necessary to properly capture the application objects. Object DBMSs are, in my view, the most promising systems to meet these requirements. However, many of the problems in engineering high performance, full functionality object DBMSs have yet to be resolved. The existing systems, by and large, are persistent object repositories with limited DBMS functionality following simple distribution strategies. It is hard to extend them (how do you store JPEG encoded images in native mode in an object DBMS such that you can interpret the encoded images?), they donÌt offer query models that can be extended with multimedia constructs, their query optimization capabilities are severely limited and I donÌt know how they would scale with increasing data volume and user community. Furthermore, querying these databases are significantly more complicated as one has to deal with quality of service concerns, handle fuzzy queries which can be answered by partial matches and those that need the discovery of ill-defined (or undefined) patterns in the data. Thus, for example, data mining techniques can be used on image databases to answer some of the queries.
As I indicated above, the multimedia systems are representative (in terms of their data management requirements) of many other applications such as electronic commerce, digital libraries, and engineering design environments. I believe distributed object management [2,3] will be major technology in addressing these requirements and R&D in this technology is a fundamental strategic direction. Research in this area is in its infancy.
The foregoing discussion may have left the impression that it is essential (or, at least, desirable) to collect all of this data under the control of a DBMS. This is not my claim as I recognize that most of this data is already stored in various other places. What I propose, however, is that DBMS-like access to this data be provided in an interoperable environment. This raises the second important strategic issue, namely interoperable systems. Early research in this area has concentrated on multidatabases (or federated databases). More current research has started to address the problem in its more generality with emphasis on wrapper-mediator systems [4]. The wrapper-mediator approach, coupled with object-orientation, seems to be the correct paradigm to deal with interoperability. However, there is currently no well-defined methodology for constructing these systems and there is very little or no support for (semi-)automatic generation of wrappers and mediators for different functions.
Technological Developments
Perhaps the most important technological development that affects database management is the emergence of distributed and parallel computing as a mainstream computing paradigm. Stonebraker had claimed in 1988 that in the subsequent decade centralized database managers will be an antique curiosity as more organizations move toward distributed database managers [5]. By and large, this has turned out to be an accurate forecast. Most, if not all, of the commercial DBMSs provide some sort of distribution. Practically every product can be configured as a single server client-server system and some go beyond that as well. In my view this trend will continue at an accelerated pace in the future and the only obstacle to this growth that I can see is our inability to manage highly distributed systems effectively. One might question whether this is a computer science problem or an issue that organizational and management science should tackle, but it remains an issue.
We had claimed in a 1991 paper [6] that we do not have a handle on the effects of computer network protocols on the performance of distributed DBMSs. This is largely true today as well, and the problem is becoming more serious. There is a convergence of communications and data management and the synergistic effect of their combination provides both challenges and opportunities. There are three major developments in networking that will have a profound effect on database management and I am not convinced that we know how to deal with the effects of these developments. These are (1) the emergence of high bandwidth, high speed broadband networks, (2) the mobile computing environments, and (3) the explosion of the Internet.
Broadband networks violate almost all of the assumptions that we used to make in designing distributed database systems. The network is no longer the bottleneck since network speeds can exceed I/O speeds. Some have suggested that the emergence of broadband networks signal the death of distributed databases since they make access to a remote centralized database feasible. These arguments miss the point, in my view, since bandwidth and latency are different things and there are motivating factors other than bandwidth and speed for distribution of storage and maintenance of data. However, there is no question that important architectural re-evaluation is necessary. There is some work, for example, that investigates the tradeoffs of accessing data from a ÏneighborÓs cache rather than retrieving it from its own disk if the network speeds make this advantageous. More work such as these that might turn some of the underlying assumptions of DBMSs on their head is necessary.
Mobility is emerging as a major force in the market place. Most of the mobile data management research assumes an environment where data are located in computers on the wireline network with the mobile stations, with limited capabilities, ÏdownloadingÓ data as they need them. This is a realistic scenario for a limited number of applications and one that doesnÌt pose major challenges for data management since data resides primarily on wireline computers. What is more interesting is the environment where the mobile stations are more powerful and store native data that may need to be shared by others (the so-called ÏwalkstationÓ case [7]). This case poses significant difficulties in data management due to the characteristics of the mobile environment. Mobile computing environments are characterized by three issues [8]: communication characteristics, mobility and portability. Communication is over wireless networks which are prone to disconnections, noise, echo, and low bandwidth. Mobility of some of the equipment on the network causes static data in wireline networks to become dynamic and volatile in wireless networks. Mobility raises issues such as address migration, maintenance of directories and difficulty in locating stations. Finally, portability places restrictions on the type of equipment that can be used in these environments. For example, easy portability and the desire for long operation between battery recharges usually restrict the type and size of storage that can be used. Dealing with the effects of these is a major R&D issue.
A particular difficulty that needs to be observed is that these two technologies are entering the foray at the same time. Thus, networks of tomorrow will likely be broadband backbones with wireless networks connected to it. Furthermore, some of the broadband backbone may be wireless, going over satellite channels. These networks pose other difficulties since the bandwidth availability is offset by communication latency between earth stations and satellites. In this case, query processing, for example, has to take into account quality of service considerations. This evironment is not too distant in the future. Even today, Canada is equipped with a country-wide ATM based broadband test network. There are many such trials all over the world and the wide scale emergence of these networks will make distributed data management over wide area networks both feasible and an R&D challenge.
The explosion of the Internet is now the topic of daily newspaper articles and TV programs. Putting aside the hype, the Internet activity is important from a database management perspective simply because of the diversity of repositories that it introduces. Most of the existing Internet access tools are browsing-based. However, there is a demand to be able to perform complex queries over Internet sites and this poses significant challenges. One of the fundamental problems is the inherent heterogeneity of the information sources and the lack of a schema to guide the querying process. The other difficulty is the variance in the capabilities of the various sites in processing these queries.
Conclusion
In my view, the strategic direction for database system research and development efforts can be summarized as follows: We should be addressing the requirements of new application domains by building DBMSs with sufficiently powerful models and flexible and extensible architectures that can exploit and adapt to the technological changes. This is a generic statement which requires fleshing out. The specifics of an R&D agenda along these lines should include in my view the following:
References
[1] E.A. Fox. "Advances in interactive digital multimedia systems," Computer, 24(10): 9--21, October 1991.
[2] M.T. Özsu, U. Dayal, and P. Valduriez. Distributed Object Management, Morgan-Kaufmann, 1994.
[3] R. Orfali, D. Harkey, and J. Edwards. The Essential Distributed Objects Survival Guide, John Wiley, 1996.
[4] G. Wiederhold. "Mediators in the Architecture of Future Information Systems", IEEE Computer, March 1992, 38-49.
[5] M. Stonebraker. Readings in Database Systems. Morgann Kaufmann, 1988.
[6] M.T. Özsu and P. Valduriez. "Distributed Database Systems: Where are we now?", IEEE Computer, August 1991, 68-78.
[7] T. Imielinski and B.R. Badrinath. "Data Management Issues in Mobile Computing," Communications of ACM, October 1994.
[8] G.H. Forman and J. Zahorjan. "The Challenges of Mobile Computing", IEEE Computer, April 1994.