A Role for Database Research in the Database Industry

David Lomet

Microsoft Research , Database Group
One Microsoft Way, Redmond, WA 98052
lomet@microsoft.com , http://www.research.microsoft.com/research/db/lomet
(206)703-1853

Abstract: We argue that research in the database industry is not primarily a matter of discovering new problems. Rather, the main role of database research is to formulate the basic abstractions, provide the elegant and general algorithms, and characterize performance.
Categories and Subject Descriptors:
General Terms:
Additional Key Words and Phrases: database research

Publication Information

Citation: Lomet, D., 1996. A Role for Database Research in the Database Industry ACM
Submission date: November 15, 1996
Acceptance date: November 15, 1996

Research in the Database Industry

A database industry would be alive and well in the US and elsewhere, even if researchers had never entered the database arena. This area, first and foremost, copes with the business data processing problem. Businesses were and are willing to spend money on this. Hence, the existence of the industry is no accident, and certainly did not require researchers to identify the problem.

In addition, a good bit of the database technology would have evolved without research input. Something close to the notion of a transaction existed in IMS around 1970. Data models, both hierarchical and network (looking very much like extended relational and OO navigation models) already existed by the early 70's without research input. Tree indexing and hashing were in use in a similar time frame. We need to understand how research contributed to the evolution of database technology so that we can understand the role it might play in the future.

The research contribution, in my view, consisted of providing two fundamental abstractions, transactions and the relational model. Working with these abstractions, one could enormously expand the scope of the algorithm solution space, hence improving functionality, performance, indeed many desirable attributes of database systems. These abstractions gave database users models with which they could cope. And researchers leapt into the database technology enterprise by exploiting these abstractions with technical solutions that were both general and elegant. With transactions, it was concurrency control, recovery, and availability techniques that resulted. With the relational model, it was normalization, query processing, optimization, data independence, indexing and storage organizations.

The point here is that industry identified the problems and provided the early impetus. Researchers came along later and provided the clean abstractions and the elegant solutions. These aree what enables database technology to be readily transmitted to new practitioners and to become solid engineering, not just arcane craft. This has served our field well, giving researchers important problems to ponder, and has returned to industry elegant abstractions, algorithms, and understanding. This is likely to continue to be the model, and I believe it should be.

Areas of Current Industry Interest

So what are the areas on which industry has just begun to focus? I think it is these areas that cry out, or should cry out to researchers as the golden opportunities. I am unconvinced that the research community is likely to anticipate the next area with a huge impact. So the areas that I cite below (only some of the possible areas) have largely had some preliminary industrial exploration already. But fundamental understandings are in short supply, as are elegant and generalizable algorithms.

"Transactions" Everywhere
We need to generalize our notion of transaction and apply it in a much wider context. This has been an on-going research activity. And I think it should continue. Transactions have a role to play in workflow systems, in business data processing over the internet, in efforts to improve reliability and availability more generally. None of this is new, but the game is not over. Most systems do not deal with applications, most workflow is not reliable, high availability techniques tend to be isolated methods. Deeper understanding would translate into systems with better performance, greater generality, and higher availability.
Query Processing and Optimization
We have just begun to deal with complex queries over enormous volumes of data. This will require parallelism to succeed. It will require effective indexing. It will require a much better handle on how to estimate query costs, and how to transform query expressions, especially over diverse data types, including extended types. We have only begun to support materialized views and to understand how to optimize in the context of multiple queries. Decision support systems, data warehousing, on-line analytic processing, multimedia data, all these are areas impacted by this technology.
Information Discovery and Integration
Somewhere between converting monetary units and fully solving the artificial intelligence problem, there surely exist technqiues for making more sense of data. Each success permits us to transform data into something closer to information (the data mining problem) and to bring together the information from diverse data sources. Whether these technques are simply a collection of ad hoc engineering efforts or ones that have real generality has yet to be established. But this is an area with enormous potential leverage for the next deep insight. This is an area where "information modelling" should play an important role.
Distribution and the Web
No list of research areas would be complete without mentioning the Web. It will transform our lives and the nature of our research. But I do not regard it as a new area, but rather as the coming of age of wide scale distribution. This will stress our abilities in all the preceding areas. The new fundamentals here are long response time, autonomy, and security. For long response time, we need to understand when and how to cache information and how to (in)validate it. To cope with autonomy requires cooperative, non-intrusive protocols that web sites will want to sign up for. Security for distributed systems has long been a "black hole" but the web makes its solution more pressing.

Industry is moving fast in all these areas, but research should be able to play its customary role of understanding and generalizing, and hence providing the foundation upon which superior technolgy can and will be built.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org.