1. Data Warehousing (General)

 

Background:

 

Surajit Chaudhuri, Umeshwar Dayal, Venkatesh Ganti, Database Technology for Decision Support Systems, IEEE Computer, December, 2001.

 

Jennifer Widom, Research problems in data warehousing, Int'l Conf. on Information and Knowledge Management, 1995.

 

 

2. Data Warehouse Design

 

Background:

 

Ralph Kimbal and Margy Ross, The Data Warehouse Toolkit, Second Edition, John Wiley and Sons, 2002.

 

3. Column Stores

Main paper:

 

M. Stonebraker et al, C-Store: A Column-oriented DBMS, in Proceedings of the Very Large Database (VLDB) Conference, Trondheim, Norway, 2005.

 

Background:

 

P. Boncz and M. Kersten, Monet: An Impresionist Sketch of an Advanced Database System, Proc. BIWITT'95, 1995

 

P. A. Boncz, M. L. Kersten. MIL Primitives for Querying a Fragmented World. The VLDB Journal, 8(2):101-119, October 1999.

 

Peter Boncz, Marcin Zukowski, Niels Nes ,MonetDB/X100: Hyper-Pipelining Query Execution”, Proceedings of CIDR, Jan., 2005.

 

 

4. Indexing

 

Main Paper:

 

P. O’Neil and D. Quass, Improved Query Performance with Variant Indexes, in Proceedings of the ACM Conference on the Mangement of Data (SIGMOD), Tucson, Arizona, May, 1997.

 

Background:

 

M.C. Wu and A.P. Buchmann, Encoded bitmap indexing for data warehouses, ICDE, 220-230, 1998.

 

C.Y. Chan and Y.E. Ioannidis, Bitmap index design and evaluation, ACM SIGMOD, 355--366, 1998.

 

H. Lei and K. A. Ross, Faster joins, self-joins and multi-way joins using join indices, Next Generation Information Technologies and Systems, 1997 (Extended version to appear in Data and Knowledge Engineering).

 

W. Labio, D. Quass, and B. Adelberg, Physical database design for data warehousing, ICDE, 1997.

 

P. O'Neil and G. Graefe, Multi-table Joins Through Bitmapped Join Indices, ACM SIGMOD Record, Volume 24 ,  Issue 3  (September 1995) pp. 38--41.

 

5. Compression

 

Main paper:

 

D. Abadi, S. Madden, M. Ferreira, Integrating Compression and Execution in Column-Oriented Database Systems, In Proceedings of the ACM SIGMOD Conference, Chicago, IL, June, 1996.

 

Background:

 

 Z. Chen, J. Gehrke, F. Korn, Query optimization in compressed database systems, in Proceedings of the 2001 ACM SIGMOD international conference on Management of data, June, 2001.

 

G. Graefe ansd L. Shapiro, Data Compression and Database Performance, in Proceedings of ACM/IEEE-CS Symposium on Applied Computing, Kansas City, MO, April, 1991.

 

S. O’Connell and N. Winterbottom, Performing Joins Without Decompression in a Compressed Database System, SIGMOD Record, Vol. 32, No. 1, March 2003.

 

J. Goldstein, R. Ramakrishnan, U. Shaft, Compressing Relations and Indexes, Proceedings of the Fourteenth International Conference on Data Engineering (ICDE), pp. 370-379, 1998.

 

6. Query Evaluation

 

Main Paper:

 

W. Han, H. Kache, M. Kandl, J. Ng, V. Markl, Progressive Optimization in a Shared-Nothing Parallel Database, in Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, Beijing, China, June, 2007.

 

Background:

 

W. Han, J. Ng, V. Markl, H. Kache, M. Kandil, Progressive Optimization in a Shared-Nothing Parallel Database, in Proceedings of the ACM Conference on the Management of Data (SIGMOD), Beijing, China, June, 2007.

 

H. V. Jagadish, Laks V. S. Lakshmanan, and Divesh Srivastava, Snakes and sandwiches: optimal clustering strategies for a data warehouse, ACM SIGMOD, 1999.

 

P. Hass and J. Hellerstein, Ripple Joins for Online Aggregation, SIGMOD, 1999.

 

(good background) G. Graefe, Query Evaluation Techiniques for Large Databases, in ACM Computing Surveys, Volume 25 ,  Issue 2  (June 1993), Pages: 73 – 169.

 

Gupta, V. Harinarayan, D. Quass. "Aggregate-Query Processing in Data Warehousing Environments." In Proceedings of the 21st VLDB Conference, Zürich, Switzerland, September 1995.

 

 

7. Automatic DB Design 1 - Index Selection

 

Main papers:

 

S. Chaudhuri and V. Narasayya, An Efficient Cost-Driven Selection Tool for Microsoft SQL Server, Proceedings of the 23rd VLDB Conference, Athens, Greece, 1997.

 

Agrawal S., Chaudhuri S. and Narasayya V., Automated Selection of Materialized Views and Indexes for SQL Databases. Proceedings of the 26th International Conference on Very Large Databases (VLDB00), Cairo , Egypt , 2000, pp. 496-505, 2000.

 

Background:

 

S. Chaudhuri and V. Narasayya, Index Merging, 15th International Conference on Data Engineering (ICDE'99).

 

N. Bruno and S. Chaudhuri, Automatic Physical Database Tuning: A Relaxation-based Approach, in Proceedings of Conference on Management of Data (SIGMOD), Baltimore, MD, June, 2005.

 

Chaudhuri, S. and Narasayya V., AutoAdmin "What-If" Index Analysis Utility. Proceedings of ACM SIGMOD, Seattle , 1998.

 

S. Agrawal, N. Bruno, S. Chaudhuri, V. Narasayya, Auto-Admin: Self-Tuning Database Systems Technology, Bulletin of the IEEE Computer Society Technical Committeee on Data Engineering, 2006.

 

8. Automatic DB Design 2 - Materialized View Selection

 

Main paper:

 

H. Gupta and I.S. Mumick, Selection of views to materialize under a maintenance-time constraint, International Conference on Database Theory, 1999.

 

Background:

 

D. Theodoratos and T. Sellis, Designing data warehouses, Data and Knowledge Engineering, 31:3, 279--301, 1999.

 

E. Baralis, S. Paraboschi, and E. Teniente , Materialized view selection in a multidimensional database, VLDB, 156--165, 1997.

 

9. Automatic DB Design 3 -Partitioning

 

Main paper:

 

S. Papadomanolakis and A. Ailamaki,  AutoPart: Automating Schema Design for Large Scientific Databases Using Data Partitioning,. Proceedings of the 16th International Conference on Scientific and Statistical Database Management (SSDBM), Santorini , Greece , June 2004.

 

Background:

 

Agrawal S., Narasayya V., and Yang, B., Integrating Vertical and Horizontal Partitioning into Automated Physical Database Design. Proceedings of the ACM SIGMOD , Paris, France, 2004.

 

 

10. Automatic DB Design 4 - IBM

 

Main papers:

 

D. Zillo, J. Rao, S. Lightstone, G. Lohman, A. Storm, C. Garcia-Arellano, S. Fadden, DB2 Design Advisor: Integrated Automatic Physical Database Design, Proceedings of the 30th VLDB Conference, Toronto, Canada, 2004.

 

J. Rao, C. Zhang, G. Lohman, N. Megiddo, Automating Physical Database Design in a Parallel Database, Proceedings of the 2002 ACM SIGMOD international conference on Management of data, 2002.

 

Background:

 

R. Telford, R. Horman, S. Lightstone, N. Markov, s. O'Connell, G. Lohman, Useability and Design Considerations for an Autonomic Relational Database Ssytem, IBM Systems Journal, Vol. 42, No. 4, 2003.

 

S. Lightstone, G. Lohman, D. Zillo, Toward Autonomic Computing with DB2 Universal Database, SIGMOD Record, Vol 31, No. 3, September, 2002.

 

 S. Lightstone, G. Lohman, P. Haas, V. Markl, Making DB2 Products Self-Managing: Strategies and Experience, Bulletin of the IEEE Computer Society Technical Committeee on Data Engineering, 2006.

 

B. Dageville and K. Dias, Oracle's Self-Tuning Architecture and Solutions, Bulletin of the IEEE Computer Society Technical Committeee on Data Engineering, 2006.

 

L. Qiao, B. Iyer, D. Agrawal, A. El Abbadi, Automated Storage Management with QoS Guarantee in Large-Scale Virtualized Storage Systems, Bulletin of the IEEE Computer Society Technical Committeee on Data Engineering, 2006.

 

M. Abd-El-Malek, et. all, Early Experience on the Journey towards self-* Storage, Bulletin of the IEEE Computer Society Technical Committeee on Data Engineering, 2006.

 

 

11. Other Approaches

 

Main papers:

 

J. Dean and S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, OSDI'04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 2004.

 

F. Chang, J. Dean, S. Ghemawat, W. Hsieh, D. Wallach, M. Burrows, T. Chandra, A. Fikes, R. Gruber, Big Table: A Distributed Storage System for Structured Data, OSDI 2006: 7th USENIX Symposium on Operating Systems Design and Implementation, 2006.

 

 

 

Other Interesting Papers on Data Warehousing and Related topics

 

S. Sarawagi and M. Stonebraker, Efficient organization of large multidimensional arrays, ICDE, 1994..

 

P. Deshpande, K. Ramasamy, A. Shukla, and Jeffrey F. Naughton, Caching multidimensional queries using chunks, ACM SIGMOD, 259--270, 1998.

 

P. Deshpande and J. Naughton, Aggregate aware caching for multi-dimensional queries, EDBT, 2000.

 

S. Ghemawat, H. Gobioff, and S. Leung ,The Google File System”, in SOSP, 2003.

 

* Wilburt Juan Labio, Ramana Yerneni, and Hector Garcia-Molina, Shrinking the warehouse update window, ACM SIGMOD, 1999.

 

* D. Quass and J. Widom, On-line warehouse view maintenance for batch updates, ACM SIGMOD, 393--404, 1997.

 

M. Staudt and M. Jarke, Incremental maintenance of externally materialized views, VLDB, 75--86, 1996.

 

Yannis Kotidis, Aggregate View Management in Data Warehouses in Handbook of Massive Datasets, 2002.

 

Y. Cui and J. Widom. "Lineage Tracing in a Data Warehousing System." In Proceedings of the Sixteenth International Conference on Data Engineering, San Diego, California, February 2000. Demonstration Description.

 

* Y. Cui and J. Widom. "Practical Lineage Tracing in Data Warehouses." In Proceedings of the Sixteenth International Conference on Data Engineering, San Diego, California, February 2000.

 

Y. Cui, J. Widom, and J. L. Wiener. "Tracing the Lineage of View Data in a Data Warehousing Environment." Technical Report, Stanford University, 1997 (Revised 1999).

 

W. J. Labio, J. Wiener, H. Garcia-Molina, V. Gorelik. "Efficient Resumption of Interrupted Warehouse Loads." Technical Report, Stanford University, 1998.

 

* J. Widom, “Trio: A System for Integrated Management of Data, Accuracy, and Lineage”,

in Proceedings of the Second Conference on Innovative Data Systems Research (CIDR),

Asilomar, CA, January, 2005.

 

 

 

Resources

 

Industry

 

1. Larry Greenfield, Data Warehousing Information Center. (Web site)

 

2. Data Warehousing Online. (Web site)

 

3. Data Warehousing Knowledge Center. (Web site)

 

Example Projects

 

N. Roussopoulos, C.M. Chen, S. Kelley, A. Delis, and Y. Papakonstantinou, The Maryland ADMS Project: Views R Us. IEEE Data Engineering Bulletin, 18(2):19-28, June 1995.

 

 G. Zhou, R. Hull, R. King, and J.C. Franchitti, Supporting Data Integration and Warehousing Using H2O.  IEEE Data Engineering Bulletin, 18(2):29-40, June 1995.

 

 J. Hammer, H. Garcia-Molina, W. Labio, J. Widom, and Y. Zhuge. The Stanford Data Warehousing Project. IEEE Data Engineering Bulletin, 18(2):41-48, June 1995.

 

M. Jarke, Y. Vassiliou. Data Warehouse Quality Design: A Review of the DWQ Project. Invited Paper, Proc. 2nd Conference on Information Quality. Massachusetts Institute of Technology, Cambridge, 1997.

 

H. Gupta and D. Srivastava. The Data Warehouse of Newsgroups. International Conference on Database Theory, Jerusalem, Israel, January 1999.

 

Stefano Trisolini, Maurizio Lenzerini, Daniele Nardi. Data Integration and Warehousing in Telecom Italia Proc.

 

 

Vendors

 

·  Hyperion Software

·  IBM

·  Oracle

·  Red Brick/Informix

·  MicroStrategy

·  Microsoft OLE DB for OLAP

·  Pilot Software

·  Business Objects

·  Comshare

·  Information Discovery

·  OLAP Council

·  Cognos

·  DISC's OMNIDEX

 

Scientific Data Warehousing

 

Karl Aberer, Klemens Hemm,  “A Methodology for Building a Data Warehouse in a Scientific Environment”, First IFCIS International Conference on Cooperative Information Systems, Brussels, Belgium, June 19–21, 1996

 

S. Maniatis, P. Vassiliadis, S. Skiadopoulos, and Y. Vassiliou, Advanced visualization for OLAP, DOLAP, 2003.

 

Y.-W. Choong, D. Laurent, and P. Marcel, Computing appropriate representations for multidimensional data, DOLAP, 2001.