1. Data
Warehousing (General)
Background:
Surajit Chaudhuri, Umeshwar Dayal, Venkatesh Ganti, Database Technology for Decision Support Systems, IEEE Computer, December, 2001.
Jennifer Widom, Research problems in data warehousing, Int'l Conf. on Information and Knowledge Management, 1995.
2. Data
Warehouse Design
Background:
Ralph Kimbal and Margy Ross, The Data Warehouse Toolkit, Second Edition, John Wiley and Sons, 2002.
3. Column Stores
Main paper:
M. Stonebraker et al, C-Store:
A Column-oriented DBMS, in Proceedings of the Very Large Database (VLDB)
Conference,
Background:
P. Boncz and M. Kersten, Monet: An Impresionist Sketch of an Advanced Database System, Proc. BIWITT'95, 1995
P. A. Boncz, M. L. Kersten. MIL Primitives for Querying a Fragmented World. The VLDB Journal, 8(2):101-119, October 1999.
Peter Boncz, Marcin Zukowski, Niels Nes , “MonetDB/X100: Hyper-Pipelining Query Execution”, Proceedings of CIDR, Jan., 2005.
4. Indexing
Main Paper:
P. O’Neil and D. Quass, Improved Query Performance with Variant Indexes, in Proceedings of the ACM Conference on the Mangement of Data (SIGMOD), Tucson, Arizona, May, 1997.
Background:
M.C. Wu and A.P. Buchmann, Encoded bitmap indexing for data warehouses, ICDE, 220-230, 1998.
C.Y. Chan and Y.E. Ioannidis, Bitmap index design and evaluation, ACM SIGMOD, 355--366, 1998.
H. Lei and K. A. Ross, Faster joins, self-joins and multi-way joins using join indices, Next Generation Information Technologies and Systems, 1997 (Extended version to appear in Data and Knowledge Engineering).
P. O'Neil and G. Graefe, Multi-table Joins Through Bitmapped Join Indices, ACM SIGMOD Record, Volume 24 , Issue 3 (September 1995) pp. 38--41.
5.
Compression
Main paper:
D. Abadi, S. Madden, M.
Ferreira, Integrating
Compression and Execution in Column-Oriented Database Systems, In Proceedings
of the ACM SIGMOD Conference,
Background:
Z. Chen, J. Gehrke, F. Korn, Query optimization in compressed database systems, in Proceedings of the 2001 ACM SIGMOD international conference on Management of data, June, 2001.
G. Graefe ansd L. Shapiro, Data Compression and Database Performance, in Proceedings of ACM/IEEE-CS Symposium on Applied Computing, Kansas City, MO, April, 1991.
S. O’Connell and N. Winterbottom, Performing Joins Without Decompression in a Compressed Database System, SIGMOD Record, Vol. 32, No. 1, March 2003.
J. Goldstein, R. Ramakrishnan, U. Shaft, Compressing Relations and Indexes, Proceedings of the Fourteenth International Conference on Data Engineering (ICDE), pp. 370-379, 1998.
6. Query
Evaluation
Main Paper:
W. Han, H. Kache, M. Kandl, J. Ng, V. Markl, Progressive Optimization in a Shared-Nothing Parallel Database, in Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, Beijing, China, June, 2007.
Background:
W. Han, J. Ng, V. Markl, H. Kache, M. Kandil, Progressive Optimization in a Shared-Nothing Parallel Database, in Proceedings of the ACM Conference on the Management of Data (SIGMOD), Beijing, China, June, 2007.
H. V. Jagadish, Laks V. S. Lakshmanan, and Divesh Srivastava, Snakes and sandwiches: optimal clustering strategies for a data warehouse, ACM SIGMOD, 1999.
P. Hass and J. Hellerstein, Ripple Joins for Online Aggregation, SIGMOD, 1999.
(good background) G. Graefe, Query Evaluation Techiniques for Large Databases, in ACM Computing Surveys, Volume 25 , Issue 2 (June 1993), Pages: 73 – 169.
Gupta, V. Harinarayan,
D. Quass. "Aggregate-Query Processing in Data Warehousing
Environments." In Proceedings of the 21st
VLDB Conference,
7. Automatic DB Design 1 - Index Selection
Main papers:
S. Chaudhuri and V. Narasayya, An Efficient Cost-Driven Selection Tool for Microsoft SQL Server, Proceedings of the 23rd VLDB Conference, Athens, Greece, 1997.
Agrawal S., Chaudhuri
S. and Narasayya V., Automated
Selection of Materialized Views and Indexes for SQL Databases. Proceedings
of the 26th International Conference on Very Large Databases (VLDB00),
Background:
N. Bruno and S. Chaudhuri, Automatic Physical Database Tuning: A Relaxation-based Approach, in Proceedings of Conference on Management of Data (SIGMOD), Baltimore, MD, June, 2005.
Chaudhuri, S. and Narasayya V., AutoAdmin "What-If"
Index Analysis Utility. Proceedings of ACM SIGMOD,
S. Agrawal, N. Bruno,
8. Automatic
DB Design 2 - Materialized View Selection
Main paper:
H. Gupta and I.S. Mumick, Selection of views to materialize under a maintenance-time constraint, International Conference on Database Theory, 1999.
Background:
D. Theodoratos and T. Sellis, Designing data warehouses, Data and Knowledge Engineering, 31:3, 279--301, 1999.
E. Baralis, S. Paraboschi, and
9. Automatic DB
Design 3 -Partitioning
Main paper:
S. Papadomanolakis and A. Ailamaki, AutoPart: Automating Schema Design for Large Scientific Databases
Using Data Partitioning,. Proceedings of the 16th International
Conference on Scientific and Statistical Database Management (SSDBM),
Background:
Agrawal S., Narasayya V., and Yang, B.,
Integrating Vertical and Horizontal Partitioning into Automated Physical
Database Design. Proceedings of the ACM SIGMOD ,
10.
Automatic DB Design 4 -
IBM
Main papers:
D. Zillo, J. Rao, S. Lightstone, G. Lohman, A. Storm, C. Garcia-Arellano, S. Fadden, DB2 Design Advisor: Integrated Automatic Physical Database Design, Proceedings of the 30th VLDB Conference, Toronto, Canada, 2004.
J. Rao, C. Zhang, G. Lohman, N. Megiddo, Automating Physical Database Design in a Parallel Database, Proceedings of the 2002 ACM SIGMOD international conference on Management of data, 2002.
Background:
R. Telford, R. Horman, S. Lightstone, N. Markov, s. O'Connell, G. Lohman, Useability and Design Considerations for an Autonomic Relational Database Ssytem, IBM Systems Journal, Vol. 42, No. 4, 2003.
S. Lightstone, G. Lohman, D. Zillo, Toward Autonomic Computing with DB2 Universal Database, SIGMOD Record, Vol 31, No. 3, September, 2002.
S. Lightstone, G. Lohman, P. Haas, V. Markl, Making DB2 Products Self-Managing: Strategies and Experience, Bulletin of the IEEE Computer Society Technical Committeee on Data Engineering, 2006.
B. Dageville and K. Dias, Oracle's Self-Tuning Architecture and Solutions, Bulletin of the IEEE Computer Society Technical Committeee on Data Engineering, 2006.
L. Qiao, B. Iyer,
D. Agrawal, A. El Abbadi, Automated
Storage Management with QoS Guarantee in Large-Scale
Virtualized Storage Systems, Bulletin of the IEEE Computer Society
Technical Committeee on Data Engineering, 2006.
M. Abd-El-Malek, et. all, Early Experience on the Journey towards self-* Storage, Bulletin of the IEEE Computer Society Technical Committeee on Data Engineering, 2006.
11. Other Approaches
Main papers:
J. Dean and S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters,
OSDI'04: Sixth Symposium on Operating System Design and Implementation,
F. Chang, J. Dean,
Other
Interesting Papers on Data Warehousing and Related topics
S. Sarawagi and M. Stonebraker, Efficient organization of large multidimensional arrays, ICDE, 1994..
P. Deshpande, K. Ramasamy, A. Shukla, and Jeffrey F. Naughton, Caching multidimensional queries using chunks, ACM SIGMOD, 259--270, 1998.
P. Deshpande and J. Naughton, Aggregate aware caching for multi-dimensional queries, EDBT, 2000.
S. Ghemawat, H. Gobioff, and
* Wilburt Juan Labio, Ramana Yerneni, and Hector Garcia-Molina, Shrinking the warehouse update window, ACM SIGMOD, 1999.
* D. Quass and J. Widom, On-line warehouse view maintenance for batch updates, ACM SIGMOD, 393--404, 1997.
M. Staudt and M. Jarke, Incremental maintenance of externally materialized views, VLDB, 75--86, 1996.
Yannis Kotidis, Aggregate View Management in Data Warehouses in Handbook of Massive Datasets, 2002.
Y. Cui and J. Widom. "Lineage Tracing in a Data Warehousing System."
In Proceedings of the Sixteenth International Conference
on Data Engineering,
* Y. Cui and J. Widom. "Practical
Lineage Tracing in Data Warehouses." In Proceedings
of the Sixteenth International Conference on Data Engineering,
Y. Cui, J. Widom, and J. L.
Wiener. "Tracing the Lineage of View Data in a Data Warehousing
Environment." Technical Report,
W. J. Labio, J. Wiener, H.
Garcia-Molina, V. Gorelik. "Efficient Resumption of Interrupted Warehouse Loads."
Technical Report,
* J. Widom, “Trio: A System for Integrated Management of Data, Accuracy, and Lineage”,
in Proceedings of the Second Conference on Innovative Data
Systems Research (CIDR),
Resources
Industry
1. Larry Greenfield, Data Warehousing Information Center. (Web site)
2. Data Warehousing Online. (Web site)
3. Data Warehousing Knowledge Center. (Web site)
Example Projects
N. Roussopoulos, C.M. Chen, S. Kelley,
A. Delis, and Y. Papakonstantinou, The
G. Zhou, R. Hull, R. King, and J.C. Franchitti, Supporting Data Integration and Warehousing Using H2O. IEEE Data Engineering Bulletin, 18(2):29-40, June 1995.
J. Hammer, H. Garcia-Molina, W. Labio, J. Widom, and Y. Zhuge. The Stanford Data Warehousing Project. IEEE Data Engineering Bulletin, 18(2):41-48, June 1995.
M. Jarke, Y. Vassiliou.
Data
Warehouse Quality Design: A Review of the DWQ Project. Invited Paper, Proc.
2nd Conference on Information Quality. Massachusetts Institute of Technology,
H. Gupta
and D. Srivastava. The Data Warehouse of Newsgroups. International
Conference on Database Theory,
Stefano Trisolini, Maurizio Lenzerini, Daniele Nardi. Data Integration and Warehousing in Telecom Italia Proc.
Vendors
· IBM
· Oracle
· Comshare
· Cognos
Scientific Data Warehousing
Karl Aberer, Klemens
Hemm, “A
Methodology for Building a Data Warehouse in a Scientific Environment”,
First IFCIS International Conference on Cooperative Information Systems,
S. Maniatis, P. Vassiliadis,
Y.-W. Choong, D. Laurent, and P. Marcel, Computing appropriate representations for multidimensional data, DOLAP, 2001.