In this course, we will discuss various issues arising in the context of data management. The course will begin with a review of such issues as file systems, architecture of database management systems, data models, and relational databases. We will also examine logical and physical design of databases, hardware and software implementation of database systems, and distributed databases. The bulk of the class will consist of reading papers drawn from the research literature.
Prerequisites:
Students must have taken a course in databases.
Class times:
Mondays, Wednesdays & Fridays 9:10 am - 10:00 am.
The class meets in MSE 003.
Office hours:
Mondays & Wednesdays 10:00 am – 11:00 am.
Tel: 827-5318
E-mail: ravi@cs.ucr.edu
Grading:
Class participation: 15%, project: 40%, exams: 45%.
About the project
The project or research paper is a major part of the class grade, and you should therefore expect to spend quite a bit of effort on it. You have the choice of doing either the systems project that is assigned, or working on a research paper. Ideally, a research paper should be publishable. However, a project that lays the groundwork for what may publishable would also be acceptable. The project may take several forms, but in all cases, its value depends on the new contributions it makes. A project could be a software (or hardware) system that implements and examines a new idea. Alternatively, it could be a theoretical contribution that combines or extends existing ideas in novel or interesting ways. To give you a sense of what to shoot for, take a look at this link.
Research paper progress
Since projects are open-ended, you need to conform to these deadlines to make sure you will be able to finish it on time.
- Week 2: Initial half-page description of interest area.
- Week 4: Specifics of the topic to be researched, with a list of references.
- Week 5: Initial detailed report on the state-of the art in the field, and outline of initial results.
- Week 8: Updated report on results obtained.
- Week 10: Final version of project report due.
Project progress
Please have a look at the Project Overview
Books
The bulk of the readings are expected to be from the research literature. A list of readings from the literature will be made available. No textbook is specifically required, but the following books are likely to be useful:
- “Database Management Systems”, R. Ramakrishnan and J. Gehrke, McGraw Hill
- “Fundamentals of Database Systems”, R. Elmasri and S. Navathe, Pearson Publishing. PTR.
Papers
R-tree indices
- A. Guttman, R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD, 1984
- N. Beckmann, H-P. Kriegel, R. Schneider, and B. Seeger, The R*-tree: An Efficient and Robust Access Method For Points and Rectangles. SIGMOD, 1990
- J. Nievergelt, H. Hinterberger, and K.C. Sevcik, The Grid File: An Adaptable, Symmetric Multikey File Structure. ACM Transactions on Database Systems, 1984
Space Filling Curves
- H.V. Jagadish, Linear clustering of objects with multiple attributes. SIGMOD, 1990
Join Processing
- Leonard D. Shapiro, Join Processing in Database Systems with Large Main Memories. ACM Transactions on Database Systems, 1984
Spatial Joins
- T. Brinkhoff, H-P. Kriegel, and B. Seeger, Efficient Processing of Spatial Joins using R-trees. SIGMOD, 1993
- Ming-Ling Lo and C.V. Ravishankar, Spatial Joins using Seeded Trees. SIGMOD Conference, SIGMOD, 1994
- Ming-Ling Lo and C.V. Ravishankar, Spatial Hash-Joins. SIGMOD, 1996
- Nick Koudas and Kenneth C. Sevcik, Size Separation Spatial Join. SIGMOD, 1997
Nearest Neighbors
- N. Roussopoulos, S. Kelley, and F. Vincent, Nearest Neighbor Queries. SIGMOD, 1995
- G.R. Hjaltason and H. Samet, Ranking in Spatial Databases. SSD 1995
Skyline Queries
- S. Börzsönyi, D. Kossmann, and K. Stocker, The Skyline Operator. ICDE 2001
- J. Chomicki, P. Godfrey, J. Gryz, and D. Liang: Skyline with Presorting. ICDE 2003
- D. Papadias, Y. Tao, G. Fu, and B. Seeger, An Optimal and Progressive Algorithm for Skyline Queries. SIGMOD, 2003
Data Intensive Applications
- J. Dean and S. Ghemawat, MapReduce: simplified data processing on large clusters. OSDI 2004
Aggregation for Data Intensive Applications
- J. Wen, V.R. Borkar, M.J. Carey, V.J. Tsotras, Revisiting Aggregation for Data Intensive Applications: A Performance Study
Top-K Queries
- R. Fagin, Combining fuzzy information: an overview, SIGMOD, 2002
Temporal Databases And Indexing
- B. Salzberg and V.J. Tsotras, Comparison of Access Methods for Time-Evolving Data. ACM Computing Surveys, 1999
- V.J. Tsotras and N. Kangerlaris, The Snapshot Index: An I/O-optimal access method for timeslice queries. Information Surveys, 1995
- B. Becker, S. Gschwind, T. Ohler, B. Seeger, P. Widmayer: An Asymptotically Optimal Multiversion B-Tree. VLDB, 1996
Data Outsourcing and Security
- H. Hacigümüs, B. Iyer, C. Li, and S. Mehrotra. Executing SQL over encrypted data in the database-service-provider model. SIGMOD, 2002
- B. Hore, S. Mehrotra, M. Canim, and M. Kantarcioglu. Secure multidimensional range queries over outsourced data. VLDB, 2011
- J.L. Dautrich and C.V. Ravishankar, Compromising Privacy in Precise Query Protocols. EDBT, 2013
- P. Wang and C.V. Ravishankar, Secure and efficient range queries on outsourced databases using Rp-trees. ICDE, 2013