You will write a 3 page paper (double-spaced) relating the topics in two papers from the literature. The two papers will be selected from the following list of pairs. If you would like to write on a topic that does not appear below, let us know and we can help you pick papers.
Your paper should compare and contrast the problems being addressed and the solutions offered. Your paper must be more than just a summary of the two papers. Three pages is not a lot of space, so we ask that you focus on just three points of contrast or comparison (and not less).
There will be three milestones related to the overall writing assignment. The purpose of the milestones is to encourage (read: force) you to start working on it early. The milestones are:
- (2 point) The choice of topic from the list below, due Thursday, July 27 (Week 4)
- (4 points) A half-page summary of each paper, one page total, due Thursday, August 17th (Week 7).
- (4 points) A list of three points of comparison or contrast, 1 or 2 complete sentences EACH, due Tuesday, September 5 (Week 10).
- (15 points) Final Paper, 3 pages, due Thursday, September 14 (Week 11).
The final draft is due at the end of the course.
Here are the topics listed so far:
Arrays in Databases
A database array algebra for spatiotemporal data and beyond. P. Bauman. In Next Generation Information Technologies and Systems, pages 76–93, 1999.
A. R. van Ballegooij, A. P. de Vries, M. L. Kersten RAM: Array Processing over a Relational DBMS (http://www.cwi.nl/ftp/CWIreports/INS/INS-R0301.pdf) CWI Tech. Report, 2003
Object Database Case Studies
Lessons learned from managing a petabyte. J. Becla and D. L. Wang. CIDR ’05: 2nd Biennial Conference on Innovative Data Systems Research, 2005
Migrating a multiterabyte archive from object to relationalA. Thakar, P. Kunszt, A. Szalay, and J. Gray. Computing in Science and Engineering, 5(5) 2003.
Data-centric Workflow Management
GridDB: A Data-centric Overlay for Scientific Grids. D. Liu and M. Franklin, VLDB 2004.
Scientific Workflow Management by Database Management. A. Ailamaki, Y. Ioannidis, and M. Livny, SSDBM 1998.
Biological data case studies
Database Challenges in the Integration of Biomedical Data Sets. R. Nagarajan, M. Ahmed, A. Phatak, VLDB 2004.
The Integrated Microbial Genomes (IMG) System: A Case Study in Biological Data Management. V. Markowitz, F. Korzeniewski, K. Palaniappan, E. Szeto, N. Ivanova, N. Kyrpides, VLDB 2005.
High-Energy Physics Case Studies
Lessons learned from managing a petabyte. J. Becla and D. L. Wang. CIDR ’05: 2nd Biennial Conference on Innovative Data Systems Research, 2005
Scientific Data Repositories: Designing for a Moving Target Etzard Stolte, Christoph von Praun, Gustavo Alonso, Thomas Gross, SIGMOD 2003.
Replication and Transfer of Large Datasets
DataMover: Robust Terabyte-Scale Multi-file Replication over Wide-Area Networks. A. Sim, J. Gu, A. Shoshani, V. Natarajan, SSDBM 2004.
A Framework for Reliable and Efficient Data Placement in Distributed Computing Systems. T. Kosar and M. Livny, Journal of Parallel and Distributed Computing, 2005.
Scientific Workflow Tools
Taverna: a tool for the composition and enactment of bioinformatics workflows. T. Oinn et al., Bioinformatics 20(17), 2004.
Resource Management of Triana P2P services. I. Taylor, M. Shields, I. Wang, Grid Resource Management, 2003.
Executing Workflows on the Grid
The GrADS Project: Software Support for High-Level Grid Application Development. F. Berman et al., International Journal of High Performance Computing Applications, 2001.
Pegasus: Mapping Scientific Workflows onto the Grid. E. Deelman, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, S. Patil, M. Su, K. Vahi, M. Livny. Across Grids Conference 2004.
Storage Systems for Science
IBM Storage Tank - A Heterogeneous scalable SAN file system J. Menon, D.A. Pease, R. Rees. L Duyanovich, B. Hilsberg, IBM Systems Journal 42(2):2003
A High-Performance Cluster Storage Server, Keith Bell, Andrew Chien and Mario Lauria, The 11th International Symposium on High Performance Distributed Computing (HPDC-11) Edinburgh, Scotland, July 24-26, 2002.
Lineage for Visualization Applications
Supporting Fine-Grained Data Lineage in a Database Visualization Environment, A. Woodruff, M.Stonebraker ICDE 1997
Managing Rapidly-Evolving Scientific Workflows (by Juliana Freire, Claudio T. Silva, Steven P. Callahan, Emanuele Santos, Carlos E. Scheidegger and Huy T. Vo) Invited paper, in the proceedings of the International Provenance and Annotation Workshop (IPAW), 2006
Uncertainty in Databases
An Introduction to ULDBs and the Trio System, Benjelloun, Omar; Das Sarma, Anish; Hayworth, Chris; Widom, Jennifer. IEEE Data Engineering Bulletin, March 2006
ORION: Concepts, Usage, and Installation Reynold Cheng, Sarvjeet Singh and Sunil Prabhakar. Orion Project Homepage
Pairs in progress
High-dimensional indexes
Moving Objects
?Modelling Biological Data
Keet, C.M. Biological Data and Conceptual Modelling Methods. Journal of Conceptual Modeling, Issue 29, October 2003
?Generic Metadata Models
epubs.cclrc.ac.uk/bitstream/485/csmdm.version-2.pdf
?Efficient IO for Scientific Applications
X. Ma, M. Winslett, J. Norris, X. Jiao, and R. Fiedler. Godiva: Lightweight data management for scientific visualization applications. In ICDE ’04