You will write a 3 page paper (double-spaced) relating the topics in two papers from the literature.  The two papers will be selected from the following list of pairs. If you would like to write on a topic that does not appear below, let us know and we can help you pick papers.

 Your paper should compare and contrast the problems being addressed and the solutions offered.  Your paper must be more than just a summary of the two papers.  Three pages is not a lot of space, so we ask that you focus on just three points of contrast or comparison (and not less).  

There will be three milestones related to the overall writing assignment.  The purpose of the milestones is to encourage (read: force) you to start working on it early.  The milestones are:

  • (2 point) The choice of topic from the list below, due Thursday, July 27 (Week 4)
  • (4 points) A half-page summary of each paper, one page total, due Thursday, August 17th (Week 7).
  • (4 points) A list of three points of comparison or contrast, 1 or 2 complete sentences EACH, due Tuesday, September 5 (Week 10). 
  • (15 points) Final Paper, 3 pages, due Thursday, September 14 (Week 11). 

The final draft is due at the end of the course. 

Here are the topics listed so far: 

Arrays in Databases 

A database array algebra for spatiotemporal data and beyond. P. Bauman. In Next Generation Information Technologies and Systems, pages 76–93, 1999.

 A. R. van Ballegooij, A. P. de Vries, M. L. Kersten RAM: Array Processing over a Relational DBMS (http://www.cwi.nl/ftp/CWIreports/INS/INS-R0301.pdf) CWI Tech. Report, 2003 


Object Database Case Studies

Lessons learned from managing a petabyte. J. Becla and D. L. Wang. CIDR ’05: 2nd Biennial Conference on Innovative Data Systems Research, 2005

Migrating a multiterabyte archive from object to relationalA. Thakar, P. Kunszt, A. Szalay, and J. Gray. Computing in Science and Engineering, 5(5) 2003.

Data-centric Workflow Management

GridDB: A Data-centric Overlay for Scientific Grids. D. Liu and M. Franklin, VLDB 2004.

Scientific Workflow Management by Database Management. A. Ailamaki, Y. Ioannidis, and M. Livny, SSDBM 1998.

Biological data case studies

Database Challenges in the Integration of Biomedical Data Sets. R. Nagarajan, M. Ahmed, A. Phatak, VLDB 2004.

The Integrated Microbial Genomes (IMG) System: A Case Study in Biological Data Management. V. Markowitz, F. Korzeniewski, K. Palaniappan, E. Szeto, N. Ivanova, N. Kyrpides, VLDB 2005.

High-Energy Physics Case Studies

Lessons learned from managing a petabyte. J. Becla and D. L. Wang. CIDR ’05: 2nd Biennial Conference on Innovative Data Systems Research, 2005

Scientific Data Repositories: Designing for a Moving Target Etzard Stolte, Christoph von Praun, Gustavo Alonso, Thomas Gross, SIGMOD 2003.

Replication and Transfer of Large Datasets

DataMover: Robust Terabyte-Scale Multi-file Replication over Wide-Area Networks. A. Sim, J. Gu, A. Shoshani, V. Natarajan, SSDBM 2004.

A Framework for Reliable and Efficient Data Placement in Distributed Computing Systems. T. Kosar and M. Livny, Journal of Parallel and Distributed Computing, 2005.

Scientific Workflow Tools

Taverna: a tool for the composition and enactment of bioinformatics workflows. T. Oinn et al., Bioinformatics 20(17), 2004.

Resource Management of Triana P2P services. I. Taylor, M. Shields, I. Wang, Grid Resource Management, 2003.

Executing Workflows on the Grid

The GrADS Project: Software Support for High-Level Grid Application Development. F. Berman et al., International Journal of High Performance Computing Applications, 2001.

Pegasus: Mapping Scientific Workflows onto the Grid. E. Deelman, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, S. Patil, M. Su, K. Vahi, M. Livny. Across Grids Conference 2004.

Storage Systems for Science

IBM Storage Tank - A Heterogeneous scalable SAN file system J. Menon, D.A. Pease, R. Rees. L Duyanovich, B. Hilsberg, IBM Systems Journal 42(2):2003

A High-Performance Cluster Storage Server, Keith Bell, Andrew Chien and Mario Lauria, The 11th International Symposium on High Performance Distributed Computing (HPDC-11) Edinburgh, Scotland, July 24-26, 2002.

Lineage for Visualization Applications

Supporting Fine-Grained Data Lineage in a Database Visualization Environment, A. Woodruff, M.Stonebraker ICDE 1997

Managing Rapidly-Evolving Scientific Workflows (by Juliana Freire, Claudio T. Silva, Steven P. Callahan, Emanuele Santos, Carlos E. Scheidegger and Huy T. Vo) Invited paper, in the proceedings of the International Provenance and Annotation Workshop (IPAW), 2006

Uncertainty in Databases

An Introduction to ULDBs and the Trio System, Benjelloun, Omar; Das Sarma, Anish; Hayworth, Chris; Widom, Jennifer. IEEE Data Engineering Bulletin, March 2006

ORION: Concepts, Usage, and Installation Reynold Cheng, Sarvjeet Singh and Sunil Prabhakar. Orion Project Homepage

Pairs in progress

High-dimensional indexes

Moving Objects

?Modelling Biological Data 

Keet, C.M. Biological Data and Conceptual Modelling Methods. Journal of Conceptual Modeling, Issue 29, October 2003

?Generic Metadata Models

epubs.cclrc.ac.uk/bitstream/485/csmdm.version-2.pdf

?Efficient IO for Scientific  Applications

X. Ma, M. Winslett, J. Norris, X. Jiao, and R. Fiedler. Godiva: Lightweight data management for scientific visualization applications. In ICDE ’04