Course Reading List

This a list of papers related to data streams. Some (NOT all) papers will be used in the course. Others are for use in comparison papers and for reference.

Stream Systems

[AC+03] Abadi, D., Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., and Zdonik, S. Aurora: A New Model and Architecture for Data Stream Management. In VLDB Journal (12)2:120-139, August 2003.

[AB+05] Arasu, A., Babcock, B., Babu, S., Cieslewicz, J., Datar, M., Ito, K., Motwani, R., Srivastava, U., and Widom, J. STREAM: The Stanford Data Stream Management System. Book chapter - to appear.

[CA+07] Cetintemel, U., Abadi,D., Ahmad, Y., Balakrishnan, H., Balazinska, M., Cherniack, M., Hwang, J., Lindner, W., Madden, S., Maskey, A., Rasin, A., Ryvkina, E., Stonebraker, M., Tatbul, N., Xing, Y., and Zdonik, S. The Aurora and Borealis Stream Processing Engines. Book chapter in Data Stream Management: Processing High-Speed Data Streams, edited by M. Garofalakis, J. Gehrke, R. Rastogi, Springer, 2007

[CJ+03] Cranor, C., Johnson, T., Spatscheck, O., and Shkapenyuk, V. Gigascope: A Stream Database for Network Applications In Proceedings of the 2003 ACM SIGMOD Conference on Management of Data, San Diego, CA, June, 2003.

[CH+03] Chandrasekaran, S., et al. Telegraph CQ: Continuous Dataflow Processing for an Uncertain World CIDR 2003.

[GJ+09] Golab, L., Johnson, T., Seidel, J.S., Shkapenyuk, V. Stream Warehousing with DataDepot

[GS+09] T. Grabs, R. Schindlauer, R. Krishnan, J. Goldstein. Introducing Microsoft StreamInsight. Microsoft White Paper, September 2009, Revised May 2010.

[Kr07] J. Krämer. Continuous Queries over Data Streams - Semantics and Implementation. PhD Thesis, University of Marburg, 2007.

Stream Query Processing

[BB+02] Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J. Models and Issues in Data Stream Systems. In Proceedings of the 21st ACM Symposium on Principles of Database Systems (PODS 2002), Madison, WI, June 2002.

[BS+02] Babu, S., Srivastava, U., and Widom, J. Exploiting k-Constraints to Reduce Memory Overhead in Continuous Queries over Data Streams Stanford University Technical Report. Nov 2002.

[CF04] Chandrasekaran, S. and Franklin, M. Remembrance of Streams Past: Overload-Sensitive Management of Archived Data Streams In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB 2004). Toronto, Canada. August 2004.

[DR04] Ding, L., Rudensteiner, E. Evaluating Window Joins over Punctuated Streams CIKM 2004

[DM+04] Ding, L., Mehta, N., Rudensteiner, E., and Heineman, G.T. Joining Punctuated Streams EDBT 2004

[FT+09] Fernández-Moctezuma, R., Tufte, K., Li, J. Inter-Operator Feedback in Data Stream Management Systems via Punctuation. CIDR 2009.

[GGO04] Golab, L., Garg, S., and Ozsu, M.T. On Indexing Sliding Windows over On-line Data Streams EDBT 2004

[JMR05] Johnson, T., Muthukrishnan, S., Rozenbaum, I. Sampling Algorithms in a Stream Operator SIGMOD 2005

[KBG04] Kifer, D., Ben-David, S., and Gherke, J. Detecting Change in Data Streams. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB 2004). Toronto, Canada. August 2004.

[LM+05a] Li, J., Maier, D., Tufte, K., Papadimos, V., and Tucker, P. Semantics and Evaluation Techniques for Window Aggregates in Data Streams. In Proceedings of the 2005 ACM SIGMOD Conference on Management of Data, Toronto, Canada, June 2005.

[LT+08] Li, J., Tufte, K., Shkapenyuk, V., Papadimos, V., Johnson, T. and Maier, D. Out-of-order processing: a new architecture for high-performance stream systems. In Proceedings of the VLDB Endowment, 1(1), August 2008.

[SJ+05] Shkapenyuk, V., Johnson, T., Spatscheck, O., and Muthukrishnan, S. A Heartbeat Mechanism and its Application in Gigascope. VLDB 2005.

[SW+04] Srivastava, U., and Widom, J. Flexible Time Management in Data Stream Systems PODS 2004

[TM03] Tucker, P., Maier, D., Sheard, T., and Fegaras, L. Exploiting Punctuation Semantics in Continuous Data Streams In Transactions on Knowledge and Data Engineering, 15(3):555-568, May, 2003.

Query Languages

[ABW03] Arasu, A., Babu, S., and Widom, J. The CQL Continuous Query Language: Semantic Foundations and Query Execution. Technical Report, October 2003.

[CF+00] Cortes, C., Fisher, K., Pregibon, D., Rogers, A., Smith, F. Hancock: A Language for Extracting Signatures from Data Streams

Scheduling

[BB+03] Babcock, B., Babu, S., Datar, M., Motwani, R. Chain: Operator Scheduling for Memory Minimization in Data Stream Systems SIGMOD 2003.

[CC+03] Carney, D., Cetinternel, U., Rasin, A., Zdonik, S., Cherniack, M., and Stonebraker, M. Operator Scheduling in a Data Stream Manager In Proceedings of the 29th International Conference on Very Large Data Bases (VLDB03), Berlin, Germany, September 2003.

[HM+03] Hammad, M., Franklin, W., Aref, W., and Elmagarmid, A. Scheduling for Shared Window Joins over Data Streams VLDB 2003 (Alternative source).

[MP+09] Moakar, L.A., Pham, T.N., Neophytou, Chrysanthis, P.K., Labrinidis, A., Sharaf, M.A. Class-based Continuous Query Scheduling for Data Streams DMSN'09

Optimization

[CDN02] Chen J., DeWitt, D., and Naughton, J. Design and Evaluation of Alternative Selection Placement Strategies in Optimizing Continuous Queries. IDEC 2002

[KNV03] Kang, J., Naughton, J., Viglas, S. Evaluating Window Joins over Unbounded Streams. VLDB 2003

[LM+05b] Li, J. Maier, D., Tufte, K., Papadimos, V., Tucker, P. No Pane, No Gain: Efficient Evaluation of Sliding-Window Aggregates over Data Streams. SIGMOD Record, March 2005.

[VN02] Viglas, S. and Naughton, J. Rate-Based Query Optimization for Streaming Information Sources. In Proceedings of the 2002 ACM SIGMOD Conference on Management of Data, Madison, WI, June 2002.

Special Query Operators

[CGM09] Chandramouli, B., Goldstein, J. and Maier, D. On-the-fly Progress Detection in Iterative Stream Queries. Proceedings of the VLDB Endowment, 2(1), August 2009.

[LE+02] Luo, G., Ellmann, C., Haas, P., and Naughton, J. A Scalable Hash Ripple Join Algorithm SIGMOD 2002

[MLA04] Mokbel, M.F., Lu, M., and Aref, W.G. Hash-Merge Join: A Non-Blocking Join Algorithm for Producing Fast and Early Join Results ICDE 2004

[UF99] Urhan, T., Franklin, M. XJoin: Getting Fast Answers From Slow and Bursty Networks UMD Technical Report, CS-TR-3994, UMIACS-TR-99-13

[WA93] Wilschut, A. N., Apers, P.M.G. (Symmetric Hash Join) Dataflow Query Execution in a Parallel Main-Memory Environment Distributed and Parallel Databases, 1(1):103-128

Synopses (summarization, approximation, load shedding)

[CG05] Cormode, G., Garofalakis, M. Sketching Streams Through the Net: Distributed Approximate Query Tracking VLDB 2005

[CMR05] Cormode, G., Muthukrishnan, S., Rozenbaum, I. Summarizing and Mining Inverse Distributions on Data STreams via Dynamic Inverse Sampling VLDB 2005

[DGR03] Das, A., Gehrke, J., Riedewald, M. Approximate Join Processing over Data Streams SIGMOD 2003

[DG+02] Dobra, A., Garofalakis, M., Gehrke, J., Rastogi, R. Processing Complex Aggregate Queries over Data Streams SIGMOD 2002

[GK01] Gilbert, A., Kotidis, Y., Muthukrishnan, S., Strauss, M. Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries VLDB 2001

[TC+03] Tatbul, N., Cetinternel, U., Zdonik, S., Cherniack, M., Stonebraker, M. Load Shedding in a Data Stream Manager VLDB 2003

Distributed Systems - Fault Tolerance

[BO03] Babcock, B., and Olston, C. Distributed Top-K Monitoring SIGMOD 2003

[BB+05] Balazinska, M., Balakrishnan, H., Madden, S., Stonebraker, M. Fault-Tolerance in the Borealis Distributed Stream Processing System SIGMOD 2005

[BF+09] Brito, A., Fetzer, C., Felber, P. Minimizing Latency in Fault-Tolerant Distributed Stream Processing Systems ICDCS'09 (Slides)

[HB+05] Hwayng, J., Balazinska, M. et. al. High-Availability Algorithms for Distributed Stream Processing ICDE 2005

[JC+07] Jacques-Silva, G., Chalenger, J., Degenaro, L., Giles, J., Wagle, R. Towards Autonomic Fault Recovery in System-S ICAC'07

Scaling Data Stream Systems

[AG+11] H. Andrade, B. Gedik, K. -L. Wu, and P. S. Yu. 2011. Processing high data rate streams in System S. J. Parallel Distrib. Comput. 71, 2 (February 2011), 145-156.

[GA+09] Bugra Gedik, Henrique Andrade, and Kun-Lung Wu. 2009. A code generation approach to optimizing high-performance distributed data stream processing. In Proceedings of the 18th ACM conference on Information and knowledge management (CIKM '09). ACM, New York, NY, USA, 847-856.

[GS+12] Guirguis, S., Sharaf, M., Chrysanthis, P., Labrinids, A. Three-level Processing of Multiple Aggregate Continuous Queries ICDE 12

[GS+11] Guirguis, S., Sharaf, M., Chrysanthis, P., Labrinids, A. Optimized Processing of Multiple Aggregate Continuous Queries CIKM 11

[JM+08] Johnson, T., Muthukrishnan, S.M., Shkapenyuk, V., Spatscheck, O. Query-aware partitioning for monitoring massive network data streams. SIGMOD 2008.

[KH+09] Rohit Khandekar, Kirsten Hildrum, Sujay Parekh, Deepak Rajan, Joel Wolf, Kun-Lung Wu, Henrique Andrade, and Bugra Gedik. 2009. COLA: optimizing stream processing applications via graph partitioning. In Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware (Middleware '09).

[MC+10] Moakar, L.A., Chrysanthis, P.K., Chung, C., Guirguis, S., Labrinidis, A., Neophytou, P., Pruhs, K. Admission Control Mechanisms for Continous Queries in the Cloud ICDE 10

{SA+09] Scott Schneider, Henrique Andrade, Bugra Gedik, Alain Biem, and Kun-Lung Wu. 2009. Elastic scaling of data parallel operators in stream processing. In Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing (IPDPS '09). IEEE Computer Society, Washington, DC, USA, 1-12.

[TM11] Teubner, J., Muller, R. How Soccer Players Would do Stream Joins SIGMOD 2011

[ZW+10] Qiong Zou, Huayong Wang, Robert SouléMartin Hirzel, Henrique Andrade, Bugra Gedik, and Kun-Lung Wu. 2010. From a stream of relational queries to distributed stream processing. Proc. VLDB Endow. 3, 1-2 (September 2010), 1394-1405.

Benchmarking

[AC+04] Arasu, A., Cherniack, M., Galvez, E., Maier, D., Maskey, A., Ryvkina, E., Stonebraker, M., and Tibbetts, R. Linear Road: A Stream Data Management Benchmark. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB), Toronto, Canada, August, 2004.

Application Domains

[LS03] Lerner, A., and Sasha, D. The Virtues and Challenges of Ad Hoc + Streams Querying in Finance Data Engineering Bulletin, March 2003.

[MF02] Madden, S., and Franklin, M. Fjording the Stream: An Architecture for Queries over Streaming Sensor Data ICDE 2002

[YG03] Yao, Y., Gehrke, J. Query Processing for Sensor Networks CIDR 2003

Related Approaches

[CD+00] Chen, J., DeWitt, D., Tian, F., and Wang, Y. NiagaraCQ: A Scalable Continuous Query System for Internet Databases SIGMOD 2000.

[DF03] Diao, Y., Franklin, F. Query Processing for High-Volume Message Brokering VLDB 2003

[DRF04] Diao, Y., Rizvi, S., Franklin, M. Towards an Internet-Scale XML Dissemination Service VLDB 2004

[RH02] Raman, V., Hellerstein, J.M. Partial Results for Online Query Processing SIGMOD 2002

Additional Resources

Data Stream Research Projects