skip navigation

This page looks better in modern browsers. Please upgrade.

Brown Home Brown Home Brown Home Brown CS
Research Project:

Borealis: Second Generation Stream Processing Engine

Over the last several years, a great deal of progress has been made in the area of stream processing engines (SPE). Several groups have developed working prototypes (e.g., Aurora, STREAM, TelegraphCQ) and many papers have been published on detailed aspects of the technology such as stream-oriented languages, resource-constrained one-pass query processing, load shedding, and distributed processing. While this work is an important first step, fundamental mismatches remain between the requirements of many streaming applications and the capabilities of first-generation systems.

In the Borealis project, we build on our previous projects in the area of stream processing: Aurora, and Medusa. We identify and address the following shortcomings of current stream processing techniques:

  • Dynamic revision of query results: In many real-world streams, corrections or updates to previously processed data are available only after the fact. For instance, many popular data streams, such as the Reuters stock market feed, often include so-called revision records, which allow the feed originator to correct errors in previously reported data. Furthermore, stream sources (such as sensors), as well as their connectivity, can be highly volatile and unpredictable. As a result, data may arrive late and miss its processing window, or may be ignored temporarily due to an overload situation. In all these cases, applications are forced to live with imperfect results, unless the system has means to revise its processing and results to take into account newly available data or updates.

  • Dynamic query modification: In many stream processing applications, it is desirable to change certain attributes of the query at run time. For example, in the financial services domain, traders typically wish to be alerted of interesting events, where the definition of ``interesting'' (i.e., the corresponding filter predicate) varies based on current context and results. In network monitoring, the system may want to obtain more precise results on a specific subnetwork, if there are signs of a potential Denial-of-Service attack. Finally, in a military stream application that Mitre explained to us, they wish to switch to a ``cheaper'' query when the system is overloaded. For the first two applications, it is sufficient to simply alter the operator parameters (e.g., window size, filter predicate), whereas the last one calls for altering the operators that compose the running query.

    Another motivating application comes again from the financial services community. Universally, people working on trading engines wish to test out new trading strategies as well as debug their applications on historical data before they go live. As such, they wish to perform ``time travel'' on input streams. Although this last example can be supported in most current SPE prototypes (i.e., by attaching the engine to previously stored data), a more user-friendly and efficient solution would obviously be desirable.

  • Flexible and highly-scalable optimization: Currently, commercial stream processing applications are popular in industrial process control (e.g., monitoring oil refineries and cereal plants), financial services (e.g., feed processing, trading engine support and compliance), and network monitoring (e.g., intrusion detection, fraud detection). Here we see a server heavy optimization problem --- the key challenge is to process high-volume data streams on a collection of resource-rich ``beefy'' servers. Over the horizon, we see a very large number of applications of wireless sensor technology (e.g., RFID in retail applications, cell phone services). Here, we see a sensor heavy optimization problem --- the key challenges revolve around extracting and processing sensor data from a network of resource-constrained ``tiny'' devices. Further over the horizon, we expect sensor networks to become faster and increase in processing power. In this case the optimization problem becomes more balanced, becoming sensor heavy, server heavy. To date systems have exclusively focused on either a server-heavy environment or sensor-heavy environment. Off into the future, there will be a need for a more flexible optimization structure that can deal with a very large number of devices and perform cross-network sensor-heavy server-heavy resource management and optimization.

Project status: Active


Project Home Page: http://www.cs.brown.edu/research/borealis/

Research Areas

Database Systems

People

Yanif Nabil Ahmad
Ugur Cetintemel
Jeong-Hyon Hwang
Olga Papaemmanouil
Alexander Rasin
E. Nesime Tatbul
Ying Xing
Stan Zdonik
 

Publications

Abadi, D., Ahmad, Y., Balazinska, M., Cetintemel, U., Cherniack, M., Hwang, J.-H., Lindner, W., Maskey, A. S., Rasin, A., Ryvkina, E., Tatbul, N., Xing, Y., and Zdonik, S. The Design of the Borealis Stream Processing Engine. In Proceedings of the 2nd Conference on Classless Inter-Domain Routing (CIDR) (Jan 2005), pp. 277-289. [ pdf ]

Ahmad, Y., Berg, B., Cetintemel, U., Humphrey, M., Hwang, J., Jhingran, A., Maskey, A., Papaemmanouil, O., Rasin, A., Tatbul, N., Xing, W., Xing, Y., and Zdonik, S. Distributed Operation in the Borealis Stream Processing Engine (demo). In Proceedings of the ACM Special Interest Group on Management of Data Conference (SIGMOD) (Baltimore, MD, Jun 2005), pp. 882-884. [ pdf ]

Ahmad, Y., Cetintemel, U., Jannotti, J., and Zgolinski, A. Locality-Aware Networked Join Evaluation. In Proceedings of the International Workshop on Networking Meets Databases (NetDB) (Apr 2005). [ pdf ]

Ahmad, Y., Cetintemel, U., Jannotti, J., Zgolinski, A., and Zdonik, S. Network Awareness in Internet-Scale Stream Processing. IEEE Bulletin of the Technical Committee on Data Engineering 28, 1 (2005), 63-69. [ pdf ]

Hwang, J.-H., Balanzinska, M., Rasin, A., Cetintemel, U., Stonebraker, M., and Zdonik, S. High-Availability Algorithms for Distributed Stream Processing. In Proceedings of the 1st IEEE Workshop on Networking Meets Databases (NETDB) (Mar 2005), pp. 779-790. [ pdf ]

Stonebraker, M., and Cetintemel, U. One Size Fits All: An Idea Whose Time has Come and Gone. In Proceedings of the 21th IEEE International Conference on Data Engineering (ICDE'05) (2005). [ pdf ]

Stonebraker, M., Cetintemel, U., and Zdonik, S. The 8 Requirements of Real-Time Stream Processing. ACM SIGMOD Record 34, 4 (Dec 2005), 42-47. [ pdf ]

Ahmad, Y., and Cetintemel, U. Network-Aware Query Processing for Distributed Stream-Based Applications. In Proceedings of the 30th International Conference on Very Large Databases (VLDB) (2004), pp. 456-467. [ pdf ]

Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Cetintemel, U., Xing, Y., and Zdonik, S. Scalable Distributed Stream Processing. In Proceedings of the 1st Biennial Conference on Innovative Data Systems Research, Classless Inter-Domain Routing (CIDR '03) (2003). [ pdf ]

Tatbul, N., Cetintemel, U., Zdonik, S., Cherniack, M., and Stonebraker, M. Load Shedding in a Data Stream Manager. In Proceedings of the Conference on Very Large Databases (VLDB) (Berlin, Germany, Sep 2003), pp. 309-320. [ pdf ]


Page Owner: Webmaster Last Modified: Mon Oct 23 14:57:09 2006