Stateful Entities: Object-oriented Cloud Applications as Distributed Dataflows

11/18/2021
by   Wouter Zorgdrager, et al.
0

Programming stateful cloud applications remains a very painful experience. Instead of focusing on the business logic, programmers spend most of their time dealing with distributed systems considerations, with the most important being consistency, load balancing, failure management, recovery, and scalability. At the same time, we witness an unprecedented adoption of modern dataflow systems such as Apache Flink, Google Dataflow, and Timely Dataflow. These systems are now performant and fault-tolerant, and they offer excellent state management primitives. With this line of work, we aim at investigating the opportunities and limits of compiling general-purpose programs into stateful dataflows. Given a set of easy-to-follow code conventions, programmers can author stateful entities, a programming abstraction embedded in Python. We present a compiler pipeline named StateFlow, to analyze the abstract syntax tree of a Python application and rewrite it into an intermediate representation based on stateful dataflow graphs. StateFlow compiles that intermediate representation to a target execution system: Apache Flink and Beam, AWS Lambda, Flink's Statefun, and Cloudburst. Through an experimental evaluation, we demonstrate that the code generated by StateFlow incurs minimal overhead. While developing and deploying our prototype, we came to observe important limitations of current dataflow systems in executing cloud applications at scale.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset