Better Write Amplification for Streaming Data Processing

06/05/2023
by   Andrei Chulkov, et al.
0

Many current applications have to perform data processing in a streaming fashion. Doing so at a large scale requires a parallel system that must be equipped to handle straggling workers and different kinds of failures. YT is the main driver behind distributed systems at Yandex, home to its distributed file system, lock service, key-value storage, and internal MapReduce platform. We implement a new component of this system designed for performing streaming MapReduce operations, utilizing different core YT solutions to achieve fault-tolerance and exactly-once semantics while maintaining efficiency and low write amplification factors.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset