LinkedIn: Creating a Low Latency Change Data Capture System with Databus
Siddharth Anand, a senior member of LinkedIn’s Distributed Data Systems team writes:
Having observed two high-traffic web companies solve similar problems, I cannot help but notice a set of wheel-reinventions. Some of these problems are difficult and it is truly unfortunate for each company to solve its problems separately. At the same time, each company has had to solve these problems due to an absence of a reliable open-source alternative. This clearly has implications for an industry dominated by fast-moving start-ups that cannot build 50-person infrastructure development teams or dedicate months away from building features.
Siddharth goes on to address a particular re-invention of the wheel: change data capture systems.
And he has a solution to this wheel re-invention problem: Databus. (Not good for all situations but worth your time to read carefully, along with following the other resources.)
From the post:
Databus is an innovative solution in this space.
It offers the following features:
- Pub-sub semantics
- In-commit-order delivery guarantees
- Commits at the source are grouped by transaction
- ACID semantics are preserved through the entire pipeline
- Supports partitioning of streams
- Ordering guarantees are then per partition
- Like other messaging systems, offers very low latency consumption for recently-published messages
- Unlike other messaging systems, offers arbitrarily-long look-back with no impact to the source
- High Availability and Reliability