Distributed Consistency and Baseball

24 Jan 2022

This post is my notes while going through the sources. I do not own any of the materials.

Big Ideas

give examples on real systems and their consistency models. what are the real world performance implications of each consistency model? What do they mean for real systems?.

Strong and Weak Consistency

The need for consistency arises from replication, which is nowadays almost a first principle in distributed systems: pretty much all large scale distributed systems use replication [source?]. Replication provides both higher fault-tolerance and availablity, but introduces a new problem: consistency. Ideally, we would still like our replicated system to behave as a single machine (strongly consistent).

Weak consistency does not behave the same as a single machine, but often offers better performance. Note that weak consistency is a category of consistency models and not a specific model.

Baseball

Let us first define 6 types of consistencies using key-value store:

  1. Strong: get(k) request is guaranteed to return the latest value of k.
  2. Eventual: get(k) will return any previously written values of k. No time nor order bound. Weakest model out of the 6.
  3. Consistent prefix: the reader observes the correct order of writes, but is not guaranteed how recent the data is. i.e. the reader sees a version of the data that existed at some point in the past.
  4. Bounded staleness: get(k) will return a value of k during the last t minutes. i.e. data “freshness” is time bounded.
  5. Monotonic reads: while the first get(k) can return any value, subsequent get(k)s by the same client is guaranteed to receive the same or more up to date value. That is, a client will never see an older version of k.
  6. Read my writes: if a client write v to k, subsequent reads to k from the same client will return v or a more recent value of x. If not writer, no guarantees and equivalent to eventual.

Visitor and Home are keys. Top table show the writes (increments) to each key in each inning. Bottom table shows all possible results of get(visitor, home). Assume only 1 client is writing. Note that within an inning, visitors bat first.

Read Requirements for other Participants

Here, each participant is a node with certain assigned tasks. We wish to find what consistencies do each participant require in order to perform their jobs correctly.

Note that each participant may perform read/writes from different server nodes.

Questions

Sources

[1] https://mwhittaker.github.io/consistency_in_distributed_systems/1_baseball.html

[2] https://parveensingh.com/cosmosdb-consistency-levels/#consistent-prefix

[3] https://www.microsoft.com/en-us/research/wp-content/uploads/2011/10/ConsistencyAndBaseballReport.pdf