# Proving Linearizability of Fault-Tolerant Register Protocols

Dependency Graph Approach

Gregory Chockler University of Surrey

Alexey Gotsman (IMDEA), Sadegh Keshavarzi (U of Surrey), Alejandro Naser-Pastoriza (IMDEA)

#### Fault-Tolerant Register Protocols



- Asynchronous messagepassing system
- Fail-prone servers (replicas)
- Fail-prone clients interact with replicas to implement a R/W register abstraction

#### Important Abstraction

- Registers model key storage functionality
- Sharing memory robustly in message-passing systems, Attiya, Bar-Noy, Dolev, JACM'95

```
self-stabilising
crash-recove
reconfigur
rollback
erasur
f
               crash-recovery
reconfigurable
                                                     1000+
           rollback-tolerant
               erasure-coded Crash
fast-path
faulty links
```

[ABD95] [1995-] [DISC'25]

#### This Talk

- New methodology for proving linearizability of multi-writer/multi-reader (MWMR) register implementations
  - Register implementation is linearizable IFF
    - $\forall$  histories  $\sigma$  of read/write invocations and responses,
    - $\exists$  permutation of  $\sigma$  (linearization) that complies with:
      - (1) Real-time order of non-overlapping invocations in  $\sigma$ , and
      - (2) Every read returns a value written by last preceding write

#### **Motivation**

- We have recently been working on MW register implementations extending ABD for new failure models
  - ► [Naser-Pastoriza, C, Gotsman, OPODIS'23, PODC'25], [Keshavarzi, C, Gotsman, DISC'25]
- We were looking for proof techniques to establish their linearizability
- Unexpectedly, this turned out to be a stumbling block...

#### It ain't easy...

- Textbook techniques do not work
  - Linearization point arguments
  - Forward simulations towards an atomic object
- "...there are no strongly-linearizable fault-tolerant message-passing implementations of multi-writer registers, max-registers, snapshots or counters" [Attiya, Enea, Welch, DISC'21]

# It ain't easy...

- Find a partial order on the invocations and prove it satisfies certain properties
  - ► [Lemma 13.16, Lynch'96], [Lynch & Shvartsman, DISC'02]
- Still not easy to use
  - Some properties cannot be proven before partial order is found
  - Assumes all invocations are complete

# It ain't easy...

- Capture partial order of [Lemma 13.16, Lynch'96] as an abstract automaton (PO Machine)
  - [C, Lynch, Mitra, Tauber, DISC'05]
- Prove MWMR ABD by forward simulation towards PO Machine
- Simulation relation turned out to be difficult to customise for more advanced register implementations

# Our Approach

Happens-before, coherence, reads-from, from-read

- Flip the partial order approach on its head!
  - (1) Express operation dependencies in terms of four standard binary relations from weak memory literature
  - (2) Prove that the union of these relations (dependency graph) is a acyclic
- A simple and elegant linearizability proof of multi-writer/ multi-reader (MWMR) ABD protocol: "proof pearl"

# Our Approach

#### Inspired by prior work on

- Early work on weak memory
  - [Shasha & Snir, ACM TOPLAS 1988]
- Transaction isolation specifications
  - [Adya's PhD thesis, 1999]
- Aspect-oriented linearizability proofs in shared memory
  - [Henzinger et al., CONCUR'13], [Dodds et al., POPL'15],
     [Domínguez & Nanevski, CONCUR'23]
- Transactional memory
  - [Khyzha, Attiya, Gotsman, Rinetzky, PPoPP'18]

Union of four relations for a given execution  $\sigma$  (Assume first event in  $\sigma$  is write( $v_0$ ))

Real-Time (rt)

 $\operatorname{rt}(a,b) \Longleftrightarrow a \text{ completes before } b \text{ is invoked in } \sigma$ 

Write-Write (ww)

ww: is a total order on the writes in  $\sigma$ 

 Write-Read (wr) ("reads-from")  $wr(w, r) \iff read r \text{ returns the}$ value written by write w

 $wr(w, r) \wedge wr(w', r) \implies w = w'$ 

Union of four relations for a given execution  $\sigma$  (Assume first event in  $\sigma$  is write( $v_0$ ))

Read-Write (rw) ("from-read")  $rw(r, w) \iff read r reads-from a$ write preceding w in ww









Cannot fix the cycle — not linearizable!



No cycles — linearizable!











# **Linearizability Theorem**

- An execution is linearizable IFF ∃ rt, ww, wr, rw such that the graph (rt U ww U wr U rw) is acyclic
- An execution is regular IFF ∃ rt⁻, ww, wr, rw such that the graph (rt ∪ ww ∪ wr ∪ rw) is acyclic
  - rt<sup>-</sup>: is a restriction of rt excluding read-read pairs



Acyclic dependency graph



Acyclic dependency graph induces a partial order



Acyclic dependency graph induces a partial order



...that can be extended into a total order



...that can be extended into a total order



...which is a linearization

#### **MWMR ABD**

- $n \ge 2f + 1$  replicas, f can crash
- Read/write quorum system

• 
$$\mathcal{R} = \{Q \subseteq P : |Q| = f+1\}, \mathcal{W} = \{Q \subseteq P : |Q| = n-f\}$$

- $\forall Q_r \in \mathcal{R} . \forall Q_w \in \mathcal{W} . Q_r \cap Q_w \neq \emptyset$
- ▶ Some  $Q_r \in \mathcal{R}$ ,  $Q_w \in \mathcal{W}$  are available in every execution

# MWMR ABD: Client $q_i$

#### Write(v): $S \leftarrow \text{read\_quorum}(TS)$ $c \leftarrow \max\{c'_i \mid (c'_i, \_) \in S\}$ $ts \leftarrow (c+1, i)$ $S \leftarrow \text{write\_quorum}(TSVal(ts, v))$

```
Read: S \leftarrow \text{read\_quorum}(\text{TSVal})

Let (ts, v) be such that (ts, v) \in S \land ts = \max\{ts' \mid (ts', \_) \in S)\}

S \leftarrow \text{write\_quorum}(\text{TSVal}(ts, v))

return v
```

```
\begin{split} \text{write\_quorum}(req) : \\ \text{send WRITE}\left(req\right) \text{ to } P \\ \text{Wait until received} \\ \{\text{WRITE\_ACK}(\mathbf{j}) \mid p_{\mathbf{j}} \in Q\} \land Q \in \mathcal{W} \end{split}
```

```
\begin{split} \text{read\_quorum}(req) : \\ & \text{send READ}(req) \text{ to } P \\ & \text{Wait until received} \\ & \{ \text{READ\_ACK}(\mathbf{r_j}, \mathbf{j}) \mid p_j \in Q \} \land Q \in \mathcal{R} \\ & \text{return } \{ r_i \mid p_i \in Q \} \end{split}
```

# MWMR ABD: Replica $p_i$

```
On WRITE (TSVal(ts, v)) from q_j if ts > ts then  (ts, val) \leftarrow (ts, v)  send WRITE_ACK(i) to q_i
```

```
On READ (req) from q_j
r \leftarrow \mathsf{case}\ req\ \mathsf{do}
\mathsf{TS}: \mathsf{ts}
\mathsf{TSVal}: (\mathsf{ts}, \mathsf{val})
\mathsf{send}\ \mathsf{READ\_ACK}(r,i)\ \mathsf{to}\ q_i
```

```
Write(v):

S \leftarrow \text{read\_quorum}(\text{TS})

c \leftarrow \max\{c'_i \mid (c'_i, \_) \in S\}

ts \leftarrow (c+1, i)

S \leftarrow \text{write\_quorum}(\text{TSVal}(ts, v))

Timestamp of a

write invocation
```

```
Read:
   S \leftarrow \text{read\_quorum}(TSVal)
   Let (ts, v) be such that (ts, v) \in S \land
       ts = \max\{ts' \mid (ts', \_) \in S\}
   S \leftarrow \text{write\_quorum}(\text{TSVal}(ts, v))
   return v
     Timestamp of a
      read invocation
```

TS: (read/write invocations)  $\rightarrow \mathbb{N}^{\geq 0}$ 

- $(W, W') \in ww \iff TS(W) < TS(W')$ 
  - ightharpoonup Timestamp uniqueness  $\Longrightarrow$  ww is a total order
- $(W,R) \in \text{wr} \iff TS(R) = TS(W)$ 
  - Fivery read R returns the value with timestamp TS(R) written by a write W such that TS(W) = TS(R)

- $(R, W) \in \text{rw} \iff TS(R) < TS(W)$ 
  - $(R, W) \in \text{rw} \iff \exists W'. (W', R) \in \text{wr} \land (W', W) \in \text{ww}$  $\iff TS(R) = TS(W') < TS(W)$
- By quorum intersection, update rule, write-back:
  - $(x, W) \in \mathsf{rt} \iff TS(x) < TS(W)$
  - $(x,R) \in \mathsf{rt} \iff TS(x) \leq TS(R)$

- $\sigma$ : an execution of MWMR ABD
- G: a dependency graph for  $\sigma$  constructed as above
- Suppose that G has a cycle C
- Then, *C* only includes reads

• Then, *C* only includes reads



- Then, *C* only includes reads
- Then, all edges in C are  $\mathsf{rt}$  edges



R has started after R has finished



- $\sigma$ : an execution of MWMR ABD
- G: a dependency graph for  $\sigma$  constructed as above
- Suppose that G has a cycle C
- Then, C only includes read vertices and rt edges
- Then, some read starts after it finishes. A contradiction.
- *G* is acyclic
- $\sigma$  is linearizable

#### Conclusions

- Proof pearl of linearizability for message-passing implementations of MWMR registers
- Easy generalisation for incomplete operations via visibility predicate
  - [Khyzha et al., PPoPP'18], [Keshavarzi, C, Gotsman, DISC'25]
- Linearizability theorem was proven for infinite executions
- Challenge for automated verification
  - Need to reason about whole traces, rather than inductively