TLA+ validation of Chord - Jean-Paul Bodeveix, Julien Brunel, David Chemouil, Mamoun Filali

Oct 15, 2020 14:00 · 1968 words · 10 minute read skipped n networks predecessor transition

Hello, everybody. My name is Mamoun Filali. I present you the work entitled A TLA+ validation of the Chord protocol. This is a joint work with my colleagues, Jean-Paul Bodeveix, Julien Brunel, and David Chemouil. First, a little bit of history about this protocol. Twenty years ago, an official version of this protocol was published. This protocol addresses peer-to-peer lookup services for Internet applications.

00:38 - Four years ago, Pamela Zave published an alloy model of this protocol. Pamela Zave validated her model to model checking and also elaborated its main correctness properties. Two years ago with my colleagues, we published a paper about the mechanization in event B of this protocol. However, since in event B, temporal properties are not explicit, we have looked for a formalism where some properties were explicit with respect to model as well with respect to specifications. So we consider it TLA. In order to give an intuition of the chord maintenance protocol, suppose we have the network on the left, that is, we have a set of nodes, and each node has a pointer to another node.

01:41 - So the role of the chord maintenance protocol is to move the connections on the left to the connections on the right such that they define a network connected according to a ring topology. We should also mention that the network also the protocol handles when a node joined or fails. In this talk, we will focus on the verification of a liveness property of the maintenance protocol. More precisely, we will focus on the liveness of stabilization. We will establish this verification on a TLA+ model. This validation will be done in the TLA logic. We will also formalize the basic notions and properties that are required to reason on such networks, and also for the proof development. This work has been mechanized and it has been mechanized with Isabelle-TLA. However, we will present the work in a TLA style instead of in Isabelle. First, we present the model of the state of each node.

03:21 - So we consider that each node has a first field, that is member which is Boolean representing the fact that the node is alive or not. Then we have the successor lists. This is the view of the node of the other nodes starting from its supposed successor. Then we have the field prdc, which is the supposed predecessor of the node. Then we have the inbox set. This is the inbox for the delivered messages. Then we have a program counter. This program counter is supposed to take its values in an integration predicates with three values.

04:08 - Last for stabilization, we have no more join or fail Boolean for expressing that we have reached the starting of stabilization. We will come back to this later. The global state is represented as an array of local states. Now, about the maintenance protocol. The maintenance protocol is represented as a set of transitions or TLA actions. Among these actions, we have protocol actions, but we have also operating assumptions. That is the assumptions that we will make, for instance, for the fail transition.

05:01 - We have also a virtual stabilization for expressing that is here, mainly for the correctness of the stabilization. As usual, the specification is specified as the conjunct of the disjunct of the transitions and of the liveness properties. The liveliness property is specified as the liveliness of each of the transitions, and we suppose that the protocol transitions on each node are weakly fair. Concerning now the protocol properties, the main property is stabilization, and the stabilization is stated as follows: when no more joins or fails occur, all the live nodes, that is the members, are eventually linked through a unique ring. Each node successor list is correct with respect to the member nodes.

06:14 - Pamela Zave has also stated the main property, that is it’s inductive invariant. This inductive invariant is stated as follows: the successor list of member nodes of a node is not empty, and the set of successor list of principle nodes is not empty. I’ll come back to this notion of principle nodes later. Now, in order to specify formally the properties of the protocol, we have to specify basic notions, for instance, basic notions concerning the rings. First, we specify the between predicate, the between sets.

06:59 - So the between sets defines the notes that are between two nodes along the ring of nodes. Then we specify also the sucNode function of a set M of a node n, and creatively the sucNode of a set M of a node n is the nearest node to N compatible with the between relation. A crucial notion for the correctness of the code protocol is the notion of principle. Here, we will first give a lightweight version of this definition, of this notion, and we will see later how it is linked to the original definition of Pamela Zave. Given the set of nodes M, a function f over M, the principles of f are the nodes of M that are not between any pair m, f of m.

This notion 08:26 - is not exactly the one that Pamela Zave introduced. Pamela Zave introduced its principles over the successor lists. However, here these principles are indeed over the first element of this lists. In fact, our notion is stronger than that of Pamela Zave. We state now one important theorem about this notion. Given a function f over the set of nodes M, M is the set of principles of f, if and only if f is the sucNode function of M. Other theorem with respect to prevNode, given a function f over the set of nodes M, p a principle of F, the prevNode of P over M is also a principle of f, If and only if the only node in M with image p is the prevNode of p over M. The last result that we will need is the back propagation of a predicate. I will not give the details of this back propagation here, I’ll just give the intuition of this back propagation. The basic idea is that given a predicate, we want to know how many times we should propagate these predicates along during and we propagate it through the prevNode function. This result is intuitively trivial.

10:28 - However, we had to state it explicitly, and the result is that if we propagate this predicate along the ring a number of time which is the cardinality of the number of nodes of the ring, then we reach the full propagation. We have the full propagation of the predicates. To summarize, let us give this figure. On the left, we have the prevNode and sucNode function. We note also that the between function is the between set is strict, that is between 10 and 16. We have the set from 11 to 15 which does not contains the boundary nodes. On the right, we have a function f modeled by the red edges. We have also noted the principal nodes with the P and the non-principal nodes with the P canceled. For instance, and the 16 is principal because it is not skipped by any edge, while 57 is not principal because it is skipped by the edge 54-60. So now, we can state formally, what do we verify? So we verify that when no more fails or joins occur, eventually, a distributed and replicated version of the sucNode function is built, that is, on each node n, the first element of the successor list defines sucNode over the set of members and the tail of the list defines replicated first successors. Also, a distributed version of the previous function is built on each node n.

The variable or the field 12:53 - prdc defines the prevNode of n over the set of members. Concerning the stabilization proof, first, we have had to validate the system invariants proposed by Pamela Zave, that is, the successor of list member nodes of a node is not empty, and the set of successor lists principal nodes is not empty. Concerning the stabilization, we have used phases. So once the virtual action no more joins or fails is filed, we first establish that the first elements of successor lists are members. Then the first elements of successor lists define a function over the members so we can apply the previous results that we have seen.

13:58 - Then we show that the prevNode is delivered to one principal. Then the field prdc updates to prevNode, and since the prdc updates the prevNode, later, prevNode becomes the principal. This establishes the basic step of the propagation that is starting from a principal P, we show that it leads to its previous node also becomes a principal. Then thanks to the full propagation result, all members become principal, and since all members become principal, we have one of the main results of stabilization, that is, the function defined by the first element of the successor list is in fact the sucNode version. Just to illustrate this through the following figures, so here we consider any principal.

15:27 - For instance, 60, 60 has a previous prevNode, 57. It has also as antecedent 54 and 49. 57 is not principal. However, 57 is eventually delivered to 60. Once 57 is received by 60, 60 updates its prdc field and then it is updated. Since it is updated, later, all the antecedents of 60 will not be any more antecedents of 60 and they will take as images, for instance, 57. PrevNode becomes principal. We have established the basic result for the propagation.

16:50 - Thanks to the full propagation result, we can establish that all the nodes will become principal, and then the red links and the sucNode function will be the same, thanks to the previous results. Last about Isabelle-TLA, all the model and the proofs have been done with Isabelle-TLA. We note that for this mechanization, the state predicates had to be made explicit for better proof automation. As I already noted, for transaction structuring, the use of guarded commands has made easier the handling of Enabled. Also we have, thanks to Isabelle-TLA, it was possible for us to express Ad hoc versions of Meta theorems and consequently augment the readability and also the automation of our proofs.

18:04 - Just to give you a flavor of Meta theorem. Here we have a rule, that is, we have the premises and the conclusion, and here we conclude on the leads to property. Thanks to this result, sorry. In the premises of this rule, we have only what we can consider as first order predicates and consequently we have more easily automated proofs for such premises since they are not any more sample. Of course, this result is an instance. We note also that this is an Ad hoc Meta theorem because this result is in fact an instantiation of the TLA logic WF rule on our problem. It relies on the fairness of the from predecessor transition of our model.

19:23 - To conclude, in this work, we have used the Principals theory that we have mechanized in Isabelle-HOL. We have also used Isabelle-TLA for temporal properties and for Meta theorems. We have presented the study of the maintenance of the Chord protocol through a TLA+ model. We note that the invariant proposed by Pamela Zave is indeed invariant for stabilization verification. Also, that stabilization liveness relies on the weak fairness of node transition.

20:05 - We consider these results interesting since in our previous study, we were convinced that strong fairness was necessary for stabilization. Thank you. .