Coordination and time

Table of contents

  1. Introduction
  2. Clock synchronization
    1. NTP
    2. The Berkley algorithm
  3. Ordering
    1. Serializability
  4. Logical clocks
    1. Lamport’s logical clocks
    2. Vector clocks
  5. Mutual exclusion
  6. Election algorithms
    1. The bully algorithm
  7. References

Introduction

Coordination between processes can be required for several reasons:

  1. To ensure two sets of data are the same (data synchronization).
  2. To ensure one process waits for another to finish its operation (process synchronization).
  3. To manage the interactions and dependencies between activities in a distributed system (coordination).

[1, P. 297]

Clock synchronization

Clock synchronization in a distributed system is difficult since processes running on different machines will have different hardware clocks [1, P. 298].

Hardware clocks suffer from clock drift. An average quartz-based hardware clock has a clock drift rate of around 31.5 seconds per year [1, P. 303].

Clock synchronization is needed to overcome the problem of clock drift in order to provide a holistic view of a system, for example when reading logs from multiple processes.

Two important timekeeping concepts are precision and accuracy. Precision refers to the deviation between two clocks. Accuracy refers to the precision in reference to an external value, like UTC [1, P. 303].

Internal clock synchronization is the process of keeping clocks precise, whereas external clock synchronization is the process of making clocks accurate [1, P. 303].

UTC is the primary time standard used to regulate time. There are shortwave radio stations around the world that broadcast a pulse at the start of each UTC second, as well as satellites that offer a UTC service [1, P. 302].

A UTC receiver is hardware that can receive UTC from providers such as satellites. Other computers can sync their clocks to a computer equipped with a receiver in order to follow UTC [1, P. 302].

When a computer’s clock is later than the actual time, the clock must be readjusted. Setting the clock to the earlier correct time can lead to problems for programs that rely on an ever-increasing time value. To solve this, the time change can be introduced gradually. One way to do this is to reduce the number of milliseconds that the computer time is incremented by on each tick until the local time has caught up to the time the computer is synchronizing with.

There are many algorithms to synchronize several computers to one machine with a UTC receiver or for keeping a set of machines as synchronized as possible.

NTP

NTP (Network Time Protocol) is a protocol for synchronizing clocks over a network [1, P. 304].

In NTP, the clock offset and roundtrip delay between two machines is estimated by sending messages between them. The offset can then be used to adjust the local time [1, P. 305].

NTP divides servers into strata. A server with a reference clock is known as a stratum-1 server. When server contacts server , it will only adjust its time if its stratum level is higher than server . Once has synchronized, it will change its stratum level to [1, P. 306].

The Berkley algorithm

The Berkley algorithm is an internal clock synchronization algorithm. It works by having a time daemon poll other machines periodically to get their respective times. The daemon then computes an average time using the answers and tells other machines to advance/slow their clocks to the new time [1, P. 306].

The Berkley algorithm works well for systems that don’t have a UTC receiver [1, P. 306].

Ordering

Ordering is the process of assigning an order to events in a distributed system.

Histories are sequences of operations. Ordered histories are used to ensure operations are executed deterministically on multiple distributed nodes.

In distributed systems, a total order is an order where each event happens one after another. A total order can be created by ordering according to the time that an event happened and then breaking ties in the case of two events occurring at the same time (e.g., by comparing the originator’s process ID) [2].

A partial order is where some events are ordered one after but other events are not ordered relative to each other.

Figure: Total and partial ordering

Serializability

A transaction history is defined as serializable if the outcome of executing the transactions in the history is the same as if the transactions were executed sequentially one after the other [3, P. 631].

Logical clocks

Logical clocks are abstract clocks that have no relation to physical time. They can be used in distributed systems to create an ordering of events [2, P. 559].

Lamport’s logical clocks

Leslie Lamport introduced a logical clock system in the paper Time, Clocks, and the Ordering of Events in a Distributed System [2].

The aim of the paper was to create an ordering of events that occur in a system that doesn’t have a precise physical time available to all processes [2, P. 559].

In the paper, Lamport introduces the happened-before relation (denoted: ). means event occurred before event .

Using Lamport’s assumptions, there are only three ways to determine that event ocurred before event :

  1. If and are events running in the same process and occurs before .
  2. If is the event of a message being sent by a process and is the event of the same message being received by a different process.
  3. If and then (happened-before is a transitive relation).

The happened-before relation can be used to create a partial ordering of events. A total ordering can also be created by breaking ties arbitrarily.

Lamport’s logical clocks work by sending a time value (an event counter) along with messages that are sent by the processes. If the receiving process has a time value that is less than the time value sent with the message, then the receiving process’ time value is incorrect (because when is a message being sent and is the same message being received). In this case, the receiver updates it’s time to be one more than the sender’s time [1, P. 312].

In Lamport’s description, each process maintains a local time value . The time value is updated according to te following rules:

  1. increments before executing an event.
  2. When sends a message to process it sets ’s timestamp to after executing step 1.
  3. When receives a message it adjusts to if .

[1, P. 312]

In the case that two events have the same event counter value, the tie is broken arbitrarily by assigning a numeric value to a process’ ID and using the ID value to compare the conflicting events [2, P. 561].

Let return the time value of event . Lamport clocks provide a total ordering of events, where event if . However, Lamport clocks can’t guarantee that occurred before just because , in other words Lamport’s clock can’t guarantee event causality.

One solution to determine causality is to use vector clocks.

Vector clocks

Vector clocks extend on logical clocks to help capture causal ordering [4, P. 57] .

With vector clocks, timestamps are represented as a vector where each index in the vector holds the event counter of a process , e.g.,: .

Let be the vector clock of process , and be the vector clock of message :

  1. is the number of events that have occurred so far at .
  2. is ’s knowledge of the number of events that have happened at .

[1, P. 318]

Vector clocks are sent with messages according to the following rules:

  1. Before executing an event, increments .
  2. When sends a message to process it sets ’s vector timestamp to after executing step 1.
  3. When receives a message it adjusts to for each (equivalent to merging causal histories). (what about incrementing?)

[1, P. 318]

This data allows a process to determine whether an event must have occurred before another event by comparing the values of each process’ timestamp at the time an event occurred [4, P. 59]. You can also use vector clocks to determine if two events are concurrent, which, depending on the event, could cause conflicts [1, P. 319].

Mutual exclusion

Mutual exclusion is needed to ensure that resources cannot be accessed multiple times, since concurrent access could lead to corrupted data [1, P. 321].

There are two main categories of distributed mutual exclusion algorithms:

  1. Token-based solutions.
  2. Permission-based solutions.

In token-based solutions, a token is passed between processes. Only processes that hold a token can access the shared resource. The downside to token-based solutions is that if the token is lost a new token must be generated (which can be a complicated procedure) [1, P. 322].

In permission-based solutions, a process that wishes to access a resource must gain permission from other processes first [1, P. 322].

Election algorithms

Election algorithms are used to pick a node to perform a special role (e.g., a coordinator) [1, P. 329].

Generally, election algorithms work by deciding which reachable node has the highest ID [1, P. 329].

The goal of an election algorithm is to ensure that an election process ends with all processes agreeing on an elected node [1, P. 330].

The bully algorithm

The bully algorithm uses the following steps to elect a new coordinator:

  1. sends an ELECTION message to all processes with higher identifiers.
  2. If no one responds, wins the election and becomes coordinator.
  3. If one of the higher-valued nodes responds, it takes over from .

A node can receive an ELECTION message from a lower-valued node at any moment. If it does, the receiver sends OK to the sender to indicate that the node is alive and taking over, and the receiver then holds an election [1, P. 330].

Eventually all processes give up but one, which is the newly elected node. It announces its victory by sending a message to all processes [1, P. 330].

If a process that was down comes back up, it holds an election [1, P. 330].

References

  1. [1] A. Tanenbaum and M. van Steen, Distributed Systems, 3.01 ed. Pearson Education, Inc., 2017.
  2. [2] L. Lamport, “Time, Clocks, and the Ordering of Events in a Distributed System,” Commun. ACM, vol. 21, no. 7, pp. 558–565, Jul. 1978.
  3. [3] C. H. Papadimitriou, “The serializability of concurrent database updates,” Journal of the ACM (JACM), vol. 26, no. 4, pp. 631–653, 1979.
  4. [4] C. J. Fidge, “Timestamps in message-passing systems that preserve the partial ordering,” Proceedings of the 11th Australian Computer Science Conference, vol. 10, no. 1, pp. 56–66, 1988.