2.2 SYSTEM MODELS⌗

Two generals problem: A model of networks. We assume that nodes are honest, but messages can get lost.
Byzantine generals problem: A model of node behaviour. We assume that messages get delivered, but nodes can be dishonest / manipulated.

In real systems, both nodes and network might be faulty.

We want to capture assumptions in a system model consisting of:

Network behaviour (e.g message loss)
Node behaviour (e.g Crashes)
Timing behaviour (e.g latency)

class ServiceExcption(Exception):
    @staticmethod
    def hello(string):
        print(string)

Network Behaviour⌗

Assume bidi point-to-point communication between two nodes. We can have:

Reliable (perfect) links: A message is received if and only if it is sent.
Fair-loss links: Messages may be lost, duplicated or reordered. If you keep retrying, a message eventually gets through.
Arbitrary (Active adversary) links: A malicious adversary may interfere with messages (eavesdrop, modify, drop, spoof, replay).

Network Partition is when a link between nodes is interrupted for a finite amount of time.

Interesting thing with this model of network behaviour is that the links can be upgraded.

Fair-loss link -> Reliable link: Using Retry + Deduplication logic.
Arbitrary link -> Fair-loss link: Using TLS protocol. This can only be upgraded though, if the active adversary does not entirely stop the transmission of packets.

Node Behaviour⌗

Each node executes a specified algorithm, assuming one of the following:

Crash-stop (fail-stop): A node is faulty if it crashes (at any moment). After a crash, it stops executing forever.
Crash-recovery (fail-recovery): A node may crash at any moment, losing its in-memory state. It mar resume executing some other time.
Byzantine (fail-arbitrary): A node is faulty if it deviates from its algorithm. This could be crashing or malicious behaviour.

A node that is not faulty is called correct. An assumption here is that a node does not know if another node is faulty, that can be handled by Fault Detection mechanisms.

Synchrony (Timing) Behaviour⌗

Assume one of the following for network and nodes:

Synchronous: Message latency no greater than a known upper bound. Nodes execute algorithm at a known speed.
Partially Synchronous: The system is asynchronous for some finite (but unknown) periods of time, synchronous otherwise.
Asynchronous: Messages can be delayed arbitrarily. Nodes can pause execution arbitrarily. No timing guarantees at all.

Here are some violations of synchrony in practice:

Violations of predictable latency:

Message loss requiring retry due to a network partition.
Congestion/contention causing queueing.
Network/route reconfiguration.

Violations of predictable code execution speed:

OS scheduling issues e.g priority inversion.
Stop-the-world GC pauses.
Page faults, swap, thrashing.

RTOS provide scheduling guarantees, but most distributed systems do not use RTOS.

Distributed Systems: System Models

2.2 SYSTEM MODELS⌗

Network Behaviour⌗

Node Behaviour⌗

Synchrony (Timing) Behaviour⌗