- Architectural styles
- Middleware organization
- System architectures
The system architecture of a distributed system is how the distributed system’s software and hardware are organized.
You can make a distinction between the logical organization of distributed systems and the physical organization [1, P. 55].
The logical organization of a distributed system is known as its software architecture [1, P. 56].
There are different architectural styles that are formulated in terms of how different components are connected to each other.
A component is a “modular unit with well-defined required and provided interfaces that is replaceable within its environment”. Components should be replaceable without requiring the system to be powered down [1, P. 56].
A connector is a mechanism that “mediates communication, coordination, or cooperation among components” [1, P. 56].
The most common distributed systems architectures are:
- Layered architectures
- Object-based and service-oriented architectures
- Resource-based architectures
- Event-based architectures
Note: in the real-world, many systems don’t follow one style exactly.
Layered architectures are where components are organized in layers. Network communication is an example of a layered architecture [1, P. 57].
In layered architectures, a component can make a downcall to a component at a lower-level (e.g. ). generally expects a response from the lower-level component [1, P. 57].
Layered communication protocols are defined protocols for interacting between layers. A communication protocol “describes the rules that parties will follow in order to exchange information” [1, P. 58].
Object-based architectures are where components (objects) are connected loosely through a (possibly remote) procedure call mechanism [1, P. 62].
In object-based architectures, each object has a defined interface that conceals implementation details [1, P. 62].
Because the interface and the object can be separated, it’s possible to interface at one machine while the object exists on a different machine. This organization is commonly referred to as a distributed object [1, P. 62].
A proxy is an implementation of the object’s interface. When a client binds to a distributed object, a proxy is loaded into the client’s address space. A proxy marshals method invocations into messages, and unmarshals reply messages [1, P. 62].
In a distributed object system, an object exists on a server machine where it has the same interface as it does on the client machine. Incoming requests are passed to a server stub, known as a skeleton [1, Pp. 62-3].
For most distributed object systems, state is not distributed. Instead, state exists on individual machines. Only the object interfaces are made available on other machines (known as remote objects). In a general distributed object system, state might be distributed across physical machines, but the distribution would be hidden to a client [1, P. 63].
Objects provide good encapsulation, where a service is a self-contained interface. This makes it possible to organize systems using a service-oriented architecture (SOA) [1, P. 63].
In service-oriented architectures a system is constructed from multiple services that interact with each other. The services might not all be administered by the same organization [1, P. 63].
Resource-based architectures are architectures where systems are viewed as a collection of resources that are individually managed by components [1, P. 64].
An example is REST. RESTful architectures have four key characteristics:
- Resource are identified using a single naming scheme.
- All services offer the same interface with four methods.
- Messages sent to or from a service are fully self-described.
- Execution is stateless.
A RESTful API only offers four methods:
Publish-subscribe architectures have a strong separation of processing and coordination [1, P. 67].
One way of conceptualizing a system is as a collection of autonomously executing processes. In this model, coordination is the communication between processes [1, P. 67].
Cabri et al. created a taxonomy of coordination models that can be applied to distributed systems. Tanenbaum et al. modified this taxonomy to create a distinction between them across temporal and referential dimensions:
|Temporally coupled||Temporally decoupled|
|Referentially decoupled||Event-based||Shared data space|
Referentially coupled means that a communicating process must address a target process directly. Temporally coupled means that a process must receive a message at the same time a message is sent [1, P. 67].
If a process is temporally and referentially coupled, communication is direct. An example is talking over cellphones [1, P. 67].
Mailbox coordination is where communication is referentially coupled but temporally decoupled, like in email [1, P. 67].
Event-based communication is temporally coupled, but not referentially coupled. An event will be published by a process without knowing what other processes are subscribed to the event [1, P. 67].
A shared data space is where a process communicates entirely in tuples, which are structured data records (like rows in a database). A process can put a tuple into the shared data space, and another process can later retrieve the tuple [1, P. 68].
In publish-subscribe systems, communication takes place by describing the event that took place. This makes naming of events an important consideration [1, P. 70].
Assume that an event is described by a series of attributes. An event is published when it’s available for other processes to read. A subscription must be passed to the middleware, using a name to identify the event that the subscription listens for [1, P. 70].
There are two common forms of publish-subscribe systems:
- Topic-based publish-subscribe systems
- Content-based publish-subscribe systems
In a topic-based publish-subscribe system, a subscriber subscribes to a named logical channel (a topic). The events published to the channel typically contain <attribute, value> pairs.
In a content-based publish-subscribe system, a subscription can also consist of <attribute, value> pairs. An event is only sent to a subscriber if the event’s attributes match constraints provided by the subscriber [1, P. 70].
If data is immediately forwarded to subscribers, the system is temporally coupled. If data must be explicitly read by subscribers, the system is temporally-decoupled. In this case, the middleware will need to store the event data [1, Pp. 7-1].
There are two important design patterns in middleware organization:
A wrapper is a component that offers an interface to a client application. It solves the problem of incompatible interfaces [1, P. 72].
The wrapper pattern is common in object-oriented programming [1, P. 72].
An object adapter is a wrapper that provides an interface for accessing remote objects [1, P. 72] .
An interceptor is a pattern where the normal execution flow is interrupted in order to execute other code.
For example, consider an object-based distributed system. An object can call a method that belongs to an object , where exists on a different computer than . Remote invocation involves:
- Object has same local interface as the interface offered by object .
- The call by is transformed to a generic object invocation by the middleware running on machine .
- The object invocation is transformed into a message that’s sent over the network to .
A request-level interceptor can interrupt an invocation as the method is called on the remote interface [1, P. 74].
A message-level interceptor can interrupt an invocation just before the message is sent [1, P. 74].
This section documents some common system architectures.
In client-server architectures, processes are divided into two (sometimes overlapping) groups: servers and clients [1, P. 76].
A server is a process that implements a specific service.
A client is a process that uses a server’s service by sending it a request and waiting for a response [1, P. 76].
A client can communicate with a server using a connectionless protocol, like UDP. The downside of using a connectionless protocol is that handling transmission errors is difficult [1, P. 77].
One solution is to use a connection-based protocol, like TCP.
The distinction between client and server is often not clear in the client-server model [1, P. 77].
The client-server model is a two-tiered architecture [1, P. 78].
Many distributed applications are split into three layers:
- User interface layer
- Processing layer
- Data layer
These layers can be distributed across different machines. An example would be a website where the browser operates at the user interface layer, a server is the processing layer, and a database operates at the data layer [1, Pp. 78,80].
The server running at the processing layer will act as a client to the database at the data layer [1, P. 80].
Peer-to-peer systems are system architectures that support distribution across clients and servers. A client or server might be physically split into logically equivalent parts, where each part is operating on its own share of the complete data set [1, P. 81].
In a peer-to-peer system, all processes that constitute the system are equal. Interaction between processes is symmetric: a process will act as a client and as a server at the same time [1, P. 81].
Peer-to-peer architectures are concerned with organizing processes in an overlay network. An overlay network is a network that is formed of the processes and the links that represent possible communication channels (usually TCP). An overlay network can either be structured or unstructured [1, P. 81].
In a structured peer-to-peer system, the overlay follows a predetermined topology (e.g. a ring or a grid). The topology is used to look up data.
Generally, structured peer-to-peer systems use a semantic-free index to access data. Each data item maintained by the system is uniquely associated with a key, and the key is used as an index:
The peer-to-peer system is then responsible for storing key-value pairs. In this case, a node is assigned an identifier from all hash values, and each node is responsible for storing data associated with a subset of keys. This architecture is known as a DHT (distributed hash table) [1, P. 82]. A DHT makes it easy for any node to find the correct node for a key:
In an unstructured peer-to-peer system, each node maintains a dynamic list of neighbors. Commonly, when a node joins an unstructured peer-to-peer system it contacts a well-known node in order to obtain a list of peers [1, P. 84].
Unstructured peer-to-peer systems must search for a requested value. Some examples of search methods are:
- Random walks
In flooding, an issuing node, , passes a request for an item to each of its neighbors. A request is ignored if the receiving node, , has seen it before. If the request hasn’t been seen before, will search locally for the item. If the item isn’t available locally, will issue a request to each of its neighbors and then returns the response to if it receives one [1, P. 85].
Random walks involves an issuing node, , asking one of its neighbors at random. If the neighbor cannot satisfy the request, asks another neighbor at random, and so on until it receives an answer [1, P. 85].
Locating data items can become difficult as the network grows, since there is no deterministic way to route a lookup request to a specific node. Some peer-to-peer systems use special index nodes to overcome this problem [1, P. 87].
An alternative is to abandon symmetric communication and use a broker node to coordinate communication between a client and a server.
Nodes that maintain an index or act as a broker are known as super peers. Super peers are often organized in a peer-to-peer relationship.
-  A. Tanenbaum and M. van Steen, Distributed Systems, 3.01 ed. Pearson Education, Inc., 2017.