Overview of Apache Pekko libraries and modules
Before delving into some best practices for writing actors, it will be helpful to preview the most commonly used Pekko libraries. This will help you start thinking about the functionality you want to use in your system. All core Pekko functionality is available as Open Source Software (OSS).
The following capabilities are included with Pekko OSS and are introduced later on this page:
- Actor library
- Remoting
- Cluster
- Cluster Sharding
- Cluster Singleton
- Persistence
- Projections
- Distributed Data
- Streams
- Apache Pekko Connectors
- HTTP
- gRPC
- Other Apache Pekko modules
This page does not list all available modules, but overviews the main functionality and gives you an idea of the level of sophistication you can reach when you start building systems on top of Pekko.
Actor library
- sbt
val PekkoVersion = "1.0.1" libraryDependencies += "org.apache.pekko" %% "pekko-actor-typed" % PekkoVersion
- Maven
<properties> <scala.binary.version>2.13</scala.binary.version> </properties> <dependencyManagement> <dependencies> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-bom_${scala.binary.version}</artifactId> <version>1.0.1</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependencies> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-actor-typed_${scala.binary.version}</artifactId> </dependency> </dependencies>
- Gradle
def versions = [ ScalaBinary: "2.13" ] dependencies { implementation platform("org.apache.pekko:pekko-bom_${versions.ScalaBinary}:1.0.1") implementation "org.apache.pekko:pekko-actor-typed_${versions.ScalaBinary}" }
The core Pekko library is pekko-actor-typed
, but actors are used across Pekko libraries, providing a consistent, integrated model that relieves you from individually solving the challenges that arise in concurrent or distributed system design. From a birds-eye view, actors are a programming paradigm that takes encapsulation, one of the pillars of OOP, to its extreme. Unlike objects, actors encapsulate not only their state but their execution. Communication with actors is not via method calls but by passing messages. While this difference may seem minor, it is actually what allows us to break clean from the limitations of OOP when it comes to concurrency and remote communication. Don’t worry if this description feels too high level to fully grasp yet, in the next chapter we will explain actors in detail. For now, the important point is that this is a model that handles concurrency and distribution at the fundamental level instead of ad hoc patched attempts to bring these features to OOP.
Challenges that actors solve include the following:
- How to build and design high-performance, concurrent applications.
- How to handle errors in a multi-threaded environment.
- How to protect my project from the pitfalls of concurrency.
Remoting
- sbt
val PekkoVersion = "1.0.1" libraryDependencies += "org.apache.pekko" %% "pekko-remote" % PekkoVersion
- Maven
<properties> <scala.binary.version>2.13</scala.binary.version> </properties> <dependencyManagement> <dependencies> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-bom_${scala.binary.version}</artifactId> <version>1.0.1</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependencies> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-remote_${scala.binary.version}</artifactId> </dependency> </dependencies>
- Gradle
def versions = [ ScalaBinary: "2.13" ] dependencies { implementation platform("org.apache.pekko:pekko-bom_${versions.ScalaBinary}:1.0.1") implementation "org.apache.pekko:pekko-remote_${versions.ScalaBinary}" }
Remoting enables actors that live on different computers to seamlessly exchange messages. While distributed as a JAR artifact, Remoting resembles a module more than it does a library. You enable it mostly with configuration and it has only a few APIs. Thanks to the actor model, a remote and local message send looks exactly the same. The patterns that you use on local systems translate directly to remote systems. You will rarely need to use Remoting directly, but it provides the foundation on which the Cluster subsystem is built.
Challenges Remoting solves include the following:
- How to address actor systems living on remote hosts.
- How to address individual actors on remote actor systems.
- How to turn messages to bytes on the wire.
- How to manage low-level, network connections (and reconnections) between hosts, detect crashed actor systems and hosts, all transparently.
- How to multiplex communications from an unrelated set of actors on the same network connection, all transparently.
Cluster
- sbt
val PekkoVersion = "1.0.1" libraryDependencies += "org.apache.pekko" %% "pekko-cluster-typed" % PekkoVersion
- Maven
<properties> <scala.binary.version>2.13</scala.binary.version> </properties> <dependencyManagement> <dependencies> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-bom_${scala.binary.version}</artifactId> <version>1.0.1</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependencies> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-cluster-typed_${scala.binary.version}</artifactId> </dependency> </dependencies>
- Gradle
def versions = [ ScalaBinary: "2.13" ] dependencies { implementation platform("org.apache.pekko:pekko-bom_${versions.ScalaBinary}:1.0.1") implementation "org.apache.pekko:pekko-cluster-typed_${versions.ScalaBinary}" }
If you have a set of actor systems that cooperate to solve some business problem, then you likely want to manage these set of systems in a disciplined way. While Remoting solves the problem of addressing and communicating with components of remote systems, Clustering gives you the ability to organize these into a “meta-system” tied together by a membership protocol. In most cases, you want to use the Cluster module instead of using Remoting directly. Clustering provides an additional set of services on top of Remoting that most real world applications need.
Challenges the Cluster module solves include the following:
- How to maintain a set of actor systems (a cluster) that can communicate with each other and consider each other as part of the cluster.
- How to introduce a new system safely to the set of already existing members.
- How to reliably detect systems that are temporarily unreachable.
- How to remove failed hosts/systems (or scale down the system) so that all remaining members agree on the remaining subset of the cluster.
- How to distribute computations among the current set of members.
- How to designate members of the cluster to a certain role, in other words, to provide certain services and not others.
Cluster Sharding
- sbt
val PekkoVersion = "1.0.1" libraryDependencies += "org.apache.pekko" %% "pekko-cluster-sharding-typed" % PekkoVersion
- Maven
<properties> <scala.binary.version>2.13</scala.binary.version> </properties> <dependencyManagement> <dependencies> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-bom_${scala.binary.version}</artifactId> <version>1.0.1</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependencies> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-cluster-sharding-typed_${scala.binary.version}</artifactId> </dependency> </dependencies>
- Gradle
def versions = [ ScalaBinary: "2.13" ] dependencies { implementation platform("org.apache.pekko:pekko-bom_${versions.ScalaBinary}:1.0.1") implementation "org.apache.pekko:pekko-cluster-sharding-typed_${versions.ScalaBinary}" }
Sharding helps to solve the problem of distributing a set of actors among members of a Pekko cluster. Sharding is a pattern that mostly used together with Persistence to balance a large set of persistent entities (backed by actors) to members of a cluster and also migrate them to other nodes when members crash or leave.
Challenges that Sharding solves include the following:
- How to model and scale out a large set of stateful entities on a set of systems.
- How to ensure that entities in the cluster are distributed properly so that load is properly balanced across the machines.
- How to ensure migrating entities from a crashed system without losing the state.
- How to ensure that an entity does not exist on multiple systems at the same time and hence keeps consistent.
Cluster Singleton
- sbt
val PekkoVersion = "1.0.1" libraryDependencies += "org.apache.pekko" %% "pekko-cluster-singleton" % PekkoVersion
- Maven
<properties> <scala.binary.version>2.13</scala.binary.version> </properties> <dependencyManagement> <dependencies> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-bom_${scala.binary.version}</artifactId> <version>1.0.1</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependencies> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-cluster-singleton_${scala.binary.version}</artifactId> </dependency> </dependencies>
- Gradle
def versions = [ ScalaBinary: "2.13" ] dependencies { implementation platform("org.apache.pekko:pekko-bom_${versions.ScalaBinary}:1.0.1") implementation "org.apache.pekko:pekko-cluster-singleton_${versions.ScalaBinary}" }
A common (in fact, a bit too common) use case in distributed systems is to have a single entity responsible for a given task which is shared among other members of the cluster and migrated if the host system fails. While this undeniably introduces a common bottleneck for the whole cluster that limits scaling, there are scenarios where the use of this pattern is unavoidable. Cluster singleton allows a cluster to select an actor system which will host a particular actor while other systems can always access said service independently from where it is.
The Singleton module can be used to solve these challenges:
- How to ensure that only one instance of a service is running in the whole cluster.
- How to ensure that the service is up even if the system hosting it currently crashes or shuts down during the process of scaling down.
- How to reach this instance from any member of the cluster assuming that it can migrate to other systems over time.
Persistence
- sbt
val PekkoVersion = "1.0.1" libraryDependencies += "org.apache.pekko" %% "pekko-persistence-typed" % PekkoVersion
- Maven
<properties> <scala.binary.version>2.13</scala.binary.version> </properties> <dependencyManagement> <dependencies> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-bom_${scala.binary.version}</artifactId> <version>1.0.1</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependencies> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-persistence-typed_${scala.binary.version}</artifactId> </dependency> </dependencies>
- Gradle
def versions = [ ScalaBinary: "2.13" ] dependencies { implementation platform("org.apache.pekko:pekko-bom_${versions.ScalaBinary}:1.0.1") implementation "org.apache.pekko:pekko-persistence-typed_${versions.ScalaBinary}" }
Just like objects in OOP, actors keep their state in volatile memory. Once the system is shut down, gracefully or because of a crash, all data that was in memory is lost. Persistence provides patterns to enable actors to persist events that lead to their current state. Upon startup, events can be replayed to restore the state of the entity hosted by the actor. The event stream can be queried and fed into additional processing pipelines (an external Big Data cluster for example) or alternate views (like reports).
Persistence tackles the following challenges:
- How to restore the state of an entity/actor when system restarts or crashes.
- How to implement a CQRS system.
- How to ensure reliable delivery of messages in face of network errors and system crashes.
- How to introspect domain events that have led an entity to its current state.
- How to leverage Event Sourcing in your application to support long-running processes while the project continues to evolve.
Projections
- sbt
val PekkoVersion = "1.0.1" libraryDependencies += "org.apache.pekko" %% "pekko-projection-core" % PekkoVersion
- Maven
<properties> <scala.binary.version>2.13</scala.binary.version> </properties> <dependencyManagement> <dependencies> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-bom_${scala.binary.version}</artifactId> <version>1.0.1</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependencies> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-projection-core_${scala.binary.version}</artifactId> </dependency> </dependencies>
- Gradle
def versions = [ ScalaBinary: "2.13" ] dependencies { implementation platform("org.apache.pekko:pekko-bom_${versions.ScalaBinary}:1.0.1") implementation "org.apache.pekko:pekko-projection-core_${versions.ScalaBinary}" }
Projections provides a simple API for consuming a stream of events for projection into a variety of downstream options. The core dependency provides only the API and other provider dependencies are required for different source and sink implementations.
Challenges Projections solve include the following:
- Constructing alternate or aggregate views over an event stream.
- Propagating an event stream onto another downstream medium such as a Kafka topic.
- A simple way of building read-side projections in the context of Event Sourcing and CQRS system
Distributed Data
- sbt
val PekkoVersion = "1.0.1" libraryDependencies += "org.apache.pekko" %% "pekko-cluster-typed" % PekkoVersion
- Maven
<properties> <scala.binary.version>2.13</scala.binary.version> </properties> <dependencyManagement> <dependencies> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-bom_${scala.binary.version}</artifactId> <version>1.0.1</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependencies> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-cluster-typed_${scala.binary.version}</artifactId> </dependency> </dependencies>
- Gradle
def versions = [ ScalaBinary: "2.13" ] dependencies { implementation platform("org.apache.pekko:pekko-bom_${versions.ScalaBinary}:1.0.1") implementation "org.apache.pekko:pekko-cluster-typed_${versions.ScalaBinary}" }
In situations where eventual consistency is acceptable, it is possible to share data between nodes in a Pekko Cluster and accept both reads and writes even in the face of cluster partitions. This can be achieved using Conflict Free Replicated Data Types (CRDTs), where writes on different nodes can happen concurrently and are merged in a predictable way afterward. The Distributed Data module provides infrastructure to share data and a number of useful data types.
Distributed Data is intended to solve the following challenges:
- How to accept writes even in the face of cluster partitions.
- How to share data while at the same time ensuring low-latency local read and write access.
Streams
- sbt
val PekkoVersion = "1.0.1" libraryDependencies += "org.apache.pekko" %% "pekko-stream" % PekkoVersion
- Maven
<properties> <scala.binary.version>2.13</scala.binary.version> </properties> <dependencyManagement> <dependencies> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-bom_${scala.binary.version}</artifactId> <version>1.0.1</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependencies> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-stream_${scala.binary.version}</artifactId> </dependency> </dependencies>
- Gradle
def versions = [ ScalaBinary: "2.13" ] dependencies { implementation platform("org.apache.pekko:pekko-bom_${versions.ScalaBinary}:1.0.1") implementation "org.apache.pekko:pekko-stream_${versions.ScalaBinary}" }
Actors are a fundamental model for concurrency, but there are common patterns where their use requires the user to implement the same pattern over and over. Very common is the scenario where a chain, or graph, of actors, need to process a potentially large, or infinite, stream of sequential events and properly coordinate resource usage so that faster processing stages do not overwhelm slower ones in the chain or graph. Streams provide a higher-level abstraction on top of actors that simplifies writing such processing networks, handling all the fine details in the background and providing a safe, typed, composable programming model. Streams is also an implementation of the Reactive Streams standard which enables integration with all third party implementations of that standard.
Streams solve the following challenges:
- How to handle streams of events or large datasets with high performance, exploiting concurrency and keeping resource usage tight.
- How to assemble reusable pieces of event/data processing into flexible pipelines.
- How to connect asynchronous services in a flexible way to each other with high performance.
- How to provide or consume Reactive Streams compliant interfaces to interface with a third party library.
Pekko Connectors
Pekko Connectors is a separate module from Pekko.
Pekko Connectors is collection of modules built upon the Streams API to provide Reactive Stream connector implementations for a variety of technologies common in the cloud and infrastructure landscape.
See the Pekko Connectors overview page for more details on the API and the implementation modules available.
Pekko Connectors help solve the following challenges:
- Connecting various infrastructure or persistence components to Stream based flows.
- Connecting to legacy systems in a manner that adheres to a Reactive Streams API.
HTTP
Pekko HTTP is a separate module from Pekko.
The de facto standard for providing APIs remotely, internal or external, is HTTP. Pekko provides a library to construct or consume such HTTP services by giving a set of tools to create HTTP services (and serve them) and a client that can be used to consume other services. These tools are particularly suited to streaming in and out a large set of data or real-time events by leveraging the underlying model of Pekko Streams.
Some of the challenges that HTTP tackles:
- How to expose services of a system or cluster to the external world via an HTTP API in a performant way.
- How to stream large datasets in and out of a system using HTTP.
- How to stream live events in and out of a system using HTTP.
gRPC
Pekko gRPC is a separate module from Pekko.
This library provides an implementation of gRPC that integrates nicely with the HTTP and Streams modules. It is capable of generating both client and server-side artifacts from protobuf service definitions, which can then be exposed using Pekko HTTP, and handled using Streams.
Some of the challenges that Pekko gRPC tackles:
- Exposing services with all the benefits of gRPC & protobuf:
- Schema-first contract
- Schema evolution support
- Efficient binary protocol
- First-class streaming support
- Wide interoperability
- Use of HTTP/2 connection multiplexing
Example of module use
Pekko modules integrate together seamlessly. For example, think of a large set of stateful business objects, such as documents or shopping carts, that website users access. If you model these as sharded entities, using Sharding and Persistence, they will be balanced across a cluster that you can scale out on-demand. They will be available during spikes that come from advertising campaigns or before holidays will be handled, even if some systems crash. You can also take the real-time stream of domain events with Persistence Query and use Streams to pipe them into a streaming Fast Data engine. Then, take the output of that engine as a Stream, manipulate it using Pekko Streams operators and expose it as web socket connections served by a load balanced set of HTTP servers hosted by your cluster to power your real-time business analytics tool.
We hope this preview caught your interest! The next topic introduces the example application we will build in the tutorial portion of this guide.