Cluster Singleton

You are viewing the documentation for the new actor APIs, to view the Pekko Classic documentation, see Classic Cluster Singleton.

Module info

To use Cluster Singleton, you must add the following dependency in your project:

sbt
val PekkoVersion = "1.1.2"
libraryDependencies += "org.apache.pekko" %% "pekko-cluster-typed" % PekkoVersion
Maven
<properties>
  <scala.binary.version>2.13</scala.binary.version>
</properties>
<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>org.apache.pekko</groupId>
      <artifactId>pekko-bom_${scala.binary.version}</artifactId>
      <version>1.1.2</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>
<dependencies>
  <dependency>
    <groupId>org.apache.pekko</groupId>
    <artifactId>pekko-cluster-typed_${scala.binary.version}</artifactId>
  </dependency>
</dependencies>
Gradle
def versions = [
  ScalaBinary: "2.13"
]
dependencies {
  implementation platform("org.apache.pekko:pekko-bom_${versions.ScalaBinary}:1.1.2")

  implementation "org.apache.pekko:pekko-cluster-typed_${versions.ScalaBinary}"
}
Project Info: Pekko Cluster (typed)
Artifact
org.apache.pekko
pekko-cluster-typed
1.1.2
JDK versions
OpenJDK 8
OpenJDK 11
OpenJDK 17
OpenJDK 21
Scala versions2.13.14, 2.12.20, 3.3.4
JPMS module namepekko.cluster.typed
License
Home pagehttps://pekko.apache.org/
API documentation
Forums
Release notesRelease Notes
IssuesGithub issues
Sourceshttps://github.com/apache/pekko

Introduction

For some use cases it is convenient and sometimes also mandatory to ensure that you have exactly one actor of a certain type running somewhere in the cluster.

Some examples:

  • single point of responsibility for certain cluster-wide consistent decisions, or coordination of actions across the cluster system
  • single entry point to an external system
  • single master, many workers
  • centralized naming service, or routing logic

Using a singleton should not be the first design choice. It has several drawbacks, such as single-point of bottleneck. Single-point of failure is also a relevant concern, but for some cases this feature takes care of that by making sure that another singleton instance will eventually be started.

Warning

Make sure to not use a Cluster downing strategy that may split the cluster into several separate clusters in case of network problems or system overload (long GC pauses), since that will result in in multiple Singletons being started, one in each separate cluster! See Downing.

Singleton manager

The cluster singleton pattern manages one singleton actor instance among all cluster nodes or a group of nodes tagged with a specific role. The singleton manager is an actor that is supposed to be started with ClusterSingleton.initClusterSingleton.init as early as possible on all nodes, or all nodes with specified role, in the cluster.

The actual singleton actor is

  • Started on the oldest node by creating a child actor from supplied BehaviorBehavior. It makes sure that at most one singleton instance is running at any point in time.
  • Always running on the oldest member with specified role.

The oldest member is determined by cluster.Member#isOlderThancluster.Member#isOlderThan This can change when removing that member from the cluster. Be aware that there is a short time period when there is no active singleton during the hand-over process.

When the oldest node is Leaving the cluster there is an exchange from the oldest and the new oldest before a new singleton is started up.

The cluster failure detector will notice when oldest node becomes unreachable due to things like JVM crash, hard shut down, or network failure. After Downing and removing that node the a new oldest node will take over and a new singleton actor is created. For these failure scenarios there will not be a graceful hand-over, but more than one active singletons is prevented by all reasonable means. Some corner cases are eventually resolved by configurable timeouts. Additional safety can be added by using a Lease.

Singleton proxy

To communicate with a given named singleton in the cluster you can access it though a proxy ActorRefActorRef. When calling ClusterSingleton.initClusterSingleton.init for a given singletonName on a node an ActorRef is returned. It is to this ActorRef that you can send messages to the singleton instance, independent of which node the singleton instance is active. ClusterSingleton.init can be called multiple times, if there already is a singleton manager running on this node, no additional manager is started, and if there is one running an ActorRef to the proxy is returned.

The proxy will route all messages to the current instance of the singleton, and keep track of the oldest node in the cluster and discover the singleton’s ActorRef. There might be periods of time during which the singleton is unavailable, e.g., when a node leaves the cluster. In these cases, the proxy will buffer the messages sent to the singleton and then deliver them when the singleton is finally available. If the buffer is full the proxy will drop old messages when new messages are sent via the proxy. The size of the buffer is configurable and it can be disabled by using a buffer size of 0.

It’s worth noting that messages can always be lost because of the distributed nature of these actors. As always, additional logic should be implemented in the singleton (acknowledgement) and in the client (retry) actors to ensure at-least-once message delivery.

The singleton instance will not run on members with status WeaklyUp.

Potential problems to be aware of

This pattern may seem to be very tempting to use at first, but it has several drawbacks, some of them are listed below:

  • The cluster singleton may quickly become a performance bottleneck.
  • You can not rely on the cluster singleton to be non-stop available — e.g. when the node on which the singleton has been running dies, it will take a few seconds for this to be noticed and the singleton be migrated to another node.
  • If many singletons are used be aware of that all will run on the oldest node (or oldest with configured role). Cluster Sharding combined with keeping the “singleton” entities alive can be a better alternative.
Warning

Make sure to not use a Cluster downing strategy that may split the cluster into several separate clusters in case of network problems or system overload (long GC pauses), since that will result in in multiple Singletons being started, one in each separate cluster! See Downing.

Example

Any BehaviorBehavior can be run as a singleton. E.g. a basic counter:

Scala
sourceobject Counter {
  sealed trait Command
  case object Increment extends Command
  final case class GetValue(replyTo: ActorRef[Int]) extends Command
  case object GoodByeCounter extends Command

  def apply(): Behavior[Command] = {
    def updated(value: Int): Behavior[Command] = {
      Behaviors.receiveMessage[Command] {
        case Increment =>
          updated(value + 1)
        case GetValue(replyTo) =>
          replyTo ! value
          Behaviors.same
        case GoodByeCounter =>
          // Possible async action then stop
          Behaviors.stopped
      }
    }

    updated(0)
  }
}
Java
sourcepublic class Counter extends AbstractBehavior<Counter.Command> {

  public interface Command {}

  public enum Increment implements Command {
    INSTANCE
  }

  public static class GetValue implements Command {
    private final ActorRef<Integer> replyTo;

    public GetValue(ActorRef<Integer> replyTo) {
      this.replyTo = replyTo;
    }
  }

  public enum GoodByeCounter implements Command {
    INSTANCE
  }

  public static Behavior<Command> create() {
    return Behaviors.setup(Counter::new);
  }

  private int value = 0;

  private Counter(ActorContext<Command> context) {
    super(context);
  }

  @Override
  public Receive<Command> createReceive() {
    return newReceiveBuilder()
        .onMessage(Increment.class, msg -> onIncrement())
        .onMessage(GetValue.class, this::onGetValue)
        .onMessage(GoodByeCounter.class, msg -> onGoodByCounter())
        .build();
  }

  private Behavior<Command> onIncrement() {
    value++;
    return this;
  }

  private Behavior<Command> onGetValue(GetValue msg) {
    msg.replyTo.tell(value);
    return this;
  }

  private Behavior<Command> onGoodByCounter() {
    // Possible async action then stop
    return this;
  }
}

Then on every node in the cluster, or every node with a given role, use the ClusterSingletonClusterSingleton extension to spawn the singleton. An instance will per data centre of the cluster:

Scala
sourceimport org.apache.pekko
import pekko.cluster.typed.ClusterSingleton
import pekko.cluster.typed.SingletonActor

val singletonManager = ClusterSingleton(system)
// Start if needed and provide a proxy to a named singleton
val proxy: ActorRef[Counter.Command] = singletonManager.init(
  SingletonActor(Behaviors.supervise(Counter()).onFailure[Exception](SupervisorStrategy.restart), "GlobalCounter"))

proxy ! Counter.Increment
Java
sourceimport org.apache.pekko.cluster.typed.ClusterSingleton;
import org.apache.pekko.cluster.typed.ClusterSingletonSettings;
import org.apache.pekko.cluster.typed.SingletonActor;

ClusterSingleton singleton = ClusterSingleton.get(system);
// Start if needed and provide a proxy to a named singleton
ActorRef<Counter.Command> proxy =
    singleton.init(SingletonActor.of(Counter.create(), "GlobalCounter"));

proxy.tell(Counter.Increment.INSTANCE);

Supervision

The default supervision strategy when an exception is thrown is for an actor to be stopped. The above example overrides this to restart to ensure it is always running. Another option would be to restart with a backoff:

Scala
sourceval proxyBackOff: ActorRef[Counter.Command] = singletonManager.init(
  SingletonActor(
    Behaviors
      .supervise(Counter())
      .onFailure[Exception](SupervisorStrategy.restartWithBackoff(1.second, 10.seconds, 0.2)),
    "GlobalCounter"))
Java
sourceClusterSingleton singleton = ClusterSingleton.get(system);
ActorRef<Counter.Command> proxy =
    singleton.init(
        SingletonActor.of(
            Behaviors.supervise(Counter.create())
                .onFailure(
                    SupervisorStrategy.restartWithBackoff(
                        Duration.ofSeconds(1), Duration.ofSeconds(10), 0.2)),
            "GlobalCounter"));

Be aware that this means there will be times when the singleton won’t be running as restart is delayed. See Fault Tolerance for a full list of supervision options.

Application specific stop message

An application specific stopMessage can be used to close the resources before actually stopping the singleton actor. This stopMessage is sent to the singleton actor to tell it to finish its work, close resources, and stop. The hand-over to the new oldest node is completed when the singleton actor is terminated. If the shutdown logic does not include any asynchronous actions it can be executed in the PostStopPostStop signal handler.

Scala
sourceval singletonActor = SingletonActor(Counter(), "GlobalCounter").withStopMessage(Counter.GoodByeCounter)
singletonManager.init(singletonActor)
Java
sourceSingletonActor<Counter.Command> counterSingleton =
    SingletonActor.of(Counter.create(), "GlobalCounter")
        .withStopMessage(Counter.GoodByeCounter.INSTANCE);
ActorRef<Counter.Command> proxy = singleton.init(counterSingleton);

Lease

A lease can be used as an additional safety measure to ensure that two singletons don’t run at the same time. Reasons for how this can happen:

  • Network partitions without an appropriate downing provider
  • Mistakes in the deployment process leading to two separate Pekko Clusters
  • Timing issues between removing members from the Cluster on one side of a network partition and shutting them down on the other side

A lease can be a final backup that means that the singleton actor won’t be created unless the lease can be acquired.

To use a lease for singleton set pekko.cluster.singleton.use-lease to the configuration location of the lease to use. A lease with with the name <actor system name>-singleton-<singleton actor path> is used and the owner is set to the Cluster(system).selfAddress.hostPortCluster.get(system).selfAddress().hostPort().

If the cluster singleton manager can’t acquire the lease it will keep retrying while it is the oldest node in the cluster. If the lease is lost then the singleton actor will be terminated then the lease will be re-tried.

Accessing singleton of another data centre

TODO #27705

Configuration

The following configuration properties are read by the ClusterSingletonManagerSettingsClusterSingletonManagerSettings when created with a ActorSystemActorSystem parameter. It is also possible to amend the ClusterSingletonManagerSettings or create it from another config section with the same layout as below. ClusterSingletonManagerSettings is a parameter to the ClusterSingletonManager.propsClusterSingletonManager.props factory method, i.e. each singleton can be configured with different settings if needed.

sourcepekko.cluster.singleton {
  # The actor name of the child singleton actor.
  singleton-name = "singleton"
  
  # Singleton among the nodes tagged with specified role.
  # If the role is not specified it's a singleton among all nodes in the cluster.
  role = ""
  
  # When a node is becoming oldest it sends hand-over request to previous oldest, 
  # that might be leaving the cluster. This is retried with this interval until 
  # the previous oldest confirms that the hand over has started or the previous 
  # oldest member is removed from the cluster (+ pekko.cluster.down-removal-margin).
  hand-over-retry-interval = 1s
  
  # The number of retries are derived from hand-over-retry-interval and
  # pekko.cluster.down-removal-margin (or ClusterSingletonManagerSettings.removalMargin),
  # but it will never be less than this property.
  # After the hand over retries and it's still not able to exchange the hand over messages
  # with the previous oldest it will restart itself by throwing ClusterSingletonManagerIsStuck,
  # to start from a clean state. After that it will still not start the singleton instance
  # until the previous oldest node has been removed from the cluster.
  # On the other side, on the previous oldest node, the same number of retries - 3 are used
  # and after that the singleton instance is stopped.
  # For large clusters it might be necessary to increase this to avoid too early timeouts while
  # gossip dissemination of the Leaving to Exiting phase occurs. For normal leaving scenarios
  # it will not be a quicker hand over by reducing this value, but in extreme failure scenarios
  # the recovery might be faster.
  min-number-of-hand-over-retries = 15

  # Config path of the lease to be taken before creating the singleton actor
  # if the lease is lost then the actor is restarted and it will need to re-acquire the lease
  # the default is no lease
  use-lease = ""

  # The interval between retries for acquiring the lease
  lease-retry-interval = 5s
}

The following configuration properties are read by the ClusterSingletonSettingsClusterSingletonSettings when created with a ActorSystemActorSystem parameter. ClusterSingletonSettings is an optional parameter in ClusterSingleton.initClusterSingleton.init. It is also possible to amend the ClusterSingletonProxySettingsClusterSingletonProxySettings or create it from another config section with the same layout as below.

sourcepekko.cluster.singleton-proxy {
  # The actor name of the singleton actor that is started by the ClusterSingletonManager
  singleton-name = ${pekko.cluster.singleton.singleton-name}
  
  # The role of the cluster nodes where the singleton can be deployed.
  # Corresponding to the role used by the `ClusterSingletonManager`. If the role is not
  # specified it's a singleton among all nodes in the cluster, and the `ClusterSingletonManager`
  # must then also be configured in same way.
  role = ""
  
  # Interval at which the proxy will try to resolve the singleton instance.
  singleton-identification-interval = 1s
  
  # If the location of the singleton is unknown the proxy will buffer this
  # number of messages and deliver them when the singleton is identified. 
  # When the buffer is full old messages will be dropped when new messages are
  # sent via the proxy.
  # Use 0 to disable buffering, i.e. messages will be dropped immediately if
  # the location of the singleton is unknown.
  # Maximum allowed buffer size is 10000.
  buffer-size = 1000 
}