Serialization

Dependency

To use Serialization, you must add the following dependency in your project:

sbt
val PekkoVersion = "1.0.1"
libraryDependencies += "org.apache.pekko" %% "pekko-actor" % PekkoVersion
Maven
<properties>
  <scala.binary.version>2.13</scala.binary.version>
</properties>
<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>org.apache.pekko</groupId>
      <artifactId>pekko-bom_${scala.binary.version}</artifactId>
      <version>1.0.1</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>
<dependencies>
  <dependency>
    <groupId>org.apache.pekko</groupId>
    <artifactId>pekko-actor_${scala.binary.version}</artifactId>
  </dependency>
</dependencies>
Gradle
def versions = [
  ScalaBinary: "2.13"
]
dependencies {
  implementation platform("org.apache.pekko:pekko-bom_${versions.ScalaBinary}:1.0.1")

  implementation "org.apache.pekko:pekko-actor_${versions.ScalaBinary}"
}

Introduction

The messages that Pekko actors send to each other are JVM objects (e.g. instances of Scala case classes). Message passing between actors that live on the same JVM is straightforward. It is done via reference passing. However, messages that have to escape the JVM to reach an actor running on a different host have to undergo some form of serialization (i.e. the objects have to be converted to and from byte arrays).

The serialization mechanism in Pekko allows you to write custom serializers and to define which serializer to use for what.

Serialization with Jackson is a good choice in many cases and our recommendation if you don’t have other preference.

Google Protocol Buffers is good if you want more control over the schema evolution of your messages, but it requires more work to develop and maintain the mapping between serialized representation and domain representation.

Pekko itself uses Protocol Buffers to serialize internal messages (for example cluster gossip messages).

Usage

Configuration

For Pekko to know which Serializer to use for what, you need to edit your configuration: in the pekko.actor.serializers-section, you bind names to implementations of the serialization.Serializerserialization.Serializer you wish to use, like this:

sourcepekko {
  actor {
    serializers {
      jackson-json = "org.apache.pekko.serialization.jackson.JacksonJsonSerializer"
      jackson-cbor = "org.apache.pekko.serialization.jackson.JacksonCborSerializer"
      proto = "org.apache.pekko.remote.serialization.ProtobufSerializer"
      myown = "docs.serialization.MyOwnSerializer"
    }
  }
}

After you’ve bound names to different implementations of Serializer you need to wire which classes should be serialized using which Serializer, this is done in the pekko.actor.serialization-bindings-section:

sourcepekko {
  actor {
    serializers {
      jackson-json = "org.apache.pekko.serialization.jackson.JacksonJsonSerializer"
      jackson-cbor = "org.apache.pekko.serialization.jackson.JacksonCborSerializer"
      proto = "org.apache.pekko.remote.serialization.ProtobufSerializer"
      myown = "docs.serialization.MyOwnSerializer"
    }

    serialization-bindings {
      "docs.serialization.JsonSerializable" = jackson-json
      "docs.serialization.CborSerializable" = jackson-cbor
      "com.google.protobuf.Message" = proto
      "docs.serialization.MyOwnSerializable" = myown
    }
  }
}

You only need to specify the name of a traitan interface or abstract base class of the messages. In case of ambiguity, i.e. the message implements several of the configured classes, the most specific configured class will be used, i.e. the one of which all other candidates are superclasses. If this condition cannot be met, because e.g. two marker traitsinterfaces that have been configured for serialization both apply and neither is a subtype of the other, a warning will be issued.

Note

If you are using Scala for your message protocol and your messages are contained inside of a Scala object, then in order to reference those messages, you will need to use the fully qualified Java class name. For a message named Message contained inside the Scala object named Wrapper you would need to reference it as Wrapper$Message instead of Wrapper.Message.

Pekko provides serializers for several primitive types and protobuf com.google.protobuf.GeneratedMessage (protobuf2) and com.google.protobuf.GeneratedMessageV3 (protobuf3) by default (the latter only if depending on the pekko-remote module), so normally you don’t need to add configuration for that if you send raw protobuf messages as actor messages.

Programmatic

If you want to programmatically serialize/deserialize using Pekko Serialization, here are some examples:

Scala
sourceimport org.apache.pekko
import pekko.actor._
import pekko.actor.typed.scaladsl.Behaviors
import pekko.cluster.Cluster
import pekko.serialization._
Java
sourceimport org.apache.pekko.actor.*;
import org.apache.pekko.serialization.*;

import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
Scala
sourceval system = ActorSystem("example")

// Get the Serialization Extension
val serialization = SerializationExtension(system)

// Have something to serialize
val original = "woohoo"

// Turn it into bytes, and retrieve the serializerId and manifest, which are needed for deserialization
val bytes = serialization.serialize(original).get
val serializerId = serialization.findSerializerFor(original).identifier
val manifest = Serializers.manifestFor(serialization.findSerializerFor(original), original)

// Turn it back into an object
val back = serialization.deserialize(bytes, serializerId, manifest).get
Java
sourceActorSystem system = ActorSystem.create("example");

// Get the Serialization Extension
Serialization serialization = SerializationExtension.get(system);

// Have something to serialize
String original = "woohoo";

// Turn it into bytes, and retrieve the serializerId and manifest, which are needed for
// deserialization
byte[] bytes = serialization.serialize(original).get();
int serializerId = serialization.findSerializerFor(original).identifier();
String manifest = Serializers.manifestFor(serialization.findSerializerFor(original), original);

// Turn it back into an object
String back = (String) serialization.deserialize(bytes, serializerId, manifest).get();

The manifest is a type hint so that the same serializer can be used for different classes.

Note that when deserializing from bytes the manifest and the identifier of the serializer are needed. It is important to use the serializer identifier in this way to support rolling updates, where the serialization-bindings for a class may have changed from one serializer to another. Therefore the three parts consisting of the bytes, the serializer id, and the manifest should always be transferred or stored together so that they can be deserialized with different serialization-bindings configuration.

The SerializationExtensionSerializationExtension is a Classic ExtensionExtension, but it can be used with an actor.typed.ActorSystemactor.typed.ActorSystem like this:

Scala
sourceimport org.apache.pekko.actor.typed.ActorSystem

val system = ActorSystem(Behaviors.empty, "example")

// Get the Serialization Extension
val serialization = SerializationExtension(system)
Java
sourceorg.apache.pekko.actor.typed.ActorSystem<Void> system =
    org.apache.pekko.actor.typed.ActorSystem.create(Behaviors.empty(), "example");

// Get the Serialization Extension
Serialization serialization = SerializationExtension.get(system);

Customization

The first code snippet on this page contains a configuration file that references a custom serializer docs.serialization.MyOwnSerializer. How would we go about creating such a custom serializer?

Creating new Serializers

A custom Serializer has to inherit from serialization.Serializerserialization.Serializerserialization.JSerializerserialization.JSerializer and can be defined like the following:

Scala
sourceimport org.apache.pekko
import pekko.actor._
import pekko.actor.typed.scaladsl.Behaviors
import pekko.cluster.Cluster
import pekko.serialization._
Java
sourceimport org.apache.pekko.actor.*;
import org.apache.pekko.serialization.*;

import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
Scala
sourceclass MyOwnSerializer extends Serializer {

  // If you need logging here, introduce a constructor that takes an ExtendedActorSystem.
  // class MyOwnSerializer(actorSystem: ExtendedActorSystem) extends Serializer
  // Get a logger using:
  // private val logger = Logging(actorSystem, this)

  // This is whether "fromBinary" requires a "clazz" or not
  def includeManifest: Boolean = true

  // Pick a unique identifier for your Serializer,
  // you've got a couple of billions to choose from,
  // 0 - 40 is reserved by Pekko itself
  def identifier = 1234567

  // "toBinary" serializes the given object to an Array of Bytes
  def toBinary(obj: AnyRef): Array[Byte] = {
    // Put the code that serializes the object here
    // #...
    Array[Byte]()
    // #...
  }

  // "fromBinary" deserializes the given array,
  // using the type hint (if any, see "includeManifest" above)
  def fromBinary(bytes: Array[Byte], clazz: Option[Class[_]]): AnyRef = {
    // Put your code that deserializes here
    // #...
    null
    // #...
  }
}
Java
sourcestatic class MyOwnSerializer extends JSerializer {

  // If you need logging here, introduce a constructor that takes an ExtendedActorSystem.
  // public MyOwnSerializer(ExtendedActorSystem actorSystem)
  // Get a logger using:
  // private final LoggingAdapter logger = Logging.getLogger(actorSystem, this);

  // This is whether "fromBinary" requires a "clazz" or not
  @Override
  public boolean includeManifest() {
    return false;
  }

  // Pick a unique identifier for your Serializer,
  // you've got a couple of billions to choose from,
  // 0 - 40 is reserved by Pekko itself
  @Override
  public int identifier() {
    return 1234567;
  }

  // "toBinary" serializes the given object to an Array of Bytes
  @Override
  public byte[] toBinary(Object obj) {
    // Put the code that serializes the object here
    // #...
    return new byte[0];
    // #...
  }

  // "fromBinary" deserializes the given array,
  // using the type hint (if any, see "includeManifest" above)
  @Override
  public Object fromBinaryJava(byte[] bytes, Class<?> clazz) {
    // Put your code that deserializes here
    // #...
    return null;
    // #...
  }
}

The identifier must be unique. The identifier is used when selecting which serializer to use for deserialization. If you have accidentally configured several serializers with the same identifier that will be detected and prevent the ActorSystemActorSystem from being started. It can be a hardcoded value because it must remain the same value to support rolling updates.

If you prefer to define the identifier in cofiguration that is supported by the BaseSerializerBaseSerializer trait, which implements the def identifier by reading it from configuration based on the serializer’s class name:

Scala
sourcepekko {
  actor {
    serialization-identifiers {
      "docs.serialization.MyOwnSerializer" = 1234567
    }
  }
}

The manifest is a type hint so that the same serializer can be used for different classes. The manifest parameter in fromBinaryfromBinaryJava is the class of the object that was serialized. In fromBinaryfromBinaryJava you can match on the class and deserialize the bytes to different objects.

Then you only need to fill in the blanks, bind it to a name in your configuration and list which classes should be deserialized with it.

The serializers are initialized eagerly by the SerializationExtensionSerializationExtension when the ActorSystemActorSystem is started and therefore a serializer itself must not access the SerializationExtension from its constructor. Instead, it should access the SerializationExtension lazily.

Serializer with String Manifest

The Serializer illustrated above supports a class based manifest (type hint). For serialization of data that need to evolve over time the SerializerWithStringManifestSerializerWithStringManifest is recommended instead of Serializer because the manifest (type hint) is a String instead of a Class. That means that the class can be moved/removed and the serializer can still deserialize old data by matching on the String. This is especially useful for Persistence.

The manifest string can also encode a version number that can be used in fromBinaryfromBinaryJava to deserialize in different ways to migrate old data to new domain objects.

If the data was originally serialized with Serializer and in a later version of the system you change to SerializerWithStringManifest then the manifest string will be the full class name if you used includeManifest=true, otherwise it will be the empty string.

This is how a SerializerWithStringManifest looks like:

Scala
sourceclass MyOwnSerializer2 extends SerializerWithStringManifest {

  val CustomerManifest = "customer"
  val UserManifest = "user"
  val UTF_8 = StandardCharsets.UTF_8.name()

  // Pick a unique identifier for your Serializer,
  // you've got a couple of billions to choose from,
  // 0 - 40 is reserved by Pekko itself
  def identifier = 1234567

  // The manifest (type hint) that will be provided in the fromBinary method
  // Use `""` if manifest is not needed.
  def manifest(obj: AnyRef): String =
    obj match {
      case _: Customer => CustomerManifest
      case _: User     => UserManifest
    }

  // "toBinary" serializes the given object to an Array of Bytes
  def toBinary(obj: AnyRef): Array[Byte] = {
    // Put the real code that serializes the object here
    obj match {
      case Customer(name) => name.getBytes(UTF_8)
      case User(name)     => name.getBytes(UTF_8)
    }
  }

  // "fromBinary" deserializes the given array,
  // using the type hint
  def fromBinary(bytes: Array[Byte], manifest: String): AnyRef = {
    // Put the real code that deserializes here
    manifest match {
      case CustomerManifest =>
        Customer(new String(bytes, UTF_8))
      case UserManifest =>
        User(new String(bytes, UTF_8))
    }
  }
}
Java
sourcestatic class MyOwnSerializer2 extends SerializerWithStringManifest {

  private static final String CUSTOMER_MANIFEST = "customer";
  private static final String USER_MANIFEST = "user";
  private static final Charset UTF_8 = StandardCharsets.UTF_8;

  // Pick a unique identifier for your Serializer,
  // you've got a couple of billions to choose from,
  // 0 - 40 is reserved by Pekko itself
  @Override
  public int identifier() {
    return 1234567;
  }

  @Override
  public String manifest(Object obj) {
    if (obj instanceof Customer) return CUSTOMER_MANIFEST;
    else if (obj instanceof User) return USER_MANIFEST;
    else throw new IllegalArgumentException("Unknown type: " + obj);
  }

  // "toBinary" serializes the given object to an Array of Bytes
  @Override
  public byte[] toBinary(Object obj) {
    // Put the real code that serializes the object here
    if (obj instanceof Customer) return ((Customer) obj).name.getBytes(UTF_8);
    else if (obj instanceof User) return ((User) obj).name.getBytes(UTF_8);
    else throw new IllegalArgumentException("Unknown type: " + obj);
  }

  // "fromBinary" deserializes the given array,
  // using the type hint
  @Override
  public Object fromBinary(byte[] bytes, String manifest) {
    // Put the real code that deserializes here
    if (manifest.equals(CUSTOMER_MANIFEST)) return new Customer(new String(bytes, UTF_8));
    else if (manifest.equals(USER_MANIFEST)) return new User(new String(bytes, UTF_8));
    else throw new IllegalArgumentException("Unknown manifest: " + manifest);
  }
}

You must also bind it to a name in your configuration and then list which classes should be serialized by it.

It’s recommended to throw IllegalArgumentException or NotSerializableException in fromBinary if the manifest is unknown. This makes it possible to introduce new message types and send them to nodes that don’t know about them. This is typically needed when performing rolling updates, i.e. running a cluster with mixed versions for a while. Those exceptions are treated as a transient problem in the classic remoting layer. The problem will be logged and the message dropped. Other exceptions will tear down the TCP connection because it can be an indication of corrupt bytes from the underlying transport. Artery TCP handles all deserialization exceptions as transient problems.

Serializing ActorRefs

Actor references are typically included in the messages. All ActorRefs are serializable when using Serialization with Jackson, but in case you are writing your own serializer, you might want to know how to serialize and deserialize them properly.

To serialize actor references to/from string representation you would use the ActorRefResolverActorRefResolver.

For example here’s how a serializer could look for Ping and Pong messages:

Scala
sourceclass PingSerializer(system: ExtendedActorSystem) extends SerializerWithStringManifest {
  private val actorRefResolver = ActorRefResolver(system.toTyped)

  private val PingManifest = "a"
  private val PongManifest = "b"

  override def identifier = 41

  override def manifest(msg: AnyRef) = msg match {
    case _: PingService.Ping => PingManifest
    case PingService.Pong    => PongManifest
    case _ =>
      throw new IllegalArgumentException(s"Can't serialize object of type ${msg.getClass} in [${getClass.getName}]")
  }

  override def toBinary(msg: AnyRef) = msg match {
    case PingService.Ping(who) =>
      actorRefResolver.toSerializationFormat(who).getBytes(StandardCharsets.UTF_8)
    case PingService.Pong =>
      Array.emptyByteArray
    case _ =>
      throw new IllegalArgumentException(s"Can't serialize object of type ${msg.getClass} in [${getClass.getName}]")
  }

  override def fromBinary(bytes: Array[Byte], manifest: String) = {
    manifest match {
      case PingManifest =>
        val str = new String(bytes, StandardCharsets.UTF_8)
        val ref = actorRefResolver.resolveActorRef[PingService.Pong.type](str)
        PingService.Ping(ref)
      case PongManifest =>
        PingService.Pong
      case _ =>
        throw new IllegalArgumentException(s"Unknown manifest [$manifest]")
    }
  }
}
Java
sourcepublic class PingSerializer extends SerializerWithStringManifest {

  final ExtendedActorSystem system;
  final ActorRefResolver actorRefResolver;

  static final String PING_MANIFEST = "a";
  static final String PONG_MANIFEST = "b";

  PingSerializer(ExtendedActorSystem system) {
    this.system = system;
    actorRefResolver = ActorRefResolver.get(Adapter.toTyped(system));
  }

  @Override
  public int identifier() {
    return 97876;
  }

  @Override
  public String manifest(Object obj) {
    if (obj instanceof Ping) return PING_MANIFEST;
    else if (obj instanceof Pong) return PONG_MANIFEST;
    else
      throw new IllegalArgumentException(
          "Can't serialize object of type "
              + obj.getClass()
              + " in ["
              + getClass().getName()
              + "]");
  }

  @Override
  public byte[] toBinary(Object obj) {
    if (obj instanceof Ping)
      return actorRefResolver
          .toSerializationFormat(((Ping) obj).replyTo)
          .getBytes(StandardCharsets.UTF_8);
    else if (obj instanceof Pong) return new byte[0];
    else
      throw new IllegalArgumentException(
          "Can't serialize object of type "
              + obj.getClass()
              + " in ["
              + getClass().getName()
              + "]");
  }

  @Override
  public Object fromBinary(byte[] bytes, String manifest) {
    if (PING_MANIFEST.equals(manifest)) {
      String str = new String(bytes, StandardCharsets.UTF_8);
      ActorRef<Pong> ref = actorRefResolver.resolveActorRef(str);
      return new Ping(ref);
    } else if (PONG_MANIFEST.equals(manifest)) {
      return new Pong();
    } else {
      throw new IllegalArgumentException("Unable to handle manifest: " + manifest);
    }
  }
}

Serialization of Classic ActorRefActorRef is described in Classic Serialization. Classic and Typed actor references have the same serialization format so they can be interchanged.

Deep serialization of Actors

The recommended approach to do deep serialization of internal actor state is to use Pekko Persistence.

Serialization of Pekko’s messages

Pekko is using a Protobuf 3 for serialization of messages defined by Pekko. This dependency is shaded in the pekko-protobuf-v3 artifact so that applications can use another version of Protobuf.

Applications should use standard Protobuf dependency and not pekko-protobuf-v3.

Java serialization

Java serialization is known to be slow and prone to attacks of various kinds - it never was designed for high throughput messaging after all. One may think that network bandwidth and latency limit the performance of remote messaging, but serialization is a more typical bottleneck.

Note

Pekko serialization with Java serialization is disabled by default and Pekko itself doesn’t use Java serialization for any of its internal messages. It is highly discouraged to enable Java serialization in production.

The log messages emitted by the disabled Java serializer in production SHOULD be treated as potential attacks which the serializer prevented, as they MAY indicate an external operator attempting to send malicious messages intending to use java serialization as attack vector. The attempts are logged with the SECURITY marker.

However, for early prototyping it is very convenient to use. For that reason and for compatibility with older systems that rely on Java serialization it can be enabled with the following configuration:

pekko.actor.allow-java-serialization = on

Pekko will still log warning when Java serialization is used and to silent that you may add:

pekko.actor.warn-about-java-serializer-usage = off

Java serialization compatibility

It is not safe to mix major Scala versions when using the Java serialization as Scala does not guarantee compatibility and this could lead to very surprising errors.

Rolling updates

A serialized remote message (or persistent event) consists of serializer-id, the manifest, and the binary payload. When deserializing it is only looking at the serializer-id to pick which SerializerJSerializer to use for fromBinaryfromBinaryJava. The message class (the bindings) is not used for deserialization. The manifest is only used within the SerializerJSerializer to decide how to deserialize the payload, so one SerializerJSerializer can handle many classes.

That means that it is possible to change serialization for a message by performing two rolling update steps to switch to the new serializer.

  1. Add the SerializerJSerializer class and define it in pekko.actor.serializers config section, but not in pekko.actor.serialization-bindings. Perform a rolling update for this change. This means that the serializer class exists on all nodes and is registered, but it is still not used for serializing any messages. That is important because during the rolling update the old nodes still don’t know about the new serializer and would not be able to deserialize messages with that format.

  2. The second change is to register that the serializer is to be used for certain classes by defining those in the pekko.actor.serialization-bindings config section. Perform a rolling update for this change. This means that new nodes will use the new serializer when sending messages and old nodes will be able to deserialize the new format. Old nodes will continue to use the old serializer when sending messages and new nodes will be able to deserialize the old format.

As an optional third step the old serializer can be completely removed if it was not used for persistent events. It must still be possible to deserialize the events that were stored with the old serializer.

Verification

Normally, messages sent between local actors (i.e. same JVM) do not undergo serialization. For testing, sometimes, it may be desirable to force serialization on all messages (both remote and local). If you want to do this in order to verify that your messages are serializable you can enable the following config option:

sourcepekko {
  actor {
    serialize-messages = on
  }
}

Certain messages can be excluded from verification by extending the marker traitinterface actor.NoSerializationVerificationNeededactor.NoSerializationVerificationNeeded or define a class name prefix in configuration pekko.actor.no-serialization-verification-needed-class-prefix.

If you want to verify that your PropsProps are serializable you can enable the following config option:

sourcepekko {
  actor {
    serialize-creators = on
  }
}
Warning

We recommend having these config options turned on only when you’re running tests. Turning these options on in production is pointless, as it would negatively impact the performance of local message passing without giving any gain.