I/O

Dependency

To use I/O, you must add the following dependency in your project:

sbt
val PekkoVersion = "1.0.3"
libraryDependencies += "org.apache.pekko" %% "pekko-actor" % PekkoVersion
Maven
<properties>
  <scala.binary.version>2.13</scala.binary.version>
</properties>
<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>org.apache.pekko</groupId>
      <artifactId>pekko-bom_${scala.binary.version}</artifactId>
      <version>1.0.3</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>
<dependencies>
  <dependency>
    <groupId>org.apache.pekko</groupId>
    <artifactId>pekko-actor_${scala.binary.version}</artifactId>
  </dependency>
</dependencies>
Gradle
def versions = [
  ScalaBinary: "2.13"
]
dependencies {
  implementation platform("org.apache.pekko:pekko-bom_${versions.ScalaBinary}:1.0.3")

  implementation "org.apache.pekko:pekko-actor_${versions.ScalaBinary}"
}

Introduction

The pekko.io package design combines experiences from the spray-io module with improvements that were jointly developed for more general consumption as an actor-based service.

The guiding design goal for this I/O implementation was to reach extreme scalability, make no compromises in providing an API correctly matching the underlying transport mechanism and to be fully event-driven, non-blocking and asynchronous. The API is meant to be a solid foundation for the implementation of network protocols and building higher abstractions; it is not meant to be a full-service high-level NIO wrapper for end users.

Terminology, Concepts

The I/O API is completely actor based, meaning that all operations are implemented with message passing instead of direct method calls. Every I/O driver (TCP, UDP) has a special actor, called a manager that serves as an entry point for the API. I/O is broken into several drivers. The manager for a particular driver is accessible through the IO entry pointby querying an ActorSystem. For example the following code looks up the TCP manager and returns its ActorRefActorRef:

Scala
sourceimport org.apache.pekko.io.{ IO, Tcp }
import context.system // implicitly used by IO(Tcp)

val manager = IO(Tcp)
Java
sourcefinal ActorRef tcpManager = Tcp.get(getContext().getSystem()).manager();

The manager receives I/O command messages and instantiates worker actors in response. The worker actors present themselves to the API user in the reply to the command that was sent. For example after a ConnectConnect command sent to the TCP manager the manager creates an actor representing the TCP connection. All operations related to the given TCP connections can be invoked by sending messages to the connection actor which announces itself by sending a ConnectedConnected message.

DeathWatch and Resource Management

I/O worker actors receive commands and also send out events. They usually need a user-side counterpart actor listening for these events (such events could be inbound connections, incoming bytes or acknowledgements for writes). These worker actors watch their listener counterparts. If the listener stops then the worker will automatically release any resources that it holds. This design makes the API more robust against resource leaks.

Thanks to the completely actor based approach of the I/O API the opposite direction works as well: a user actor responsible for handling a connection can watch the connection actor to be notified if it unexpectedly terminates.

Write models (Ack, Nack)

I/O devices have a maximum throughput which limits the frequency and size of writes. When an application tries to push more data than a device can handle, the driver has to buffer bytes until the device is able to write them. With buffering it is possible to handle short bursts of intensive writes — but no buffer is infinite. “Flow control” is needed to avoid overwhelming device buffers.

Pekko supports two types of flow control:

  • Ack-based, where the driver notifies the writer when writes have succeeded.
  • Nack-based, where the driver notifies the writer when writes have failed.

Each of these models is available in both the TCP and the UDP implementations of Pekko I/O.

Individual writes can be acknowledged by providing an ack object in the write message (WriteWrite in the case of TCP and SendSend for UDP). When the write is complete the worker will send the ack object to the writing actor. This can be used to implement ack-based flow control; sending new data only when old data has been acknowledged.

If a write (or any other command) fails, the driver notifies the actor that sent the command with a special message (CommandFailed in the case of UDP and TCP). This message will also notify the writer of a failed write, serving as a nack for that write. Please note, that in a nack-based flow-control setting the writer has to be prepared for the fact that the failed write might not be the most recent write it sent. For example, the failure notification for a write W1 might arrive after additional write commands W2 and W3 have been sent. If the writer wants to resend any nacked messages it may need to keep a buffer of pending messages.

Warning

An acknowledged write does not mean acknowledged delivery or storage; receiving an ack for a write signals that the I/O driver has successfully processed the write. The Ack/Nack protocol described here is a means of flow control not error handling. In other words, data may still be lost, even if every write is acknowledged.

ByteString

To maintain isolation, actors should communicate with immutable objects only. ByteStringByteString is an immutable container for bytes. It is used by Pekko’s I/O system as an efficient, immutable alternative the traditional byte containers used for I/O on the JVM, such as Array[Byte]byte[] and ByteBuffer.

ByteString is a rope-like data structure that is immutable and provides fast concatenation and slicing operations (perfect for I/O). When two ByteStrings are concatenated together they are both stored within the resulting ByteString instead of copying both to a new Arrayarray. Operations such as dropdrop and taketake return ByteStrings that still reference the original Arrayarray, but just change the offset and length that is visible. Great care has also been taken to make sure that the internal Arrayarray cannot be modified. Whenever a potentially unsafe Arrayarray is used to create a new ByteString a defensive copy is created. If you require a ByteString that only blocks as much memory as necessary for its content, use the compactcompact method to get a CompactByteStringCompactByteString instance. If the ByteString represented only a slice of the original array, this will result in copying all bytes in that slice.

ByteString inherits all methods from IndexedSeq, and it also has some new ones. For more information, look up the util.ByteStringutil.ByteString class and its companion object in the ScalaDoc.

ByteString also comes with its own optimized builder and iterator classes ByteStringBuilderByteStringBuilder and ByteIteratorByteIterator which provide extra features in addition to those of normal builders and iterators.

Compatibility with java.io

A ByteStringBuilderByteStringBuilder can be wrapped in a java.io.OutputStream via the asOutputStreamasOutputStream method. Likewise, ByteIteratorByteIterator can be wrapped in a java.io.InputStream via asInputStreamasInputStream. Using these, pekko.io applications can integrate legacy code based on java.io streams.

Architecture in-depth

For further details on the design and internal architecture see I/O Layer Design.