I/O
Dependency
To use I/O, you must add the following dependency in your project:
- sbt
val PekkoVersion = "1.1.2" libraryDependencies += "org.apache.pekko" %% "pekko-actor" % PekkoVersion
- Maven
<properties> <scala.binary.version>2.13</scala.binary.version> </properties> <dependencyManagement> <dependencies> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-bom_${scala.binary.version}</artifactId> <version>1.1.2</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependencies> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-actor_${scala.binary.version}</artifactId> </dependency> </dependencies>
- Gradle
def versions = [ ScalaBinary: "2.13" ] dependencies { implementation platform("org.apache.pekko:pekko-bom_${versions.ScalaBinary}:1.1.2") implementation "org.apache.pekko:pekko-actor_${versions.ScalaBinary}" }
Introduction
The pekko.io
package design combines experiences from the spray-io
module with improvements that were jointly developed for more general consumption as an actor-based service.
The guiding design goal for this I/O implementation was to reach extreme scalability, make no compromises in providing an API correctly matching the underlying transport mechanism and to be fully event-driven, non-blocking and asynchronous. The API is meant to be a solid foundation for the implementation of network protocols and building higher abstractions; it is not meant to be a full-service high-level NIO wrapper for end users.
Terminology, Concepts
The I/O API is completely actor based, meaning that all operations are implemented with message passing instead of direct method calls. Every I/O driver (TCP, UDP) has a special actor, called a manager that serves as an entry point for the API. I/O is broken into several drivers. The manager for a particular driver is accessible through the IO
entry pointby querying an ActorSystem
. For example the following code looks up the TCP manager and returns its ActorRef
ActorRef
:
- Scala
-
source
import org.apache.pekko.io.{ IO, Tcp } import context.system // implicitly used by IO(Tcp) val manager = IO(Tcp)
- Java
-
source
final ActorRef tcpManager = Tcp.get(getContext().getSystem()).manager();
The manager receives I/O command messages and instantiates worker actors in response. The worker actors present themselves to the API user in the reply to the command that was sent. For example after a Connect
Connect
command sent to the TCP manager the manager creates an actor representing the TCP connection. All operations related to the given TCP connections can be invoked by sending messages to the connection actor which announces itself by sending a Connected
Connected
message.
DeathWatch and Resource Management
I/O worker actors receive commands and also send out events. They usually need a user-side counterpart actor listening for these events (such events could be inbound connections, incoming bytes or acknowledgements for writes). These worker actors watch their listener counterparts. If the listener stops then the worker will automatically release any resources that it holds. This design makes the API more robust against resource leaks.
Thanks to the completely actor based approach of the I/O API the opposite direction works as well: a user actor responsible for handling a connection can watch the connection actor to be notified if it unexpectedly terminates.
Write models (Ack, Nack)
I/O devices have a maximum throughput which limits the frequency and size of writes. When an application tries to push more data than a device can handle, the driver has to buffer bytes until the device is able to write them. With buffering it is possible to handle short bursts of intensive writes — but no buffer is infinite. “Flow control” is needed to avoid overwhelming device buffers.
Pekko supports two types of flow control:
- Ack-based, where the driver notifies the writer when writes have succeeded.
- Nack-based, where the driver notifies the writer when writes have failed.
Each of these models is available in both the TCP and the UDP implementations of Pekko I/O.
Individual writes can be acknowledged by providing an ack object in the write message (Write
Write
in the case of TCP and Send
Send
for UDP). When the write is complete the worker will send the ack object to the writing actor. This can be used to implement ack-based flow control; sending new data only when old data has been acknowledged.
If a write (or any other command) fails, the driver notifies the actor that sent the command with a special message (CommandFailed
in the case of UDP and TCP). This message will also notify the writer of a failed write, serving as a nack for that write. Please note, that in a nack-based flow-control setting the writer has to be prepared for the fact that the failed write might not be the most recent write it sent. For example, the failure notification for a write W1
might arrive after additional write commands W2
and W3
have been sent. If the writer wants to resend any nacked messages it may need to keep a buffer of pending messages.
An acknowledged write does not mean acknowledged delivery or storage; receiving an ack for a write signals that the I/O driver has successfully processed the write. The Ack/Nack protocol described here is a means of flow control not error handling. In other words, data may still be lost, even if every write is acknowledged.
ByteString
To maintain isolation, actors should communicate with immutable objects only. ByteString
ByteString
is an immutable container for bytes. It is used by Pekko’s I/O system as an efficient, immutable alternative the traditional byte containers used for I/O on the JVM, such as Array
[Byte
]byte[]
and ByteBuffer
.
ByteString
is a rope-like data structure that is immutable and provides fast concatenation and slicing operations (perfect for I/O). When two ByteString
s are concatenated together they are both stored within the resulting ByteString
instead of copying both to a new Array
array. Operations such as drop
drop
and take
take
return ByteString
s that still reference the original Array
array, but just change the offset and length that is visible. Great care has also been taken to make sure that the internal Array
array cannot be modified. Whenever a potentially unsafe Array
array is used to create a new ByteString
a defensive copy is created. If you require a ByteString
that only blocks as much memory as necessary for its content, use the compact
compact
method to get a CompactByteString
CompactByteString
instance. If the ByteString
represented only a slice of the original array, this will result in copying all bytes in that slice.
ByteString
inherits all methods from IndexedSeq
, and it also has some new ones. For more information, look up the util.ByteString
util.ByteString
class and its companion object
in the ScalaDoc.
ByteString
also comes with its own optimized builder and iterator classes ByteStringBuilder
ByteStringBuilder
and ByteIterator
ByteIterator
which provide extra features in addition to those of normal builders and iterators.
Compatibility with java.io
A ByteStringBuilder
ByteStringBuilder
can be wrapped in a java.io.OutputStream
via the asOutputStream
asOutputStream
method. Likewise, ByteIterator
ByteIterator
can be wrapped in a java.io.InputStream
via asInputStream
asInputStream
. Using these, pekko.io
applications can integrate legacy code based on java.io
streams.
Architecture in-depth
For further details on the design and internal architecture see I/O Layer Design.