Google Cloud Pub/Sub gRPC

Note

Google Cloud Pub/Sub provides many-to-many, asynchronous messaging that decouples senders and receivers.

Further information at the official Google Cloud documentation website.

This connector communicates to Pub/Sub via the gRPC protocol. The integration between Apache Pekko Stream and gRPC is handled by Apache Pekko gRPC 1.0. For a connector that uses HTTP for the communication, take a look at the alternative Apache Pekko Connectors Google Cloud Pub/Sub connector.

Project Info: Apache Pekko Connectors Google Cloud PubSub (gRPC)
Artifact
org.apache.pekko
pekko-connectors-google-cloud-pub-sub-grpc
1.0.2
JDK versions
OpenJDK 8
OpenJDK 11
OpenJDK 17
Scala versions2.13.13, 2.12.19, 3.3.3
JPMS module namepekko.stream.connectors.google.cloud.pubsub.grpc
License
API documentation
Forums
Release notesGitHub releases
IssuesGithub issues
Sourceshttps://github.com/apache/pekko-connectors

Artifacts

Apache Pekko gRPC uses Apache Pekko Discovery internally. Make sure to add Apache Pekko Discovery with the same Apache Pekko version that the application uses.

sbt
val PekkoVersion = "1.0.2"
libraryDependencies ++= Seq(
  "org.apache.pekko" %% "pekko-connectors-google-cloud-pub-sub-grpc" % "1.0.2",
  "org.apache.pekko" %% "pekko-stream" % PekkoVersion,
  "org.apache.pekko" %% "pekko-discovery" % PekkoVersion
)
Maven
<properties>
  <pekko.version>1.0.2</pekko.version>
  <scala.binary.version>2.13</scala.binary.version>
</properties>
<dependencies>
  <dependency>
    <groupId>org.apache.pekko</groupId>
    <artifactId>pekko-connectors-google-cloud-pub-sub-grpc_${scala.binary.version}</artifactId>
    <version>1.0.2</version>
  </dependency>
  <dependency>
    <groupId>org.apache.pekko</groupId>
    <artifactId>pekko-stream_${scala.binary.version}</artifactId>
    <version>${pekko.version}</version>
  </dependency>
  <dependency>
    <groupId>org.apache.pekko</groupId>
    <artifactId>pekko-discovery_${scala.binary.version}</artifactId>
    <version>${pekko.version}</version>
  </dependency>
</dependencies>
Gradle
def versions = [
  PekkoVersion: "1.0.2",
  ScalaBinary: "2.13"
]
dependencies {
  implementation "org.apache.pekko:pekko-connectors-google-cloud-pub-sub-grpc_${versions.ScalaBinary}:1.0.2"
  implementation "org.apache.pekko:pekko-stream_${versions.ScalaBinary}:${versions.PekkoVersion}"
  implementation "org.apache.pekko:pekko-discovery_${versions.ScalaBinary}:${versions.PekkoVersion}"
}

The table below shows direct dependencies of this module and the second tab shows all libraries it depends on transitively.

Binary compatibility

Warning

This connector contains code generated from Protobuf files which is bound to Apache Pekko gRPC 1.0. This makes it NOT binary-compatible with later versions of Apache Pekko gRPC. You can not use a different version of Apache Pekko gRPC within the same JVM instance.

Build setup

The Apache Pekko Connectors Google Cloud Pub/Sub gRPC library contains the classes generated from Google’s protobuf specification.

ALPN on JDK 8

HTTP/2 requires ALPN negotiation, which comes with the JDK starting with version 8u251.

For older versions of the JDK you will need to load the jetty-alpn-agent yourself, but we recommend upgrading.

Configuration

The Pub/Sub gRPC connector shares its basic configuration with all the Google connectors in Apache Pekko Connectors. Additional Pub/Sub-specific configuration settings can be found in its own reference.conf.

The defaults can be changed (for example when testing against the emulator) by tweaking the reference configuration:

reference.conf
source# SPDX-License-Identifier: Apache-2.0

pekko.connectors.google.credentials.default-scopes = ${?pekko.connectors.google.credentials.default-scopes} ["https://www.googleapis.com/auth/pubsub"]

pekko.connectors.google.cloud.pubsub.grpc {
  host = "pubsub.googleapis.com"
  port = 443
  # Set to "false" to disable TLS
  use-tls = true

  # Set to "none" to use the system default CA
  rootCa = "none"

  # Deprecated, use config path pekko.connectors.google.credentials.provider
  callCredentials = deprecated
}
Test Configuration
source# SPDX-License-Identifier: Apache-2.0

pekko {
  loggers = ["org.apache.pekko.event.slf4j.Slf4jLogger"]
  logging-filter = "org.apache.pekko.event.slf4j.Slf4jLoggingFilter"
  loglevel = "DEBUG"
}

pekko.connectors.google.cloud.pubsub.grpc {
  # To run the IntegrationSpec against Google Cloud:
  # * go to the console at https://console.cloud.google.com
  # * Create a compute engine service account as documented at https://cloud.google.com/docs/authentication/production#creating_a_service_account
  # * Point GOOGLE_APPLICATION_CREDENTIALS to the downloaded JSON key and start sbt
  # * Create a project, and update IntegrationSpec to use that project ID rather than "pekko-connectors"
  # * Under 'Pub/Sub', 'Topics' create a topic 'simpleTopic' with a Google-managed key
  # * Under 'Pub/Sub', 'Subscriptions' create a subscription 'simpleSubscription' for this topic
  # * For 'republish', also create 'testTopic' and 'testSubscription'
  # * Comment out these test settings:

  host = "localhost"
  port = 8538
  use-tls = false # no TLS
  rootCa = "none"
  callCredentials = "none" # no authentication
}

For more configuration details consider the underlying configuration for Apache Pekko gRPC.

A manually initialized GrpcPublisherGrpcPublisher or GrpcSubscriberGrpcSubscriber can be used by providing it as an attribute to the stream:

Scala
sourceval settings = PubSubSettings(system)
val publisher = GrpcPublisher(settings)

val publishFlow: Flow[PublishRequest, PublishResponse, NotUsed] =
  GooglePubSub
    .publish(parallelism = 1)
    .withAttributes(PubSubAttributes.publisher(publisher))
Java
sourcefinal PubSubSettings settings = PubSubSettings.create(system);
final GrpcPublisher publisher = GrpcPublisher.create(settings, system);

final Flow<PublishRequest, PublishResponse, NotUsed> publishFlow =
    GooglePubSub.publish(1).withAttributes(PubSubAttributes.publisher(publisher));

Publishing

We first construct a message and then a request using Google’s builders. We declare a singleton source which will go via our publishing flow. All messages sent to the flow are published to PubSub.

Scala
sourceimport org.apache.pekko
import pekko.NotUsed
import pekko.stream.connectors.googlecloud.pubsub.grpc.scaladsl.GooglePubSub
import pekko.stream.scaladsl._

import com.google.protobuf.ByteString
import com.google.pubsub.v1.pubsub._

import scala.concurrent.Future

val projectId = "pekko-connectors"
val topic = "simpleTopic"

val publishMessage: PubsubMessage =
  PubsubMessage()
    .withData(ByteString.copyFromUtf8("Hello world!"))

val publishRequest: PublishRequest =
  PublishRequest()
    .withTopic(s"projects/$projectId/topics/$topic")
    .addMessages(publishMessage)

val source: Source[PublishRequest, NotUsed] =
  Source.single(publishRequest)

val publishFlow: Flow[PublishRequest, PublishResponse, NotUsed] =
  GooglePubSub.publish(parallelism = 1)

val publishedMessageIds: Future[Seq[PublishResponse]] = source.via(publishFlow).runWith(Sink.seq)
Java
sourceimport org.apache.pekko.stream.connectors.googlecloud.pubsub.grpc.PubSubSettings;
import org.apache.pekko.stream.connectors.googlecloud.pubsub.grpc.javadsl.GooglePubSub;
import org.apache.pekko.stream.connectors.googlecloud.pubsub.grpc.javadsl.GrpcPublisher;
import org.apache.pekko.stream.connectors.googlecloud.pubsub.grpc.javadsl.PubSubAttributes;
import org.apache.pekko.stream.connectors.testkit.javadsl.LogCapturingJunit4;
import org.apache.pekko.stream.javadsl.*;
import com.google.protobuf.ByteString;
import com.google.pubsub.v1.*;

final String projectId = "pekko-connectors";
final String topic = "simpleTopic";

final PubsubMessage publishMessage =
    PubsubMessage.newBuilder().setData(ByteString.copyFromUtf8("Hello world!")).build();

final PublishRequest publishRequest =
    PublishRequest.newBuilder()
        .setTopic("projects/" + projectId + "/topics/" + topic)
        .addMessages(publishMessage)
        .build();

final Source<PublishRequest, NotUsed> source = Source.single(publishRequest);

final Flow<PublishRequest, PublishResponse, NotUsed> publishFlow = GooglePubSub.publish(1);

final CompletionStage<List<PublishResponse>> publishedMessageIds =
    source.via(publishFlow).runWith(Sink.seq(), system);

Similarly to before, we can publish a batch of messages for greater efficiency.

Scala
sourceval projectId = "pekko-connectors"
val topic = "simpleTopic"

val publishMessage: PubsubMessage =
  PubsubMessage()
    .withData(ByteString.copyFromUtf8("Hello world!"))

val messageSource: Source[PubsubMessage, NotUsed] = Source(List(publishMessage, publishMessage))
val published = messageSource
  .groupedWithin(1000, 1.minute)
  .map { msgs =>
    PublishRequest()
      .withTopic(s"projects/$projectId/topics/$topic")
      .addAllMessages(msgs)
  }
  .via(GooglePubSub.publish(parallelism = 1))
  .runWith(Sink.seq)
Java
sourcefinal String projectId = "pekko-connectors";
final String topic = "simpleTopic";

final PubsubMessage publishMessage =
    PubsubMessage.newBuilder().setData(ByteString.copyFromUtf8("Hello world!")).build();

final Source<PubsubMessage, NotUsed> messageSource = Source.single(publishMessage);
final CompletionStage<List<PublishResponse>> published =
    messageSource
        .groupedWithin(1000, Duration.ofMinutes(1))
        .map(
            messages ->
                PublishRequest.newBuilder()
                    .setTopic("projects/" + projectId + "/topics/" + topic)
                    .addAllMessages(messages)
                    .build())
        .via(GooglePubSub.publish(1))
        .runWith(Sink.seq(), system);

Subscribing

To receive messages from a subscription, there are two options: StreamingPullRequests or synchronous PullRequests. To decide whether you should use StreamingPullRequest or PullRequest, see StreamingPull: Dealing with large backlogs of small messages and Synchronous Pull from Google Cloud PubSub’s documentation

StreamingPullRequest

To receive message from a subscription, first we create a StreamingPullRequest with a FQRS of a subscription and a deadline for acknowledgements in seconds. Google requires that only the first StreamingPullRequest has the subscription and the deadline set. This connector takes care of that and clears up the subscription FQRS and the deadline for subsequent StreamingPullRequest messages.

Scala
sourceval projectId = "pekko-connectors"
val subscription = "simpleSubscription"

val request = StreamingPullRequest()
  .withSubscription(s"projects/$projectId/subscriptions/$subscription")
  .withStreamAckDeadlineSeconds(10)

val subscriptionSource: Source[ReceivedMessage, Future[Cancellable]] =
  GooglePubSub.subscribe(request, pollInterval = 1.second)
Java
sourcefinal String projectId = "pekko-connectors";
final String subscription = "simpleSubscription";

final StreamingPullRequest request =
    StreamingPullRequest.newBuilder()
        .setSubscription("projects/" + projectId + "/subscriptions/" + subscription)
        .setStreamAckDeadlineSeconds(10)
        .build();

final Duration pollInterval = Duration.ofSeconds(1);
final Source<ReceivedMessage, CompletableFuture<Cancellable>> subscriptionSource =
    GooglePubSub.subscribe(request, pollInterval);

Here pollInterval is the time between StreamingPullRequests are sent when there are no messages in the subscription.

PullRequest

With PullRequest, each request receives a batch of messages, up to a maximum specified by the maxMessages.

Scala
sourceval projectId = "pekko-connectors"
val subscription = "simpleSubscription"

val request = PullRequest()
  .withSubscription(s"projects/$projectId/subscriptions/$subscription")
  .withMaxMessages(10)

val subscriptionSource: Source[ReceivedMessage, Future[Cancellable]] =
  GooglePubSub.subscribePolling(request, pollInterval = 1.second)
Java
sourcefinal String projectId = "pekko-connectors";
final String subscription = "simpleSubscription";

final PullRequest request =
    PullRequest.newBuilder()
        .setSubscription("projects/" + projectId + "/subscriptions/" + subscription)
        .setMaxMessages(10)
        .build();

final Duration pollInterval = Duration.ofSeconds(1);
final Source<ReceivedMessage, CompletableFuture<Cancellable>> subscriptionSource =
    GooglePubSub.subscribePolling(request, pollInterval);

Here pollInterval is the time between PullRequest messages.

In order to minimise latency between requests you can set a buffer on the source. The buffer size depends on the usual number of messages you receive per each request, if you usually receive the maximum number of messages, it’s a good idea to set the buffer size to be the same as the maxMessages parameter. Please note that by having a buffer, you are allowing messages to spend some of their lease time in the buffer, hence reducing the time to process them before the acknowledgement deadline is reached. This will depend on your acknowledgement deadline and processing time.

Acknowledge

Messages received from the subscription need to be acknowledged or they will be sent again. To do that create AcknowledgeRequest that contains ackIds of the messages to be acknowledged and send them to a sink created by GooglePubSub.acknowledge.

Scala
sourceval ackSink: Sink[AcknowledgeRequest, Future[Done]] =
  GooglePubSub.acknowledge(parallelism = 1)

subscriptionSource
  .map { message =>
    // do something fun
    message.ackId
  }
  .groupedWithin(10, 1.second)
  .map(ids =>
    AcknowledgeRequest()
      .withSubscription(
        s"projects/$projectId/subscriptions/$subscription")
      .withAckIds(ids))
  .to(ackSink)
Java
sourcefinal Sink<AcknowledgeRequest, CompletionStage<Done>> ackSink = GooglePubSub.acknowledge(1);

subscriptionSource
    .map(
        receivedMessage -> {
          // do some computation
          return receivedMessage.getAckId();
        })
    .groupedWithin(10, Duration.ofSeconds(1))
    .map(acks -> AcknowledgeRequest.newBuilder().addAllAckIds(acks).build())
    .to(ackSink);

Running the test code

Note

Integration test code requires Google Cloud Pub/Sub emulator running in the background. You can start it quickly using docker:

docker-compose up -d gcloud-pubsub-client

This will also run the Pub/Sub admin client that will create topics and subscriptions used by the integration tests.

Tests can be started from sbt by running:

sbt
> google-cloud-pub-sub-grpc/test

There is also an ExampleApp that can be used to test publishing to topics and receiving messages from subscriptions.

To run the example app you will need to configure a project and Pub/Sub in Google Cloud and provide your own credentials.

sbt
env GOOGLE_APPLICATION_CREDENTIALS=/path/to/application/credentials.json sbt

// receive messages from a subsciptions
> google-cloud-pub-sub-grpc/Test/run subscribe <project-id> <subscription-name>

// publish a single message to a topic
> google-cloud-pub-sub-grpc/Test/run publish-single <project-id> <topic-name>

// continually publish a message stream to a topic
> google-cloud-pub-sub-grpc/Test/run publish-stream <project-id> <topic-name>