Google Cloud Pub/Sub gRPC
Google Cloud Pub/Sub provides many-to-many, asynchronous messaging that decouples senders and receivers.
Further information at the official Google Cloud documentation website.
This connector communicates to Pub/Sub via the gRPC protocol. The integration between Apache Pekko Stream and gRPC is handled by Apache Pekko gRPC 1.0. For a connector that uses HTTP for the communication, take a look at the alternative Apache Pekko Connectors Google Cloud Pub/Sub connector.
Project Info: Apache Pekko Connectors Google Cloud PubSub (gRPC) | |
---|---|
Artifact | org.apache.pekko
pekko-connectors-google-cloud-pub-sub-grpc
1.0.2
|
JDK versions | OpenJDK 8 OpenJDK 11 OpenJDK 17 |
Scala versions | 2.13.14, 2.12.20, 3.3.3 |
JPMS module name | pekko.stream.connectors.google.cloud.pubsub.grpc |
License | |
API documentation | |
Forums | |
Release notes | GitHub releases |
Issues | Github issues |
Sources | https://github.com/apache/pekko-connectors |
Artifacts
Apache Pekko gRPC uses Apache Pekko Discovery internally. Make sure to add Apache Pekko Discovery with the same Apache Pekko version that the application uses.
- sbt
val PekkoVersion = "1.0.3" libraryDependencies ++= Seq( "org.apache.pekko" %% "pekko-connectors-google-cloud-pub-sub-grpc" % "1.0.2", "org.apache.pekko" %% "pekko-stream" % PekkoVersion, "org.apache.pekko" %% "pekko-discovery" % PekkoVersion )
- Maven
<properties> <pekko.version>1.0.3</pekko.version> <scala.binary.version>2.13</scala.binary.version> </properties> <dependencies> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-connectors-google-cloud-pub-sub-grpc_${scala.binary.version}</artifactId> <version>1.0.2</version> </dependency> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-stream_${scala.binary.version}</artifactId> <version>${pekko.version}</version> </dependency> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-discovery_${scala.binary.version}</artifactId> <version>${pekko.version}</version> </dependency> </dependencies>
- Gradle
def versions = [ PekkoVersion: "1.0.3", ScalaBinary: "2.13" ] dependencies { implementation "org.apache.pekko:pekko-connectors-google-cloud-pub-sub-grpc_${versions.ScalaBinary}:1.0.2" implementation "org.apache.pekko:pekko-stream_${versions.ScalaBinary}:${versions.PekkoVersion}" implementation "org.apache.pekko:pekko-discovery_${versions.ScalaBinary}:${versions.PekkoVersion}" }
The table below shows direct dependencies of this module and the second tab shows all libraries it depends on transitively.
Binary compatibility
This connector contains code generated from Protobuf files which is bound to Apache Pekko gRPC 1.0. This makes it NOT binary-compatible with later versions of Apache Pekko gRPC. You can not use a different version of Apache Pekko gRPC within the same JVM instance.
Build setup
The Apache Pekko Connectors Google Cloud Pub/Sub gRPC library contains the classes generated from Google’s protobuf specification.
HTTP/2 requires ALPN negotiation, which comes with the JDK starting with version 8u251.
For older versions of the JDK you will need to load the jetty-alpn-agent
yourself, but we recommend upgrading.
Configuration
The Pub/Sub gRPC connector shares its basic configuration with all the Google connectors in Apache Pekko Connectors. Additional Pub/Sub-specific configuration settings can be found in its own reference.conf.
The defaults can be changed (for example when testing against the emulator) by tweaking the reference configuration:
- reference.conf
-
source
# SPDX-License-Identifier: Apache-2.0 pekko.connectors.google.credentials.default-scopes = ${?pekko.connectors.google.credentials.default-scopes} ["https://www.googleapis.com/auth/pubsub"] pekko.connectors.google.cloud.pubsub.grpc { host = "pubsub.googleapis.com" port = 443 # Set to "false" to disable TLS use-tls = true # Set to "none" to use the system default CA rootCa = "none" # Deprecated, use config path pekko.connectors.google.credentials.provider callCredentials = deprecated }
- Test Configuration
-
source
# SPDX-License-Identifier: Apache-2.0 pekko { loggers = ["org.apache.pekko.event.slf4j.Slf4jLogger"] logging-filter = "org.apache.pekko.event.slf4j.Slf4jLoggingFilter" loglevel = "DEBUG" } pekko.connectors.google.cloud.pubsub.grpc { # To run the IntegrationSpec against Google Cloud: # * go to the console at https://console.cloud.google.com # * Create a compute engine service account as documented at https://cloud.google.com/docs/authentication/production#creating_a_service_account # * Point GOOGLE_APPLICATION_CREDENTIALS to the downloaded JSON key and start sbt # * Create a project, and update IntegrationSpec to use that project ID rather than "pekko-connectors" # * Under 'Pub/Sub', 'Topics' create a topic 'simpleTopic' with a Google-managed key # * Under 'Pub/Sub', 'Subscriptions' create a subscription 'simpleSubscription' for this topic # * For 'republish', also create 'testTopic' and 'testSubscription' # * Comment out these test settings: host = "localhost" port = 8538 use-tls = false # no TLS rootCa = "none" callCredentials = "none" # no authentication }
For more configuration details consider the underlying configuration for Apache Pekko gRPC.
A manually initialized GrpcPublisher
GrpcPublisher
or GrpcSubscriber
GrpcSubscriber
can be used by providing it as an attribute to the stream:
- Scala
-
source
val settings = PubSubSettings(system) val publisher = GrpcPublisher(settings) val publishFlow: Flow[PublishRequest, PublishResponse, NotUsed] = GooglePubSub .publish(parallelism = 1) .withAttributes(PubSubAttributes.publisher(publisher))
- Java
-
source
final PubSubSettings settings = PubSubSettings.create(system); final GrpcPublisher publisher = GrpcPublisher.create(settings, system); final Flow<PublishRequest, PublishResponse, NotUsed> publishFlow = GooglePubSub.publish(1).withAttributes(PubSubAttributes.publisher(publisher));
Publishing
We first construct a message and then a request using Google’s builders. We declare a singleton source which will go via our publishing flow. All messages sent to the flow are published to PubSub.
- Scala
-
source
import org.apache.pekko import pekko.NotUsed import pekko.stream.connectors.googlecloud.pubsub.grpc.scaladsl.GooglePubSub import pekko.stream.scaladsl._ import com.google.protobuf.ByteString import com.google.pubsub.v1.pubsub._ import scala.concurrent.Future val projectId = "pekko-connectors" val topic = "simpleTopic" val publishMessage: PubsubMessage = PubsubMessage() .withData(ByteString.copyFromUtf8("Hello world!")) val publishRequest: PublishRequest = PublishRequest() .withTopic(s"projects/$projectId/topics/$topic") .addMessages(publishMessage) val source: Source[PublishRequest, NotUsed] = Source.single(publishRequest) val publishFlow: Flow[PublishRequest, PublishResponse, NotUsed] = GooglePubSub.publish(parallelism = 1) val publishedMessageIds: Future[Seq[PublishResponse]] = source.via(publishFlow).runWith(Sink.seq)
- Java
-
source
import org.apache.pekko.stream.connectors.googlecloud.pubsub.grpc.PubSubSettings; import org.apache.pekko.stream.connectors.googlecloud.pubsub.grpc.javadsl.GooglePubSub; import org.apache.pekko.stream.connectors.googlecloud.pubsub.grpc.javadsl.GrpcPublisher; import org.apache.pekko.stream.connectors.googlecloud.pubsub.grpc.javadsl.PubSubAttributes; import org.apache.pekko.stream.connectors.testkit.javadsl.LogCapturingJunit4; import org.apache.pekko.stream.javadsl.*; import com.google.protobuf.ByteString; import com.google.pubsub.v1.*; final String projectId = "pekko-connectors"; final String topic = "simpleTopic"; final PubsubMessage publishMessage = PubsubMessage.newBuilder().setData(ByteString.copyFromUtf8("Hello world!")).build(); final PublishRequest publishRequest = PublishRequest.newBuilder() .setTopic("projects/" + projectId + "/topics/" + topic) .addMessages(publishMessage) .build(); final Source<PublishRequest, NotUsed> source = Source.single(publishRequest); final Flow<PublishRequest, PublishResponse, NotUsed> publishFlow = GooglePubSub.publish(1); final CompletionStage<List<PublishResponse>> publishedMessageIds = source.via(publishFlow).runWith(Sink.seq(), system);
Similarly to before, we can publish a batch of messages for greater efficiency.
- Scala
-
source
val projectId = "pekko-connectors" val topic = "simpleTopic" val publishMessage: PubsubMessage = PubsubMessage() .withData(ByteString.copyFromUtf8("Hello world!")) val messageSource: Source[PubsubMessage, NotUsed] = Source(List(publishMessage, publishMessage)) val published = messageSource .groupedWithin(1000, 1.minute) .map { msgs => PublishRequest() .withTopic(s"projects/$projectId/topics/$topic") .addAllMessages(msgs) } .via(GooglePubSub.publish(parallelism = 1)) .runWith(Sink.seq)
- Java
-
source
final String projectId = "pekko-connectors"; final String topic = "simpleTopic"; final PubsubMessage publishMessage = PubsubMessage.newBuilder().setData(ByteString.copyFromUtf8("Hello world!")).build(); final Source<PubsubMessage, NotUsed> messageSource = Source.single(publishMessage); final CompletionStage<List<PublishResponse>> published = messageSource .groupedWithin(1000, Duration.ofMinutes(1)) .map( messages -> PublishRequest.newBuilder() .setTopic("projects/" + projectId + "/topics/" + topic) .addAllMessages(messages) .build()) .via(GooglePubSub.publish(1)) .runWith(Sink.seq(), system);
Subscribing
To receive messages from a subscription, there are two options: StreamingPullRequest
s or synchronous PullRequest
s. To decide whether you should use StreamingPullRequest
or PullRequest
, see StreamingPull: Dealing with large backlogs of small messages and Synchronous Pull from Google Cloud PubSub’s documentation
StreamingPullRequest
To receive message from a subscription, first we create a StreamingPullRequest
with a FQRS of a subscription and a deadline for acknowledgements in seconds. Google requires that only the first StreamingPullRequest
has the subscription and the deadline set. This connector takes care of that and clears up the subscription FQRS and the deadline for subsequent StreamingPullRequest
messages.
- Scala
-
source
val projectId = "pekko-connectors" val subscription = "simpleSubscription" val request = StreamingPullRequest() .withSubscription(s"projects/$projectId/subscriptions/$subscription") .withStreamAckDeadlineSeconds(10) val subscriptionSource: Source[ReceivedMessage, Future[Cancellable]] = GooglePubSub.subscribe(request, pollInterval = 1.second)
- Java
-
source
final String projectId = "pekko-connectors"; final String subscription = "simpleSubscription"; final StreamingPullRequest request = StreamingPullRequest.newBuilder() .setSubscription("projects/" + projectId + "/subscriptions/" + subscription) .setStreamAckDeadlineSeconds(10) .build(); final Duration pollInterval = Duration.ofSeconds(1); final Source<ReceivedMessage, CompletableFuture<Cancellable>> subscriptionSource = GooglePubSub.subscribe(request, pollInterval);
Here pollInterval
is the time between StreamingPullRequest
s are sent when there are no messages in the subscription.
PullRequest
With PullRequest
, each request receives a batch of messages, up to a maximum specified by the maxMessages
.
- Scala
-
source
val projectId = "pekko-connectors" val subscription = "simpleSubscription" val request = PullRequest() .withSubscription(s"projects/$projectId/subscriptions/$subscription") .withMaxMessages(10) val subscriptionSource: Source[ReceivedMessage, Future[Cancellable]] = GooglePubSub.subscribePolling(request, pollInterval = 1.second)
- Java
-
source
final String projectId = "pekko-connectors"; final String subscription = "simpleSubscription"; final PullRequest request = PullRequest.newBuilder() .setSubscription("projects/" + projectId + "/subscriptions/" + subscription) .setMaxMessages(10) .build(); final Duration pollInterval = Duration.ofSeconds(1); final Source<ReceivedMessage, CompletableFuture<Cancellable>> subscriptionSource = GooglePubSub.subscribePolling(request, pollInterval);
Here pollInterval
is the time between PullRequest
messages.
In order to minimise latency between requests you can set a buffer on the source. The buffer size depends on the usual number of messages you receive per each request, if you usually receive the maximum number of messages, it’s a good idea to set the buffer size to be the same as the maxMessages
parameter. Please note that by having a buffer, you are allowing messages to spend some of their lease time in the buffer, hence reducing the time to process them before the acknowledgement deadline is reached. This will depend on your acknowledgement deadline and processing time.
Acknowledge
Messages received from the subscription need to be acknowledged or they will be sent again. To do that create AcknowledgeRequest
that contains ackId
s of the messages to be acknowledged and send them to a sink created by GooglePubSub.acknowledge
.
- Scala
-
source
val ackSink: Sink[AcknowledgeRequest, Future[Done]] = GooglePubSub.acknowledge(parallelism = 1) subscriptionSource .map { message => // do something fun message.ackId } .groupedWithin(10, 1.second) .map(ids => AcknowledgeRequest() .withSubscription( s"projects/$projectId/subscriptions/$subscription") .withAckIds(ids)) .to(ackSink)
- Java
-
source
final Sink<AcknowledgeRequest, CompletionStage<Done>> ackSink = GooglePubSub.acknowledge(1); subscriptionSource .map( receivedMessage -> { // do some computation return receivedMessage.getAckId(); }) .groupedWithin(10, Duration.ofSeconds(1)) .map(acks -> AcknowledgeRequest.newBuilder().addAllAckIds(acks).build()) .to(ackSink);
Running the test code
Integration test code requires Google Cloud Pub/Sub emulator running in the background. You can start it quickly using docker:
docker compose up -d gcloud-pubsub-client
This will also run the Pub/Sub admin client that will create topics and subscriptions used by the integration tests.
Tests can be started from sbt by running:
- sbt
-
> google-cloud-pub-sub-grpc/test
There is also an ExampleApp that can be used to test publishing to topics and receiving messages from subscriptions.
To run the example app you will need to configure a project and Pub/Sub in Google Cloud and provide your own credentials.
- sbt
-
env GOOGLE_APPLICATION_CREDENTIALS=/path/to/application/credentials.json sbt // receive messages from a subsciptions > google-cloud-pub-sub-grpc/Test/run subscribe <project-id> <subscription-name> // publish a single message to a topic > google-cloud-pub-sub-grpc/Test/run publish-single <project-id> <topic-name> // continually publish a message stream to a topic > google-cloud-pub-sub-grpc/Test/run publish-stream <project-id> <topic-name>