FTP
The FTP connector provides Apache Pekko Stream sources to connect to FTP, FTPs and SFTP servers. Currently, two kinds of sources are provided:
- one for browsing or traversing the server recursively and,
- another for retrieving files as a stream of bytes.
Project Info: Apache Pekko Connectors FTP | |
---|---|
Artifact | org.apache.pekko
pekko-connectors-ftp
1.0.2
|
JDK versions | OpenJDK 8 OpenJDK 11 OpenJDK 17 |
Scala versions | 2.13.14, 2.12.20, 3.3.3 |
JPMS module name | pekko.stream.connectors.ftp |
License | |
API documentation | |
Forums | |
Release notes | GitHub releases |
Issues | Github issues |
Sources | https://github.com/apache/pekko-connectors |
Artifacts
- sbt
val PekkoVersion = "1.0.3" libraryDependencies ++= Seq( "org.apache.pekko" %% "pekko-connectors-ftp" % "1.0.2", "org.apache.pekko" %% "pekko-stream" % PekkoVersion )
- Maven
<properties> <pekko.version>1.0.3</pekko.version> <scala.binary.version>2.13</scala.binary.version> </properties> <dependencies> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-connectors-ftp_${scala.binary.version}</artifactId> <version>1.0.2</version> </dependency> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-stream_${scala.binary.version}</artifactId> <version>${pekko.version}</version> </dependency> </dependencies>
- Gradle
def versions = [ PekkoVersion: "1.0.3", ScalaBinary: "2.13" ] dependencies { implementation "org.apache.pekko:pekko-connectors-ftp_${versions.ScalaBinary}:1.0.2" implementation "org.apache.pekko:pekko-stream_${versions.ScalaBinary}:${versions.PekkoVersion}" }
The table below shows direct dependencies of this module and the second tab shows all libraries it depends on transitively.
Configuring the connection settings
In order to establish a connection with the remote server, you need to provide a specialized version of a RemoteFileSettings
instance. It’s specialized as it depends on the kind of server you’re connecting to: FTP, FTPs or SFTP.
- Scala
-
source
val ftpSettings = FtpSettings .create(InetAddress.getByName(HOSTNAME)) .withPort(PORT) .withCredentials(CREDENTIALS) .withBinary(true) .withPassiveMode(true) // only useful for debugging .withConfigureConnection((ftpClient: FTPClient) => { ftpClient.addProtocolCommandListener(new PrintCommandListener(new PrintWriter(System.out), true)) })
- Java
-
source
import org.apache.pekko.stream.connectors.ftp.javadsl.Ftp; import org.apache.pekko.stream.connectors.ftp.FtpSettings; import org.apache.pekko.stream.javadsl.Source; import org.apache.commons.net.PrintCommandListener; import org.apache.commons.net.ftp.FTPClient; import java.net.InetAddress; FtpSettings ftpSettings = FtpSettings.create(InetAddress.getByName(HOSTNAME)) .withPort(PORT) .withCredentials(CREDENTIALS) .withBinary(true) .withPassiveMode(true) // only useful for debugging .withConfigureConnectionConsumer( (FTPClient ftpClient) -> { ftpClient.addProtocolCommandListener( new PrintCommandListener(new PrintWriter(System.out), true)); });
The configuration above will create an anonymous connection with a remote FTP server in passive mode. For both FTPs and SFTP servers, you will need to provide the specialized versions of these settings: FtpsSettings
or SftpSettings
respectively.
The example demonstrates optional use of configureConnection
option available on FTP and FTPs clients. Use it to configure any custom parameters the server may require, such as explicit or implicit data transfer encryption.
For non-anonymous connection, please provide an instance of NonAnonFtpCredentials
instead.
For connection via a proxy, please provide an instance of java.net.Proxy
by using the withProxy
method.
For connection using a private key, please provide an instance of SftpIdentity
to SftpSettings
.
In order to use a custom SSH client for SFTP please provide an instance of SSHClient.
- Scala
-
source
import org.apache.pekko.stream.connectors.ftp.scaladsl.{ Sftp, SftpApi } import net.schmizz.sshj.{ DefaultConfig, SSHClient } val sshClient: SSHClient = new SSHClient(new DefaultConfig) val configuredClient: SftpApi = Sftp(sshClient)
- Java
-
source
import org.apache.pekko.stream.connectors.ftp.javadsl.Sftp; import org.apache.pekko.stream.connectors.ftp.javadsl.SftpApi; import net.schmizz.sshj.DefaultConfig; import net.schmizz.sshj.SSHClient; public class ConfigureCustomSSHClient { public ConfigureCustomSSHClient() { SSHClient sshClient = new SSHClient(new DefaultConfig()); SftpApi sftp = Sftp.create(sshClient); } }
Improving SFTP throughput
For SFTP connections allowing more than one unconfirmed read request to be sent by the client you can use withMaxUnconfirmedReads
on SftpSettings
The command-line tool sftp
uses a value of 64
by default. This can significantly improve throughput by reducing the impact of latency.
- Scala
-
source
import org.apache.pekko import pekko.stream.IOResult import pekko.stream.connectors.ftp.scaladsl.Sftp import pekko.stream.scaladsl.Source import pekko.util.ByteString import scala.concurrent.Future def retrieveFromPath(path: String, settings: SftpSettings): Source[ByteString, Future[IOResult]] = Sftp.fromPath(path, settings.withMaxUnconfirmedReads(64))
- Java
-
source
import org.apache.pekko.stream.IOResult; import org.apache.pekko.stream.connectors.ftp.SftpSettings; import org.apache.pekko.stream.connectors.ftp.javadsl.Sftp; import org.apache.pekko.stream.javadsl.Source; import org.apache.pekko.util.ByteString; import java.util.concurrent.CompletionStage; public class SftpRetrievingExample { public Source<ByteString, CompletionStage<IOResult>> retrieveFromPath( String path, SftpSettings settings) throws Exception { return Sftp.fromPath(path, settings.withMaxUnconfirmedReads(64)); } }
Traversing a remote FTP folder recursively
In order to traverse a remote folder recursively, you need to use the ls
method in the FTP API:
- Scala
-
source
import org.apache.pekko import pekko.NotUsed import pekko.stream.connectors.ftp.scaladsl.Ftp import pekko.stream.scaladsl.Source def listFiles(basePath: String, settings: FtpSettings): Source[FtpFile, NotUsed] = Ftp.ls(basePath, settings)
- Java
-
source
import org.apache.pekko.actor.ActorSystem; import org.apache.pekko.stream.connectors.ftp.FtpSettings; import org.apache.pekko.stream.connectors.ftp.javadsl.Ftp; public class FtpTraversingExample { public void listFiles(String basePath, FtpSettings settings, ActorSystem system) throws Exception { Ftp.ls(basePath, settings) .runForeach(ftpFile -> System.out.println(ftpFile.toString()), system); } }
This source will emit FtpFile
elements with no significant materialization.
For both FTPs and SFTP servers, you will need to use the FTPs
and SFTP
API respectively.
Retrieving files
In order to retrieve a remote file as a stream of bytes, you need to use the fromPath
method in the FTP API:
- Scala
-
source
import org.apache.pekko import pekko.stream.IOResult import pekko.stream.connectors.ftp.scaladsl.Ftp import pekko.stream.scaladsl.Source import pekko.util.ByteString import scala.concurrent.Future def retrieveFromPath(path: String, settings: FtpSettings): Source[ByteString, Future[IOResult]] = Ftp.fromPath(path, settings)
- Java
-
source
import org.apache.pekko.stream.IOResult; import org.apache.pekko.stream.connectors.ftp.FtpSettings; import org.apache.pekko.stream.connectors.ftp.javadsl.Ftp; import org.apache.pekko.stream.javadsl.Source; import org.apache.pekko.util.ByteString; import java.util.concurrent.CompletionStage; public class FtpRetrievingExample { public Source<ByteString, CompletionStage<IOResult>> retrieveFromPath( String path, FtpSettings settings) throws Exception { return Ftp.fromPath(path, settings); } }
This source will emit ByteString
elements and materializes to Future
in Scala API and CompletionStage
in Java API of IOResult
when the stream finishes.
For both FTPs and SFTP servers, you will need to use the FTPs
and SFTP
API respectively.
Writing files
In order to store a remote file from a stream of bytes, you need to use the toPath
method in the FTP API:
- Scala
-
source
import org.apache.pekko import pekko.stream.IOResult import pekko.stream.connectors.ftp.scaladsl.Ftp import pekko.util.ByteString import scala.concurrent.Future val result: Future[IOResult] = Source .single(ByteString("this is the file contents")) .runWith(Ftp.toPath("file.txt", ftpSettings)) // Create a gzipped target file import org.apache.pekko.stream.scaladsl.Compression val result: Future[IOResult] = Source .single(ByteString("this is the file contents" * 50)) .via(Compression.gzip) .runWith(Ftp.toPath("file.txt.gz", ftpSettings))
- Java
-
source
import org.apache.pekko.stream.connectors.ftp.javadsl.Ftp; import org.apache.pekko.stream.IOResult; import org.apache.pekko.stream.connectors.testkit.javadsl.LogCapturingJunit4; import org.apache.pekko.stream.javadsl.Compression; import org.apache.pekko.stream.testkit.javadsl.StreamTestKit; import org.apache.pekko.util.ByteString; import java.util.concurrent.CompletionStage; CompletionStage<IOResult> result = Source.single(ByteString.fromString("this is the file contents")) .runWith(Ftp.toPath("file.txt", ftpSettings), materializer); // Create a gzipped target file CompletionStage<IOResult> result = Source.single(ByteString.fromString("this is the file contents")) .via(Compression.gzip()) .runWith(Ftp.toPath("file.txt.gz", ftpSettings), materializer);
This sink will consume ByteString
elements and materializes to Future
in Scala API and CompletionStage
in Java API of IOResult
when the stream finishes.
For both FTPs and SFTP servers, you will need to use the FTPs
and SFTP
API respectively.
Removing files
In order to remove a remote file, you need to use the remove
method in the FTP API:
- Scala
-
source
import org.apache.pekko import pekko.stream.IOResult import pekko.stream.connectors.ftp.scaladsl.Ftp import pekko.stream.scaladsl.Sink import scala.concurrent.Future def remove(settings: FtpSettings): Sink[FtpFile, Future[IOResult]] = Ftp.remove(settings)
- Java
-
source
import org.apache.pekko.stream.IOResult; import org.apache.pekko.stream.connectors.ftp.FtpFile; import org.apache.pekko.stream.connectors.ftp.FtpSettings; import org.apache.pekko.stream.connectors.ftp.javadsl.Ftp; import org.apache.pekko.stream.javadsl.Sink; import java.util.concurrent.CompletionStage; public class FtpRemovingExample { public Sink<FtpFile, CompletionStage<IOResult>> remove(FtpSettings settings) throws Exception { return Ftp.remove(settings); } }
This sink will consume FtpFile
elements and materializes to Future
in Scala API and CompletionStage
in Java API of IOResult
when the stream finishes.
Moving files
In order to move a remote file, you need to use the move
method in the FTP API. The move
method takes a function to calculate the path to which the file should be moved based on the consumed FtpFile
.
- Scala
-
source
import org.apache.pekko import pekko.stream.IOResult import pekko.stream.connectors.ftp.scaladsl.Ftp import pekko.stream.scaladsl.Sink import scala.concurrent.Future def move(destinationPath: FtpFile => String, settings: FtpSettings): Sink[FtpFile, Future[IOResult]] = Ftp.move(destinationPath, settings)
- Java
-
source
import org.apache.pekko.stream.IOResult; import org.apache.pekko.stream.connectors.ftp.FtpFile; import org.apache.pekko.stream.connectors.ftp.FtpSettings; import org.apache.pekko.stream.connectors.ftp.javadsl.Ftp; import org.apache.pekko.stream.javadsl.Sink; import java.util.concurrent.CompletionStage; import java.util.function.Function; public class FtpMovingExample { public Sink<FtpFile, CompletionStage<IOResult>> move( Function<FtpFile, String> destinationPath, FtpSettings settings) throws Exception { return Ftp.move(destinationPath, settings); } }
This sink will consume FtpFile
elements and materializes to Future
in Scala API and CompletionStage
in Java API of IOResult
when the stream finishes.
Typical use-case for this would be listing files from a ftp location, do some processing and move the files when done. An example of this use case can be found below.
Creating directory
In order to create a directory the user has to specify a parent directory (also known as base path) and directory’s name.
Apache Pekko Connectors provides a materialized API mkdirAsync
(based on FutureCompletion Stage) and unmaterialized API mkdir
(using Sources) to let the user choose when the action will be executed.
- Scala
-
source
import org.apache.pekko import pekko.NotUsed import pekko.stream.scaladsl.Source import pekko.stream.connectors.ftp.scaladsl.Ftp import pekko.Done def mkdir(basePath: String, directoryName: String, settings: FtpSettings): Source[Done, NotUsed] = Ftp.mkdir(basePath, directoryName, settings) - Java
-
source
import org.apache.pekko.Done; import org.apache.pekko.NotUsed; import org.apache.pekko.stream.connectors.ftp.FtpSettings; import org.apache.pekko.stream.connectors.ftp.javadsl.Ftp; import org.apache.pekko.stream.javadsl.Source; public class FtpMkdirExample { public Source<Done, NotUsed> mkdir( String parentPath, String directoryName, FtpSettings settings) { return Ftp.mkdir(parentPath, directoryName, settings); } }
Please note that to include a subdirectory in result of ls
the emitTraversedDirectories
has to be set to true
.
Example: downloading files from an FTP location and move the original files
- Scala
-
source
import java.nio.file.Files import org.apache.pekko import pekko.NotUsed import pekko.stream.connectors.ftp.scaladsl.Ftp import pekko.stream.scaladsl.{ FileIO, RunnableGraph } def processAndMove(sourcePath: String, destinationPath: FtpFile => String, settings: FtpSettings): RunnableGraph[NotUsed] = Ftp .ls(sourcePath, settings) .flatMapConcat(ftpFile => Ftp.fromPath(ftpFile.path, settings).map((_, ftpFile))) .alsoTo(FileIO.toPath(Files.createTempFile("downloaded", "tmp")).contramap(_._1)) .to(Ftp.move(destinationPath, settings).contramap(_._2))
- Java
-
source
import org.apache.pekko.NotUsed; import org.apache.pekko.japi.Pair; import org.apache.pekko.stream.connectors.ftp.FtpFile; import org.apache.pekko.stream.connectors.ftp.FtpSettings; import org.apache.pekko.stream.connectors.ftp.javadsl.Ftp; import org.apache.pekko.stream.javadsl.FileIO; import org.apache.pekko.stream.javadsl.RunnableGraph; import java.nio.file.Files; import java.util.function.Function; public class FtpProcessAndMoveExample { public RunnableGraph<NotUsed> processAndMove( String sourcePath, Function<FtpFile, String> destinationPath, FtpSettings settings) throws Exception { return Ftp.ls(sourcePath, settings) .flatMapConcat( ftpFile -> Ftp.fromPath(ftpFile.path(), settings).map(data -> new Pair<>(data, ftpFile))) .alsoTo(FileIO.toPath(Files.createTempFile("downloaded", "tmp")).contramap(Pair::first)) .to(Ftp.move(destinationPath, settings).contramap(Pair::second)); } }
Running the example code
The code in this guide is part of runnable tests of this project. You are welcome to browse the code, edit and run it in sbt.
```
docker compose up -d ftp sftp
sbt
> ftp/test
```
When using the SFTP
API, take into account that JVM relies on /dev/random
for random number generation by default. This might potentially block the process on some operating systems as /dev/random
waits for a certain amount of entropy to be generated on the host machine before returning a result. In such case, please consider providing the parameter -Djava.security.egd = file:/dev/./urandom
into the execution context. Further information can be found here.