Working with Kafka Connect

Menu

In this example we will be using Kafka Connect to store changes in a text file to a Kafka topic and a second text file.

Prerequisites

The Apache Kafka package installation comes bundled with a number of Kafka tools. For this example we are going to use the connect-standalone.sh tool. To get this tool you will need to download and install a Kafka release from here. This example has been tested with Kafka 1.1.0.

Kafka Connect Configuration

Before we can use Kafka Connect we need to configure a number of things. Basic configuration requires the following configuration options. Make a file connect.properties with the following content:

Make sure to replace the bootstrap.servers with the IP address of at least one node in your cluster.

Note: To connect to your Kafka cluster over the private network, use port 9093 instead of 9092.

In order to use Kafka Connect with Instaclustr Kafka you also need to provide authentication credentials. Add the following to your connect.properties file, ensuring the password is correct:

If your cluster does not have client ⇆ broker encryption enabled, add the following to your connect.properties file:

If your cluster has client ⇆ broker encryption enabled you will also need to provide encryption information. For more information on using certificates with Kafka and where to find them see here. Add the following to your connect.properties file, ensuring the truststore location is correct:

Create Kafka Topic

For Kafka Connect to work, file sources and file sinks must refer to specific Kafka topics. Before we can run Kafka Connect we need to create a topic to be used for storing the messages produced by Kafka Connect. Use the guide here to create a new topic called connect-test with a replication factor of 3.

File Source Configuration

Now that Kafka Connect is configured, we need to configure the source for our data. In this case we will be using the Connect File Source that is provided with Apache Kafka. The Connect File Source will pipe all changes in a file into a specific kafka topic. Make a file connect-file-source.properties with the following content:

Make sure to replace the file option with the path to your desired input file.

File Sink Configuration

Now that our data source is configured, we need to configure the sink for our data. In this case we will be using the Connect File Sink that is provided with Apache Kafka. The Connect File Sink will pipe all changes in a Kafka topic to a file. Make a file connect-file-sink.properties with the following content:

Make sure to replace the file option with the path to the desired output file.

Start Kafka Connect

Now that all the Kafka Connect components are configured, we can start Kafka Connect from the command line like so:

Test Kafka Connect

Once Kafka Connect has started, it’s time to test the configuration.

First, follow the guide here to setup a Kafka console consumer, changing the topic name to the connect-test topic.

Next, add some text to your input file:

echo "testing" >> text.txt

After a short delay, the same text should appear in the output file:

And be output by the console consumer:

Site by Swell Design Group