Kafka Mirroring with Kafka Connect
Mirroring is the process of copying your Kafka data from one cluster to another. This can be for a variety of reasons including:
- Data backup
- Creating a Failover cluster
- Moving data between geographical regions for local production/consumption
- Creating Active/Active highly available cluster topologies
Instaclustr uses Mirror Maker 2 on top of Kafka Connect to provide a mirroring capability.
Prerequisites
You can use the Instaclustr dashboard to start the required mirroring connectors to create mirroring between two Kafka clusters. In order to do so you will need as a prerequisite:
- A source Kafka cluster (i.e. where your data is coming from )
- A destination Kafka cluster (i.e. where your data is going to)
- A Kafka Connect cluster targeting your destination Kafka cluster (to run the connectors). Note that since the Kafka Connect cluster must be connected to the destination Kafka cluster, it is strongly recommended that the Kafka Connect cluster be provisioned in the same region as the destination Kafka cluster. The source cluster need not be in the same region.
If you have not yet set these up, do so first, using the following pages to help you:
- https://www.instaclustr.com/support/documentation/kafka/getting-started-with-kafka/creating-a-kafka-cluster/
- https://www.instaclustr.com/support/documentation/kafka-connect/getting-started-with-kafka-connect/creating-a-kafka-connect-cluster/
Creating the connectors
Navigate to the mirroring page for your Kafka Connect cluster on the console and click on Create New Mirror.
Enter the required information. You will need to specify:
- The source Kafka cluster. (If this is not an Instaclustr managed cluster then you will need to specify connection properties similar to what you would use in a consumer configuration)
- Whether or not to rename the mirrored topics. Topics named ‘abc’ in the source cluster will become named ‘alias.abc’ in the destination cluster (where ‘alias’ is the alias for the source cluster). Renaming topics is the default and is useful for creating active/active setups or moving data between geographic areas. This helps prevent mirroring loops and identifies where data has come from. Turning renaming off is useful for straight data backups or creating failover clusters that consumers can use without needing to be topology aware.
- A source cluster alias. This is a short name used to describe the source cluster in various places and defaults to the cluster’s name.
- Whether or not to use private IPs. Use private IP’s only if your source and destination clusters can be routed between using private IPs (e.g. via VPC peering). Connecting via private IPs may save on traffic costs.
- The topic to mirror. This can be a regular expression to cover mirroring multiple topics.
- The maximum number of Kafka Connect tasks to use. We recommend that this not be less than the number of workers in your Connect cluster.
Press the Create Mirror button. Creating the mirror may take a minute or two, particularly if this is your first mirror, and your source cluster will be automatically set up as a ‘connected’ cluster. Refer to our support article on Connecting to Instaclustr managed Clusters for more details on this feature.
From the main mirroring page you can see a summary of your active mirroring data flows. For each individual mirroring data flow, you can use the action buttons on the right to get more details, or to delete the mirroring data flow.
You can also see the status of all your Kafka mirrors on this page. There are three different statuses for our managed Kafka mirrors. Before introducing them, some terminologies need to be defined for your better understanding.
- Active topics: Topics have messages produced at a specific time.
- Inactive topics: Topics do not have messages produced at a specific time.
- Stale topics: Topics have messages produced before mirror creation, but don’t have new messages produced after mirror creation.
The following statutes are the three statuses that might be shown on the main mirroring page.
- IN_SYNC:
- All mirrored topics are active and their average replication latency values are lower than the target latency.
- All mirrored topics are active and their average replication latency values are lower than the target latency.
- STALE_TOPIC:
- Mirrored topics contain one or more stale topics.
- Mirrored topics contain one or more inactive topics, and the average replication latency values of their last messages are higher than the target latency.
- OUT_OF_SYNC:
- The mirror connector or its tasks are not running.
- Mirrored topics contain one or more active topics that have their average replication latency values higher than the target latency.
The details page will show you additional information about a particular flow, including:
- The latest latency measurements for copying of data from source cluster to target cluster.
- The status of each of the Mirror Maker 2 connectors and tasks
- The configuration used to create each of the Mirror Maker 2 connectors
Additionally, you can optionally update the target latency for the data flow. This value is used to control when to alert the support team to high latency issues. We recommend you only change it in consultation with our support team.