Kafka Mirroring with Kafka Connect
Mirroring is the process of copying your Kafka data from one cluster to another. This can be for a variety of reasons including:
- Data backup
- Creating a Failover cluster
- Moving data between geographical regions for local production/consumption
- Creating Active/Active highly available cluster topologies
Instaclustr uses Mirror Maker 2 on top of Kafka Connect to provide a mirroring capability.
You can use the Instaclustr dashboard to start the required mirroring connectors to create mirroring between two Kafka clusters. In order to do so you will need as a prerequisite:
- A source Kafka cluster (i.e. where your data is coming from )
- A destination Kafka cluster (i.e. where your data is going to)
- A Kafka Connect cluster targeting your destination Kafka cluster (to run the connectors). Note that since the Kafka Connect cluster must be connected to the destination Kafka cluster, it is strongly recommended that the Kafka Connect cluster be provisioned in the same region as the destination Kafka cluster. The source cluster need not be in the same region.
If you have not yet set these up, do so first, using the following pages to help you:
Creating the connectors
Navigate to the mirroring page for your Kafka Connect cluster on the console:
Press the ‘Create new mirror’ button at the bottom of the page.
Enter the required information. You will need to specify:
- The source Kafka cluster. (If this is not an Instaclustr managed cluster then you will need to specify connection properties similar to what you would use in a consumer configuration)
- Whether or not to rename the mirrored topics. Topics named ‘abc’ in the source cluster will become named ‘alias.abc’ in the destination cluster (where ‘alias’ is the alias for the source cluster). Renaming topics is the default and is useful for creating active/active setups or moving data between geographic areas. This helps prevent mirroring loops and identifies where data has come from. Turning renaming off is useful for straight data backups or creating failover clusters that consumers can use without needing to be topology aware.
- A source cluster alias. This is a short name used to describe the source cluster in various places and defaults to the cluster names itself. It can be changed though
- Whether or not to use private IPs. Use private IP’s only if your source and destination clusters can be routed between using private IPs (e.g. via VPC peering). If possible to do so, this may save on traffic costs.
- The topic to mirror. This can be a regular expression to cover mirroring multiple topics.
- The maximum number of Kafka Connect tasks to use. We recommend that this not be less than the number of workers in your Connect cluster.
Press the ‘Create mirror’ button. Creating the mirror may take a minute or two, particularly if this is your first mirror, and your source cluster will be automatically set up as a ‘connected’ cluster. See https://www.instaclustr.com/support/documentation/kafka-connect/accessing-and-using-kafka-connect/connecting-kafka-connect-to-other-clusters/ for more details on this feature
From the main mirroring page you can see a summary of your active mirroring data flows
For each individual mirroring data flow, you can use the action buttons on the right to get more details, or to delete the mirroring data flow.
The details page will show you additional information about a particular flow, including:
- The latest latency measurements for copying of data from source cluster to target cluster.
- The status of each of the Mirror Maker 2 connectors and tasks
- The configuration used to create each of the Mirror Maker 2 connectors
Additionally, you can optionally update the target latency for the data flow. This value is used to control when to alert the support team to high latency issues. We recommend you only change it in consultation with our support team.