PrivateLink is a feature supported by Amazon Web Services (AWS) as an alternative to VPC peering, which allows a simple way for establishing secure private connectivity between VPCs without exposing the data to the internet. For instance, it allows direct and one-way access to a VPC without needing you to get involved in network management.
Clients must be within an AWS VPC.
A Route 53 Hosted Zone to allow name resolution of the PrivateLink cluster’s advertised hostname. (You can read more about Amazon Route 53 here.)
The cluster must be a Private Network cluster to allow the migrating across of a non-PrivateLink cluster to use PrivateLink
Cross-Region AWS PrivateLink Endpoint connections are not supported.
PrivateLink is not supported in the following regions:
How Clients Connect to a Kafka Cluster
There is a 2-step process when a client (producer/consumer) sends a command to a broker. Firstly, there is an initial connection where the broker returns the list of available brokers and their respective advertised listeners. Based on the request, for instance, writing to a certain topic, the client will then connect with the broker which is the partition leader using the returned advertised listener. In short, advertised listeners are how clients talk to brokers. An advertised listener is made up of an external address and a port number.
For instance, a Kafka Cluster has 3 brokers, and their respective advertised listeners are: B0:9092, B1:9092, B2:9092 (where Bx stands for an external address). A client sends a command to produce messages to the test topic with bootstrap server B0:9092. The returned metadata will contain all the advertised listeners. Assume that B1:9092 is the advertised listener of the leader of the test topic partition, then the client would produce the message to B1:9092 this time, and it will be successful.
How Clients Can Connect Using Our AWS PrivateLink Connection
For our PrivateLink architecture, the Apache Kafka® cluster must be a Private Network Cluster and each broker in the cluster must have a unique advertised listener. We will use the same hostname as our external address for all brokers but assign a unique port number for each broker. For instance, in Figure 1, the hostname is kafka.test.com and the port on broker 1 is 6001. Hence the advertised listener for privateLink in broker 1 is kafka.test.com:6001.
We need these distinct, unique advertised listener ports so that the network load balancer (NLB) can direct traffic to the right broker according to the port. The NLB has its own listeners, and each listener has a target group. As shown in Figure 1, in our architecture we have a default listener that listens on 9091, and additional listeners listening on each broker’s unique advertised listener ports.
The default listener forwards requests to a target group where all brokers have registered. If this default listener group does not exist, the listener could be forwarding requests to a potentially down broker. Thus the default listener helps to ensure that the client is able to connect to one of the available brokers. Once connected to the cluster, the client is able to retrieve the cluster topology and access the broker the client wants to talk to.
To ensure the above works, before connecting to the cluster, you are required to create an endpoint and a Route 53 alias record. The steps are outlined in here. The endpoint will be connected to the NLB and the Route 53 record will be used to resolve the hostname kafka.test.com and port 9091 to the created endpoint to route the request to the NLB.
For more information, please refer to the How It Works section below.
Figure 1: Instaclustr for Apache Kafka—AWS PrivateLink Architecture
How It Works
The Kafka client will resolve the hostname kafka.test.com via Route 53 to the endpoint created within the client’s VPC.
The client initiates a connection to the endpoint over port 9091. Note, port 9091 is the port for the bootstrap servers and will always be the same for any Instaclustr for Apache Kafka cluster enabled with the PrivateLink feature.
The NLB listener on port 9091 receives the connection.
The corresponding target group, B-ALL, loads the request to one of the brokers. For instance, Broker 3. Broker 3 sends back its advertised hostname (kafka.test.com) and port (6003) to the client, amongst other relevant information.
The client resolves the returned hostname kafka.test.com, which points to the same endpoint, and initiates the connection again, but this time directly with 6003 port.
The client connects to the endpoint over port 6003.
The NLB listener on port 6003 then receives the connection and forwards the request to the only registered target in B3, which is broker 3.