Why this series exists

Every year, teams spend countless hours clicking through cloud consoles, manually provisioning infrastructure, and hoping they remember the exact configuration when it’s time to rebuild. Then something breaks at 2 a.m. and nobody remembers which checkbox was checked six months ago.

There’s a better way.

This series walks you through building a complete streaming analytics pipeline: Apache Kafka, Apache Kafka Connect, and ClickHouse—entirely through Terraform. No clicking. No guessing. Just code that you can version, review, and reproduce.

By the end of the three parts, you’ll have infrastructure that deploys in minutes and can be recreated identically in any environment.

What we’re building

The full pipeline looks like this:

  • Part 1 (this article): Deploy a managed Kafka cluster
  • Part 2: Add ClickHouse for analytics and Kafka Connect to bridge them
  • Part 3: Integrate with AWS VPC for private networking

Today, we start simple: a production-ready Kafka cluster deployed with a single Terraform apply.

Who uses this pattern?

This isn’t theoretical. Companies processing millions of events daily rely on Kafka pipelines backed by infrastructure-as-code:

  • LinkedIn created Kafka to handle 1.4 trillion messages per day across their platform
  • Uber uses Kafka for real-time pricing, dispatch, and trip tracking
  • Netflix processes billions of events through Kafka for personalization and monitoring

What these companies have in common: they don’t provision infrastructure by hand. Everything is automated, versioned, and repeatable.

Prerequisites

Before we begin

We’re about to write a Terraform configuration that talks to the Instaclustr API, provisions a Kafka cluster on AWS, and opens a firewall rule so that you can connect to it. To do that, you’ll need four things:

  1. Terraform installed (version 1.0+)
  2. An Instaclustr account—sign up for free
  3. An API key from Instaclustr Console -> Account -> API Keys
  4. Your public IP address—run curl ifconfig.me from a terminal, or visit ipchicken.com in your browser

How to build a streaming analytics pipeline with Terraform and Instaclustr screenshot

Account settings—gear icon dropdown reveals the account settings page

How to build a streaming analytics pipeline with Terraform and Instaclustr screenshot

API Keys management page—create your Provisioning API key

The complete Terraform configuration

Create a new directory and add two files:

Terraform reads all .tf files in a directory as a single configuration, so we need to keep everything organized in its own folder. The main.tf file defines what infrastructure to create and terraform.tfvars holds your credentials and personal settings, which are kept separate so that you never accidentally commit secrets to version control. So those are the two files that we’ll create right now to get started:

main.tf

terraform.tfvars

NOTE: Terraform automatically loads variables from this file, so you don’t need to pass them in the command line every time. Keep this file out of version control because it contains your API key.

Replace the values with your actual API key and IP address from the pre-req section above (remember to add /32 at the end of your IP).

Understanding the configuration

Let’s break down what each section does—because copying code without understanding it is how technical debt is born.

The cluster resource

The instaclustr_kafka_cluster_v3 resource is the core of our deployment. A few key decisions here:

  • kafka_version = "4.1.1": We’re using Kafka 4.1.1, which includes KRaft mode (https://kafka.apache.org/40/getting-started/upgrade/”>no more ZooKeeper dependency).This simplifies operations significantly.
  • sla_tier = "NON_PRODUCTION": For learning and development. Production workloads should use "PRODUCTION" for higher availability guarantees.
  • auto_create_topics = true: Topics are created automatically when a producer first writes to them. Convenient for development; you may want to disable this in production for tighter control.

How to build a streaming analytics pipeline with Terraform and Instaclustr screenshot

Create New Cluster—this is what you would see if you were doing this in Instaclustr’s Management Console. However, with Terraform you skip this entirely because the code handles it for you.

The data centre block

This is where Terraform’s declarative approach shines. You’re saying what you want, not how to build it:

  • network = "10.0.0.0/16": The CIDR block for the cluster’s VPC. Each Instaclustr cluster gets its own isolated network.
  • number_of_nodes = 3: Three brokers for fault tolerance. Kafka’s replication ensures data survives node failures.
  • node_size = "KFK-DEV-t4g.small-5": A development-sized instance. For production, you’d scale up based on throughput requirements.

How to build a streaming analytics pipeline with Terraform and Instaclustr screenshot

Data centre options—shows provider, data centre, custom name, cluster network

How to build a streaming analytics pipeline with Terraform and Instaclustr screenshot

Kafka node selection—KFK-DEV-t4g.small-5 with 5 GiB disk, 2 GiB RAM, and 2 cores

Why firewall rules matter

Instaclustr clusters are secure by default – nothing can connect until you explicitly allow it. The cluster_id reference uses Terraform’s dependency system: the firewall rule won’t be created until the cluster exists.

Notice the type = "KAFKA" – this opens the Kafka broker port (9092). Other types like "KAFKA_CONNECT" or "CLICKHOUSE" open different ports for different services.

Deploy the cluster

With your files in place, run:

How to build a streaming analytics pipeline with Terraform and Instaclustr screenshot

Terraform apply—execution plan showing all config values:

Kafka 4.1.1, 3 partitions, 3 replication factor, and streaming-kafka-cluster

The plan output shows exactly what Terraform will create. Review it, then type yes to proceed.

How to build a streaming analytics pipeline with Terraform and Instaclustr screenshot

Each green + in the plan output means Terraform will create that attribute or resource. At this step you can review what’s about to be provisioned before anything happens. Values marked (known after apply) are generated by the Instaclustr API once the cluster exists (like the cluster ID and status).

The (sensitive value) next to default_user_password means Terraform is hiding it from the output to protect credentials. In this example, notice the bottom shows Plan: 2 to add, 0 to change, 0 to destroy. This just confirms exactly what will happen: 2 new resources (the Kafka cluster and the firewall rule), nothing will get modified, and nothing will get deleted.

Deployment takes 10-15 minutes. When that completes, then you’ll see outputs like:

How to build a streaming analytics pipeline with Terraform and Instaclustr screenshot

Creation complete — Apply is complete with cluster ID and RUNNING status

Explore your cluster

Once provisioning finishes, the cluster appears in your Instaclustr console dashboard with built-in monitoring—no set up required.

How to build a streaming analytics pipeline with Terraform and Instaclustr screenshot

Monitoring dashboard—real time CPU usage and disk utilization graphs across all 3 nodes.

Test your cluster

With the cluster running, let’s verify it works. If you have the Kafka CLI tools installed:

Replace with the value from your Terraform output.

Clean up

When you’re done experimenting:

This removes everything Terraform created. No orphaned resources, no surprise bills.

How to build a streaming analytics pipeline with Terraform and Instaclustr screenshot

Delete Cluster confirmation—console equivalent of terraform destroy

A note on safety and peace of mind: Terraform doesn’t require you to type the cluster name to confirm, deletion the way that the Console does. As a result, it only asks you to type “yes”. If that makes you nervous for production clusters, then you can add a lifecycle {} block with prevent_destroy = true to your resource. Terraform will refuse to destroy it until you explicitly removed that setting from the code making deletion a deliberate reviewable change rather than an accidental one. It would look like this.

What’s next

You now have a working Kafka cluster deployed entirely through code. In part 2, we’ll expand this into a full pipeline by adding:

  • ClickHouse—a columnar database for real-time analytics
  • Kafka Connect—to stream data from Kafka into ClickHouse
  • Firewall rules—connecting all three systems securely

The goal: a complete streaming analytics stack where data flows from Kafka to ClickHouse automatically.

Key takeaways

  1. Infrastructure-as-code isn’t optional—it’s how serious teams manage cloud resources consistently.
  2. Terraform’s declarative model means you describe the end state, not the steps to get there.
  3. Instaclustr’s provider handles the complexity of managed services behind simple resource definitions.
  4. Firewall rules are explicit—security by default, access by intention.

See you in part 2