Managed Apache Kafka on AWS

Interested in AWS Kafka? Rather than going with Amazon MSK, you can save with true open source Apache Kafka fully managed and hosted on AWS with Instaclustr.

Apache Kafka: The Platform for Building Real-Time Streaming Data Pipelines and Applications

Our fully managed and hosted Apache Kafka service can be hosted on AWS to provide you with a reliable and highly available alternative to Amazon MSK. Instaclustr Managed Apache Kafka on AWS enables you to build fast and scalable distributed systems for real time streaming.

While some turn to Amazon Managed Streaming for Apache Kafka, you can use AWS for hosting your Apache Kafka instance without committing to Amazon as your managed service provider to avoid long-term vendor lock-in. Our service lets you deploy your open source Apache Kafka cluster on AWS, fully managed by Instaclustr, to create highly available clusters for your streaming data with pre-configured and fully optimized settings. Our open source Apache Kafka optimization techniques are based on best practices developed from thorough and exhaustive testing to match different real-world use cases.

Managing Your Kafka Cluster on AWS (or Your Cloud of Choice)

Using Instaclustr’s fully managed Apache Kafka service with your chosen cloud provider is easy. You have two options: simply run managed Apache Kafka from within Instaclustr’s AWS accounts, or run in your own cloud provider account. You can get the benefits of using AWS without the costs and lock-in of the Apache Kafka Amazon MSK.

Best Practices for Running Managed Kafka on AWS

The first step in deploying open source Apache Kafka on AWS is deciding the correct (Amazon EC2) instance type for Apache Kafka nodes (brokers). This important choice determines the performance and throughput of your cluster, as well as the cost of running it on AWS. It is a crucial step and often involves a trade-off between cost and performance.

Amazon EC2 offers a huge number of instance types with varied combinations of CPU, memory, and disk hardware to suit different purposes and applications. Instaclustr has made this step easier for you by narrowing down this choice to a handful of instance types that offer the best return on investment for a Kafka deployment. Our research and choice of instance types are based on Kafka’s architecture and internals, AWS features, a cost-vs-value analysis, and, most importantly, real-world use cases of Apache Kafka.

Learn how to store Kafka data to Amazon S3 via the Instaclustr console

Learn how to create a Kafka Cluster on the Instaclustr console

Kafka and Cassandra in Action: Explore How We Built a Massively Scalable Anomaly Detection Application

The 10-part blog series showcases a detailed anomaly detection application we deployed on Amazon EKS and integrated with a massive-scale Apache Kafka and Cassandra data pipeline on AWS, all through the Instaclustr Managed Platform. The series highlights best practices, performance tuning, monitoring and tracing capabilities, and above all demonstrates how a massively scalable Kafka-Cassandra data pipeline can be architected to handle and detect anomalies from billions of daily transactions.

Explore the 10-part series by category.

In this blog we introduce the main motivation behind the project, and cover functionality and initial test results beginning with Cassandra.

Learn More

Learn how to provision Cassandra and Kafka clusters automatically with Instaclustr’s provisioning API.

Learn more

In this post, we generate high volume load for Kafka, the log aggregation system that operates via a publish-subscribe mechanism.

Learn More

Metrics were added to compute and report CPU utilization, memory, rate-of-event production, and producer latency.

Learn More

We explore how to better understand an open source system using Prometheus for distributed metrics monitoring.

Learn More

In this post, we look at another way of increasing visibility into a system using OpenTracing for distributed tracing.

Learn More

We explore deploying the Anomalia Machina application on Kubernetes with the help of Amazon EKS.

Learn More

We deploy the instrumented application in a cloud production environment.

Learn More

We test out the application to see how anomaly detection can scale on small Kafka and Cassandra Instaclustr production clusters.

Learn More

Our final blog of the Anomalia Machina series focuses on scaling the application out from 3 to 48 Cassandra nodes. The scale results were impressive: 574 CPU cores (across Cassandra, Kafka, and Kubernetes clusters), 2.3 million writes/s into Kafka during its peak, and 220,000 anomaly checks per second (sustainable). In total, the application handled, a massive 19 billion anomaly checks per day.

Learn More

Apache Kafka Benchmarks for AWS

We have recently completed an extensive benchmarking exercise to help our customers in evaluating the choice of instance types for Kafka.