Instaclustr Blog Archive
2026
-
- ClickHouse
What are Asynchronous Data Inserts in ClickHouse®
Asynchronous data inserts in ClickHouse are a server-side batching mechanism that buffers incoming data in memory before writing it to disk. This approach allows ClickHouse to handle high-throughput ingestion smoothly and predictably under heavy load. Everyone talks about ClickHouse’s speed in large analytical workloads, usually focusing on the blazing-fast queries. But speed doesn’t just come...
Learn MoreVikas KumarJune 09, 2026 -
- AI
- Dev Rel
- OpenSearch
Optimizing AI Search in OpenSearch: A practical guide
Learn how to build production-ready AI search in OpenSearch. This practical guide covers OpenSearch vector search, hybrid search, k-NN search, Lucene vs. Faiss, cluster tuning, and measuring search quality
Learn MoreKassian WrenJune 05, 2026 -
- Dev Rel
- OpenSearch
How to create a hybrid search pipeline in OpenSearch®
Hybrid search in OpenSearch is a retrieval method that combines multiple search techniques, such as keyword matching and semantic vector search, into a single, unified result set. When you search strictly with keyword matching or solely with semantic, you’re losing some of the query’s intent. This strategy can also make things difficult when dealing with...
Learn MoreKassian WrenJune 04, 2026 -
- Apache Kafka
- ClickHouse
- Dev Rel
- News
Kafka® to Iceberg: Build a queryable data lake with ClickHouse®
That’s ClickHouse below querying an Iceberg table on S3 within 0.31 seconds to read metadata and return the first rows. No Spark job, no data movement, and no separate warehouse layer to manage. By the end of this article, you’ll have the full pipeline running and understand why each component exists—not just how to configure...
Learn MoreWalt RibeiroJune 03, 2026 -
- ClickHouse
- Dev Rel
Apache Iceberg explained: A better table format for modern data lakes
Data lakes had a reputation problem. The promise was compelling: dump all your data into cheap object storage—S3, GCS, Azure Blob—and query it whenever you need. The reality was a mess of stale partitions, schema drift, and silent data corruption caused by unsafe concurrent writes. Engineers knew the risks and worked around them rather than...
Learn MoreWalt RibeiroJune 02, 2026 -
- Apache Kafka
- Dev Rel
How to handle bad messages safely in Kafka Streams with KIP-1034 Dead Letter Queues
What should you do with messages that can’t be delivered? If you have spent time developing Kafka Streams applications in Java, you’ve probably felt the gap between what Kafka Connect and Kafka Streams offer when something goes wrong on the wire. Connect has a familiar pattern: poison messages land in a dead letter queue (DLQ)...
Learn MorePaul BrebnerJune 01, 2026 -
- Apache Cassandra
- Dev Rel
- Feature Releases
- News
What’s new in Cassandra® 6? A roundup of features for users and operators
Apache Cassandra 6 is shaping up to be significant release as some of its biggest changes affect the core behavior of the database: How metadata is coordinated How Cassandra is moving toward broader transaction support via Accord protocol How repair is scheduled, and How operators inspect and manage the system. Let’s focus on a few...
Learn MoreMariah McLaughlinMay 14, 2026 -
- Apache Kafka
- Dev Rel
- News
- Open Source
When (and when not) to use Apache Kafka® Diskless Topics
I recently wrote a Visual Guide to Apache Kafka Diskless Topics, which introduces the main ideas behind Kafka Diskless Topics and links to the relevant Kafka KIPs. At present, the only accepted KIP is the high-level KIP-1150, and the only available implementation is Aiven’s “Inkless” fork. So, what are some potential Kafka Diskless Topic use...
Learn MorePaul BrebnerMay 14, 2026