Modern businesses face increasing pressure to deliver real-time insights from ever-growing, distributed data sources. Today, NetApp is making this easier than ever by announcing seamless integration between fully managed Instaclustr for ClickHouse and Amazon FSx for NetApp ONTAP—designed to empower AWS customers to build a truly hybrid lakehouse architecture.

Many customers have to make choices between type of storage to use for storing data for analytics. Data may be ingested from SMB or NFS protocols, but your analytics pipelines only work with object storage which forces customers to move data from file to object, maintain different copies, secure it at multiple places – all this ends up in more complexity and cost.

Why lakehouse architecture?

A data lakehouse merges the strengths of data lakes and data warehouses. While data lakes provide flexibility and scalability, and ability to store large amounts of unstructured and semi-structured data, for example, for artificial intelligence (AI) and machine learning (ML) workloads, data warehouses come with the ability to store structured data, for example, for business intelligence and reporting. It allows organizations to store structured, semi-structured, and unstructured data in low-cost repositories while enabling advanced analytics, business intelligence (BI), and machine learning (ML) use-cases.

Why FSx for NetApp ONTAP?

FSx for NetApp ONTAP helps with a single unified file and object data store so that customers don’t have to move data to object to run analytics. Data can be ingested from any file protocol and made available for analytics with ClickHouse. FSx for NetApp ONTAP already supports tiering of the data so hot data gets faster performance while older data gets tiered to low cost tier, saving long term storage costs. Data can be brought in from on-prem via SnapMirror and FlexCache and made available for data processing and ETL pipelines via say DIY Spark, and then final analysis by ClickHouse.

Using ONTAP S3, customers avoid any AWS S3 API costs, thereby adding predictability to analytics while lowering overall cost. Customers can also run their transactional workloads like SQL, Oracle, PostgreSQL, etc, on the same FSx for NetApp ONTAP system which can also be connected to ClickHouse for a unified analytics solution.

Why ClickHouse?

ClickHouse is an open source, columnar database purpose-built for ultra-fast analytics on massive datasets. Its architecture delivers real-time query performance through vectorized execution and advanced compression, making it ideal for interactive dashboards, streaming analytics and AI workloads.

In modern data lakehouse environments, ClickHouse integrates seamlessly with open file formats like Apache Parquet and Apache ORC as well as open table formats like Apache Iceberg, allowing organizations to query data directly without costly ETL processes. Combined with its horizontal scalability, fault-tolerant replication, and ability to run on commodity hardware, ClickHouse offers a powerful, cost-effective foundation for next-generation analytics and AI systems—without vendor lock-in.

Why Instaclustr for ClickHouse?

Instaclustr for ClickHouse is a fully managed service designed to deliver high-performance analytics with minimal operational overhead. It offers automated provisioning, monitoring, and scaling, along with advanced features like tiered storage and load balancing for cost-effective scalability. With enterprise-grade reliability, 24×7 support, and strong security compliance, it’s ideal for mission-critical workloads. Instaclustr also maintains a pure open source approach and integrates seamlessly with technologies like Kafka and PostgreSQL, making it a powerful choice for building robust data pipelines.

Benefits of Instaclustr for ClickHouse and Amazon FSx for NetApp ONTAP integration for lakehouse architecture

In a traditional lakehouse, you would need to prepare and move data several times as it is enriched by processing layers. However, with Clickhouse, all your data will stay in FSx for ONTAP and be catalogued and filtered ready for any analytic, AI or ML workload. This saves you significant storage, network, and processing costs. To explain a bit further:

  • Unified data lake: FSx for ONTAP acts as a scalable, S3-compatible data lake, supporting both file and object protocols for maximum flexibility.
  • Direct analytics: Instaclustr Managed ClickHouse queries data sources directly from FSx for ONTAP in any of the dozens of supported data file or table formats (for example, Parquet, Iceberg, etc) via the S3 compatible endpoints, eliminating the need for redundant data ingestion.
  • Zero‑copy lakehouse analytics: Run ClickHouse directly on your FSx for NetApp ONTAP data. One copy. Instant insights.
  • Faster time-to-insight: Query immediately without ETL copies. Consistent multi‑AZ performance via ONTAP.
  • Lower TCO: With your one authoritative dataset benefitting from FSx for NetApp ONTAP’s built‑in compression and deduplication you can eliminate redundant storage and egress, reducing the total cost of ownership.
  • Simpler operations: Fully managed ClickHouse + FSx for NetApp ONTAP with Snapshots, versioning, and S3 compatibility out of the box.

“Amazon FSx for NetApp ONTAP is already a well-known solution for its cost efficiency, performance, and data mobility. Now with the added ease of integration between Instaclustr Managed ClickHouse and FSx for NetApp ONTAP, customers can deploy a ready-to-use Lakehouse solution backed by our 24×7 expert support so you can focus on insights, not infrastructure.”

― Ben Slater, VP and General Manager, NetApp Instaclustr

ntap clickhouse services diagram

Getting started with Managed ClickHouse and FSx for ONTAP

Connecting your existing FSx for ONTAP cluster to Instaclustr’s managed ClickHouse is a straightforward process. The Instaclustr console guides you through creating your enterprise-grade ClickHouse cluster and configuring connectivity with FSx for ONTAP. For existing customers, this process will feel familiar, similar to setting up other integrations with your managed clusters. If you manage your infrastructure as code, you can also achieve this with a single Terraform script.

Find more details to help you get started on our dedicated support page.

The Future of hybrid and multi-cloud analytics

We are constantly working to enhance this integration. In the coming months, we plan to add support for more complex deployments, such as cross-VPC and cross-hyperscaler ClickHouse and ONTAP clusters. This aims to provide even better support for hybrid and multi-cloud environments, giving you more flexibility for your lakehouse analytics strategy.

Ready to experience the power of FSx for ONTAP with Instaclustr for ClickHouse? Sign up for a free trial to get started today or contact us to discuss your unique Lakehouse use case.

 


 

SAFE HARBOR STATEMENT: Any unreleased services or features referenced in this blog are not currently available and may not be made generally available on time or at all, as may be determined in NetApp’s sole discretion. Any such referenced services or features do not represent promises to deliver, commitments, or obligations of NetApp and may not be incorporated into any contract. Customers should make their purchase decisions based upon services and features that are currently generally available.