High Performing Analytics Engine – Apache Spark
Instaclustr Managed Apache Spark provides a reliable and managed platform, collocated with your Apache Cassandra data store, to leverage the power of Apache Spark for stream or batch analytics. Harness the power of a high-performing and a faster analytic engine without having to move your data.
SOC 2 Certified
Instaclustr Managed Apache Spark is SOC 2 Certified, providing cluster security and availability assurance. Our SOC2 program includes security and availability considerations in our design, along with continually reviewing, testing and monitoring the environment.
Instaclustr Managed Apache Spark
The Instaclustr Managed Service is available on AWS, Azure, Google Compute Platform and IBM SoftLayer and provides a range of key features to ensure you can focus on the productive work of developing analytics with Spark.
What is Apache Spark?
The fast and powerful open source processing engine, Apache Spark is built around speed, ease of use and sophisticated analytics.
With advanced DAG execution engine that supports cyclic data flow and in-memory computing, Apache Spark is 100x faster than its competing analytic engines. UC Berkeley’s AMPLab developed Spark in 2009 and open sourced it in 2010, since then, it has grown to become one of the largest open source communities in big data. Built by a wide set of developers from 200+ companies, 1000+ developers have contributed to Spark since 2009.
Advantages of Managed Apache Spark
Spark detect patterns and provide actionable insight to your data. Healthcare, Banking, Airlines, Retail, Scientific Research and many other industries use data from Apache Spark to improvise their business performance. Yahoo, Amazon, eBay, Uber, Alibaba are some of the big names using Apache Spark in production.
Collocated Data Engine
Your Apache Spark engine is right where your operational database resides. No need for extracting, transforming and loading into a new environment.
Functional and Easy to Use
Apache Spark can be deployed as a standalone cluster mode, or in the cloud. Apache Spark can access data from diverse sources including Cassandra. It has easy to use APIs to operate on large datasets.
A Unified Engine
Apache Spark lets you seamlessly combine various libraries like Spark SQL, Spark streaming, MLliB (machine learning), GraphX (graph) to create complex workflows and manage analytics.
Collocated Data Engine
Your Spark engine is right where your operational database resides. No need for extracting, transforming and loading into a new environment.
We provide Apache Cassandra as the underlying data store. Spark fully integrates with the key components of Cassandra and provides the resilience and scale required.
Spark Jobserver and Apache Zeppelin
To provide easy access to your Spark processing engine, Instaclustr’s Spark cluster can include Spark Jobserver (REST API) and Apache Zeppelin (analyst notebook UI).
Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing. It runs programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.
Managed for Reliability
Instaclustr’s focus is supporting application reliability at scale. Our Spark architecture and support offering enables you to use the power of Spark from your application, with the confidence to meet your availability and processing requirements.