Apache Spark on Cassandra

Instaclustr provides fully hosted and Managed Apache Spark™ solution on Cassandra so you can embrace the analytical power of Spark without having to move your data.

High Performing Analytics Engine – Apache Spark

Instaclustr Managed Apache Spark provides a reliable and managed platform, collocated with your Apache Cassandra data store, to leverage the power of Apache Spark™ for stream or batch analytics. Harness the power of a high-performing and a faster analytic engine without having to move your data.

SOC 2 Certified

Instaclustr Managed Apache Spark is SOC 2 Certified, providing cluster security and availability assurance. Our SOC2 program includes security and availability considerations in our design, along with continually reviewing, testing and monitoring the environment.

Instaclustr Managed Apache Spark

The Instaclustr Managed Service is available on AWS, Azure, GCP, and IBM SoftLayer and provides a range of key features to ensure you can focus on the productive work of developing analytics with Spark.

Get in Touch

Apache Spark Managed Service

Monitoring

Our management console provides integrated Spark management and monitoring.

24x7 Expert Support

We bring 24/7 technical expert support for our Managed Apache Spark customers.

Apache Cassandra

We are the experts for providing open source technologies as managed services. We provide Managed Apache Cassandra as the underlying datastore. Spark fully integrates with the key components of Cassandra and provides the resilience and scale you would need for your application.

Spark Jobserver and Apache Zeppelin

To provide easy access to your Spark processing engine, Instaclustr’s Spark cluster can include Spark Jobserver (REST API) and Apache Zeppelin (analyst notebook UI).

Managed for Reliability

Our managed environment is focussed on bringing reliability at scale. Our Spark architecture and support offering enables you to use the power of Spark from your application, with the confidence to meet your availability and processing requirements.

What is Apache Spark™?

The fast and powerful open source processing engine, Apache Spark is built around speed, ease of use and sophisticated analytics.

With advanced DAG execution engine that supports cyclic data flow and in-memory computing, Apache Spark is 100x faster than its competing analytic engines. UC Berkeley’s AMPLab developed Spark in 2009 and open sourced it in 2010, since then, it has grown to become one of the largest open source communities in big data. Built by a wide set of developers from 200+ companies, 1000+ developers have contributed to Spark since 2009.

Advantages of Managed Apache Spark

Spark detect patterns and provide actionable insight to your data. Healthcare, Banking, Airlines, Retail, Scientific Research and many other industries use data from Apache Spark to improvise their business performance. Yahoo, Amazon, eBay, Uber, Alibaba are some of the big names using Apache Spark in production.

Collocated Data Engine

Your Apache Spark engine is right where your operational database resides. No need for extracting, transforming and loading into a new environment.

Functional and Easy to Use

Apache Spark can be deployed as a standalone cluster mode, or in the cloud. Apache Spark can access data from diverse sources including Cassandra. It has easy to use APIs to operate on large datasets.

A Unified Engine

Apache Spark lets you seamlessly combine various libraries like Spark SQL, Spark streaming, MLliB (machine learning), GraphX (graph) to create complex workflows and manage analytics.

Apache Spark Ecosystem

A lightning fast in-memory cluster computing, Apache Spark requires a fast, distributed back-end data store to provide advanced analytics capabilities, Apache Cassandra is the most modern, reliable and scalable choice for a data store.

Programming Languages

Apache Spark support popular languages for data analysis like Python and R, as well as the enterprise-friendly Java and Scala thus, allowing everyone from application developers to data scientists to harness its scalability and speed.

Libraries

 

Libraries The Spark core is complemented by a set of powerful, higher-level libraries which can be seamlessly used in the same application

Spark SQL  : Spark SQL is a module for working with the structured data. Spark SQL provides a standard interface for reading from and writing to other datastores.  It also provides powerful integration with the rest of the Spark ecosystem (e.g., integrating SQL query processing with machine learning). A server mode provides industry standard JDBC and ODBC connectivity for business intelligence tools.

Spark Streaming – An early addition to Apache Spark, Spark Streaming makes it easy to build scalable fault-tolerant streaming applications. Spark Streaming enables powerful interactive and analytic applications across both streaming (of new data in real-time) and historical data. It readily integrates with a wide variety of popular data sources.  Spark Streaming extended the Apache Spark concept of batch processing into streaming by breaking the stream down into a continuous series of microbatches.

MLliB – Apache Spark’s scalable machine learning library, this library is usable in Java, Scala, and Python as part of Spark applications. It includes a framework for creating machine learning pipelines, allowing for easy implementation of feature extraction, selections, and transformations on any structured dataset.

GraphX –  Apache Spark’s API for graphs and graph-parallel computation. It comes with a selection of distributed algorithms for processing graph structures. It includes a growing collection of graph algorithms and builders to simplify graph analytic tasks.

Apache Spark Core

Spark Core (General Execution Engine): A general processing engine for the Spark platform  provides in-memory computing capabilities to deliver fast execution of a wide variety of applications. Spark Core component is the foundation for parallel and distributed processing of large datasets. It provides distributed task dispatching, scheduling, and basic I/O functionality. It also handles node failures and re-computes missing pieces.

Data Sources

Spark supports many data sources including (with the Spark Cassandra connector), Apache Cassandra.

Ready to experience the power of Apache Spark?

Related Services

Site by Swell Design Group