Saturday, 30 April 2016

What is Apache Spark

Spark :


Apache Spark is general-purpose cluster computing System to process big data at a very fast speed in comparison with traditional Map-Reduce. Spark is design uniquely, which allows it keeping large amounts of data in memory. Spark programs can be 100 times faster than their Map Reduce counterparts.

In Compare to Map reduce model , Spark support more types of computations , including interactive queries and streaming processing .The Biggest difference come up between map reduce and Spark is "Use of memory (RAM)".
In Map reduce, memory is primarily used to actual computation and it makes process very slow. While In Spark, memory is used for both purpose to compute and also to store objects.

Spark is like a shopping mall which have all layouts in one place .It means Spark is designed to cover all types of workloads that required separate distributed systems, including batch applications, iterative algorithms, interactive queries and Streaming.  Spark makes it easy and inexpensive to combine different processing types, which is often necessary in production data analysis pipelines.

Spark is written in Scala and it is designed to be highly accessible by Using Python, Java, Scala, and SQL.


There some application which are not suitable for Spark due to its distributed architecture. Spark’s overhead is negligible when handling large amounts of data, but if you have small amount of data which can be fit in single machine than it will more beneficial to go with some other framework .Also Spark was not made with OLTP application – which have fast and numerous transactions. It is better suited for OLAP – which have batch jobs and data mining.

No comments:

Post a Comment