Spark :
Apache Spark is general-purpose cluster computing System to process big
data at a very fast speed in comparison with traditional Map-Reduce. Spark is design uniquely, which allows it keeping large amounts of data in memory. Spark
programs can be 100 times faster than their Map Reduce counterparts.
In Compare
to Map reduce model , Spark support more types of computations , including
interactive queries and streaming processing .The Biggest difference come up
between map reduce and Spark is "Use of memory (RAM)".
In Map reduce,
memory is primarily used to actual computation and it makes process very slow.
While In Spark, memory is used for both purpose to compute and also to store
objects.
Spark is
like a shopping mall which have all layouts in one place .It means Spark is
designed to cover all types of workloads that required separate distributed
systems, including batch applications, iterative algorithms, interactive
queries and Streaming. Spark makes it
easy and inexpensive to combine different processing types, which is often
necessary in production data analysis pipelines.
Spark is
written in Scala and it is designed to be highly accessible by Using Python,
Java, Scala, and SQL.
There some application which are not
suitable for Spark due to its distributed architecture. Spark’s overhead is
negligible when handling large amounts of data, but if you have small amount of
data which can be fit in single machine than it will more beneficial to go with
some other framework .Also Spark was not made with OLTP application – which
have fast and numerous transactions. It is better suited for OLAP – which have
batch jobs and data mining.
No comments:
Post a Comment