Sunday, 1 May 2016

Spark Architeture










Spark Driver : 

It is the application which trigger the main Method in which the Instance of sparkContext is created. In simple , its a process that creates and  own an instance of Sparkcontext.
It is the Engine of train which holds the responsibility of jobs and task execution. It splits a Spark application into tasks and schedules them to run on executors. A driver is where the task scheduler lives and spawns tasks across workers. A driver coordinates workers and overall execution of tasks.

Master : 

A master is a running Spark instance that connects to a cluster manager for resources. The master  runs executors on cluster nodes. It’s a mediator between driver and cluster but not SC.

Spark Worker : 

Spark Workers or slaves are running Spark instances where executors live to execute tasks. They are the compute nodes in Spark. A worker receives serialized tasks that it runs in a thread pool.

Executors

Executors are distributed agents that execute tasks. They typically run for the entire lifetime of a Spark application. Executors send active task metrics to a driver and inform executor backends about task status updates (task results including).

Executors provide in-memory storage for RDDs that are cached in Spark applications. When executors are started they register themselves with the driver and communicate directly to execute tasks.

Executor offers are described by executor id and the host on which an executor runs.

Executors use a thread pool for sending metrics and launching tasks (using Task Runner) .Each executor can run multiple tasks over its lifetime, both in parallel and sequentially.It is recommended to have as many executors as data nodes and as many cores as you can get from the cluster.

Task Runner :

TaskRunner is a thread of execution that manages a single individual task.
It can be run or killed that boils down to running or killing the task the TaskRunner object manages.
A TaskRunner object is created when an executor is requested to launch a task. It is created with an ExecutorBackend (to send the task’s status updates to), task and attempt ids, task name, and serialized version of the task (as ByteBuffer).

No comments:

Post a Comment