Spark Driver :
It is the
application which trigger the main Method in which the Instance of sparkContext
is created. In simple , its a process that creates and own an instance of Sparkcontext.
It is the Engine
of train which holds the responsibility of jobs and task execution. It splits a Spark
application into tasks and schedules them to run on executors. A driver is
where the task scheduler lives and spawns tasks across workers. A driver
coordinates workers and overall execution of tasks.
Master :
A master is a running Spark
instance that connects to a cluster manager for resources. The master runs executors on cluster nodes. It’s a mediator
between driver and cluster but not SC.
Spark Worker :
Spark Workers
or slaves are running Spark instances where executors live to execute tasks.
They are the compute nodes in Spark. A worker receives serialized tasks that it
runs in a thread pool.
Executors
Executors are distributed agents that execute
tasks. They typically run for the entire lifetime of a Spark application. Executors
send active task metrics to a driver and inform executor backends about task
status updates (task results including).
Executors provide in-memory storage for RDDs
that are cached in Spark applications. When executors are started they register
themselves with the driver and communicate directly to execute tasks.
Executor offers are described by executor id
and the host on which an executor runs.
Executors use a thread pool for sending
metrics and launching tasks (using Task Runner) .Each executor can run multiple
tasks over its lifetime, both in parallel and sequentially.It is recommended to
have as many executors as data nodes and as many cores as you can get from the
cluster.
Task Runner :
TaskRunner is a thread of execution that
manages a single individual task.
It can be run or killed that boils down to
running or killing the task the TaskRunner object manages.
A TaskRunner object is created when an
executor is requested to launch a task. It is created with an ExecutorBackend
(to send the task’s status updates to), task and attempt ids, task name, and
serialized version of the task (as ByteBuffer).