Architecture of YARN in Hadoop
YARN ARCHITECTURE
\=> Yarn is used for resource management
Apache Hadoop YARN Architecture consists of the following main components :
Resource Manager
Node Manager
Application Master
container
✍️ Resource Manager :
1. It is the ultimate authority in resource allocation
2. On receiving the processing of the request, it passes part of the request to corresponding node managers accordingly, where the actual processing takes place.
3. It is the arbitrator of the cluster resources and decides the allocation of the available resources for competing applications.
4. Optimizes the cluster utilization by keeping all resources in use all the time against various constraints such as capacity guarantees, fairness, and SLAs.
5. It has two major components:
👋 a) Scheduler: The Scheduler component is responsible for allocating resources to applications based on their requirements. It manages the resource requests from various applications and decides how to allocate resources among them.
The scheduler uses policies and algorithms to determine the best allocation strategy, taking into consideration factors like resource availability, application priority, fairness, and user-defined constraints.
👋 b) Application Manager :
* The Application Manager is responsible for accepting job submissions, negotiating the initial container allocation with the Scheduler, and tracking the progress of the application.
* It works closely with the Scheduler to ensure that the required resources are allocated for the application's execution.
* The Application Manager also monitors the health and status of the application and handles failures or restarts if necessary.
👊 Together, the Scheduler and Application Manager form the Resource Manager, which provides a unified view and control of the cluster resources. 👊 The Resource Manager communicates with the Node Managers (running on each node of the cluster) to manage and monitor the available resources and the execution of containers.
👊👊 In summary, the Resource Manager in YARN acts as a centralized resource manager, coordinating the allocation and scheduling of resources for applications running on a Hadoop cluster.
✍️ Node Manager :
1. It takes care of individual nodes in a Hadoop cluster and manages user jobs and workflow on the given node.
2. The NodeManager is the per-machine framework agent who is responsible for containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the ResourceManager/Scheduler.
3. It registers with the Resource Manager and sends heartbeats with the health status of the node.
4. Its primary goal is to manage application containers assigned to it by the resource manager.
5. It keeps up-to-date with the Resource Manager.
6. It also kills the container as directed by the Resource Manager.
✍️ Application Master :
1. An application is a single job submitted to the framework. Each such application has a unique Application Master associated with it which is a framework-specific entity.
2. The ApplicationsManager is responsible for accepting job submissions, negotiating the first container for executing the application-specific ApplicationMaster and providing the service for restarting the ApplicationMaster container on failure.
3. It is the process that coordinates an application’s execution in the cluster and also manages faults.
4. Its task is to negotiate resources from the Resource Manager and work with the Node Manager to execute and monitor the component tasks.
5. It is responsible for negotiating appropriate resource containers from the Resource Manager, tracking their status and monitoring progress.
6. Once started, it periodically sends heartbeats to the Resource Manager to affirm its health and to update the record of its resource demands.
✍️ Container :
1. It is a collection of physical resources such as RAM, CPU cores, and disks on a single node.
2. YARN containers are managed by a container launch context which is the container life-cycle(CLC).
3. It grants rights to an application to use a specific amount of resources (memory, CPU etc.) on a specific host.