Tuning and setting of memory and CPU on yarn cluster

Memory and CPU configuration of yard

According to the calculation method in the article, set the value of the following property and adjust it to the best state. Note that in many cases, a single task is a container. The parallelism of task is also the parallelism of container. It can be understood that task is the task started in container.

Note a few attributes

profile

Configuration settings Chinese explanation
yarn.nodemanager.resource .memory-mb Indicates the total amount of physical memory that can be used by yard on the nodemanager node. The default value is 8192 (MB). Note that if your node’s memory resource is not enough for 8GB, you need to reduce this value, and yard will not intelligently detect the total amount of physical memory of the node.
yarn.scheduler.minimum -allocation-mb The minimum value of the memory range of the container, that is, the minimum amount of physical memory that a single task can request. The default value is 1024 (MB). If the amount of physical memory requested by a task is less than this value, the corresponding value will be changed to this number.
yarn.scheduler.maximum -allocation-mb The maximum value of the memory range of the container, that is, the maximum amount of physical memory that a single task can apply for, is 8192 (MB) by default.
yarn.app.mapreduce . am.resource.mb Amount of memory occupied by Mr applicationmaster
yarn.app.mapreduce . am.command -opts
mapreduce.map.memory .mb The memory requested by the map task when the container is started
mapreduce.reduce.memory .mb The memory requested by the reduce task when the container is started
mapreduce.map.java .opts
mapreduce.reduce.java .opts

1. yarn.scheduler.minimum -allocation-mb/ yarn.scheduler.maximum -Allocation MB should be the minimum and maximum memory requested by the container to the resource manager, and – MB means that the memory unit is MB. Refer to the explanation of configuration items on the official website.

2. mapreduce.map.memory .mb/ mapreduce.reduce.memory . MB should be the memory requested from the resource scheduler when the task starts. There are two parameters in release 2.7.2, but they are not found in release 2.5.2. Refer to the explanation of configuration items on the official website.

I don’t quite understand here. What’s the difference between 1 and 2?1 is the memory requested from the container to the ResourceManager, and 2 is the memory requested from the scheduler. Is it different?My vague understanding is that first of all, the container is equivalent to a container. When the container is created, it applies to RM for memory. It is a flexible range with the maximum and minimum. This is the understanding of 1. When there is not so much, the memory can be used a little smaller. 2 refers to the memory that the task wants the container to have when starting the task. Of course, the memory that the task wants from the container must not be larger than that of the container.

According to the following reply, it should be the meaning I explained above. Link: the scheduling and isolation of memory and CPU resources in Hadoop ring

Hello, why did I set it up yarn.scheduler.minimum -Allocation MB and yarn.scheduler.maximum -After the allocation MB parameter, the system only allocates the available memory of the container according to the former. When the amount of memory used exceeds the minimum value but is not greater than the maximum value, will it report an oom error?

[reply]

Dong re:
November 15th, 2013 at 1:11 am

These two parameters are not what you mean. They are used by the administrator to set the minimum and maximum memory resources that can be applied by each task that the user can set. The specific number of requests for each task is set by each application. If it is a MapReduce program, the resources that can be applied for by map task can be set by mapreduce.map.memory . MB specifies that the resources of reduce task can be accessed through mapreduce.reduce.memory . MB specifies that the maximum number of these two parameters cannot exceed yarn.scheduler.maximum -allocation-mb。

Excerpt:

Hadoop ring supports both memory and CPU resource scheduling. This paper introduces how to configure the use of memory and CPU.

As a resource scheduler, yarn should consider the computing resources of each machine in the cluster, and then allocate containers according to the resources requested by application. Container is the basic unit of resource allocation in yarn. It has a certain amount of memory and CPU resources.

In the yarn cluster, it is very important to balance the resources of memory, CPU and disk. According to experience, when every two containers use one disk and one CPU core, the resources of the cluster can be better utilized.

Memory configuration

For memory related configuration, please refer to hortonwork’s document determine HDP memory configuration settings to configure your cluster.

All the available memory resources of horn and MapReduce should be removed from the system operation and other Hadoop programs. The total reserved memory = system memory + HBase memory.

You can refer to the following table to determine the memory that should be reserved:

The following formula can be used to calculate the maximum number of containers each machine can have:

containers = min (2*CORES, 1.8*DISKS, (Total available RAM)/MIN_ CONTAINER_ SIZE)

Note:

cores is the number of machine CPU cores

disks is the number of disks mounted on the machine

total available RAM is the total memory of the machine

MIN_ CONTAINER_ Size refers to the minimum capacity of the container, which needs to be set according to the specific situation. Please refer to the following table:

The available RAM of each machine is
the minimum value of container

The average memory size of each container is calculated as follows:

RAM-per-container = max(MIN_ CONTAINER_ SIZE, (Total Available RAM)/containers))

Through the above calculation, horn and MapReduce can be configured as follows:

For example: for a machine with 128G memory and 32 core CPU, 7 disks are mounted. According to the above description, the system reserves 24g of memory. If it is not suitable for HBase, the remaining available memory of the system is 104g. The calculated values of containers are as follows:

containers = min (2*32, 1.8* 7 , (128-24)/2) = min (64, 12.6 , 51) = 13

The ram per container value is calculated as follows:

RAM-per-container = max (2, (124-24)/13) = max (2, 8) = 8

In this case, the memory of each container is 8g, which seems to be a little too much. I prefer to adjust it to 2G according to the cluster usage task. Then the following parameter configuration values in the cluster are as follows:

The corresponding XML configuration is as follows:

<property&>
      <name&>yarn.nodemanager.resource.memory-mb</name&>
      <value&>106496</value&>
  </property&>
  <property&>
      <name&>yarn.scheduler.minimum-allocation-mb</name&>
      <value&>2048</value&>
  </property&>
  <property&>
      <name&>yarn.scheduler.maximum-allocation-mb</name&>
      <value&>106496</value&>
  </property&>
  <property&>
      <name&>yarn.app.mapreduce.am.resource.mb</name&>
      <value&>4096</value&>
  </property&>
  <property&>
      <name&>yarn.app.mapreduce.am.command-opts</name&>
      <value&>-Xmx3276m</value&>
  </property&>

In addition, there are the following parameters:

yarn.nodemanager.vmem -Pmem ratio : for each 1MB of physical memory used by a task, the maximum amount of virtual memory can be used. The default is 2.1.

yarn.nodemanager.pmem -Check enabled : whether to start a thread to check the amount of physical memory used by each task. If the task exceeds the allocated value, it will be killed directly. The default is true.

yarn.nodemanager.vmem -Pmem ratio : whether to start a thread to check the amount of virtual memory used by each task. If the task exceeds the allocated value, it will be killed directly. The default is true.

The first parameter means that when the total physical memory allocated by a map task is 2G, the maximum heap memory allocated in the container of the task is 1.6g, and the maximum virtual memory that can be allocated is 2 * 2.1 = 4.2g. In addition, according to this calculation, the number of maps that can be started by horn on each node is 104/2 = 52.

CPU configuration

The current CPU in yard is divided into virtual CPU (CPU virtual) The concept of virtual CPU here is introduced by yarn himself. The original intention is to consider that the CPU performance of different nodes may be different, and the computing power of each CPU is also different. For example, the computing power of one physical CPU may be twice that of another physical CPU. At this time, you can make up for this difference by configuring more virtual CPUs for the first physical CPU. When users submit jobs, they can specify the number of virtual CPUs required for each task.

In horn, the CPU configuration parameters are as follows:

yarn.nodemanager.resource . CPU vcores : indicates the number of virtual CPUs that can be used by yard on the node. The default value is 8. Note that it is recommended to set this value to the same number of physical CPU cores. If the number of CPU cores of your node is less than 8, you need to reduce this value, and horn will not intelligently detect the total number of physical CPUs of the node.

yarn.scheduler.minimum -Allocation vcores : the minimum number of virtual CPUs that can be applied for a single task. The default value is 1. If the number of CPUs applied for a task is less than this number, the corresponding value will be changed to this number.

yarn.scheduler.maximum -Allocation vcores : the maximum number of virtual CPUs that can be applied for a single task. The default is 32.

For a cluster with a large number of CPU cores, the above default configuration is obviously inappropriate. In my test cluster, each of the four nodes has 31 CPU cores, leaving one for the operating system, which can be configured as follows:

  <property&>
      <name&>yarn.nodemanager.resource.cpu-vcores</name&>
      <value&>31</value&>
  </property&>
  <property&>
      <name&>yarn.scheduler.maximum-allocation-vcores</name&>
      <value&>124</value&>
  </property&>

Similar Posts: