Tag Archives: hadoop

Hadoop Start Error: ssh: Could not resolve hostname xxx: Name or service not known

1.problem

hadoop Start Error ssh: Could not resolve hostname xxx: Name or service not known

2. Error message:

[root[@master](https://my.oschina.net/u/48054) hadoop-2.2.0]# sbin/start-dfs.sh 
17/08/06 13:08:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /data/cloud/hadoop/hadoop-2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
master]
sed: -e expression #1, char 6: unknown option to `s'
library: ssh: Could not resolve hostname library: Name or service not known
loaded: ssh: Could not resolve hostname loaded: Name or service not known
VM: ssh: Could not resolve hostname VM: Name or service not known
have: ssh: Could not resolve hostname have: Name or service not known
Server: ssh: Could not resolve hostname Server: Name or service not known
64-Bit: ssh: Could not resolve hostname 64-Bit: Name or service not known
have: ssh: Could not resolve hostname have: Name or service not known
disabled: ssh: Could not resolve hostname disabled: Name or service not known
VM: ssh: Could not resolve hostname VM: Name or service not known
to: ssh: Could not resolve hostname to: No address associated with hostname
Java: ssh: Could not resolve hostname Java: No address associated with hostname
which: ssh: Could not resolve hostname which: Name or service not known
will: ssh: Could not resolve hostname will: Name or service not known
might: ssh: Could not resolve hostname might: Name or service not known
stack: ssh: Could not resolve hostname stack: Name or service not known
The: ssh: Could not resolve hostname The: Name or service not known
stack: ssh: Could not resolve hostname stack: Name or service not known
guard.: ssh: Could not resolve hostname guard.: Name or service not known
recommended: ssh: Could not resolve hostname recommended: Name or service not known
guard: ssh: Could not resolve hostname guard: Name or service not known
try: ssh: Could not resolve hostname try: Name or service not known
fix: ssh: Could not resolve hostname fix: Name or service not known
-c: Unknown cipher type 'cd'
fix: ssh: Could not resolve hostname fix: Name or service not known
the: ssh: Could not resolve hostname the: Name or service not known
the: ssh: Could not resolve hostname the: Name or service not known
with: ssh: Could not resolve hostname with: Name or service not known
it: ssh: Could not resolve hostname it: No address associated with hostname
with: ssh: Could not resolve hostname with: Name or service not known
or: ssh: Could not resolve hostname or: Name or service not known
The authenticity of host 'master (192.168.1.110)' can't be established.
RSA key fingerprint is db:8e:ce:15:18:ad:9d:16:0e:88:98:0d:da:06:e4:18.
Are you sure you want to continue connecting (yes/no)?that: ssh: Could not resolve hostname that: Name or service not known
highly: ssh: Could not resolve hostname highly: Name or service not known
library: ssh: Could not resolve hostname library: Name or service not known
now.: ssh: Could not resolve hostname now.: No address associated with hostname
You: ssh: Could not resolve hostname You: No address associated with hostname
you: ssh: Could not resolve hostname you: No address associated with hostname
link: ssh: Could not resolve hostname link: No address associated with hostname
HotSpot(TM): ssh: Could not resolve hostname HotSpot(TM): Temporary failure in name resolution
warning:: ssh: Could not resolve hostname warning:: Temporary failure in name resolution
It's: ssh: Could not resolve hostname It's: Temporary failure in name resolution
<libfile>',: ssh: Could not resolve hostname <libfile>',: Temporary failure in name resolution
'execstack: ssh: Could not resolve hostname 'execstack: Temporary failure in name resolution
'-z: ssh: Could not resolve hostname '-z: Temporary failure in name resolution
noexecstack'.: ssh: Could not resolve hostname noexecstack'.: Temporary failure in name resolution

3. Solution:

Add in /etc/profile file:
	export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
	export HADOOP_OPTS=-Djava.library.path=$HADOOP_HOME/lib
	
Note: is export HADOOP_OPTS=-Djava.library.path=$HADOOP_HOME/lib, not export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

Yarn configures multi queue capacity scheduler

First configure the hadoop/etc/capacity-scheduler.xml file

<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

   <!-- The maximum number of jobs the capacity scheduler can hold-->
  <property&>
    <name&>yarn.scheduler.capacity.maximum-applications</name>
    <value&>10000</value>
    <description&>
      Maximum number of applications that can be pending and running.
    </description>
  </property>

  <!-- How much of the total resources of the queue can be occupied by the MRAppMaster process started in the current queue
        This parameter allows you to limit the number of submitted Jobs in the queue
  --&>
  <property&>
    <name&>yarn.scheduler.capacity.maximum-am-resource-percent</name&>
    <value&>0.1</value&>
    <description&>
      Maximum percent of resources in the cluster which can be used to run 
      application masters i.e. controls number of concurrent running
      applications.
    </description&>
  </property&>

  <!-- What strategy is used to calculate when allocating resources to a Job
  --&>
  <property>
    <name&>yarn.scheduler.capacity.resource-calculator</name>
    <value&>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
    <description>
      The ResourceCalculator implementation to be used to compare 
      Resources in the scheduler.
      The default i.e. DefaultResourceCalculator only uses Memory while
      DominantResourceCalculator uses dominant-resource to compare 
      multi-dimensional resources such as Memory, CPU etc.
    </description&>
  </property>

   <!-- What sub queues are in the root queue, new a, b queues are added----&>
  <property&>
    <name&>yarn.scheduler.capacity.root.queues</name>
    <value&>default,a,b</value>
    <description>
      The queues at the this level (root is the root queue).
    </description>
  </property>

  <!-- Percentage of capacity occupied by the default queue in the root queue
        The capacity of all subqueues must add up to 100
  --&>
  <property&>
    <name&>yarn.scheduler.capacity.root.default.capacity</name&>
    <value&>40</value&>
    <description&>Default queue target capacity.</description&>
  </property&>
  
  <property&>
    <name&>yarn.scheduler.capacity.root.a.capacity</name&>
    <value&>30</value&>
    <description&>Default queue target capacity.</description&>
  </property>
  
  <property>
    <name&>yarn.scheduler.capacity.root.b.capacity</name>
    <value&>30</value>
    <description&>Default queue target capacity.</description>
  </property>

    <!-- Limit percentage of users in the queue that can use the resources of this queue
  --&>
  <property&>
    <name&>yarn.scheduler.capacity.root.default.user-limit-factor</name&>
    <value&>1</value&>
    <description&>
      Default queue user limit a percentage from 0.0 to 1.0.
    </description&>
  </property&>
  
   <property&>
    <name&>yarn.scheduler.capacity.root.a.user-limit-factor</name&>
    <value&>1</value&>
    <description&>
      Default queue user limit a percentage from 0.0 to 1.0.
    </description&>
  </property&>
  
   <property&>
    <name&>yarn.scheduler.capacity.root.b.user-limit-factor</name&>
    <value&>1</value&>
    <description&>
      Default queue user limit a percentage from 0.0 to 1.0.
    </description&>
  </property&>

  <!-- The maximum value of the percentage of capacity occupied by the default queue in the root queue
  --&>
  <property&>
    <name&>yarn.scheduler.capacity.root.default.maximum-capacity</name&>
    <value&>100</value&>
    <description&>
      The maximum capacity of the default queue. 
    </description&>
  </property&>
  
   <property&>
    <name&>yarn.scheduler.capacity.root.a.maximum-capacity</name&>
    <value&>100</value&>
    <description&>
      The maximum capacity of the default queue. 
    </description&>
  </property&>
  
   <property&>
    <name&>yarn.scheduler.capacity.root.b.maximum-capacity</name&>
    <value&>100</value&>
    <description&>
      The maximum capacity of the default queue. 
    </description&>
  </property&>

    <!-- The state of the default queue in the root queue
  --&>
  <property&>
    <name&>yarn.scheduler.capacity.root.default.state</name&>
    <value&>RUNNING</value&>
    <description&>
      The state of the default queue. State can be one of RUNNING or STOPPED.
    </description&>
  </property&>
  
    <property&>
    <name&>yarn.scheduler.capacity.root.a.state</name&>
    <value&>RUNNING</value&>
    <description>
      The state of the default queue. State can be one of RUNNING or STOPPED.
    </description&>
  </property&>

  
    <property&>
    <name&>yarn.scheduler.capacity.root.b.state</name&>
    <value&>RUNNING</value&>
    <description&>
      The state of the default queue. State can be one of RUNNING or STOPPED.
    </description&>
  </property&>

  <!-- Restrict users who submit to the default queue, i.e. access rights--&>
  <property&>
    <name&>yarn.scheduler.capacity.root.default.acl_submit_applications</name&>
    <value&>*</value&>
    <description&>
      The ACL of who can submit jobs to the default queue.
    </description&>
  </property&>
  
  <property&>
    <name&>yarn.scheduler.capacity.root.a.acl_submit_applications</name&>
    <value&>*</value&>
    <description&>
      The ACL of who can submit jobs to the default queue.
    </description&>
  </property&>
  
  <property&>
    <name&>yarn.scheduler.capacity.root.b.acl_submit_applications</name&>
    <value&>*</value&>
    <description&>
      The ACL of who can submit jobs to the default queue.
    </description&>
  </property&>
<!-- set as admin--&>
  <property&>
    <name&>yarn.scheduler.capacity.root.default.acl_administer_queue</name&>
    <value&>*</value&>
    <description&>
      The ACL of who can administer jobs on the default queue.
    </description&>
  </property&>
  
  <property&>
    <name&>yarn.scheduler.capacity.root.a.acl_administer_queue</name&>
    <value&>*</value&>
    <description&>
      The ACL of who can administer jobs on the default queue.
    </description&>
  </property&>
  
  <property&>
    <name&>yarn.scheduler.capacity.root.b.acl_administer_queue</name&>
    <value&>*</value&>
    <description&>
      The ACL of who can administer jobs on the default queue.
    </description&>
  </property&>

  <property&>
    <name&>yarn.scheduler.capacity.node-locality-delay</name&>
    <value&>40</value&>
    <description&>
      Number of missed scheduling opportunities after which the CapacityScheduler 
      attempts to schedule rack-local containers. 
      Typically this should be set to number of nodes in the cluster, By default is setting 
      approximately number of nodes in one rack which is 40.
    </description&>
  </property&>

  <property&>
    <name&>yarn.scheduler.capacity.queue-mappings</name&>
    <value&></value&>
    <description>
      A list of mappings that will be used to assign jobs to queues
      The syntax for this list is [u|g]:[name]:[queue_name][,next mapping]*
      Typically this list will be used to map users to queues,
      for example, u:%user:%user maps all users to queues with the same name
      as the user.
    </description&>
  </property&>

  <property>
    <name&>yarn.scheduler.capacity.queue-mappings-override.enable</name&>
    <value&>false</value>
    <description>
      If a queue mapping is present, will it override the value specified
      by the user?This can be used by administrators to place jobs in queues
      that are different than the one specified by the user.
      The default is false.
    </description>
  </property>

</configuration>

Use the refresh command after configuration

yarn rmadmin -refreshQueues

Then enter the yarn interface of the cluster, and you can see that the queue becomes three

Then the next step is how to set the job to run in other queues

You know, it’s up to mapred to decide which queue the job runs on- default.xml It’s decided in the document

So you need to change this configuration:

1. If idea is used, it can be used

conf.set (” mapred.job.queue .name”, “a”);

This specifies that the job should be run in the a queue

2. If you run jar package on Linux, you can use

hadoop jar hadoop-mapreduce-examples-2.7.2.jar wordcount -D mapreduce.job.queuename=a/mapjoin/output3

As shown in the figure, the job switches to the a queue

Cause analysis of Hadoop ring ResourceManager crash caused by data limitation of zookeeper node (2)

After five months (click to read the previous article), the problem as shown in the title occurs again. Due to the improvement of our big data monitoring system, I have made a further study on this problem. The following is the whole investigation process and solution:

1、 Problem description

The first ResourceManager service exception alarm was received from 8:12 a.m. on August 8. As of 8:00 a.m. on August 11, the ResourceManager service exception problem frequently occurred between 8:00 a.m. and 8:12 a.m. every day, and occasionally occurred at 8:00 p.m. and 1-3 p.m. every day. The following is the statistics of ResourceManager abnormal status times by SpaceX :

2、 Abnormal causes

1. Abnormal information

The following interception is the log between 20:00 and 20:12 on August 8 , and the exception information in other periods is the same as this information:

2019-08-08 20:12:18,681 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Retrying operation on ZK. Retry no. 544
2019-08-08 20:12:18,886 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server 10.204.245.44/10.204.245.44:5181. Will not attempt to authenticate using SASL (unknown error)
2019-08-08 20:12:18,887 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to 10.204.245.44/10.204.245.44:5181, initiating session
2019-08-08 20:12:18,887 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server 10.204.245.44/10.204.245.44:5181, sessionid = 0x26c00dfd48e9068, negotiated timeout = 60000
2019-08-08 20:12:20,850 WARN org.apache.zookeeper.ClientCnxn: Session 0x26c00dfd48e9068 for server 10.204.245.44/10.204.245.44:5181, unexpected error, closing socket connection and attempting reconnect
java.lang.OutOfMemoryError: Java heap space
2019-08-08 20:12:20,951 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation.
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
	at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
	at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:989)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:986)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1128)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1161)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:986)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:1000)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:1017)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:713)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:243)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:226)
	at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:812)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:872)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:867)
	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
	at java.lang.Thread.run(Thread.java:745)

2. Abnormal causes

The main reason is that the ZK server limits the amount of data of a single node to less than 1m . After the data submitted by the client exceeds 1m , the ZK server will throw the following exception:

Exception causing close of session 0x2690d678e98ae8b due to java.io.IOException: Len error 1788046

After the exception is thrown, yard will continue to retry the ZK , with short interval and many times of retrying, resulting in yard memory overflow and unable to provide normal service.

3. Yard exception code

The following is org.apache.hadoop . yarn.server.resourcemanager . recovery.ZKRMStateStore Method of exception code in :

   /**
     * Update Information
     *
     * @param appAttemptId
     * @param attemptStateDataPB
     * @throws Exception
     */
    @Override
    public synchronized void updateApplicationAttemptStateInternal(
            ApplicationAttemptId appAttemptId,
            ApplicationAttemptStateData attemptStateDataPB)
            throws Exception {
        String appIdStr = appAttemptId.getApplicationId().toString();
        String appAttemptIdStr = appAttemptId.toString();
        String appDirPath = getNodePath(rmAppRoot, appIdStr);
        String nodeUpdatePath = getNodePath(appDirPath, appAttemptIdStr);
        if (LOG.isDebugEnabled()) {
            LOG.debug("Storing final state info for attempt: " + appAttemptIdStr
                    + " at: " + nodeUpdatePath);
        }
        byte[] attemptStateData = attemptStateDataPB.getProto().toByteArray();

        if (existsWithRetries(nodeUpdatePath, true) != null) {
            setDataWithRetries(nodeUpdatePath, attemptStateData, -1);
        } else {
            createWithRetries(nodeUpdatePath, attemptStateData, zkAcl,
                    CreateMode.PERSISTENT);
            LOG.debug(appAttemptId + " znode didn't exist. Created a new znode to"
                    + " update the application attempt state.");
        }
    }

This code is mainly used to update or add task retrial status information to ZK , horn in the process of scheduling tasks, tasks may be retried many times, which is mainly affected by network, hardware, resources and other factors. If the task retry information fails to save ZK , the will be called org.apache.hadoop . yarn.server.resourcemanager . recovery.ZKRMStateStore.ZKAction . runwithretries method to try again. By default, the number of retries is 1000 times, and the interval between retries is affected by whether to enable yard high availability, that is, yard- site.xml in yarn.resourcemanager.ha Whether the. Enabled parameter is true . The official explanation of the retrial interval is as follows:

Retry interval in milliseconds when connecting to ZooKeeper. When HA is enabled, the value here is NOT used. It is generated automatically from yarn.resourcemanager.zk-timeout-ms and yarn.resourcemanager.zk-num-retries.

Under the condition of whether to enable horn high availability, the retrial interval mechanism is as follows:

(1) horn high availability is not enabled:

Affected by yarn.resourcemanager.zk -The default value of this parameter is 1000 in the Bi production environment, and the unit is Ms.

(2) Enable horn high availability:

Affected by yarn.resourcemanager.zk -Timeout MS ( ZK session timeout) and session timeout yarn.resourcemanager.zk -Num retries (number of retries after operation failure) parameter control, the calculation formula is:

try later(yarn.resourcemanager.zk-retry-interval-ms )=yarn.resourcemanager.zk-timeout-ms(ZK session超时时间)/yarn.resourcemanager.zk-num-retries(重试次数)

The process of determining the retrial interval is in org.apache.hadoop . yarn.server.resourcemanager . recovery.ZKRMStateStore.initInternal The method source code is:

// Calculate the time interval to retry the connection ZK, expressed in milliseconds
if (HAUtil.isHAEnabled(conf)) { // In the case of high availability: retry interval = session timeout / number of retries to ZK
    zkRetryInterval = zkSessionTimeout/numRetries;
} else {
    zkRetryInterval =
            conf.getLong(YarnConfiguration.RM_ZK_RETRY_INTERVAL_MS,
                    YarnConfiguration.DEFAULT_RM_ZK_RETRY_INTERVAL_MS);
}

Bi configuration of production environment:

yarn.resourcemanager.zk -Timeout MS : 60000 , unit: ms

yarn.resourcemanager.zk -Num retries : use the default value 1000 , unit times

Therefore, the retrial interval of Bi production environment is 60000/1000 = 60 . If the task status is not saved successfully, it will be retried 1000 times with an interval of 60 Ms. It’s terrible, which will eventually lead to horn heap memory overflow ( 10g = 4G [Cenozoic] + 6G [old age] ). The following is the JVM monitoring data monitored by SpaceX when using the above 2 parameters to perform high frequency retrying operation:

(1) Heap memory usage:

(2) GC times:

(3) full GC time:

3、 Solutions

1. Adjust the number of completed tasks saved by yard in ZK to solve the problem that yard registers too many useless watchers in ZK due to saving too much completed task information in ZK (default value is 10000 ). Major adjustments yarn.resourcemanager.state – store.max -Completed applications and yarn.resourcemanager.max -The parameters of completed applications are as follows:

<!--ZKMaximum number of completed tasks saved--&>
<property&>
  <name&>yarn.resourcemanager.state-store.max-completed-applications</name&>
  <value&>2000</value&>
</property&>

<!--The maximum number of completed tasks saved in RM memory, adjust this parameter mainly for the consistency of the information and number of tasks saved in RM memory and ZK--&>
<property&>
  <name&>yarn.resourcemanager.max-completed-applications</name&>
  <value&>2000</value&>
</property&>

Task status information saved in ZK ( RM_ APP_ The structure of root ) is as follows:

    ROOT_DIR_PATH
      |--- VERSION_INFO
      |--- EPOCH_NODE
      |--- RM_ZK_FENCING_LOCK
      |--- RM_APP_ROOT
      |     |----- (#ApplicationId1)
      |     |        |----- (#ApplicationAttemptIds)
      |     |
      |     |----- (#ApplicationId2)
      |     |       |----- (#ApplicationAttemptIds)
      |     ....
      |
      |--- RM_DT_SECRET_MANAGER_ROOT
      |----- RM_DT_SEQUENTIAL_NUMBER_ZNODE_NAME
      |----- RM_DELEGATION_TOKENS_ROOT_ZNODE_NAME
      |       |----- Token_1
      |       |----- Token_2
      |       ....
      |
      |----- RM_DT_MASTER_KEYS_ROOT_ZNODE_NAME
      |      |----- Key_1
      |      |----- Key_2
      ....
      |--- AMRMTOKEN_SECRET_MANAGER_ROOT
      |----- currentMasterKey
      |----- nextMasterKey

The data structure determines the algorithm implementation. As can be seen from the above structure, a task ID ( applicationid ) will correspond to multiple task retrial information ID ( applicationattemptid ), zkrmstatestore has registered watcher for these nodes, so too many nodes will increase the number of watchers and consume too much ZK Heap memory. Bi production environment horn running tasks every day 7000 or so. Therefore, the above two parameters are reduced to 2000 , and the adjustment will not affect the task status information at runtime. The specific reasons are as follows:

(1) From org.apache.hadoop . yarn.server.resourcemanager According to the operations related to member variables completedappsinstatestore and completedapps in. Rmappmanager class, the above two configurations save the information of completed tasks. The relevant codes are as follows:

protected int completedAppsInStateStore = 0; //Record the completed task information, task completion automatically add 1
private LinkedList<ApplicationId&> completedApps = new LinkedList<ApplicationId&>();// Record the task ID of the completed task, the task is completed and executed remove

 /**
   * Save completed task information
   * @param applicationId
   */
  protected synchronized void finishApplication(ApplicationId applicationId) {
    if (applicationId == null) {
      LOG.error("RMAppManager received completed appId of null, skipping");
    } else {
      // Inform the DelegationTokenRenewer
      if (UserGroupInformation.isSecurityEnabled()) {
        rmContext.getDelegationTokenRenewer().applicationFinished(applicationId);
      }
      
      completedApps.add(applicationId);
      completedAppsInStateStore++;
      writeAuditLog(applicationId);
    }
  }

  /*
   * check to see if hit the limit for max # completed apps kept
   *
   * Check if the number of completed applications stored in memory and ZK exceeds the maximum limit, and perform a remove completed task information operation if the limit is exceeded
   */
  protected synchronized void checkAppNumCompletedLimit() {
    // check apps kept in state store.
    while (completedAppsInStateStore &> this.maxCompletedAppsInStateStore) {
      ApplicationId removeId =
          completedApps.get(completedApps.size() - completedAppsInStateStore);
      RMApp removeApp = rmContext.getRMApps().get(removeId);
      LOG.info("Max number of completed apps kept in state store met:"
          + " maxCompletedAppsInStateStore = " + maxCompletedAppsInStateStore
          + ", removing app " + removeApp.getApplicationId()
          + " from state store.");
      rmContext.getStateStore().removeApplication(removeApp);
      completedAppsInStateStore--;
    }

    // check apps kept in memorty.
    while (completedApps.size() &> this.maxCompletedAppsInMemory) {
      ApplicationId removeId = completedApps.remove();
      LOG.info("Application should be expired, max number of completed apps"
          + " kept in memory met: maxCompletedAppsInMemory = "
          + this.maxCompletedAppsInMemory + ", removing app " + removeId
          + " from memory: ");
      rmContext.getRMApps().remove(removeId);
      this.applicationACLsManager.removeApplication(removeId);
    }
  }

(2) Before modification, yard used the default value of 10000 for the maximum number of completed task information saved in ZK , and checked in zkdoc /bi-rmstore-20190811-1/zkrmstateroot/rmapproot in zkdoc , the number of sub nodes is 10000 + . After reducing, check the /bi-rmstore-20190811-1/zkrmstateroot/rmapproot in zkdoc to see that the number of sub nodes is 2015 , yard the real-time data of the monitoring page shows that 15 tasks were running at that time, that is to say, yard saves the status information of running tasks and completed tasks under this node . The monitoring data of zkdoc are as follows:

From this, we can conclude the mechanism of saving and removing task state in horn :

when there is a new task, yard uses the storeapplicationstateinternal method of zkrmstatestore to save the state of the new task

when more than yarn.resourcemanager.state - store.max -When the completed applications parameter is limited, yard uses the removeapplication method of rmstatestore to delete the status of completed tasks

rmstatestore is the parent class of zkrmstatestore . The synchronized keyword is added to the above two methods. The two operations are independent and do not interfere with each other, so they will not affect the tasks running in yard .

2. Solve the following problems: the retrial interval is too short, which leads to the shortage of horn heap memory and frequent GC :

<!--Default 1000, set to 100 here is to control the frequency of retry connections to ZK, in high availability case, retry frequency (yarn.resourcemanager.zk-retry-interval-ms ) = yarn.resourcemanager.zk-timeout-ms (ZK session timeout) / yarn.resourcemanager.zk-num-retries (number of retries)--&>
<property&>
  <name&>yarn.resourcemanager.zk-num-retries</name&>
  <value&>100</value&>
</property&>

After adjustment, the retrial interval of Bi production environment horn connection ZK is: 60000/100 = 600 Ms. The JVM data monitored by SpaceX is as follows:

(1) Heap memory usage:

(2) GC times:

(3) full GC time:

It can be seen from the monitoring data that when the problem occurs, the JVM heap memory usage, GC times and time consumption are improved due to the increase of the retrial interval.

3. Solve the problem that the task retrial status data exceeds 1m :

Modifying the logic related to horn will affect the task recovery mechanism of horn , so we can only modify the server configuration and client configuration of ZK to solve this problem

(1) ZK server jute.maxbuffer increase the parameter size to 3M 3M

(2) Modify yarn- env.sh , in yard_ Opts and horn_ RESOURCEMANAGER_ Opts configuration – Djute.maxbuffer=3145728 The parameter indicates that the maximum amount of data submitted by the ZK client to the ZK server is 3M . The modified configuration is as follows:

YARN_OPTS="$YARN_OPTS -Dyarn.policy.file=$YARN_POLICYFILE -Djute.maxbuffer=3145728"

YARN_RESOURCEMANAGER_OPTS="-server -Xms10240m -Xmx10240m -Xmn4048m -Xss512k -verbose:gc -Xloggc:$YARN_LOG_DIR/gc_resourcemanager.log-`date +'%Y%m%d%H%M'` -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:SurvivorRatio=8 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSCompactAtFullCollection 
-XX:CMSFullGCsBeforeCompaction=0 -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:+UseCMSInitiatingOccupancyOnly -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$YARN_LOG_DIR -Djute.maxbuffer=3145728 $YARN_RESOURCEMANAGER_OPTS"

After modification, restart ResourceManager service and ZK service to make the configuration effective.

4、 Summary

1. The log mechanism of Hadoop is perfect, and the whole log information is a complete event flow. Therefore, when you encounter problems, you must carefully read the log information of Hadoop to find clues.

2. At present, the ZK cluster used by yard is also used by HBase and other services. With the expansion of cluster scale and the growth of data volume, it will have a certain performance impact on ZK . Therefore, it is recommended to build a separate ZK cluster for yard instead of creating high load on ZK Use a common ZK cluster.

3. Adjusting the maximum amount of node data of ZK to 3M , will have a certain performance impact on ZK , such as cluster synchronization and request processing. Therefore, we must improve the monitoring of the basic service of ZK to ensure high availability.

5、 References

Troubleshooting of frequent change of ownership in yarn ResourceManager active

Resource manager ha application state storage and recovery

yard official issue :

(1) About issue : limit application resource reservation on nodes for non node/rack specific requests

(2) zkrmstatestore update data exceeds 1MB issue : ResourceManager failed when zkrmstatestore tries to update znode data larger than 1MB

The Difference Between Hadoop job-kill and Yarn application-kill

Hadoop job – kill calls CLI.java Inside job.killJob (); there are several situations. If you can find that the status is running, you can send a kill request directly to appmaster.
YARNRunner.java

@Override
  public void killJob(JobID arg0) throws IOException, InterruptedException {
    /* check if the status is not running, if not send kill to RM */
    JobStatus status = clientCache.getClient(arg0).getJobStatus(arg0);
    ApplicationId appId = TypeConverter.toYarn(arg0).getAppId();

    // get status from RM and return
    if (status == null) {
      killUnFinishedApplication(appId);
      return;
    }

    if (status.getState() != JobStatus.State.RUNNING) {
      killApplication(appId);
      return;
    }

    try {
      /* send a kill to the AM */
      clientCache.getClient(arg0).killJob(arg0);
      long currentTimeMillis = System.currentTimeMillis();
      long timeKillIssued = currentTimeMillis;
      while ((currentTimeMillis < timeKillIssued + 10000L)
          && !isJobInTerminalState(status)) {
        try {
          Thread.sleep(1000L);
        } catch (InterruptedException ie) {
          /** interrupted, just break */
          break;
        }
        currentTimeMillis = System.currentTimeMillis();
        status = clientCache.getClient(arg0).getJobStatus(arg0);
        if (status == null) {
          killUnFinishedApplication(appId);
          return;
        }
      }
    } catch(IOException io) {
      LOG.debug("Error when checking for application status", io);
    }
    if (status != null && !isJobInTerminalState(status)) {
      killApplication(appId);
    }
  }

MRClientService.java

@SuppressWarnings("unchecked")
    @Override
    public KillJobResponse killJob(KillJobRequest request) 
      throws IOException {
      JobId jobId = request.getJobId();
      UserGroupInformation callerUGI = UserGroupInformation.getCurrentUser();
      String message = "Kill job " + jobId + " received from " + callerUGI
          + " at " + Server.getRemoteAddress();
      LOG.info(message);
      verifyAndGetJob(jobId, JobACL.MODIFY_JOB);
      appContext.getEventHandler().handle(
          new JobDiagnosticsUpdateEvent(jobId, message));
      appContext.getEventHandler().handle(
          new JobEvent(jobId, JobEventType.JOB_KILL));
      KillJobResponse response = 
        recordFactory.newRecordInstance(KillJobResponse.class);
      return response;
    }

The yarn application -kill uses ApplicationCLI.java, which sends a kill request to RM

/**
   * Kills the application with the application id as appId
   * 
   * @param applicationId
   * @throws YarnException
   * @throws IOException
   */
  private void killApplication(String applicationId) throws YarnException,
      IOException {
    ApplicationId appId = ConverterUtils.toApplicationId(applicationId);
    ApplicationReport  appReport = null;
    try {
      appReport = client.getApplicationReport(appId);
    } catch (ApplicationNotFoundException e) {
      sysout.println("Application with id '" + applicationId +
          "' doesn't exist in RM.");
      throw e;
    }

    if (appReport.getYarnApplicationState() == YarnApplicationState.FINISHED
        || appReport.getYarnApplicationState() == YarnApplicationState.KILLED
        || appReport.getYarnApplicationState() == YarnApplicationState.FAILED) {
      sysout.println("Application " + applicationId + " has already finished ");
    } else {
      sysout.println("Killing application " + applicationId);
      client.killApplication(appId);
    }
  }

YarnClientImpl.java

@Override
  public void killApplication(ApplicationId applicationId)
      throws YarnException, IOException {
    KillApplicationRequest request =
        Records.newRecord(KillApplicationRequest.class);
    request.setApplicationId(applicationId);

    try {
      int pollCount = 0;
      long startTime = System.currentTimeMillis();

      while (true) {
        KillApplicationResponse response =
            rmClient.forceKillApplication(request);
        if (response.getIsKillCompleted()) {
          LOG.info("Killed application " + applicationId);
          break;
        }

        long elapsedMillis = System.currentTimeMillis() - startTime;
        if (enforceAsyncAPITimeout() &&
            elapsedMillis &>= this.asyncApiPollTimeoutMillis) {
          throw new YarnException("Timed out while waiting for application " +
            applicationId + " to be killed.");
        }

        if (++pollCount % 10 == 0) {
          LOG.info("Waiting for application " + applicationId + " to be killed.");
        }
        Thread.sleep(asyncApiPollIntervalMillis);
      }
    } catch (InterruptedException e) {
      LOG.error("Interrupted while waiting for application " + applicationId
          + " to be killed.");
    }
  }

Comparison of service registration models: Consult vs zookeeper vs etcd vs Eureka

Zookeeper is based on the simplified version of Zab of Paxos, etcd is based on raft algorithm, and consumer is also based on raft algorithm. Etcd and consumer, as rising stars, did not abandon themselves because they already had zookeeper, but adopted a more direct raft algorithm.

The number one goal of raft algorithm is to be easy to understand, which can be seen from the title of the paper. Of course, raft enhances comprehensibility, which is no less than Paxos in terms of performance, reliability and availability.

Raft more understandable than Paxos and also provides a better foundation for building practical systems

   in order to achieve the goal of being easy to understand, raft has made a lot of efforts, the most important of which are two things:

problem decomposition

State simplification

   problem decomposition is to divide the complex problem of “node consistency in replication set” into several subproblems that can be explained, understood and solved independently. In raft, subproblems include leader selection, log replication, safety and membership changes. The better understanding of state simplification is to make some restrictions on the algorithm, reduce the number of States to be considered, and make the algorithm clearer and less uncertain (for example, to ensure that the newly elected leader will contain all the commented log entries)

Raft implements consensus by first electing a distinguished leader, then giving the leader complete responsibility for managing the replicated log. The leader accepts log entries from clients, replicates them on other servers, and tells servers when it is safe to apply log entries to their state machines. A leader can fail or become disconnected from the other servers, in which case a new leader is elected.

   the above quotation highly summarizes the working principle of raft protocol: raft will select the leader first, and the leader is fully responsible for the management of replicated log. The leader is responsible for accepting all client update requests, copying them to the follower node, and executing them when “safe”. If the leader fails, followers will re elect a new leader.

   this involves two new subproblems of raft: Leader Selection and log replication

leader election

log replication

Here is a comparison of the following features of service discovery products that are often used. First of all, let’s look at the following conclusions:

Health check of service

Euraka needs to explicitly configure health check support when it is used; zookeeper and etcd are not healthy when they lose the connection with the service process, while consult is more detailed, such as whether the memory has been used by 90% and whether the file system is running out of space.

Multi data center support

Consul completes the synchronization across data centers through Wan’s gossip protocol, and other products need additional development work

KV storage service

In addition to Eureka, several other products can support K-V storage services externally, so we will talk about the important reasons why these products pursue high consistency later. And providing storage services can also be better transformed into dynamic configuration services.

The choice of cap theory in product design

Eureka’s typical AP is more suitable for service discovery in distributed scenarios. Service discovery scenarios have higher availability priority and consistency is not particularly fatal. Secondly, CA type scenario consul can also provide high availability and ensure consistency of K-V store service. Zookeeper and etcd are CP types, which sacrifice availability and have little advantage in service discovery scenarios

Multilingual capability and access protocol for external services

Zookeeper’s cross language support is weak, and other models support http11 to provide access. Euraka generally provides access support for multilingual clients through sidecar. Etcd also provides grpc support. In addition to the standard rest Service API, consul also provides DNS support.

Watch support (clients observe changes in service providers)

Zookeeper supports server-side push changes, and Eureka 2.0 (under development) also plans to support it. Eureka 1, consul and etcd all realize change perception through long polling

Monitoring of self cluster

In addition to zookeeper, other models support metrics by default. Operators can collect and alarm these metrics information to achieve the purpose of monitoring

Safety

Consul and zookeeper support ACL, and consul and etcd support secure channel HTTPS

Integration of spring cloud

At present, there are corresponding boot starters, which provide integration capabilities.

In general, the functions of consul and the support of spring cloud for its integration are relatively perfect, and the complexity of operation and maintenance is relatively simple (there is no detailed discussion). The design of Eureka is more in line with the scene, but it needs continuous improvement.

Etcd and zookeeper provide very similar capabilities, and their positions in the software ecosystem are almost the same, and they can replace each other.

They are universal consistent meta information storage

Both provide a watch mechanism for change notification and distribution

They are also used by distributed systems as shared information storage

In addition to the differences in implementation details, language, consistency and protocol, the biggest difference lies in the surrounding ecosystem.

Zookeeper is written in Java under Apache and provides RPC interface. It was first hatched from Hadoop project and widely used in distributed system (Hadoop, Solr, Kafka, mesos, etc.).

Etcd is an open source product of coreos company, which is relatively new. With its easy-to-use rest interface and active community, etcd has captured a group of users and has been used in some new clusters (such as kubernetes).

Although V3 is changed to binary RPC interface for performance, its usability is better than zookeeper.

While the goal of consul is more specific. Etcd and zookeeper provide distributed consistent storage capacity. Specific business scenarios need to be implemented by users themselves, such as service discovery and configuration change.

Consul aims at service discovery and configuration change, with kV storage.

In the software ecology, the more abstract the components are, the wider the scope of application is, but there must be some shortcomings in meeting the requirements of specific business scenarios.

——————-Message middleware rabbitmq

Message middleware rabbitmq (01)

Message middleware rabbitmq (02)
0

Message middleware rabbitmq (03)
0

Message middleware rabbitmq (04)
0

Message middleware rabbitmq (05)

Message middleware rabbitmq (06)
0

Message middleware rabbitmq (07)

———————- cloud computing————————————-

Cloud computing (1) — docker’s past and present life

Cloud computing (2) — Architecture

Cloud computing (3) — container applications

Cloud computing (4) — lamp

Cloud computing (5) — dockerfile
cloud computing (6) — harbor

Add wechat to the wechat communication group of wechat service, and note that wechat service enters the group for communication

pay attention to official account soft Zhang Sanfeng

this article is shared by WeChat official account – soft Zhang Sanfeng (aguzhangsanfeng).
In case of infringement, please contact [email protected] Delete.
This article participates in the “OSC source creation program”. You are welcome to join and share.