[Solved] Reasons for error in rabbitmq when supervisor starts

Preface

Today, I rebooted the server and found that the rabbitmq process managed by supervisor failed to start. Looking at the log, I found that there were always errors and recorded the solution

Error: erlexec:HOME must be set

I found many people’s blogs on the Internet. Generally speaking, I added the following in the script of process startup:

export HOME=/usr/local/erlang
export PATH=$PATH:$HOME/bin  

The system’s default home is/root, which may result in the Erlang language environment not getting the home parameter; The above modifications can be used in chkconfig management and service management processes, but for supervisor management processes, because the startup name of the process is in supervisor.conf, the home parameter cannot be modified directly

method: add the above statements to the supervisor startup script

vi Ssupervisor.conf

#!/bin/sh
# chkconfig: 2345 70 90

export HOME=/usr/local/erlang
export PATH=$PATH:$HOME/bin
/usr/bin/supervisord -c /etc/supervisor/supervisord.conf

This can ensure that the home is only changed temporarily, but it has no impact on the home of the system

Tracing back to the source

Why is there such a mistake

This error is not caused by rabbitmq, but by Erlang language environment; To view an ERL process:

ps aux | grep beam

# 结果:
root      1779  0.4  0.5 3863876 86060 ?      Sl   19:21   0:06 /usr/local/erlang/bin/x86_64-unknown-linux-gnu/beam.smp -W w -A 64 -P 1048576 -t 5000000 -stbt db -zdbbl 32000 -K true -B i -- -root /usr/local/erlang -progname erl -- -home /root 

You can see that the – Home parameter is added to the startup. Start an ERL instance, which calls the C file of erlexec

# erlexec.c The path to the file is /usr/local/erlang/erts/etc/common/erlexec.c

# part of codes
static char * home;
static char ** Eargsp = NULL;
static int EargsCnt = 0;
static char **argsp = NULL;

static void get_home( void )
{
    home = get_env("HOME");
    if (home == NULL)
        error("HOME must be set");
}

can see get_ The env function gets the home environment variable. If it fails, it outputs “home must be set”

at present, what we don’t understand is that the home parameter has a default value of/root, why get_ Env function cannot get it, but returns null; Further research is needed

Rabbitmq restart failed

it is found that after the process of manually killing rabbitmq, the supervisor will either fail or not restart rabbitmq

it’s OK to start and stop rabbitmq by using supervisor background process management, but it’s impossible to restart rabbitmq by manually killing it

reason:

rabbitmq after using rabbitmq server start or rabbitmq server start, there will be two processes, one is Erlang’s node service program; One is the application of rabbitmq; Rabbitmq applications run on Erlang nodes

if you forcibly kill rabbitmq’s application process, supervisor will try to start it. At this time, it will try to start Erlang’s node service program and rabbitmq’s application program. It is found that there is already an Erlang’s node service program, so the start will fail

if you forcibly kill Erlang’s node service program, Erlang’s node service program and rabbitmq’s application program will be stopped. If the configuration parameter is autorestart = unexpected, supervisor will not restart the process. If the parameter is set to autorestart = true, Then supervisor will restart Erlang’s node service program and rabbitmq’s application program

conclusion:

it is not appropriate for supervisor to manage rabbitmq process, because when rabbitmq application crashes and Erlang node service program is normal, restart will fail

If only nodes are running, but there is no application instance of rabbitmq, then the management background of rabbitmq cannot log in

Similar Posts: