Preface
Today, I rebooted the server and found that the rabbitmq process managed by supervisor failed to start. Looking at the log, I found that there were always errors and recorded the solution
Error: erlexec:HOME must be set
I found many people’s blogs on the Internet. Generally speaking, I added the following in the script of process startup:
export HOME=/usr/local/erlang
export PATH=$PATH:$HOME/bin
The system’s default home is/root, which may result in the Erlang language environment not getting the home parameter; The above modifications can be used in chkconfig management and service management processes, but for supervisor management processes, because the startup name of the process is in supervisor.conf, the home parameter cannot be modified directly
method: add the above statements to the supervisor startup script
vi Ssupervisor.conf
#!/bin/sh
# chkconfig: 2345 70 90
export HOME=/usr/local/erlang
export PATH=$PATH:$HOME/bin
/usr/bin/supervisord -c /etc/supervisor/supervisord.conf
This can ensure that the home is only changed temporarily, but it has no impact on the home of the system
Tracing back to the source
Why is there such a mistake
This error is not caused by rabbitmq, but by Erlang language environment; To view an ERL process:
ps aux | grep beam
# 结果:
root 1779 0.4 0.5 3863876 86060 ? Sl 19:21 0:06 /usr/local/erlang/bin/x86_64-unknown-linux-gnu/beam.smp -W w -A 64 -P 1048576 -t 5000000 -stbt db -zdbbl 32000 -K true -B i -- -root /usr/local/erlang -progname erl -- -home /root
You can see that the – Home parameter is added to the startup. Start an ERL instance, which calls the C file of erlexec
# erlexec.c The path to the file is /usr/local/erlang/erts/etc/common/erlexec.c
# part of codes
static char * home;
static char ** Eargsp = NULL;
static int EargsCnt = 0;
static char **argsp = NULL;
static void get_home( void )
{
home = get_env("HOME");
if (home == NULL)
error("HOME must be set");
}
can see get_ The env function gets the home environment variable. If it fails, it outputs “home must be set”
at present, what we don’t understand is that the home parameter has a default value of/root, why get_ Env function cannot get it, but returns null; Further research is needed
Rabbitmq restart failed
it is found that after the process of manually killing rabbitmq, the supervisor will either fail or not restart rabbitmq
it’s OK to start and stop rabbitmq by using supervisor background process management, but it’s impossible to restart rabbitmq by manually killing it
reason:
rabbitmq after using rabbitmq server start or rabbitmq server start, there will be two processes, one is Erlang’s node service program; One is the application of rabbitmq; Rabbitmq applications run on Erlang nodes
if you forcibly kill rabbitmq’s application process, supervisor will try to start it. At this time, it will try to start Erlang’s node service program and rabbitmq’s application program. It is found that there is already an Erlang’s node service program, so the start will fail
if you forcibly kill Erlang’s node service program, Erlang’s node service program and rabbitmq’s application program will be stopped. If the configuration parameter is autorestart = unexpected, supervisor will not restart the process. If the parameter is set to autorestart = true, Then supervisor will restart Erlang’s node service program and rabbitmq’s application program
conclusion:
it is not appropriate for supervisor to manage rabbitmq process, because when rabbitmq application crashes and Erlang node service program is normal, restart will fail
If only nodes are running, but there is no application instance of rabbitmq, then the management background of rabbitmq cannot log in