11.2.0.4 run the second node root.sh under AIX to report an error

The second node runs root.sh, and the error is as follows

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /U01/app/crs/crs/install/crsconfig_params
User ignored Prerequisites during installation
Installing Trace File Analyzer
Start of resource "ora.asm" failed
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'sbfhxxj-db2'
CRS-2676: Start of 'ora.drivers.acfs' on 'sbfhxxj-db2' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'sbfhxxj-db2'
CRS-5017: The resource action "ora.asm start" encountered the following error: 
ORA-03113: end-of-file on communication channel
Process ID: 0
Session ID: 0 Serial number: 0
. For details refer to "(:CLSN00107:)" in "/U01/app/crs/log/sbfhxxj-db2/agent/ohasd/oraagent_grid/oraagent_grid.log".
CRS-2674: Start of 'ora.asm' on 'sbfhxxj-db2' failed
CRS-2679: Attempting to clean 'ora.asm' on 'sbfhxxj-db2'
CRS-2681: Clean of 'ora.asm' on 'sbfhxxj-db2' succeeded
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'sbfhxxj-db2'
CRS-2677: Stop of 'ora.drivers.acfs' on 'sbfhxxj-db2' succeeded
CRS-4000: Command Start failed, or completed with errors.
Failed to start Oracle Grid Infrastructure stack
Failed to start ASM at /U01/app/crs/crs/install/crsconfig_lib.pm line 1339.
/U01/app/crs/perl/bin/perl -I/U01/app/crs/perl/lib -I/U01/app/crs/crs/install /U01/app/crs/crs/install/rootcrs.pl execution failed

From the above error, it can be seen that ora.asm is not started, resulting in the error of running script. View the alert log of node 2 ASM

PMON (ospid: 4063948): terminating the instance due to error 481
kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).

ASM on Non-First Node (Second or Others) Fails to Start: PMON (ospid: nnnn): terminating the instance due to error 481 (Doc ID 1383737.1)，check haip server

[+ASM1]@sbfhxxj-db1[/home/grid]$crsctl stat res ora.cluster_interconnect.haip -init
NAME=ora.cluster_interconnect.haip
TYPE=ora.haip.type
TARGET=ONLINE
STATE=OFFLINE
[+ASM2]@sbfhxxj-db2[/home/grid]$crsctl stat res ora.cluster_interconnect.haip -init
NAME=ora.cluster_interconnect.haip
TYPE=ora.haip.type
TARGET=ONLINE
STATE=ONLINE on sbfhxxj-db2
[+ASM1]@sbfhxxj-db1[/home/grid]$crsctl start res ora.cluster_interconnect.haip -init
CRS-2501: Resource 'ora.cluster_interconnect.haip' is disabled
CRS-4000: Command Start failed, or completed with errors.

Check the log and report the following error

2018-10-17 14:47:44.551: [    AGFW][2057]{0:0:70} Agent received the message: AGENT_HB[Engine] ID 12293:749
2018-10-17 14:47:44.738: [ USRTHRD][5674]{0:0:169} failed to create arp
2018-10-17 14:47:44.738: [ USRTHRD][5674]{0:0:169} (null) category: -2, operation: ioctl, loc: bpfopen:22,o, OS error: 22, other: ARP device /dev/bpf0, interface en4, BIOCSBLEN request with size 4096
2018-10-17 14:47:44.738: [ USRTHRD][5674]{0:0:169} (:CLSN00130:) category: -2, operation: ioct  , loc: bpfopen:22,o, OS error: 28288, other: ARP device /de
2018-10-17 14:47:44.739: [ USRTHRD][5674]{0:0:169} [NetHAWork] thread hit exception Agent failed to initialize which is required for HAIP processing
2018-10-17 14:47:44.739: [ USRTHRD][5674]{0:0:169} [NetHAWork] thread stopping
2018-10-17 14:47:44.739: [ USRTHRD][5674]{0:0:169} Thread:[NetHAWork]isRunning is reset to false here
2018-10-17 14:47:46.742: [ USRTHRD][5931]{0:0:169} failed to create arp

据OS error: 22, other: ARP device /dev/bpf0， AIX: HAIP fails to start with “OS error: 22” due to non-related devices using same major device number (File ID 1447517.1), check the device tag number under /dev

[root@sbfhxxj-db1 /dev]# ls -lrt|grep bpf
cr--------    1 root     system       43,  9 Jan 18 2016  bpf9
cr--------    1 root     system       43,  8 Jan 18 2016  bpf8
cr--------    1 root     system       43,  7 Jan 18 2016  bpf7
cr--------    1 root     system       43,  6 Jan 18 2016  bpf6
cr--------    1 root     system       43,  5 Jan 18 2016  bpf5
cr--------    1 root     system       43,  4 Jan 18 2016  bpf4
cr--------    1 root     system       43,  3 Jan 18 2016  bpf3
cr--------    1 root     system       43,  2 Jan 18 2016  bpf2
cr--------    1 root     system       43, 19 Jan 18 2016  bpf19
cr--------    1 root     system       43, 18 Jan 18 2016  bpf18
cr--------    1 root     system       43, 17 Jan 18 2016  bpf17
cr--------    1 root     system       43, 16 Jan 18 2016  bpf16
cr--------    1 root     system       43, 15 Jan 18 2016  bpf15
cr--------    1 root     system       43, 14 Jan 18 2016  bpf14
cr--------    1 root     system       43, 13 Jan 18 2016  bpf13
cr--------    1 root     system       43, 12 Jan 18 2016  bpf12
cr--------    1 root     system       43, 11 Jan 18 2016  bpf11
cr--------    1 root     system       43, 10 Jan 18 2016  bpf10
cr--------    1 root     system       43,  1 Jan 18 2016  bpf1
cr--------    1 root     system       43,  0 Jan 18 2016  bpf0
[root@sbfhxxj-db1 /dev]# ls -lrt|grep dlm
crw-------    1 root     system       42,  0 Dec 01 2015  rdlmcldrv
crw-------    1 root     system       43,  0 Dec 01 2015  dlmadrv
crw-------    1 root     system       44,  0 Sep 13 10:52 rdlmfdrvio

Theoretically, the device tag number in the/dev directory should be the same; DLM is a multi-path software; Try RM BPF * to delete all BPF devices and regenerate BPF tcpdump – D; The tag number of the device remains unchanged, and only the multi-path software can be uninstalled and re installed; The second node runs root.sh normally, and the second node joins the cluster normally

DebugAH

How to Solve Your Programmer Error

11.2.0.4 run the second node root.sh under AIX to report an error

Similar Posts: