11gR2修改主机名导致CRS-0184以及CRS-4000的错误

crs_stat_error

我们通常都说,在部署Oracle数据库服务器前,要规划好主机名、IP地址等基础信息,一经确定,尤其是部署好Oracle数据库之后,就不要轻易修改主机名或者是IP地址等。而且我们也都知道这个理儿,但可是,可但是,在实际工作当中,你难免不会遇到这种情况。

除了自己前不久曾遭遇到的一则,在Windows Server 2008 R2 X64上跑的一套10.2.0.5.0配置有ASM单实例物理备库因修改主机名的故障。而今又遇到一则类似的故障场景:

一 操作系统版本:CentOS release 5.6 (Final) X86_64平台

二 Oracle版本:Oracle 11gR2 11.2.0.1.0 64位配置ASM的单实例数据库

三 故障现象:修改主机名并重启主机后,导致数据库不可用,准确的说是grid软件的CRS进程都不能启动。如下:

1 检查服务状态:

[shell]
wxwl_iop-> crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

wxwl_iop-> [/shell]

2 发现ohasd守护进程都不存在:

[shell]
wxwl_iop-> ps -ef|grep has
root 4869 1 0 Nov14 ? 00:00:02 /bin/sh /etc/init.d/init.ohasd run
grid 27544 26907 0 09:58 pts/2 00:00:00 grep has
wxwl_iop->[/shell]

3 检查Oracle restart是否配置以及尝试手工启动Oracle Restart均报错:

[shell]
wxwl_iop-> crsctl config has
CRS-4014: Oracle High Availability Services autostart was not defined.
Failure at scls_scr_create with code 1
Internal Error Information:
Category: -1
Operation: has_ha_privs
Location: scrcreate5
Other: need ha priv
System Dependent Information: 0

CRS-4000: Command Config failed, or completed with errors.
wxwl_iop-> crsctl start has
Failure at scls_scr_create with code 1
Internal Error Information:
Category: -1
Operation: has_ha_privs
Location: scrcreate5
Other: need ha priv
System Dependent Information: 0

CRS-4000: Command Start failed, or completed with errors.
wxwl_iop-> [/shell]

4 在/etc/oracle/scls_scr路径下,看到只有之前的旧主机名下的Oracle Restart的配置信息,并没有关于新主机名的Oracle Restart的配置信息

[shell]
[root@wxwl_iop scls_scr]# ll
total 4
drwxr-xr-x 4 root oinstall 4096 Nov 7 16:42 localhost
[root@wxwl_iop scls_scr]# pwd
/etc/oracle/scls_scr
[/shell]

四 故障原因:人为修改了数据库服务器主机名,并重启主机,导致Oracle Restart启动报错,进而ASM实例,监听、磁盘组以及数据库实例均无法正常启动。

五 解决步骤:

1 root用户强制删除CRS相关的配置信息:

[shell]
[root@wxwl_iop scls_scr]# /u01/app/11.2.0/grid/crs/install/rootcrs.pl -deconfig -force
2013-11-15 10:00:51: Parsing the host name
2013-11-15 10:00:51: Checking for super user privileges
2013-11-15 10:00:51: User has super user privileges
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
Usage: srvctl <command> <object> [<options>]
commands: enable|disable|start|stop|status|add|remove|modify|getenv|setenv|unsetenv|config
objects: database|service|asm|diskgroup|listener|home|ons|eons
For detailed help on each command and object and its options use:
srvctl <command> -h or
srvctl <command> <object> -h
PRKO-2012 : nodeapps object is not supported in Oracle Restart
ADVM/ACFS is not supported on centos-release-5-6.el5.centos.1

ACFS-9201: Not Supported
CRS-4013: This command is not supported in a single-node configuration.
CRS-4000: Command Stop failed, or completed with errors.
You must kill crs processes or reboot the system to properly
cleanup the processes started by Oracle clusterware
Use of uninitialized value in chdir at /usr/lib/perl5/5.8.8/File/Find.pm line 751.
Use of chdir(”) or chdir(undef) as chdir() is deprecated at /usr/lib/perl5/5.8.8/File/Find.pm line 751.
error: package cvuqdisk is not installed
Successfully deconfigured Oracle clusterware stack on this node
[root@wxwl_iop scls_scr]# [/shell]

2 root用户重新配置Oracle Restart:

[shell]
[root@wxwl_iop scls_scr]# /u01/app/11.2.0/grid/crs/install/roothas.pl
2013-11-15 10:02:36: Checking for super user privileges
2013-11-15 10:02:36: User has super user privileges
2013-11-15 10:02:36: Parsing the host name
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
LOCAL ADD MODE
Creating OCR keys for user ‘grid’, privgrp ‘oinstall’..
Operation successful.
CRS-4664: Node wxwl_iop successfully pinned.
Adding daemon to inittab
CRS-4123: Oracle High Availability Services has been started.
ohasd is starting
ADVM/ACFS is not supported on centos-release-5-6.el5.centos.1

wxwl_iop 2013/11/15 10:02:49 /u01/app/11.2.0/grid/cdata/wxwl_iop/backup_20131115_100249.olr
Successfully configured Oracle Grid Infrastructure for a Standalone Server
[root@wxwl_iop scls_scr]#[/shell]

重新配置Oracle Restart之后,在/etc/oracle/scls_scr路径下,看到之前的旧主机名下的Oracle Restart的配置信息自动消除,而出现新的关于新主机名的Oracle Restart的配置信息。

[shell]
[root@wxwl_iop scls_scr]# ll
total 4
drwxr-xr-x 4 root oinstall 4096 Nov 15 10:02 wxwl_iop
[root@wxwl_iop scls_scr]#
[/shell]

且此时,grid用户可顺利通过执行 crs_stat -t查看服务状态:

[shell]
wxwl_iop-> crs_stat -t
Name Type Target State Host
————————————————————
ora.cssd ora.cssd.type OFFLINE OFFLINE
ora.diskmon ora….on.type OFFLINE OFFLINE
wxwl_iop-> [/shell]

3 通过srvctl命令手工添加Listener、ASM、oracle instance到Oracle Restart管理

grid用户添加Listener、ASM服务到Oracle Restart

[shell]
wxwl_iop-> srvctl add listener
wxwl_iop-> crs_stat -t
Name Type Target State Host
————————————————————
ora….ER.lsnr ora….er.type OFFLINE OFFLINE
ora.cssd ora.cssd.type OFFLINE OFFLINE
ora.diskmon ora….on.type OFFLINE OFFLINE
wxwl_iop-> srvctl add asm -l listener -d data,fra
wxwl_iop-> crs_stat -t
Name Type Target State Host
————————————————————
ora….ER.lsnr ora….er.type OFFLINE OFFLINE
ora.asm ora.asm.type OFFLINE OFFLINE
ora.cssd ora.cssd.type OFFLINE OFFLINE
ora.diskmon ora….on.type OFFLINE OFFLINE
wxwl_iop-> [/shell]

oracle用户添加数据库到Oracle Restart
grid 用户添加出错:

[shell]
wxwl_iop-> srvctl add database -d iopdb -o /u01/app/oracle/product/11.2.0/db_1
PRCD-1025 : Failed to create database iopdb
PRKH-1014 : Current user grid is not the same as oracle owner oracle of oracle home /u01/app/oracle/product/11.2.0/db_1.
wxwl_iop->
[/shell]

需要以oracle用户添加:

[shell]
wxwl_iop-> srvctl add database -d normaldb -o /u01/app/oracle/product/11.2.0/db_1 -n normaldb -a ‘data,fra’
wxwl_iop->
[/shell]

最后,对于配置了Oracle Restart环境下的数据库,分别以grid用户通过srvctl命令来启动listener、ASM实例,以oracle用户通过srvctl命令来启动数据库。

启示:对于已经配置好Oracle数据库的服务器,真的不要轻易修改主机名或者IP地址,尤其是RAC环境或者是11gR2开始的配置有Oracle Restart的standalone Server环境。当然,在安装和配置数据库之前,务必要做好充足的前期规划。当然,如果非要修改的话,也要配合DBA一起来重新配置。

附:关于文章提及为什么要在/etc/oracle/scls_scr路径下,查看是否存在关于主机名相关的Oracle Restart配置信息,将在下一篇中分享O(∩_∩)O~

发表评论

邮箱地址不会被公开。 必填项已用*标注