不要轻易修改你的主机名

    这篇文章不是标题党,是在实际工作中真切的案例。

    场景:这是一套Windows Server 2008 R2 X64的系统,跑了一套10.2.0.5.0的oracle物理备库,运行一切正常。在客户的要求下,需要调整该服务器的机器名。

    步骤:整理好调整的思路后,开始执行操作[包括停备库,ASM实例,修改hosts文件、tnsnames.ora文件等],在客户IT人员修改完机器名并重启服务器之后,发现悲剧的一幕,机器无法正常启动,不过客户端倒是可以ping通服务器,但是无法通过远程桌面连接。

    怎么办呢?经过分析和定位,感觉极有可能出问题的地方就是OracleCSService这个服务,而且该服务的启动类型是自动启动。也就是说该服务项会加载到windows系统的启动项里,随着操作系统的启动而启动,而该服务又是hard-coded,应该是同机器名进行“捆绑”的,由于修改了机器名,导致OracleCSService服务项不能正常启动,进而导致操作系统无法正常启动。

    找到解决问题的思路之后,可以尝试重启服务器,进入安全模式,禁用该服务,然后重启机器,结果该机器已经无法再次进入安全模式,之前进去过,原因未知,客户IT硬件人员操作。

    于是,一边尝试可以进入安全模式的方法,一边估计下下策的重装Windows系统,重建Dataguard的方案。结果,更为不可思议的是,服务器特么自己能够正常启动了,大家什么都没操作。接下来,就登录上去,果断重建了OracleCSService服务:

    删除该服务:

Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\Administrator>C:\oracle\product\10.2.0\db_1\BIN\localconfig.bat
usage:  crssetup 
                  config   - configure and startup the cluster on nodes
                  add      - add specified nodes to the cluster
                  del      - delete the specified nodes from the cluster
                  deconfig - wipe out all cluster configuration information
                  ldel     - local css delete from oracle home
                  lres     - local css home reset to new oracle home
                  ladd     - local css add to oracle home
                  shutdown - shutdown the selected nodes
                  upgrade  - upgrade the specified nodes
                  help     - print out this information

C:\Users\Administrator>C:\oracle\product\10.2.0\db_1\BIN\localconfig.bat deconfig
GetConfiguredClusterNodes:  failed to initialize subsystem, rc(21)
failed to determine remaining nodes in the cluster
failed during critical configuration information
  please supply <-force> option to continue

C:\Users\Administrator>C:\oracle\product\10.2.0\db_1\BIN\localconfig.bat deconfig -force
GetConfiguredClusterNodes:  failed to initialize subsystem, rc(21)
failed to determine remaining nodes in the cluster
failed during critical configuration information
  <-force> option specified, continuing
Step 1:  shutting down node apps
failed executing check for CRS resources
  [ 2 ] The system cannot find the file specified.
failed executing check for CRS resources

failure determining CRS resources state, continuing due to FORCE option
  DEBRESTDDB            Removing node apps
PRKC-1056 : Failed to get the hostname for node DEBRESTDDB
PRKH-1010 : Unable to communicate with CRS services.
  [Communications Error(Native: prsr_initCLSS:[3])]
  DEBRESTDDB            Removing ONS configuration
failed to remove ONS configuration
  [ 2 ] The system cannot find the file specified.
  DEBRESTDDB            failed to execute removal of ONS configuration
failuring during delete of node apps, continuing
Step 2:  shutting down local CRS stack
  DEBRESTDDB            failed to located service OracleEVMService, err(1060)
failed to stop CRS stack on all nodes to be removed, continuing
Step 3:  removing CRS stack from requested nodes
Step 4:  stopping extra CRS services
Step 5:  cleanup up registry keys
Step 6:  perform cleanup of the OCR repository C:\oracle\product\10.2.0\db_1\cdata\localhost\local.ocr
successful deconfiguration of the cluster

C:\Users\Administrator>

 

    重建该服务:

C:\Users\Administrator>C:\oracle\product\10.2.0\db_1\BIN\localconfig.bat add
Step 1:  creating new OCR repository
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'administrator', privgrp ''..
Operation successful.
Step 2:  creating new CSS service
successfully created local CSS service
successfully added CSS to home

C:\Users\Administrator>

 

    最后,启动ASM实例,启动物理备库,打开同主库的同步,完成同步。

    值得记住的地方:

    ① 不要轻易修改机器名,除非必要。修改之前,一定一定要理清楚checklist,不可像本例中遗漏了OracleCSService服务项的重建;

    ② 对于生产环境的各种操作,真的要三思而后行;

    ③ 写这篇记录小文的时候,发现Metalink上有该案例的详细操作说明哇:How to change the Hostname when Oracle 10G and ASM are used [ID 422729.1]

发表评论

邮箱地址不会被公开。 必填项已用*标注