11gR2修改主机名导致CRS-0184以及CRS-4000的错误

crs_stat_error

我们通常都说,在部署Oracle数据库服务器前,要规划好主机名、IP地址等基础信息,一经确定,尤其是部署好Oracle数据库之后,就不要轻易修改主机名或者是IP地址等。而且我们也都知道这个理儿,但可是,可但是,在实际工作当中,你难免不会遇到这种情况。

除了自己前不久曾遭遇到的一则,在Windows Server 2008 R2 X64上跑的一套10.2.0.5.0配置有ASM单实例物理备库因修改主机名的故障。而今又遇到一则类似的故障场景:

一 操作系统版本:CentOS release 5.6 (Final) X86_64平台

二 Oracle版本:Oracle 11gR2 11.2.0.1.0 64位配置ASM的单实例数据库

三 故障现象:修改主机名并重启主机后,导致数据库不可用,准确的说是grid软件的CRS进程都不能启动。如下:

1 检查服务状态:

[shell]
wxwl_iop-> crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

wxwl_iop-> [/shell]

2 发现ohasd守护进程都不存在:

[shell]
wxwl_iop-> ps -ef|grep has
root 4869 1 0 Nov14 ? 00:00:02 /bin/sh /etc/init.d/init.ohasd run
grid 27544 26907 0 09:58 pts/2 00:00:00 grep has
wxwl_iop->[/shell]

3 检查Oracle restart是否配置以及尝试手工启动Oracle Restart均报错:

[shell]
wxwl_iop-> crsctl config has
CRS-4014: Oracle High Availability Services autostart was not defined.
Failure at scls_scr_create with code 1
Internal Error Information:
Category: -1
Operation: has_ha_privs
Location: scrcreate5
Other: need ha priv
System Dependent Information: 0

CRS-4000: Command Config failed, or completed with errors.
wxwl_iop-> crsctl start has
Failure at scls_scr_create with code 1
Internal Error Information:
Category: -1
Operation: has_ha_privs
Location: scrcreate5
Other: need ha priv
System Dependent Information: 0

CRS-4000: Command Start failed, or completed with errors.
wxwl_iop-> [/shell]

4 在/etc/oracle/scls_scr路径下,看到只有之前的旧主机名下的Oracle Restart的配置信息,并没有关于新主机名的Oracle Restart的配置信息

[shell]
[root@wxwl_iop scls_scr]# ll
total 4
drwxr-xr-x 4 root oinstall 4096 Nov 7 16:42 localhost
[root@wxwl_iop scls_scr]# pwd
/etc/oracle/scls_scr
[/shell]

四 故障原因:人为修改了数据库服务器主机名,并重启主机,导致Oracle Restart启动报错,进而ASM实例,监听、磁盘组以及数据库实例均无法正常启动。

五 解决步骤:

1 root用户强制删除CRS相关的配置信息:

[shell]
[root@wxwl_iop scls_scr]# /u01/app/11.2.0/grid/crs/install/rootcrs.pl -deconfig -force
2013-11-15 10:00:51: Parsing the host name
2013-11-15 10:00:51: Checking for super user privileges
2013-11-15 10:00:51: User has super user privileges
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
Usage: srvctl <command> <object> [<options>]
commands: enable|disable|start|stop|status|add|remove|modify|getenv|setenv|unsetenv|config
objects: database|service|asm|diskgroup|listener|home|ons|eons
For detailed help on each command and object and its options use:
srvctl <command> -h or
srvctl <command> <object> -h
PRKO-2012 : nodeapps object is not supported in Oracle Restart
ADVM/ACFS is not supported on centos-release-5-6.el5.centos.1

ACFS-9201: Not Supported
CRS-4013: This command is not supported in a single-node configuration.
CRS-4000: Command Stop failed, or completed with errors.
You must kill crs processes or reboot the system to properly
cleanup the processes started by Oracle clusterware
Use of uninitialized value in chdir at /usr/lib/perl5/5.8.8/File/Find.pm line 751.
Use of chdir(”) or chdir(undef) as chdir() is deprecated at /usr/lib/perl5/5.8.8/File/Find.pm line 751.
error: package cvuqdisk is not installed
Successfully deconfigured Oracle clusterware stack on this node
[root@wxwl_iop scls_scr]# [/shell]

2 root用户重新配置Oracle Restart:

[shell]
[root@wxwl_iop scls_scr]# /u01/app/11.2.0/grid/crs/install/roothas.pl
2013-11-15 10:02:36: Checking for super user privileges
2013-11-15 10:02:36: User has super user privileges
2013-11-15 10:02:36: Parsing the host name
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
LOCAL ADD MODE
Creating OCR keys for user ‘grid’, privgrp ‘oinstall’..
Operation successful.
CRS-4664: Node wxwl_iop successfully pinned.
Adding daemon to inittab
CRS-4123: Oracle High Availability Services has been started.
ohasd is starting
ADVM/ACFS is not supported on centos-release-5-6.el5.centos.1

wxwl_iop 2013/11/15 10:02:49 /u01/app/11.2.0/grid/cdata/wxwl_iop/backup_20131115_100249.olr
Successfully configured Oracle Grid Infrastructure for a Standalone Server
[root@wxwl_iop scls_scr]#[/shell]

重新配置Oracle Restart之后,在/etc/oracle/scls_scr路径下,看到之前的旧主机名下的Oracle Restart的配置信息自动消除,而出现新的关于新主机名的Oracle Restart的配置信息。

[shell]
[root@wxwl_iop scls_scr]# ll
total 4
drwxr-xr-x 4 root oinstall 4096 Nov 15 10:02 wxwl_iop
[root@wxwl_iop scls_scr]#
[/shell]

且此时,grid用户可顺利通过执行 crs_stat -t查看服务状态:

[shell]
wxwl_iop-> crs_stat -t
Name Type Target State Host
————————————————————
ora.cssd ora.cssd.type OFFLINE OFFLINE
ora.diskmon ora….on.type OFFLINE OFFLINE
wxwl_iop-> [/shell]

3 通过srvctl命令手工添加Listener、ASM、oracle instance到Oracle Restart管理

grid用户添加Listener、ASM服务到Oracle Restart

[shell]
wxwl_iop-> srvctl add listener
wxwl_iop-> crs_stat -t
Name Type Target State Host
————————————————————
ora….ER.lsnr ora….er.type OFFLINE OFFLINE
ora.cssd ora.cssd.type OFFLINE OFFLINE
ora.diskmon ora….on.type OFFLINE OFFLINE
wxwl_iop-> srvctl add asm -l listener -d data,fra
wxwl_iop-> crs_stat -t
Name Type Target State Host
————————————————————
ora….ER.lsnr ora….er.type OFFLINE OFFLINE
ora.asm ora.asm.type OFFLINE OFFLINE
ora.cssd ora.cssd.type OFFLINE OFFLINE
ora.diskmon ora….on.type OFFLINE OFFLINE
wxwl_iop-> [/shell]

oracle用户添加数据库到Oracle Restart
grid 用户添加出错:

[shell]
wxwl_iop-> srvctl add database -d iopdb -o /u01/app/oracle/product/11.2.0/db_1
PRCD-1025 : Failed to create database iopdb
PRKH-1014 : Current user grid is not the same as oracle owner oracle of oracle home /u01/app/oracle/product/11.2.0/db_1.
wxwl_iop->
[/shell]

需要以oracle用户添加:

[shell]
wxwl_iop-> srvctl add database -d normaldb -o /u01/app/oracle/product/11.2.0/db_1 -n normaldb -a ‘data,fra’
wxwl_iop->
[/shell]

最后,对于配置了Oracle Restart环境下的数据库,分别以grid用户通过srvctl命令来启动listener、ASM实例,以oracle用户通过srvctl命令来启动数据库。

启示:对于已经配置好Oracle数据库的服务器,真的不要轻易修改主机名或者IP地址,尤其是RAC环境或者是11gR2开始的配置有Oracle Restart的standalone Server环境。当然,在安装和配置数据库之前,务必要做好充足的前期规划。当然,如果非要修改的话,也要配合DBA一起来重新配置。

附:关于文章提及为什么要在/etc/oracle/scls_scr路径下,查看是否存在关于主机名相关的Oracle Restart配置信息,将在下一篇中分享O(∩_∩)O~

Grid软件安装过程中orainstRoot.sh和root.sh脚本到底做了什么?

在Linux平台上,从Oracle 11gR2开始,不论是单实例的库,还是RAC库,如果需要用到ASM时,通常都需要单独安装Grid Infrastructure软件。而在Linux平台上以图形界面【OUI Oracle Universal Installer】安装Oracle 11gR2 Grid软件过程中,会提示以root用户执行orainstRoot.sh和root.sh脚本【如下图一:单实例安装,图二:RAC安装】。

Linux平台安装单实例时,grid软件需要执行的脚本

图一:Linux平台安装单实例时,grid软件需要执行的脚本

Linux平台安装RAC时,grid软件需要执行的脚本

图二:Linux平台安装RAC时,grid软件需要执行的脚本

那么这两个脚本分别做了什么,有什么作用呢?本文简单记录:

一 /u01/app/oraInventory/orainstRoot.sh脚本的作用

1 添加grid用户及oinstall组对Oracle Central Inventory 目录的读写权限;

Oracle Central Inventory 目录默认情况下,位于$ORACLE_BASE的上一级目录下,如下图三:

Oracle Central Inventory

图三:Oracle Central Inventory的路径

2 删除其它用户组对该目录的读、写、执行权限;

[shell]11gOCM-> pwd
/u01/app
11gOCM> ls
total 20
drwxrwxr-x. 3 grid oinstall 4096 Oct 28 10:38 11.2.0
drwxr-xr-x 2 grid oinstall 4096 Oct 30 10:20 acfs
drwxrwxr-x. 8 grid oinstall 4096 Oct 28 10:38 grid
drwxrwxr-x. 8 oracle oinstall 4096 Nov 5 10:57 oracle
drwxrwx—. 6 grid oinstall 4096 Oct 10 11:09 oraInventory
11gOCM->[/shell]

3 创建/etc/oraInst.loc文件。

11gOCM->
cat /etc/oraInst.loc
inventory_loc=/u01/app/oraInventory
inst_group=oinstall
11gOCM->

二 /u01/app/11.2.0/grid/root.sh脚本的作用

1 cp grid用户的$ORACLE_HOME/bin下的oraenv、dbhome、coraenv可执行程序到/usr/local/bin路径下;

11gOCM-> pwd
/u01/app/11.2.0/grid/bin
11gOCM-> ll oraenv dbhome coraenv 
-rwxr-xr-x. 1 grid oinstall 5778 Jan  1  2000 coraenv
-rwxr-xr-x. 1 grid oinstall 2415 Jan  1  2000 dbhome
-rwxr-xr-x. 1 grid oinstall 6183 Jan  1  2000 oraenv
11gOCM-> ll /usr/local/bin/
total 232
-rwxr-xr-x. 1 grid root   5778 Oct 10 10:54 coraenv
-rwxr-xr-x. 1 grid root   2415 Oct 10 10:54 dbhome
-rwxr-xr-x. 1 grid root   6183 Oct 10 10:54 oraenv
-rwxr-xr-x  1 root root 214001 Oct 10 16:39 rlwrap
11gOCM->

2 创建/etc/oratab文件;

 
11gOCM-> cat /etc/oratab 
#Backup file is  /u01/app/11.2.0/grid/srvm/admin/oratab.bak.11gocm line added by Agent
#

# This file is used by ORACLE utilities.  It is created by root.sh
# and updated by either Database Configuration Assistant while creating
# a database or ASM Configuration Assistant while creating ASM instance.

# A colon, ':', is used as the field terminator.  A new line terminates
# the entry.  Lines beginning with a pound sign, '#', are comments.
#
# Entries are of the form:
#   $ORACLE_SID:$ORACLE_HOME:<N|Y>:
#
# The first and second fields are the system identifier and home
# directory of the database respectively.  The third filed indicates
# to the dbstart utility that the database should , "Y", or should not,
# "N", be brought up at system boot time.
#
# Multiple entries with the same $ORACLE_SID are not allowed.
#
#
+ASM:/u01/app/11.2.0/grid:N
orcl:/u01/app/oracle/product/11.2.0/db_1:N              # line added by Agent
11gOCM->

3 创建grid用户的OCR keys文件;

4 启动ohasd守护进程,并且将ohasd守护进程的启动文件写入到/etc/inittab文件中,以便于ohasd守护进程随着操作系统的启动而启动。如果是Oracle Enterprise Linux 6及以上版本的操作系统的话,则将该配置信息写入到/etc/init/oracle-ohasd.conf的独立配置文件中。

11gOCM-> cat /etc/init/oracle-ohasd.conf 
# Copyright (c) 2001, 2011, Oracle and/or its affiliates. All rights reserved. 
#
# Oracle OHASD startup

start on runlevel [35]
stop  on runlevel [!35]
respawn
exec /etc/init.d/init.ohasd run >/dev/null 2>&1

这是Oracle Enterprise Linux 6及以上版本的操作系统上的变化,在以前版本的操作系统下,由/etc/inittab文件来控制的条目均转移到/etc/init下单个文件来控制。

Oracle Grid Infrastructure包含哪些组件及其新特性?

Oracle 从11gR2开始,推出Grid Infrastructure[我个人习惯称之为Grid软件],那么Grid Infrastructure究竟有哪些优势,又包含哪些组件呢

首先,Grid Infrastructure包含下述组件

ASM:Automatic Storage Management;

ACFS:ASM Cluster File System;

ACFS snapshot;

Oracle Clusterware;

Oracle Restart。

其次,Grid Infrastructure 又有什么新特性呢?

探其优势,简单总结,从其包含的组件可知,如果在RAC环境下,无需单独安装clusterware软件,只需安装grid即可。如果在单实例环境下,数据库的存储如果想用ASM的话,那么就必须得先安装grid软件,并创建ASM磁盘组之后,方可安装oracle软件,在创建数据库的时候,存储才可以选择放在ASM磁盘组上。

另:对于非RAC架构下的Standalone 数据库,其Oracle Restart可以接管包括数据库服务、数据库实例、ASM实例、ASM磁盘组、数据库监听等组件的故障处理。举例来说,假设数据库在运行过程中,如果监听进程意外终止、或者数据库意外crash掉,那么Oracle Restart将接管并重启这些服务。

最后,11gR2的ASM同之前的ASM相比,又有什么增强呢?

如下图:

grid_asm

从上,看到在11gR2之前[包含11gR1及10g],ASM位于操作系统层面之上,ASM为数据库访问存储设备提供驱动,然后为上层的数据库提供存储,最后数据库供最上层的应用系统访问。

而从11gR2开始,看到ASM依然位于OS层面之上,和ASM同层的多出来个ADVM[ASM Dynamic Volume Manager]。往上,位于同层的除了database之外,还有ACFS[ASM Cluster File System]和ASM为第三方文件系统提供支持。

补充,简单描述ASM disk,ASM diskgroup,ADVM,ACFS的关系?

一块儿物理硬盘或物理硬盘的一个分区或存储设备上的LUN或一个LVM均可以被创建成一块ASM disk,一块或多块儿ASM磁盘在逻辑上可以被用来创建成一个ASM磁盘组。

有了ASM diskgroup之后,可以通过图形界面工具[ASMCA、OEM]或者命令行工具[ASMCMD、SQL*PLUS]在磁盘组上创建出一个或多个ASM Dynamic Volume 。ADVM上的卷设备文件对于操作系统而言被作为块设备来识别,其位于/dev/asm/<volume name>-nnn。

使用mkfs命令就可以在ADVM上创建ACFS类型的文件系统了[命令如:mkfs -t acfs -n advm-volume-name /dev/asm/<volume name>-nnn],最后类似于在操作系统上挂载其它设备一样,可以将/dev/asm/<volume name>-nnn直接挂在到操作系统上使用了。

索引扫描路径之3:Index Skip Scan

Index_Skip_Scan

在前面的两篇中,分别描述了Index Unique ScanIndex Range Scan,今天来描述Index Skip Scan。

 所谓的Index Skip Scan就是指,查询的WHERE条件中“SKIP”【跳过,没带上】了复合索引的前导列,只是指定了复合索引的其它列,这样CBO选择Index Skip Scan的路径来访问数据的方式。

如果复合索引的前导列唯一值越少(选择性低),非前导列索引唯一值越多(选择性高),则Index Skip Scan的性能越优。

事实上,在处理的过程中,Oracle将该复合索引上分为若干个logical subindexes(逻辑子索引),逻辑子索引的具体个数取决于前导列的唯一值值个数,即前导列有几个重复值就分为几个逻辑子索引。

Index Skip Scan适用的场景:

1 Index一定是一个复合索引(索引类型可以是UNIQUE的唯一索引,也可以是NONUIQUE的非唯一索引);
2 且复合索引的前导列重复值很多(即唯一值少),非前导列的重复值很少(唯一值多);
3 WHERE限制条件中,并没有带上该符合索引的前导列,仅仅是使用非前导列来过滤数据;

测试与验证:

1 创建测试表:

SQL> conn hr/hr
Connected.
SQL> create table skip_t as select object_id,object_name,object_type from all_objects;

Table created.

SQL> select count(*) from skip_t;

  COUNT(*)
----------
     55683

SQL> select count(distinct object_type) from skip_t;

COUNT(DISTINCTOBJECT_TYPE)
--------------------------
                        25

SQL> select count(distinct object_id) from skip_t;

COUNT(DISTINCTOBJECT_ID)
------------------------
                   55683

SQL> select count(distinct object_name) from skip_t;

COUNT(DISTINCTOBJECT_NAME)
--------------------------
                     30902

SQL>

从上,看到skip_t表数据情况:有55683条记录,且OBJECT_ID字段均唯一,无重复值。OBJECT_TYPE唯一值只有25个。
2 接下来,在OBJECT_TYPE和OBJECT_ID字段上创建复合索引,并收集统计信息:

SQL> create index idx_skip_t on skip_t(object_type,object_id);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'skip_t',cascade=>true);

PL/SQL procedure successfully completed.

SQL>

3 “Skip”掉复合索引前导列的情况,即让CBO按照预期的选择Index Skip Scan的路径来访问数据:

SQL> set autotrace trace
SQL> select object_type,object_id from skip_t where object_id=30000;

Execution Plan
----------------------------------------------------------
Plan hash value: 2119292092

-------------------------------------------------------------------------------
| Id  | Operation        | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |            |     1 |    15 |    26   (0)| 00:00:01 |
|*  1 |  INDEX SKIP SCAN | IDX_SKIP_T |     1 |    15 |    26   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OBJECT_ID"=30000)
       filter("OBJECT_ID"=30000)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         14  consistent gets
          0  physical reads
          0  redo size
        611  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> select object_type,object_name from skip_t where object_id=30000;

Execution Plan
----------------------------------------------------------
Plan hash value: 1237951313

------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |     1 |    40 |    27   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| SKIP_T     |     1 |    40 |    27   (0)| 00:00:01 |
|*  2 |   INDEX SKIP SCAN           | IDX_SKIP_T |     1 |       |    26   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=30000)
       filter("OBJECT_ID"=30000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         15  consistent gets
          0  physical reads
          0  redo size
        640  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

4 使用索引前导列的情况:

SQL> select object_type,object_id from skip_t where object_type='TABLE';

136 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3840590361

-------------------------------------------------------------------------------
| Id  | Operation        | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |            |  2227 | 33405 |     9   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| IDX_SKIP_T |  2227 | 33405 |     9   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OBJECT_TYPE"='TABLE')

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         12  consistent gets
          0  physical reads
          0  redo size
       3592  bytes sent via SQL*Net to client
        623  bytes received via SQL*Net from client
         11  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        136  rows processed

SQL> select object_type,object_name from skip_t where object_type='TABLE';

136 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1483414757

------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |  2227 | 80172 |    43   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| SKIP_T     |  2227 | 80172 |    43   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX_SKIP_T |  2227 |       |     9   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_TYPE"='TABLE')

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         40  consistent gets
          0  physical reads
          0  redo size
       6671  bytes sent via SQL*Net to client
        623  bytes received via SQL*Net from client
         11  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        136  rows processed

SQL>

遭遇ORA-17410及ORA-03137:TTC协议内部错误:[12333]

早上,项目组同事发过来一则应用系统连接数据库错误的请求技术支持,具体信息可见下述报错截图:

ora-17410

ora-17410

这则错误信息对于我还真是大姑娘上轿–头一回遇到。下面简单记录下处理错误的思路和方法:

环境说明:一套Linux X86_64位环境的Oracle 11gR2【11.2.0.1.0】的单实例数据库。应用系统通过调用Tomcat的JDBC驱动来访问数据,如下:

OS:

[oracle@oracle11g ~]$ uname -rm
2.6.18-194.el5 x86_64
[oracle@oracle11g ~]$

数据库:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

SQL>

同事说,上周五的时候,通过登录应用系统来访问数据库还一切正常,今天早上登录系统就报上述错误,更加觉得蹊跷。

1 首先,让同事登录应用系统,重现错误。然后查看alert日志文件,看到下述错误信息的线索:

Mon Sep 16 09:43:24 2013
Errors in file /u01/app/oracle/diag/rdbms/glndb/GLNDB/trace/GLNDB_ora_11692.trc  (incident=78959):
ORA-03137: TTC 协议内部错误: [12333] [6] [88] [77] [] [] [] []
Incident details in: /u01/app/oracle/diag/rdbms/glndb/GLNDB/incident/incdir_78959/GLNDB_ora_11692_i78959.trc
Mon Sep 16 09:43:27 2013
Trace dumping is performing id=[cdmp_20130916094327]
Mon Sep 16 09:43:28 2013
Sweep [inc][78959]: completed
Sweep [inc2][78959]: completed

2 根据alert日志提示,继续查看/u01/app/oracle/diag/rdbms/glndb/GLNDB/trace/GLNDB_ora_11692.trc文件,看到:

*** 2013-09-16 09:43:24.912
*** SESSION ID:(410.54431) 2013-09-16 09:43:24.912
*** CLIENT ID:() 2013-09-16 09:43:24.912
*** SERVICE NAME:(SYS$USERS) 2013-09-16 09:43:24.912
*** MODULE NAME:(JDBC Thin Client) 2013-09-16 09:43:24.912
*** ACTION NAME:() 2013-09-16 09:43:24.912

--- PROTOCOL VIOLATION DETECTED ---
...
...
...
  NamespaceDump:  
            Child Cursor:  Heap0=0x9fae5148 Heap6=0xb8bc6b48 Heap0 
Load Time=09-16-2013 09:43:24 Heap6 Load Time=09-16-2013 09:43:24 
  NamespaceDump:  
    Parent Cursor:  sql_id=4x3cjtfs37n4v parent=0xb21328e8 maxchild=5 plk=y ppn=n 
      Current Cursor Sharing Diagnostics Nodes:  
        Child Node: 3  ID=40 reason=Bind mismatch(33) size=2x4 
          init ranges in first pass: 1 
          selectivity: 0 
        Child Node: 2  ID=37 reason=Authorization Check failed(4) size=5x4 
          translation table position: 0 
          original handle: 3055622408 
          temp handle: 2705674440 
          schema: 161 
          synonym object number: 0 
        Child Node: 1  ID=3 reason=Optimizer mismatch(13) size=3x4 
          kxscflg: 32 
          kxscfl4: 4194560 
          dnum_kksfcxe: 1 
      Aged Out Cursor Sharing Diagnostic Nodes:  
        Child Node: 3  ID=37 reason=Authorization Check failed(4) size=5x4 
          translation table position: 0 
          original handle: 2705674440 
          temp handle: 3055622408 
          schema: 160 
          synonym object number: 0 
        Child Node: 2  ID=40 reason=Bind mismatch(25) size=0x0 
          extended cursor sharing:   
        Child Node: 1  ID=3 reason=Optimizer mismatch(13) size=3x4 
          kxscflg: 32 
          kxscfl4: 4194560 
          dnum_kksfcxe: 1 
        Child Node: 0  ID=37 reason=Authorization Check failed(4) size=5x4 
          translation table position: 0 
          original handle: 3055622408 
          temp handle: 2705674440 
          schema: 161 
          synonym object number: 0 
        Child Node: 0  ID=3 reason=Optimizer mismatch(13) size=3x4 
          kxscflg: 32 
          kxscfl4: 4194560 
          dnum_kksfcxe: 0 
        Child Node: 3  ID=37 reason=Authorization Check failed(4) size=5x4 
          translation table position: 0 
          original handle: 3055622408 
          temp handle: 2705674440 
          schema: 161 
          synonym object number: 0 
        Child Node: 2  ID=3 reason=Optimizer mismatch(13) size=3x4 
          kxscflg: 32 
          kxscfl4: 4194560 
          dnum_kksfcxe: 2 
        Child Node: 1  ID=37 reason=Authorization Check failed(4) size=5x4 
          translation table position: 0 
          original handle: 2705674440 
          temp handle: 3055622408 
          schema: 160 
          synonym object number: 0 
        Child Node: 0  ID=3 reason=Optimizer mismatch(13) size=3x4 
          kxscflg: 32 
          kxscfl4: 4194560 
          dnum_kksfcxe: 0 
        Child Node: 1  ID=37 reason=Authorization Check failed(4) size=5x4 
          translation table position: 0 
          original handle: 2705674440 
          temp handle: 3055622408 
          schema: 160 
          synonym object number: 0 
        Child Node: 0  ID=3 reason=Optimizer mismatch(13) size=3x4 
          kxscflg: 32 
          kxscfl4: 4194560 
          dnum_kksfcxe: 0 
        Child Node: 0  ID=3 reason=Optimizer mismatch(13) size=3x4 
          kxscflg: 32 
          kxscfl4: 4194560 
          dnum_kksfcxe: 0 
        Child Node: 0  ID=3 reason=Optimizer mismatch(13) size=3x4 
          kxscflg: 32 
          kxscfl4: 4194560 
          dnum_kksfcxe: 0 
        Child Node: 0  ID=3 reason=Optimizer mismatch(13) size=3x4 
          kxscflg: 32 
          kxscfl4: 4194560 
          dnum_kksfcxe: 0    kkscs=0xb2132df0 nxt=0xb21331f8 flg=11 cld=0 
hd=0xa1a00330 par=0xb21328e8
   Mutex 0xb2132df0(0, 0) idn 3000000000
   ct=19 hsh=0 unp=(nil) unn=0 hvl=b2133798 nhv=0 ses=(nil)
   hep=0xb2132e80 flg=80 ld=0 ob=(nil) ptr=(nil) fex=(nil)
   kkscs=0xb21331f8 nxt=0xb9a94588 flg=1a cld=1 hd=0xaf246848 par=0xb21328e8
   Mutex 0xb21331f8(0, 0) idn 0
   ct=12 hsh=0 unp=(nil) unn=0 hvl=b21336e0 nhv=0 ses=(nil)
   hep=0xb2133288 flg=80 ld=1 ob=0xb2c09860 ptr=0xab22a250 fex=0xab2295f0
   kkscs=0xb9a94588 nxt=0xb9a94930 flg=18 cld=2 hd=0xaf3c53e8 par=0xb21328e8
   Mutex 0xb9a94588(0, 0) idn 0
   ct=13 hsh=0 unp=(nil) unn=0 hvl=b21336a0 nhv=0 ses=(nil)
   hep=0xb9a94618 flg=80 ld=1 ob=0xb97655a0 ptr=0xa4e14f58 fex=0xa4e142f8
   kkscs=0xb9a94930 nxt=0xa840b1b0 flg=14 cld=3 hd=0xa1515da8 par=0xb21328e8
   Mutex 0xb9a94930(0, 0) idn 0
   ct=14 hsh=0 unp=(nil) unn=0 hvl=b2133638 nhv=1 ses=0xc0467f40
   hsv[0]=0
   hep=0xb9a949c0 flg=80 ld=1 ob=0xa81e1108 ptr=0xb16c2ea0 fex=0xb16c2240
   kkscs=0xa840b1b0 nxt=(nil) flg=18 cld=4 hd=0xaf3038c8 par=0xb21328e8
   Mutex 0xa840b1b0(0, 0) idn 0
   ct=19 hsh=0 unp=(nil) unn=0 hvl=b2133660 nhv=0 ses=(nil)
   hep=0xa840b240 flg=80 ld=1 ob=0x9fae5060 ptr=0xb8bc6b48 fex=0xb8bc5ee8
cursor instantiation=0x2ac85fded0d8 used=1379295804 exec_id=16777397 exec=1
 child#4(0xaf3038c8) pcs=0xa840b1b0
  clk=0xb6c31f68 ci=0x9fae5148 pn=0xaf984ba0 ctx=0xb8bc6b48
 kgsccflg=0 llk[0x2ac85fded0e0,0x2ac85fded0e0] idx=0
 xscflg=c0110676 fl2=5d000008 fl3=42222008 fl4=180
 sharing failure(s)=62000
----- Bind Info (kkscoacd) -----
 Bind#0
  oacdty=01 mxl=32(24) mxlc=00 mal=00 scl=00 pre=00
  oacflg=03 fl2=1000010 frm=01 csi=873 siz=32 off=0
  kxsbbbfp=2ac85fdecae0  bln=32  avl=00  flg=05
 Frames pfr 0x2ac85fdecea8 siz=27104 efr 0x2ac85fdecd98 siz=27088
 kxscphp=0x2ac85fdc03d8 siz=984 inu=456 nps=408
 kxscbhp=0x2ac85fdc01f8 siz=984 inu=176 nps=56
 kxscwhp=0x2ac85fdc0798 siz=16352 inu=15344 nps=5160
Starting SQL statement dump
SQL Information
user_id=161 user_name=WX_TMS_TEST module=JDBC Thin Client action=
sql_id=4x3cjtfs37n4v plan_hash_value=-785040234 problem_type=4
----- Current SQL Statement for this session (sql_id=4x3cjtfs37n4v) -----
select distinct viewbutton5_.SYS_VIEW_BUTTON_ID as SYS1_155_, 
viewbutton5_.RECORD_VERSION as RECORD2_155_, viewbutton5_.BTN_NAME as BTN3_155_, viewbutton5_.BTN_TITLE_KEY as BTN4_155_, viewbutton5_.BTN_MSG_KEY as BTN5_155_,
viewbutton5_.BUTTON_URL as BUTTON6_155_, viewbutton5_.SYS_MENU_ITEM_ID as SYS7_155_ 
from SYS_ROLE_MENU_BUTTON rolemenubu0_ left outer join SYS_ROLE_MENU_ITEM rolemenuit1_ 
on rolemenubu0_.SYS_ROLE_MENU_ITEM_ID=rolemenuit1_.SYS_ROLE_MENU_ITEM_ID left outer 
join SYS_ROLE role2_ on rolemenuit1_.SYS_ROLE_ID=role2_.SYS_ROLE_ID left outer join 
SYS_ROLE_USER roleusers3_ on role2_.SYS_ROLE_ID=roleusers3_.SYS_ROLE_ID left 
outer join SYS_USER user4_ on roleusers3_.USER_ID=user4_.USER_ID inner join 
SYS_VIEW_BUTTON viewbutton5_ on 
rolemenubu0_.SYS_VIEW_BUTTON_ID=viewbutton5_.SYS_VIEW_BUTTON_ID where user4_.USER_CODE=:1
sql_text_length=837
sql=select distinct viewbutton5_.SYS_VIEW_BUTTON_ID as SYS1_155_, viewbutton5_.RECORD_VERSION as RECORD2_155_, viewbutton5_.BTN_NAME as BTN3_155_, viewbutton5_.BTN_TITLE_KEY as BTN4_155_, viewbutton5_.BTN_MSG_KEY as BTN5_155_, viewbutton5_.BUTTON_URL as BUTTON
sql=6_155_, viewbutton5_.SYS_MENU_ITEM_ID as SYS7_155_ from SYS_ROLE_MENU_BUTTON rolemenubu0_ 
left outer join SYS_ROLE_MENU_ITEM rolemenuit1_ on rolemenubu0_.SYS_ROLE_MENU_ITEM_ID=rolemenuit1_.SYS_ROLE_MENU_ITEM_ID left outer 
join SYS_ROLE role2_ on rolemenuit
sql=1_.SYS_ROLE_ID=role2_.SYS_ROLE_ID left outer join SYS_ROLE_USER roleusers3_ on role2_.SYS_ROLE_ID=roleusers3_.SYS_ROLE_ID left outer join SYS_USER user4_ on roleusers3_.USER_ID=user4_.USER_ID inner join SYS_VIEW_BUTTON viewbutton5_ on 
rolemenubu0_.SYS_VIEW
sql=_BUTTON_ID=viewbutton5_.SYS_VIEW_BUTTON_ID where user4_.USER_CODE=:1
...
...

3 看到这里,没有太好的解决思路。同事给出提示信息说,同一套库,另外一套应用环境下,通过WebLogic来访问数据库时,则不会报错。难道是因为JDBC的不同驱动导致的?寻求MetaLink,Troubleshooting ORA-3137 [12333] Errors Encountered When Using Oracle JDBC Driver (Doc ID 1361107.1)

通过Metalink上的提示,得到造成该错误的可能原因:

① 部分版本的JDBC驱动会引起该错误:

Bug 9445675  NO MORE DATA TO READ FROM SOCKET WHEN USING END-TO-END METRICS

This bug does affect the JDBC driver.  This bug may be the cause when all of the following conditions are met:

  1. You are using the 10.1.x.x or the 11.2.0.1 JDBC driver; the bug does not affect 10.2.x.x, or 11.1.x.x versions of the driver, nor versions 11.2.0.2 or above
  2. You are using end-to-end metrics in your Java code
  3. The server side ORA-3137 [12333] error is accompanied by the client side Java exception “No more data to read from socket”

This bug is fixed in the 11.2.0.2 version of the JDBC driver and above.  It is discussed in the following notes:

Note 9445675.8 Bug 9445675 – “No more data” / ORA-3137 using end to end metrics with JDBC Thin

Note 1081275.1 “java.sql.SQLRecoverableException: No more data to read from socket” is Thrown When End-to-end Metrics is Used

② 数据库自身的BUG也会导致该错误:

Unpublished Bug:9703463 – ORA-3137 [12333] or ORA-600 [kpobav-1] When Using Bind Peeking

This bug affects versions 11.1.0.6, 11.1.0.7, and 11.2.0.1 of the RDBMS.  It is fixed in version 11.2.0.2 of the database.

It can also occur intermittently; similarly to unpublished Bug:8625762, this is a bind peeking bug.

==================================================

Unpublished Bug 9373370  DATA BASE RETURNS WRONG CURSORID WHEN THERE IS AN ORA-01013

This bug affects the 10.2, 11.1, 11.2.0.1, and 11.2.0.2 databases.  It is discussed in the following note:

Note 9373370.8 Bug 9373370 – The wrong cursor may be executed by JDBC thin following a query timeout / ORA-3137 [12333]

While the bug primarily manifests in ORA-1006 or ORA-1008 errors, the problem may also result in ORA-600 [12333] or ORA-3137 [12333] errors appearing on the server side.

判断本案例的错误应该是 Bug:9703463导致的,从trace文件中看到该SQL语句的确用到了绑定变量,且数据库版本是11.2.0.1.0。

解决思路:

①  可以通过更换不同版本的JDBC驱动来避免该错误,也说明为什么同事在另外一套环境下,通过Weblogic的JDBC来访问数据库时,则不会遇到该错误;

② 给数据库打patch,初步认为可以通过打Patch:9703463 来解决;

③ 通过修改数据库参数来规避该错误:

修改之后,同事用之前的Tomcat那个版本的驱动来重新访问数据库时,则不再报错;

④ 直接升级数据库版本至11.2.0.3.0,通过导出导入的方式将刚项目组下的schema数据复制一份到一套11.2.0.3.0的库上,重新使用Tomcat那个版本的驱动来重新访问数据库时,亦不再报错。

索引扫描路径之2:Index Range Scan

index_range_scan

在上一篇文章,索引扫描路径之1:Index Unique Scan 中,详细描述了Oracle的CBO优化器会在何种情形下选择通过Index Unique Scan的方式来访问数据。

本篇描述CBO优化器会在什么场景下选择Index Range Scan的路径来访问数据?

Index Range Scan通常见于通过索引去查找具有高选择性的数据。默认情况下,Oracle是按照被索引字段升序排序来存放索引记录的,如果被索引字段有重复值,则按照相应的ROWID做升序排序来存放。

较为常见的Index Range ScanWHERE条件如下:

Col1=:b1

Col1>:b1

Col1>=:b1

Col1<:b1

Col1<=:b1

或者是通过AND连接的满足上述可构成前导列的条件。再或者WHERE条件中有使用BETWEEN…AND条件。

同样,给出验证与说明:

系统版本、数据库版本和平台,以及测试表同与之前的环境。即选择Linux X86_64平台上的一套11.2.0.1.0库,使用HR这个schema下的departments表。

1 唯一索引,CBO选择Index Range Scan的场景

SQL> set autotrace trace
SQL> select * from departments where department_id>260;

Execution Plan
----------------------------------------------------------
Plan hash value: 3346631158

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |     1 |    20 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| DEPARTMENTS |     1 |    20 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | DEPT_ID_PK  |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DEPARTMENT_ID">260)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
        743  bytes sent via SQL*Net to client
        492  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

从上,即使是从唯一索引中获取一条数据[满足department_id>260条件的记录,确实只有1条],优化器也选择了走INDEX RANGE SCAN,因为上述SQL并不满足Index Unique Scan的条件。恰恰满足了INDEX RANGE SCAN 的条件。

2 接下来,一个典型的使用INDEX RANGE SCAN 的场景:

SQL> select * from departments where location_id=1700;

21 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 735461860

------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                  |    21 |   420 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| DEPARTMENTS      |    21 |   420 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | DEPT_LOCATION_IX |    21 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("LOCATION_ID"=1700)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          6  consistent gets
          1  physical reads
          0  redo size
       1468  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         21  rows processed

SQL>

或者:

SQL> select * from departments where location_id<=1700;

23 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 735461860

------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                  |    24 |   480 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| DEPARTMENTS      |    24 |   480 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | DEPT_LOCATION_IX |    24 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("LOCATION_ID"<=1700)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads
          0  redo size
       1502  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         23  rows processed

SQL>

再或者:

SQL> select * from departments where location_id>=1700;

25 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 735461860

------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                  |    25 |   500 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| DEPARTMENTS      |    25 |   500 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | DEPT_LOCATION_IX |    25 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("LOCATION_ID">=1700)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads
          0  redo size
       1561  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         25  rows processed

SQL>

可以看到,不管是location_id>=1700还是location_id=1700或者是location_id<=1700条件,CBO都选择了location_id字段上的索引DEPT_LOCATION_IX通过INDEX RANGE SCAN 的方式去获取数据。

3 接下来,BETWEEN…AND的场景:

SQL> select department_id,location_id,rowid from departments where location_id between 1500 and 2600;

25 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 735461860

------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                  |    26 |   494 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| DEPARTMENTS      |    26 |   494 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | DEPT_LOCATION_IX |    26 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

     2 - access("LOCATION_ID">=1500 AND "LOCATION_ID"<=2600)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads
          0  redo size
       1556  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         25  rows processed

SQL>

那么,如果是通过department_id字段来做BETWEEN…AND条件测试呢?即测试在一个唯一性索引上使用BETWEEN…AND条件的情形。

SQL> select department_id,location_id,rowid from departments where department_id between 120 and 130;

Execution Plan
----------------------------------------------------------
Plan hash value: 3346631158

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |     3 |    57 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| DEPARTMENTS |     3 |    57 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | DEPT_ID_PK  |     3 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DEPARTMENT_ID">=120 AND "DEPARTMENT_ID"<=130)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        748  bytes sent via SQL*Net to client
        492  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          2  rows processed

SQL>

测试结果发现,即使是唯一索引,CBO依然会选择INDEX RANGE SCAN。
反之,如果条件换做是department_id between 120 and 120的话,很明显,聪明的CBO这次肯定会选择Index Unique Scan。

SQL> select department_id,location_id,rowid from departments where department_id between 120 and 120;

Execution Plan
----------------------------------------------------------
Plan hash value: 4024094692

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |     1 |    19 |     1   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| DEPARTMENTS |     1 |    19 |     1   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | DEPT_ID_PK  |     1 |       |     0   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DEPARTMENT_ID"=120)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
        673  bytes sent via SQL*Net to client
        492  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

4 最后,看一个关于INDEX RANGE SCAN DESCENDING的场景:

CBO选择INDEX RANGE SCAN DESCENDING的方式来获取数据,适用于 ORDER BY column_name DESC的场景。在默认情况下,索引是按照索引关键字升序来存放数据的。

按照常理来讲,如果有ORDER BY column_name DESC的条件,那么

① 需要先从索引中读取数据;

② 再按照条件中column_name字段做降序排序。

而选择INDEX RANGE SCAN DESCENDING,则是直接在索引上按照索引关键字降序查找数据,这样正是为了避免先按照索引来查找数据,然后再做一次降序排序的操作。

SQL> set autotrace on
SQL> select department_id,location_id,rowid from departments where location_id between 1500 and 2600 order by location_id desc;

DEPARTMENT_ID LOCATION_ID ROWID
------------- ----------- ------------------
           80        2500 AAAz+PABRAAAACbAAH
           40        2400 AAAz+PABRAAAACbAAD
           20        1800 AAAz+PABRAAAACbAAB
          270        1700 AAAz+PABRAAAACbAAa
          260        1700 AAAz+PABRAAAACbAAZ
          250        1700 AAAz+PABRAAAACbAAY
          240        1700 AAAz+PABRAAAACbAAX
          230        1700 AAAz+PABRAAAACbAAW
          220        1700 AAAz+PABRAAAACbAAV
          210        1700 AAAz+PABRAAAACbAAU
          200        1700 AAAz+PABRAAAACbAAT

DEPARTMENT_ID LOCATION_ID ROWID
------------- ----------- ------------------
          190        1700 AAAz+PABRAAAACbAAS
          180        1700 AAAz+PABRAAAACbAAR
          170        1700 AAAz+PABRAAAACbAAQ
          160        1700 AAAz+PABRAAAACbAAP
          150        1700 AAAz+PABRAAAACbAAO
          140        1700 AAAz+PABRAAAACbAAN
          130        1700 AAAz+PABRAAAACbAAM
          120        1700 AAAz+PABRAAAACbAAL
          110        1700 AAAz+PABRAAAACbAAK
          100        1700 AAAz+PABRAAAACbAAJ
           90        1700 AAAz+PABRAAAACbAAI

DEPARTMENT_ID LOCATION_ID ROWID
------------- ----------- ------------------
           30        1700 AAAz+PABRAAAACbAAC
           10        1700 AAAz+PABRAAAACbAAA
           50        1500 AAAz+PABRAAAACbAAE

25 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2689299823

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |    26 |   494 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | DEPARTMENTS      |    26 |   494 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN DESCENDING| DEPT_LOCATION_IX |    26 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("LOCATION_ID">=1500 AND "LOCATION_ID"<=2600)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads
          0  redo size
       1575  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         25  rows processed

SQL>

从上可以看到,对于LOCATION_ID=1700的记录,则正是按照ROWID做的降序排序操作。

索引扫描路径之1:Index Unique Scan

index_unique_scan

近期在读一本名为《Expert Indexing in Oracle Database 11g》的书,个人觉得写的很不错,收获颇多。在这里就把一些关于CBO 通过INDEX 扫描数据的相关知识稍作整理,算是读书笔记。

其中,在读到第7章,Tuning Index Usage时,提到常见的通过索引访问数据的方式有下述几种:

Index Unique Scan;

Index Range Scan;

Index Skip Scan;

Index Full Scan;

Index Fast Full Scan;

        这里,首先说Index Unique Scan。

CBO【Cost Based Optimizer】通常在什么场景下会选择通过Index Unique Scan的方式来访问数据呢?

1 必须是通过唯一索引【UNIQUE】来访问数据;

2 通过唯一索引来访问数据时,每次返回的记录数必须是1条;

3 WHERE从句中必须要用等值【=】条件来过滤数据;

        我个人目前的理解是,必须同时满足上述3个条件,CBO才会选择通过Index Unique Scan的路径来访问数据。

下面给出验证:

1 操作系统版本和数据库版本:
操作系统:

[oracle@oracle11g ~]$ uname -rm
2.6.18-194.el5 x86_64
[oracle@oracle11g ~]$

数据库:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

SQL>

2 测试表、表结构、表数据如下:

SQL> conn hr/hr
Connected.
SQL> desc departments;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 DEPARTMENT_ID                             NOT NULL NUMBER(4)
 DEPARTMENT_NAME                           NOT NULL VARCHAR2(30)
 MANAGER_ID                                         NUMBER(6)
 LOCATION_ID                                        NUMBER(4)

SQL> select * from departments;

DEPARTMENT_ID DEPARTMENT_NAME                MANAGER_ID LOCATION_ID
------------- ------------------------------ ---------- -----------
           10 Administration                        200        1700
           20 Marketing                             201        1800
           30 Purchasing                            114        1700
           40 Human Resources                       203        2400
           50 Shipping                              121        1500
           60 IT                                    103        1400
           70 Public Relations                      204        2700
           80 Sales                                 145        2500
           90 Executive                             100        1700
          100 Finance                               108        1700
          110 Accounting                            205        1700

DEPARTMENT_ID DEPARTMENT_NAME                MANAGER_ID LOCATION_ID
------------- ------------------------------ ---------- -----------
          120 Treasury                                         1700
          130 Corporate Tax                                    1700
          140 Control And Credit                               1700
          150 Shareholder Services                             1700
          160 Benefits                                         1700
          170 Manufacturing                                    1700
          180 Construction                                     1700
          190 Contracting                                      1700
          200 Operations                                       1700
          210 IT Support                                       1700
          220 NOC                                              1700

DEPARTMENT_ID DEPARTMENT_NAME                MANAGER_ID LOCATION_ID
------------- ------------------------------ ---------- -----------
          230 IT Helpdesk                                      1700
          240 Government Sales                                 1700
          250 Retail Sales                                     1700
          260 Recruiting                                       1700
          270 Payroll                                          1700

27 rows selected.

SQL>

3 该表上的索引信息如下:

SQL> select index_name,index_type,table_name,uniqueness from user_indexes where table_name='DEPARTMENTS';

INDEX_NAME                     INDEX_TYPE                  TABLE_NAME                     UNIQUENES
------------------------------ --------------------------- ------------------------------ ---------
DEPT_ID_PK                     NORMAL                      DEPARTMENTS                    UNIQUE
DEPT_LOCATION_IX               NORMAL                      DEPARTMENTS                    NONUNIQUE

SQL> select * from user_ind_columns where table_name='DEPARTMENTS';

INDEX_NAME                     TABLE_NAME                     COLUMN_NAME     COLUMN_POSITION COLUMN_LENGTH CHAR_LENGTH DESC
------------------------------ ------------------------------ --------------- --------------- ------------- ----------- ----
DEPT_ID_PK                     DEPARTMENTS                    DEPARTMENT_ID                 1            22           0 ASC
DEPT_LOCATION_IX               DEPARTMENTS                    LOCATION_ID                   1            22           0 ASC

SQL>

从上看到departments表有2个索引,其中在DEPARTMENT_ID字段上的索引DEPT_ID_PK 为主键索引,是个UNIQUE类型的索引。而LOCATION_ID 字段上的DEPT_LOCATION_IX则为非唯一【NONUNIQUE】索引。

4 通过该表上的主键字段department_id来定位数据:

SQL> set autotrace trace
SQL> select department_id from departments where department_id=270;

Execution Plan
----------------------------------------------------------
Plan hash value: 1856623209

--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |     1 |     4 |     0   (0)| 00:00:01 |
|*  1 |  INDEX UNIQUE SCAN| DEPT_ID_PK |     1 |     4 |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("DEPARTMENT_ID"=270)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          1  consistent gets
          0  physical reads
          0  redo size
        532  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

从上述执行计划里看到,CBO选择了Index Unique Scan,因为department_id字段上有一个DEPT_ID_PK 的UNIQUE类型的索引,且上述SQL满足Index Unique Scan的条件。

而如果WHERE条件换成department_id>=270时,则CBO选择了Index Range Scan的访问路径,因为该WHERE条件是>=而非等值匹配。CBO认为返回的记录数可能不止1条,即使实际情况下满足条件的记录数确实是1条,所以选择了Index Range Scan。

SQL> select department_id from departments where department_id>=270;

Execution Plan
----------------------------------------------------------
Plan hash value: 4271874676

-------------------------------------------------------------------------------
| Id  | Operation        | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |            |     1 |     4 |     1   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| DEPT_ID_PK |     1 |     4 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("DEPARTMENT_ID">=270)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          1  consistent gets
          0  physical reads
          0  redo size
        532  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

5 通过该表上的location_id字段来定位数据的场景:

SQL> select location_id from departments where location_id=2700;

Execution Plan
----------------------------------------------------------
Plan hash value: 2790119751

-------------------------------------------------------------------------------------
| Id  | Operation        | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                  |     1 |     3 |     1   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| DEPT_LOCATION_IX |     1 |     3 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("LOCATION_ID"=2700)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          1  consistent gets
          0  physical reads
          0  redo size
        529  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> select location_id from departments where location_id=1700;

21 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2790119751

-------------------------------------------------------------------------------------
| Id  | Operation        | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                  |    21 |    63 |     1   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| DEPT_LOCATION_IX |    21 |    63 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("LOCATION_ID"=1700)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        901  bytes sent via SQL*Net to client
        535  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         21  rows processed

SQL>

从上述执行计划看到,通过LOCATION_ID 字段上的DEPT_LOCATION_IX来定位数据时,不管是通过location_id=2700还是location_id=1400或者location_id=1700,结果都是走INDEX RANGE SCAN。因为DEPT_LOCATION_IX不是唯一索引,即使location_id=2700或者location_id=1400条件能确保返回的记录数只有1条,CBO也不会选择Index Unique Scan。

6  通过唯一索引、非唯一索引来定位数据的场景:

非唯一索引的情况:

SQL> create table all_t as select object_id,object_name from all_objects;

Table created.

SQL> desc all_t;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OBJECT_ID                                 NOT NULL NUMBER
 OBJECT_NAME                               NOT NULL VARCHAR2(30)

SQL> select table_name,index_name,index_type,uniqueness from user_indexes where table_name='ALL_T';

no rows selected

SQL> create index idx_all_t on all_t(object_id);

Index created.

SQL> select table_name,index_name,index_type,uniqueness from user_indexes where table_name='ALL_T';

TABLE_NAME                     INDEX_NAME                     INDEX_TYPE                  UNIQUENES
------------------------------ ------------------------------ --------------------------- ---------
ALL_T                          IDX_ALL_T                      NORMAL                      NONUNIQUE

SQL> exec dbms_stats.gather_table_stats(user,'all_t',cascade=>true);

PL/SQL procedure successfully completed.

SQL> set autotrace trace
SQL> select * from all_t where object_id=30000;

Execution Plan
----------------------------------------------------------
Plan hash value: 2459308467

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |     1 |    31 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ALL_T     |     1 |    31 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX_ALL_T |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=30000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        627  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

唯一索引的情况:

SQL> drop index idx_all_t ;

Index dropped.

SQL> create unique index idx_all_t on all_t(object_id);

Index created.

SQL> select table_name,index_name,index_type,uniqueness from user_indexes where table_name='ALL_T';

TABLE_NAME                     INDEX_NAME                     INDEX_TYPE                  UNIQUENES
------------------------------ ------------------------------ --------------------------- ---------
ALL_T                          IDX_ALL_T                      NORMAL                      UNIQUE

SQL> set autotrace trace
SQL> exec dbms_stats.gather_table_stats(user,'all_t',cascade=>true);

PL/SQL procedure successfully completed.

SQL> select * from all_t where object_id=30000;

Execution Plan
----------------------------------------------------------
Plan hash value: 3932529197

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |     1 |    31 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ALL_T     |     1 |    31 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | IDX_ALL_T |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=30000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        491  bytes sent via SQL*Net to client
        512  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

从上可以看到,虽然测试表all_t上的字段object_id字段虽然的确可以确保唯一,但是在第一种情况下,索引IDX_ALL_T为非唯一索引的时候,CBO毅然选择了INDEX RANGE SCAN,而第二种情况下,却是Index Unique Scan。

也就是说,即使可以确保表中某个字段是唯一的,如果该字段上的索引不是唯一的,那么CBO选择的将是通过INDEX RANGE SCAN的路径来访问数据。想要表达的是,真实场景下这种情况的确会存在,因为有些系统是通过应用程序来保证表中的字段唯一,而并没有在表上对应的字段上通过创建唯一约束或者是唯一索引来保证数据的唯一性。那么,这样的话,将会导致CBO在选择执行计划的情况下,有可能会出现偏差。所以,对于这种情况下,即使应用程序可以保证数据的唯一性,最好还是在表上创建一个唯一索引比较稳妥。

最后,留一个小疑问给所有的看到这篇文章的朋友,对于第6种情况,是不是第二种场景下的Index Unique Scan的效率要高于第一种场景下的INDEX RANGE SCAN?因为,从执行计划来看的话,一致性读的次数分别为3和4。

详细记录一则RAC数据库恢复到异机单实例的案例

背景描述:来源于一套客户的生产系统的恢复案例,Linux平台的10.2.0.5.0的双节点RAC数据库,由于项目组的应用程序BUG导致,在7月30号下午4点30分左右执行了误操作,将生产系统中一张重要表的所有记录全部update一遍。

处理流程,接到电话之后,让项目组同事紧急使用闪回表尝试找回数据,由于发现错误时已是7月30号晚上,再加上undo_retention设置的偏低,导致报错ORA-01555的错误,未果。不得已让项目组同事参照以前的一则使用LogMiner来恢复数据,依然不幸,找不回数据。

不得已考虑使用之前的备份+归档来恢复到异机单实例环境,执行不完全恢复尝试恢复数据,本文用以记录和还原当时的详细步骤。

1 找到在误操作执行之前的最近一次的有效备份,以及从备份以来到误操作之前的所有归档日志。因为库小,该RAC数据库目前配置在每天晚上23:00开始执行全备份,且包括归档日志的备份。于是,找到29号晚上的全备的备份集。

发现,该库备份的结果是写到+FLASH磁盘组的,10g情况下貌似还不可以直接将数据从ASM磁盘组直接copy到文件系统上来存放。于是,通过RMAN工具将29号23点的全备的备份集重新备份一份到文件系统上来。

找29号晚上的全备结果:

list backup tag=TAG20130729T230017;

根据当时在RMAN中返回的备份集信息,重新备份29号晚上的备份集到文件系统:

 backup backupset 1822 format '/home/oracle/1822.bak';
 backup backupset 1823 format '/home/oracle/1823.bak';
 backup backupset 1824 format '/home/oracle/1824.bak';
 backup backupset 1825 format '/home/oracle/1825.bak';
 backup backupset 1826 format '/home/oracle/1826.bak';
 backup backupset 1827 format '/home/oracle/1827.bak';

找到从备份以来到误操作之前的所有归档日志,并备份这些归档日志到文件系统:

select name,FIRST_TIME from v$archived_log where first_time  between to_date('2013/07/29 22:00:00','yyyy/mm/dd hh24:mi:ss') and to_date('2013/07/30 16:20:00','yyyy/mm/dd hh24:mi:ss');

并且,将新的备份集copy或者FTP至新的单实例环境的机器上。这里存放在单实例环境下的/home/oracle/zl路径下。

2 获取原RAC数据库的参数文件。
利用RAC数据库的SPFILE生成新的PFILE命令:

SQL>CREATE PFILE='/home/oracle/pfile.txt' FROM SPFILE;

插曲1:原参数文件内容如下:

zldb2.__db_cache_size=398458880
zldb1.__db_cache_size=406847488
zldb1.__java_pool_size=4194304
zldb2.__java_pool_size=8388608
zldb1.__large_pool_size=4194304
zldb2.__large_pool_size=4194304
zldb2.__shared_pool_size=629145600
zldb1.__shared_pool_size=624951296
zldb1.__streams_pool_size=4194304
zldb2.__streams_pool_size=4194304
*.audit_file_dest='/u01/app/oracle/admin/zldb/adump'
*.background_dump_dest='/u01/app/oracle/admin/zldb/bdump'
*.cluster_database_instances=2
*.cluster_database=true
*.compatible='10.2.0.5.0'
*.control_files='+DATA/zldb/controlfile/current.256.793214967','+FLASH/zldb/controlfile/current.256.793214967'
*.core_dump_dest='/u01/app/oracle/admin/zldb/cdump'
*.db_block_size=8192
*.db_create_file_dest='+DATA'
*.db_domain=''
*.db_file_multiblock_read_count=16
*.db_name='zldb'
*.db_recovery_file_dest='+FLASH'
*.db_recovery_file_dest_size=107374182400
*.dispatchers='(PROTOCOL=TCP) (SERVICE=zldbXDB)'
zldb1.instance_number=1
zldb2.instance_number=2
*.job_queue_processes=10
Node1.local_listener='(ADDRESS =(PROTOCOL = TCP)(HOST = 192.168.33.31)(PORT = 22))'
Node2.local_listener='(ADDRESS =(PROTOCOL = TCP)(HOST = 192.168.33.32)(PORT = 22))'
*.open_cursors=300
*.pga_aggregate_target=3357540352
*.processes=150
*.remote_listener='LISTENERS_ZLDB'
*.remote_login_passwordfile='exclusive'
*.sga_target=1053818880
zldb2.thread=2
zldb1.thread=1
*.undo_management='AUTO'
*.undo_retention=86400
zldb1.undo_tablespace='UNDOTBS1'
zldb2.undo_tablespace='UNDOTBS2'
*.user_dump_dest='/u01/app/oracle/admin/zldb/udump'

插曲2:从上可以看到,该库的listeners参数肯定是被修改过,因为从alert日志文件内容看到下述信息:

ALTER SYSTEM SET local_listener='(ADDRESS =(PROTOCOL = TCP)(HOST = 192.168.33.31)(PORT = 22))' SCOPE=BOTH SID='Node1';
Fri Nov 23 11:36:19 CST 2012
ALTER SYSTEM SET local_listener='(ADDRESS =(PROTOCOL = TCP)(HOST = 192.168.33.32)(PORT = 22))' SCOPE=BOTH SID='Node2';

3 修改RAC数据库的参数文件,修改为一个可以为单实例数据库使用的初始化参数文件。修改之后的内容如下:

*.__db_cache_size=465567744
*.__java_pool_size=4194304
*.__large_pool_size=4194304
*.__shared_pool_size=541065216
*.__streams_pool_size=8388608
*.audit_file_dest='/u01/app/admin/zldb/adump'
*.background_dump_dest='/u01/app/admin/zldb/bdump'
#*.cluster_database_instances=1
*.cluster_database=false
*.compatible='10.2.0.5.0'
*.control_files='/u01/app/oradata/zldb/control01.ctl'
*.core_dump_dest='/u01/app/admin/zldb/cdump'
*.db_block_size=8192
*.db_create_file_dest='/u01/app/oradata/zldb'
*.db_domain=''
*.db_file_multiblock_read_count=16
*.db_name='zldb'
*.db_recovery_file_dest='/u01/app/oradata/zldb'
*.db_recovery_file_dest_size=107374182
*.dispatchers='(PROTOCOL=TCP) (SERVICE=zldbXDB)'
*.instance_number=1
*.job_queue_processes=10
*.open_cursors=300
*.pga_aggregate_target=3357540352
*.processes=150
*.remote_login_passwordfile='exclusive'
*.sga_target=1053818880
*.thread=1
*.undo_management='AUTO'
*.undo_retention=86400
*.undo_tablespace='UNDOTBS1'
*.user_dump_dest='/u01/app/admin/zldb/udump'

4 创建对应的目录。在一个新的已经安装好10.2.0.5.0软件的环境下,创建下述目录:

/u01/app/oradata/zldb
/u01/app/admin/zldb/adump
/u01/app/admin/zldb/bdump
/u01/app/admin/zldb/cdump
/u01/app/admin/zldb/udump

5 启动单实例库到NOMOUNT状态。

[oracle@localhost ~]$ export ORACLE_SID=zldb
[oracle@localhost ~]$ env|grep ORA
ORACLE_SID=zldb
ORACLE_BASE=/u01/app
ORACLE_TERM=xterm
ORACLE_HOME=/u01/app/oracle/product/10.2.5/db_1
[oracle@localhost zl]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.5.0 - Production on Wed Aug 21 17:42:57 2013

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.

Connected to an idle instance.

SQL> startup nomout pfile='/home/oracle/pfile.txt';
SP2-0714: invalid combination of STARTUP options
SQL> startup nomount pfile='/home/oracle/pfile.txt';
ORACLE instance started.

Total System Global Area 1056964608 bytes
Fixed Size                  2101608 bytes
Variable Size             260050584 bytes
Database Buffers          788529152 bytes
Redo Buffers                6283264 bytes
SQL>

6 在新环境下使用RMAN工具,还原控制文件。

[oracle@localhost ~]$ export ORACLE_SID=zldb
[oracle@localhost ~]$ env|grep ORA
ORACLE_SID=zldb
ORACLE_BASE=/u01/app
ORACLE_TERM=xterm
ORACLE_HOME=/u01/app/oracle/product/10.2.5/db_1
[oracle@localhost zl]$ rman target /

Recovery Manager: Release 10.2.0.5.0 - Production on Wed Aug 21 17:53:09 2013

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

connected to target database: zldb (not mounted)

RMAN> restore controlfile from '/home/oracle/zl/1823.bak';

Starting restore at 2013/08/21 17:55:49
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=156 devtype=DISK

channel ORA_DISK_1: restoring control file
channel ORA_DISK_1: restore complete, elapsed time: 00:00:03
output filename=/u01/app/oradata/zldb/control01.ctl
Finished restore at 2013/08/21 17:55:55

RMAN>

7 MOUNT 数据库(可直接在RMAN中执行。也可以通过SQL*PluS完成,这里直接在RMAN下完成。),然后注册备份信息到控制文件中

RMAN> alter database mount;

database mounted
released channel: ORA_DISK_1

RMAN> catalog start with '/home/oracle/zl';

Starting implicit crosscheck backup at 2013/08/21 17:56:53
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=156 devtype=DISK
Crosschecked 186 objects
Finished implicit crosscheck backup at 2013/08/21 17:56:56

Starting implicit crosscheck copy at 2013/08/21 17:56:56
using channel ORA_DISK_1
Finished implicit crosscheck copy at 2013/08/21 17:56:56

searching for all files in the recovery area
cataloging files...
no files cataloged

searching for all files that match the pattern /home/oracle/zl

List of Files Unknown to the Database
=====================================
File Name: /home/oracle/zl/1824.bak
File Name: /home/oracle/zl/arc1.arc
File Name: /home/oracle/zl/arc2.arc
File Name: /home/oracle/zl/1826.bak
File Name: /home/oracle/zl/1827.bak
File Name: /home/oracle/zl/1822.bak
File Name: /home/oracle/zl/1825.bak
File Name: /home/oracle/zl/1823.bak

Do you really want to catalog the above files (enter YES or NO)? yes
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: /home/oracle/zl/1824.bak
File Name: /home/oracle/zl/arc1.arc
File Name: /home/oracle/zl/arc2.arc
File Name: /home/oracle/zl/1826.bak
File Name: /home/oracle/zl/1827.bak
File Name: /home/oracle/zl/1822.bak
File Name: /home/oracle/zl/1825.bak
File Name: /home/oracle/zl/1823.bak

RMAN>

8 然后restore database。由于之前的数据文件是存放在ASM磁盘组上的,而这里新的单实例环境是使用文件系统来存放数据库的,所以,在RMAN工具中需要使用set newname来重命名数据文件,并将该信息重新写入到控制文件。

RMAN> run
2> {
3> allocate channel c1 type disk;
allocate channel c2 type disk;
allocate channel c3 type disk;
4> 5> 6> allocate channel c4 type disk;
7> set newname for datafile 1 to '/u01/app/oradata/zldb/1.dbf';
8> set newname for datafile 2 to '/u01/app/oradata/zldb/2.dbf';
9> set newname for datafile 3 to '/u01/app/oradata/zldb/3.dbf';
10> set newname for datafile 4 to '/u01/app/oradata/zldb/4.dbf';
11> set newname for datafile 5 to '/u01/app/oradata/zldb/5.dbf';
12> set newname for datafile 6 to '/u01/app/oradata/zldb/6.dbf';
13> set newname for datafile 7 to '/u01/app/oradata/zldb/7.dbf';
14> set newname for datafile 8 to '/u01/app/oradata/zldb/8.dbf';
15> set newname for datafile 9 to '/u01/app/oradata/zldb/9.dbf';
16> set newname for datafile 10 to '/u01/app/oradata/zldb/10.dbf';
17> set newname for datafile 11 to '/u01/app/oradata/zldb/11.dbf';
18> set newname for datafile 12 to '/u01/app/oradata/zldb/12.dbf';
19> set newname for datafile 13 to '/u01/app/oradata/zldb/13.dbf';
20> set newname for datafile 14 to '/u01/app/oradata/zldb/14.dbf';
21> set newname for datafile 15 to '/u01/app/oradata/zldb/15.dbf';
22> set newname for datafile 16 to '/u01/app/oradata/zldb/16.dbf';
23> set newname for datafile 17 to '/u01/app/oradata/zldb/17.dbf';
24> set newname for datafile 18 to '/u01/app/oradata/zldb/18.dbf';
25> set newname for datafile 19 to '/u01/app/oradata/zldb/19.dbf';
26> set newname for datafile 20 to '/u01/app/oradata/zldb/20.dbf';
27> set newname for datafile 21 to '/u01/app/oradata/zldb/21.dbf';
28> set newname for datafile 22 to '/u01/app/oradata/zldb/22.dbf';
29> set newname for datafile 23 to '/u01/app/oradata/zldb/23.dbf';
30> set newname for datafile 24 to '/u01/app/oradata/zldb/24.dbf';
31> set newname for datafile 25 to '/u01/app/oradata/zldb/25.dbf';
32> set newname for datafile 26 to '/u01/app/oradata/zldb/26.dbf';
33> restore database;
34> switch datafile all;
35> }

released channel: ORA_DISK_1
allocated channel: c1
channel c1: sid=156 devtype=DISK

allocated channel: c2
channel c2: sid=154 devtype=DISK

allocated channel: c3
channel c3: sid=153 devtype=DISK

allocated channel: c4
channel c4: sid=152 devtype=DISK

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

Starting restore at 2013/08/21 17:57:23

channel c1: starting datafile backupset restore
channel c1: specifying datafile(s) to restore from backup set
restoring datafile 00006 to /u01/app/oradata/zldb/6.dbf
restoring datafile 00023 to /u01/app/oradata/zldb/23.dbf
restoring datafile 00024 to /u01/app/oradata/zldb/24.dbf
restoring datafile 00025 to /u01/app/oradata/zldb/25.dbf
restoring datafile 00026 to /u01/app/oradata/zldb/26.dbf
channel c1: reading from backup piece /home/oracle/zl/1822.bak
channel c2: starting datafile backupset restore
channel c2: specifying datafile(s) to restore from backup set
restoring datafile 00005 to /u01/app/oradata/zldb/5.dbf
restoring datafile 00016 to /u01/app/oradata/zldb/16.dbf
restoring datafile 00017 to /u01/app/oradata/zldb/17.dbf
restoring datafile 00019 to /u01/app/oradata/zldb/19.dbf
restoring datafile 00020 to /u01/app/oradata/zldb/20.dbf
restoring datafile 00021 to /u01/app/oradata/zldb/21.dbf
restoring datafile 00022 to /u01/app/oradata/zldb/22.dbf
channel c2: reading from backup piece /home/oracle/zl/1826.bak
channel c3: starting datafile backupset restore
channel c3: specifying datafile(s) to restore from backup set
restoring datafile 00003 to /u01/app/oradata/zldb/3.dbf
restoring datafile 00004 to /u01/app/oradata/zldb/4.dbf
restoring datafile 00007 to /u01/app/oradata/zldb/7.dbf
restoring datafile 00009 to /u01/app/oradata/zldb/9.dbf
restoring datafile 00011 to /u01/app/oradata/zldb/11.dbf
restoring datafile 00014 to /u01/app/oradata/zldb/14.dbf
restoring datafile 00018 to /u01/app/oradata/zldb/18.dbf
channel c3: reading from backup piece /home/oracle/zl/1827.bak
channel c4: starting datafile backupset restore
channel c4: specifying datafile(s) to restore from backup set
restoring datafile 00001 to /u01/app/oradata/zldb/1.dbf
restoring datafile 00002 to /u01/app/oradata/zldb/2.dbf
restoring datafile 00008 to /u01/app/oradata/zldb/8.dbf
restoring datafile 00010 to /u01/app/oradata/zldb/10.dbf
restoring datafile 00012 to /u01/app/oradata/zldb/12.dbf
restoring datafile 00013 to /u01/app/oradata/zldb/13.dbf
restoring datafile 00015 to /u01/app/oradata/zldb/15.dbf
channel c4: reading from backup piece /home/oracle/zl/1825.bak
channel c1: restored backup piece 1
piece handle=/home/oracle/zl/1822.bak tag=TAG20130729T230017
channel c1: restore complete, elapsed time: 00:05:41

channel c3: restored backup piece 1
piece handle=/home/oracle/zl/1827.bak tag=TAG20130729T230017
channel c3: restore complete, elapsed time: 00:11:29
channel c4: restored backup piece 1
piece handle=/home/oracle/zl/1825.bak tag=TAG20130729T230017
channel c4: restore complete, elapsed time: 00:11:30
channel c2: restored backup piece 1
piece handle=/home/oracle/zl/1826.bak tag=TAG20130729T230017
channel c2: restore complete, elapsed time: 00:11:37
Finished restore at 2013/08/21 18:09:01

datafile 1 switched to datafile copy
input datafile copy recid=27 stamp=824062143 filename=/u01/app/oradata/zldb/1.dbf
datafile 2 switched to datafile copy
input datafile copy recid=28 stamp=824062143 filename=/u01/app/oradata/zldb/2.dbf
datafile 3 switched to datafile copy
input datafile copy recid=29 stamp=824062143 filename=/u01/app/oradata/zldb/3.dbf
datafile 4 switched to datafile copy
input datafile copy recid=30 stamp=824062143 filename=/u01/app/oradata/zldb/4.dbf
datafile 5 switched to datafile copy
input datafile copy recid=31 stamp=824062143 filename=/u01/app/oradata/zldb/5.dbf
datafile 6 switched to datafile copy
input datafile copy recid=32 stamp=824062143 filename=/u01/app/oradata/zldb/6.dbf
datafile 7 switched to datafile copy
input datafile copy recid=33 stamp=824062143 filename=/u01/app/oradata/zldb/7.dbf
datafile 8 switched to datafile copy
input datafile copy recid=34 stamp=824062143 filename=/u01/app/oradata/zldb/8.dbf
datafile 9 switched to datafile copy
input datafile copy recid=35 stamp=824062144 filename=/u01/app/oradata/zldb/9.dbf
datafile 10 switched to datafile copy
input datafile copy recid=36 stamp=824062144 filename=/u01/app/oradata/zldb/10.dbf
datafile 11 switched to datafile copy
input datafile copy recid=37 stamp=824062144 filename=/u01/app/oradata/zldb/11.dbf
datafile 12 switched to datafile copy
input datafile copy recid=38 stamp=824062144 filename=/u01/app/oradata/zldb/12.dbf
datafile 13 switched to datafile copy
input datafile copy recid=39 stamp=824062144 filename=/u01/app/oradata/zldb/13.dbf
datafile 14 switched to datafile copy
input datafile copy recid=40 stamp=824062144 filename=/u01/app/oradata/zldb/14.dbf
datafile 15 switched to datafile copy
input datafile copy recid=41 stamp=824062144 filename=/u01/app/oradata/zldb/15.dbf
datafile 16 switched to datafile copy
input datafile copy recid=42 stamp=824062144 filename=/u01/app/oradata/zldb/16.dbf
datafile 17 switched to datafile copy
input datafile copy recid=43 stamp=824062144 filename=/u01/app/oradata/zldb/17.dbf
datafile 18 switched to datafile copy
input datafile copy recid=44 stamp=824062144 filename=/u01/app/oradata/zldb/18.dbf
datafile 19 switched to datafile copy
input datafile copy recid=45 stamp=824062144 filename=/u01/app/oradata/zldb/19.dbf
datafile 20 switched to datafile copy
input datafile copy recid=46 stamp=824062144 filename=/u01/app/oradata/zldb/20.dbf
datafile 21 switched to datafile copy
input datafile copy recid=47 stamp=824062144 filename=/u01/app/oradata/zldb/21.dbf
datafile 22 switched to datafile copy
input datafile copy recid=48 stamp=824062144 filename=/u01/app/oradata/zldb/22.dbf
datafile 23 switched to datafile copy
input datafile copy recid=49 stamp=824062144 filename=/u01/app/oradata/zldb/23.dbf
datafile 24 switched to datafile copy
input datafile copy recid=50 stamp=824062144 filename=/u01/app/oradata/zldb/24.dbf
datafile 25 switched to datafile copy
input datafile copy recid=51 stamp=824062144 filename=/u01/app/oradata/zldb/25.dbf
datafile 26 switched to datafile copy
input datafile copy recid=52 stamp=824062144 filename=/u01/app/oradata/zldb/26.dbf
released channel: c1
released channel: c2
released channel: c3
released channel: c4

RMAN>

9 重命名联机日志文件。
重命名之前:

SQL> col status for a10
SQL> select * from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES    MEMBERS ARCHIVED  STATUS     FIRST_CHANGE# FIRST_TIME
---------- ---------- ---------- ---------- ---------- --------- ---------- ------------- -------------------
         1          1       1373   52428800          2 YES       INACTIVE       187740095 2013/07/29 16:58:43
         2          1       1374   52428800          2 NO        CURRENT        188086667 2013/07/29 22:00:53
         3          2       1989   52428800          2 NO        CURRENT        188149113 2013/07/29 23:00:04
         4          2       1988   52428800          2 YES       INACTIVE       188096691 2013/07/29 22:01:50

SQL> col member for a50
SQL> select * from v$logfile;

    GROUP# STATUS     TYPE                  MEMBER                                             IS_RECOVE
---------- ---------- --------------------- -------------------------------------------------- ---------
         1            ONLINE                +DATA/zldb/onlinelog/group_1.257.793214967         NO
         1            ONLINE                +FLASH/zldb/onlinelog/group_1.257.793214969        YES
         2            ONLINE                +DATA/zldb/onlinelog/group_2.258.793214969         NO
         2            ONLINE                +FLASH/zldb/onlinelog/group_2.258.793214969        YES
         3            ONLINE                +DATA/zldb/onlinelog/group_3.265.793215729         NO
         3            ONLINE                +FLASH/zldb/onlinelog/group_3.259.793215729        YES
         4            ONLINE                +DATA/zldb/onlinelog/group_4.266.793215731         NO
         4            ONLINE                +FLASH/zldb/onlinelog/group_4.260.793215731        YES

8 rows selected.

SQL>

开始重命名:

SQL> alter database rename file '+DATA/zldb/onlinelog/group_1.257.793214967' to '/u01/app/oradata/zldb/redo01a.log';

Database altered.

SQL> alter database rename file '+FLASH/zldb/onlinelog/group_1.257.793214969' to '/u01/app/oradata/zldb/redo01b.log';

Database altered.

SQL> alter database rename file '+DATA/zldb/onlinelog/group_2.258.793214969' to '/u01/app/oradata/zldb/redo02a.log';

Database altered.

SQL> alter database rename file '+FLASH/zldb/onlinelog/group_2.258.793214969' to '/u01/app/oradata/zldb/redo02b.log';

Database altered.

SQL> alter database rename file '+DATA/zldb/onlinelog/group_3.265.793215729' to '/u01/app/oradata/zldb/redo03a.log';

Database altered.

SQL> alter database rename file '+FLASH/zldb/onlinelog/group_3.259.793215729' to '/u01/app/oradata/zldb/redo03b.log';

Database altered.

SQL> alter database rename file '+DATA/zldb/onlinelog/group_4.266.793215731' to '/u01/app/oradata/zldb/redo04a.log';

Database altered.

SQL> alter database rename file '+FLASH/zldb/onlinelog/group_4.260.793215731' to '/u01/app/oradata/zldb/redo04b.log';

Database altered.

SQL>

重命名之后:

SQL> select * from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES    MEMBERS ARCHIVED  STATUS     FIRST_CHANGE# FIRST_TIME
---------- ---------- ---------- ---------- ---------- --------- ---------- ------------- -------------------
         1          1       1373   52428800          2 YES       INACTIVE       187740095 2013/07/29 16:58:43
         2          1       1374   52428800          2 NO        CURRENT        188086667 2013/07/29 22:00:53
         3          2       1989   52428800          2 NO        CURRENT        188149113 2013/07/29 23:00:04
         4          2       1988   52428800          2 YES       INACTIVE       188096691 2013/07/29 22:01:50

SQL> select * from v$logfile;

    GROUP# STATUS     TYPE                  MEMBER                                             IS_RECOVE
---------- ---------- --------------------- -------------------------------------------------- ---------
         1            ONLINE                /u01/app/oradata/zldb/redo01a.log                  NO
         1            ONLINE                /u01/app/oradata/zldb/redo01b.log                  NO
         2            ONLINE                /u01/app/oradata/zldb/redo02a.log                  NO
         2            ONLINE                /u01/app/oradata/zldb/redo02b.log                  NO
         3            ONLINE                /u01/app/oradata/zldb/redo03a.log                  NO
         3            ONLINE                /u01/app/oradata/zldb/redo03b.log                  NO
         4            ONLINE                /u01/app/oradata/zldb/redo04a.log                  NO
         4            ONLINE                /u01/app/oradata/zldb/redo04b.log                  NO

8 rows selected.

SQL>

10 RECOVER database。
这里,从第7步骤执行后的结果,我们可以从RMAN中看到,当前环境下,控制文件里记录的新的归档日志文件备份信息如下,其中已经包含RAC双线程下从7月29号22:00到7月30号16:20分之间的所有归档日志的备份信息:

RMAN> list backup of archivelog all;

List of Backup Sets
===================

BS Key  Size       Device Type Elapsed Time Completion Time    
------- ---------- ----------- ------------ -------------------
1824    3.50K      DISK        00:00:00     2013/07/31 16:34:06
        BP Key: 1824   Status: AVAILABLE  Compressed: NO  Tag: TAG20130731T163406
        Piece Name: /home/oracle/zl/arc1.arc

  List of Archived Logs in backup set 1824
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    1374    188086667  2013/07/29 22:00:53 188173593  2013/07/29 23:25:01
  1    1375    188173593  2013/07/29 23:25:01 188569236  2013/07/30 07:00:24
  1    1376    188569236  2013/07/30 07:00:24 188847801  2013/07/30 11:48:32
  1    1377    188847801  2013/07/30 11:48:32 189110687  2013/07/30 16:21:18

BS Key  Size       Device Type Elapsed Time Completion Time    
------- ---------- ----------- ------------ -------------------
1825    515.04M    DISK        00:00:00     2013/07/31 16:32:30
        BP Key: 1825   Status: AVAILABLE  Compressed: NO  Tag: TAG20130731T163230
        Piece Name: /home/oracle/zl/arc2.arc

  List of Archived Logs in backup set 1825
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  2    1987    188086615  2013/07/29 22:00:52 188096691  2013/07/29 22:01:50
  2    1988    188096691  2013/07/29 22:01:50 188149113  2013/07/29 23:00:04
  2    1989    188149113  2013/07/29 23:00:04 188173580  2013/07/29 23:25:01
  2    1990    188173580  2013/07/29 23:25:01 188279567  2013/07/30 01:21:06
  2    1991    188279567  2013/07/30 01:21:06 188423782  2013/07/30 04:09:46
  2    1992    188423782  2013/07/30 04:09:46 188569192  2013/07/30 07:00:24
  2    1993    188569192  2013/07/30 07:00:24 188672390  2013/07/30 08:56:37
  2    1994    188672390  2013/07/30 08:56:37 188756632  2013/07/30 10:16:49
  2    1995    188756632  2013/07/30 10:16:49 188847773  2013/07/30 11:48:31
  2    1996    188847773  2013/07/30 11:48:31 188937795  2013/07/30 13:26:24
  2    1997    188937795  2013/07/30 13:26:24 189020185  2013/07/30 14:49:20
  2    1998    189020185  2013/07/30 14:49:20 189110668  2013/07/30 16:21:17

RMAN>

开始RECOVER:

RMAN> recover database;

Starting recover at 2013/08/26 16:54:48
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=154 devtype=DISK

starting media recovery

channel ORA_DISK_1: starting archive log restore to default destination
channel ORA_DISK_1: restoring archive log
archive log thread=1 sequence=1374
channel ORA_DISK_1: restoring archive log
archive log thread=1 sequence=1375
channel ORA_DISK_1: restoring archive log
archive log thread=1 sequence=1376
channel ORA_DISK_1: restoring archive log
archive log thread=1 sequence=1377
channel ORA_DISK_1: reading from backup piece /home/oracle/zl/arc1.arc
Oracle Error: 
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01152: file 1 was not restored from a sufficiently old backup 
ORA-01110: data file 1: '/u01/app/oradata/zldb/1.dbf'

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 08/26/2013 16:54:52
ORA-19870: error reading backup piece /home/oracle/zl/arc1.arc
ORA-19809: limit exceeded for recovery files
ORA-19804: cannot reclaim 512 bytes disk space from 107374182 limit

RMAN>

插曲3:RECOVER报错,开始处理报错:

[oracle@localhost zldb]$ oerr ora 19870
19870, 00000, "error reading backup piece %s"
// *Cause: This error should be followed by other errors indicating
//         the cause of the problem.
// *Action: See other errors actions.
//          
// Do not use message 19871; it is used by RMAN client for testing previous
// resync time when using backup/standby control file.
[oracle@localhost zldb]$ oerr ora 19809
19809, 00000, "limit exceeded for recovery files"
//*Cause: The limit for recovery files specified by the
//        DB_RECOVERY_FILE_DEST_SIZE was exceeded.
//*Action:The error is accompanied by 19804. See message 19804 for further
//        details.
[oracle@localhost zldb]$ oerr ora 19804
19804, 00000, "cannot reclaim %s bytes disk space from %s limit"
// *Cause: Oracle cannot reclaim disk space of specified bytes from the
//         DB_RECOVERY_FILE_DEST_SIZE limit.
// *Action: There are five possible solutions:
//          1) Take frequent backup of recovery area using RMAN.
//          2) Consider changing RMAN retention policy.
//          3) Consider changing RMAN archivelog deletion policy.
//          4) Add disk space and increase DB_RECOVERY_FILE_DEST_SIZE.
//          5) Delete files from recovery area using RMAN.
[oracle@localhost zldb]$

从报错信息来看,应该是DB_RECOVERY_FILE_DEST_SIZE参数限制了数据恢复区的大小,导致RECOVER失败

SQL> show parameter db_recover

NAME                                 TYPE                              VALUE
------------------------------------ --------------------------------- ------------------------------
db_recovery_file_dest                string                            /u01/app/oradata/zldb
db_recovery_file_dest_size           big integer                       107374182
SQL> alter system set db_recovery_file_dest_size=40G;

System altered.

SQL> show parameter db_recover

NAME                                 TYPE                              VALUE
------------------------------------ --------------------------------- ------------------------------
db_recovery_file_dest                string                            /u01/app/oradata/zldb
db_recovery_file_dest_size           big integer                       40G
SQL>

重新执行RECOVER:

RMAN> recover database;

Starting recover at 2013/08/26 16:59:28
using channel ORA_DISK_1

starting media recovery

channel ORA_DISK_1: starting archive log restore to default destination
channel ORA_DISK_1: restoring archive log
archive log thread=1 sequence=1374
channel ORA_DISK_1: restoring archive log
archive log thread=1 sequence=1375
channel ORA_DISK_1: restoring archive log
archive log thread=1 sequence=1376
channel ORA_DISK_1: restoring archive log
archive log thread=1 sequence=1377
channel ORA_DISK_1: reading from backup piece /home/oracle/zl/arc1.arc
channel ORA_DISK_1: restored backup piece 1
piece handle=/home/oracle/zl/arc1.arc tag=TAG20130731T163406
channel ORA_DISK_1: restore complete, elapsed time: 00:00:02
channel ORA_DISK_1: starting archive log restore to default destination
channel ORA_DISK_1: restoring archive log
archive log thread=2 sequence=1989
channel ORA_DISK_1: restoring archive log
archive log thread=2 sequence=1990
channel ORA_DISK_1: restoring archive log
archive log thread=2 sequence=1991
channel ORA_DISK_1: restoring archive log
archive log thread=2 sequence=1992
channel ORA_DISK_1: restoring archive log
archive log thread=2 sequence=1993
channel ORA_DISK_1: restoring archive log
archive log thread=2 sequence=1994
channel ORA_DISK_1: restoring archive log
archive log thread=2 sequence=1995
channel ORA_DISK_1: restoring archive log
archive log thread=2 sequence=1996
channel ORA_DISK_1: restoring archive log
archive log thread=2 sequence=1997
channel ORA_DISK_1: restoring archive log
archive log thread=2 sequence=1998
channel ORA_DISK_1: reading from backup piece /home/oracle/zl/arc2.arc
channel ORA_DISK_1: restored backup piece 1
piece handle=/home/oracle/zl/arc2.arc tag=TAG20130731T163230
channel ORA_DISK_1: restore complete, elapsed time: 00:00:26
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_2_1989_91p67mxo_.arc thread=2 sequence=1989
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_1_1374_91p67kqh_.arc thread=1 sequence=1374
channel default: deleting archive log(s)
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_2_1989_91p67mxo_.arc recid=3334 stamp=824489982
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_2_1990_91p67my5_.arc thread=2 sequence=1990
channel default: deleting archive log(s)
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_1_1374_91p67kqh_.arc recid=3330 stamp=824489969
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_1_1375_91p67kqz_.arc thread=1 sequence=1375
channel default: deleting archive log(s)
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_2_1990_91p67my5_.arc recid=3335 stamp=824489982
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_2_1991_91p67myc_.arc thread=2 sequence=1991
channel default: deleting archive log(s)
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_2_1991_91p67myc_.arc recid=3336 stamp=824489982
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_2_1992_91p67myq_.arc thread=2 sequence=1992
channel default: deleting archive log(s)
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_2_1992_91p67myq_.arc recid=3337 stamp=824489982
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_2_1993_91p67yp3_.arc thread=2 sequence=1993
channel default: deleting archive log(s)
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_1_1375_91p67kqz_.arc recid=3331 stamp=824489969
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_1_1376_91p67kr9_.arc thread=1 sequence=1376
channel default: deleting archive log(s)
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_2_1993_91p67yp3_.arc recid=3342 stamp=824489987
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_2_1994_91p67ysz_.arc thread=2 sequence=1994
channel default: deleting archive log(s)
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_2_1994_91p67ysz_.arc recid=3343 stamp=824489987
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_2_1995_91p67n06_.arc thread=2 sequence=1995
channel default: deleting archive log(s)
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_2_1995_91p67n06_.arc recid=3338 stamp=824489982
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_2_1996_91p67n0f_.arc thread=2 sequence=1996
channel default: deleting archive log(s)
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_1_1376_91p67kr9_.arc recid=3332 stamp=824489969
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_1_1377_91p67krh_.arc thread=1 sequence=1377
channel default: deleting archive log(s)
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_2_1996_91p67n0f_.arc recid=3339 stamp=824489982
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_2_1997_91p67n0o_.arc thread=2 sequence=1997
channel default: deleting archive log(s)
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_2_1997_91p67n0o_.arc recid=3340 stamp=824489982
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_2_1998_91p67ylw_.arc thread=2 sequence=1998
channel default: deleting archive log(s)
archive log filename=/u01/app/oradata/zldb/ZLDB/archivelog/2013_08_26/o1_mf_2_1998_91p67ylw_.arc recid=3341 stamp=824489987
unable to find archive log
archive log thread=2 sequence=1999
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 08/26/2013 17:01:32
RMAN-06054: media recovery requesting unknown log: thread 2 seq 1999 lowscn 189110668

RMAN>

这次,从RECOVER报错信息看到RMAN识别不了 thread 2 seq 1999的归档日志。的确,这也是我们预期的,因为thread 2 seq 1999的归档日志从上述的归档日志备份信息是不存在的,所以,RMAN不识别,再者也不是我们需要的归档日志。恢复到此阶段,直接OK。
11 重新以resetlogs方式来启动新的单实例数据库,并确认恢复之后的结果。

SQL> alter database open resetlogs;

Database altered.

SQL>
SQL> select count(*) from CUL_GMAR.OPM_BL_HEADER;

  COUNT(*)
----------
      7791

SQL>

经和项目组确认,已经是最大限度的找回了误操作的数据。

12 删除RAC的多余UNDO表空间和线程。
处理原RAC下的第2个线程:

SQL> select THREAD#, STATUS, ENABLED from v$thread;

   THREAD# STATUS     ENABLED
---------- ---------- ------------------------
         1 OPEN       PUBLIC
         2 CLOSED     PUBLIC

SQL> select * from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES    MEMBERS ARCHIVED  STATUS     FIRST_CHANGE# FIRST_TIME
---------- ---------- ---------- ---------- ---------- --------- ---------- ------------- -------------------
         1          1         19   52428800          2 YES       INACTIVE       189209640 2013/08/26 22:00:54
         2          1         20   52428800          2 NO        CURRENT        189295930 2013/08/27 02:00:13
         3          2          7   52428800          2 NO        CURRENT        189180639 2013/08/26 17:51:27
         4          2          6   52428800          2 YES       INACTIVE       189169179 2013/08/26 17:50:16

SQL>

删除第2个线程,及删除线程2的联机重做日志文件:

SQL> alter database disable thread 2;

Database altered.
SQL> alter database drop logfile group 3;
alter database drop logfile group 3
*
ERROR at line 1:
ORA-00350: log 3 of instance zldb2 (thread 2) needs to be archived
ORA-00312: online log 3 thread 2: '/u01/app/oradata/zldb/redo03a.log'
ORA-00312: online log 3 thread 2: '/u01/app/oradata/zldb/redo03b.log'

SQL> alter database clear unarchived logfile group 3;

Database altered.

SQL> alter database drop logfile group 3;

Database altered.

SQL> alter database drop logfile group 4;

Database altered.

SQL> select THREAD#, STATUS, ENABLED from v$thread;

   THREAD# STATUS     ENABLED
---------- ---------- ------------------------
         1 OPEN       PUBLIC

SQL> select * from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES    MEMBERS ARCHIVED  STATUS     FIRST_CHANGE# FIRST_TIME
---------- ---------- ---------- ---------- ---------- --------- ---------- ------------- -------------------
         1          1         19   52428800          2 YES       INACTIVE       189209640 2013/08/26 22:00:54
         2          1         20   52428800          2 NO        CURRENT        189295930 2013/08/27 02:00:13

SQL>

删除多余的UNDO表空间:

SQL> select tablespace_name,contents from dba_tablespaces where contents='UNDO';

TABLESPACE_NAME                                                                            CONTENTS
------------------------------------------------------------------------------------------ ---------------------------
UNDOTBS1                                                                                   UNDO
UNDOTBS2                                                                                   UNDO

SQL> show parameter undo

NAME                                 TYPE                              VALUE
------------------------------------ --------------------------------- ------------------------------
undo_management                      string                            AUTO
undo_retention                       integer                           86400
undo_tablespace                      string                            UNDOTBS1
SQL> drop tablespace undotbs2 including contents and datafiles;

Tablespace dropped.

SQL> select tablespace_name,contents from dba_tablespaces where contents='UNDO';

TABLESPACE_NAME                                                                            CONTENTS
------------------------------------------------------------------------------------------ ---------------------------
UNDOTBS1                                                                                   UNDO

SQL> show parameter undo

NAME                                 TYPE                              VALUE
------------------------------------ --------------------------------- ------------------------------
undo_management                      string                            AUTO
undo_retention                       integer                           86400
undo_tablespace                      string                            UNDOTBS1
SQL>

临时表空间的处理,由于在第11步骤中,RESETLOGS方式启库的时候,从告警日志文件中看到下述信息,数据库自动重新创建临时文件,并且从数据库中可以验证到,故不需重新创建临时表空间:

Mon Aug 26 17:16:42 CST 2013
File 201 not verified due to error ORA-01157
Mon Aug 26 17:16:42 CST 2013
Dictionary check complete
Mon Aug 26 17:16:42 CST 2013
SMON: enabling tx recovery
Mon Aug 26 17:16:43 CST 2013
Re-creating tempfile +DATA/zldb/tempfile/temp.262.793214975 as /u01/app/oradata/zldb/ZLDB/datafile/o1_mf_temp_91p77vly_.tmp
Database Characterset is ZHS16GBK
Opening with internal Resource Manager plan
SQL> select name from v$tempfile;

NAME
------------------------------------------------------------
/u01/app/oradata/zldb/ZLDB/datafile/o1_mf_temp_91p77vly_.tmp

SQL>

小结:

1 数据库上的各种操作要谨慎,尤其尤其是生产库。如果不是项目组的那个误操作,也不会促成这篇文字;当然,插曲1中提到的那个listener参数被误修改就是另外一种误修改;

2 数据库的备份很重要,如果该库没做备份的话,估计恢复数据就是难上加难的事儿了;

声讨淘宝上及其他各类非法销售本人Oracle视频作品的卖家和网站

被侵权了,被盗版了,虽然这已经不是一天两天的事儿了,并且还在继续…

同时,深知这是一个相当不重视知识产权,不够尊重个人劳动价值的国度,反正就是各种不靠谱,不多说。

并且,也明白,码出的这些文字几乎是瞬间就能被淹没在网络中,消失的无声无息。但,还是忍不住声讨那些在淘宝上在未经本人任何许可的情况下销售本人的Oracle数据库教学视频的卖家。在此,我很无奈的被问候你们家的某些人!

声明:

1 截止目前我所有的oracle视频均为原创,未经本人许可,不得以任何形式在互联网上兜售、获得商业利益的行为;

2 截止目前我所有的oracle视频唯一指定的销售平台只有播布客;

3 其他一切渠道,包括淘宝、美和在线等等等各类三教九流之辈均为盗版。是的,你们侵权了,侵了我的知识产权。

附:给出部分非法淘宝的黑心卖家列表及其他非法销售本人Oracle视频的卖家链接地址,这篇文章我会不定期的更新。你们这帮混蛋行为的链接地址在这里呢:

坏蛋 01:http://item.taobao.com/item.htm?spm=a230r.1.14.1.iaFvu9&id=19972027533&_u=v1fov9jd9105

坏蛋 02:http://item.taobao.com/item.htm?spm=a230r.1.14.10.iaFvu9&id=19225714645&_u=v1fov9jd3d58

坏蛋 03:http://item.taobao.com/item.htm?spm=a230r.1.14.18.iaFvu9&id=17768833172&_u=v1fov9jdb5fa

坏蛋 04:http://item.taobao.com/item.htm?spm=a230r.1.14.28.PQJZdM&id=25385320741&_u=11fov9jd0cf7

坏蛋 05:http://item.taobao.com/item.htm?spm=a230r.1.14.18.PQJZdM&id=17768833172&_u=11fov9jd2efd

坏蛋 太多了,点击这里可以戳出一坨一坨的坏蛋,这也有一坨坏蛋呢!

你们这帮混蛋,下面给出你们的不要脸的脸面贴图:

淘宝坏蛋1淘宝坏蛋2

这篇短文字写于2013年7月31日23点43分,于给上海一客户恢复数据的空当时间内。

最后,大家看到这篇文字的时候,如果有更好的办法可以制止这帮坏蛋的行为或者建议,请回复这篇文章,多谢!当然,如果你们当中有需要这些Oracle视频资料的话,更是可以直接联系我。你们的支持,是我前进的动力。

不要轻易修改你的主机名

    这篇文章不是标题党,是在实际工作中真切的案例。

    场景:这是一套Windows Server 2008 R2 X64的系统,跑了一套10.2.0.5.0的oracle物理备库,运行一切正常。在客户的要求下,需要调整该服务器的机器名。

    步骤:整理好调整的思路后,开始执行操作[包括停备库,ASM实例,修改hosts文件、tnsnames.ora文件等],在客户IT人员修改完机器名并重启服务器之后,发现悲剧的一幕,机器无法正常启动,不过客户端倒是可以ping通服务器,但是无法通过远程桌面连接。

    怎么办呢?经过分析和定位,感觉极有可能出问题的地方就是OracleCSService这个服务,而且该服务的启动类型是自动启动。也就是说该服务项会加载到windows系统的启动项里,随着操作系统的启动而启动,而该服务又是hard-coded,应该是同机器名进行“捆绑”的,由于修改了机器名,导致OracleCSService服务项不能正常启动,进而导致操作系统无法正常启动。

    找到解决问题的思路之后,可以尝试重启服务器,进入安全模式,禁用该服务,然后重启机器,结果该机器已经无法再次进入安全模式,之前进去过,原因未知,客户IT硬件人员操作。

    于是,一边尝试可以进入安全模式的方法,一边估计下下策的重装Windows系统,重建Dataguard的方案。结果,更为不可思议的是,服务器特么自己能够正常启动了,大家什么都没操作。接下来,就登录上去,果断重建了OracleCSService服务:

    删除该服务:

Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\Administrator>C:\oracle\product\10.2.0\db_1\BIN\localconfig.bat
usage:  crssetup 
                  config   - configure and startup the cluster on nodes
                  add      - add specified nodes to the cluster
                  del      - delete the specified nodes from the cluster
                  deconfig - wipe out all cluster configuration information
                  ldel     - local css delete from oracle home
                  lres     - local css home reset to new oracle home
                  ladd     - local css add to oracle home
                  shutdown - shutdown the selected nodes
                  upgrade  - upgrade the specified nodes
                  help     - print out this information

C:\Users\Administrator>C:\oracle\product\10.2.0\db_1\BIN\localconfig.bat deconfig
GetConfiguredClusterNodes:  failed to initialize subsystem, rc(21)
failed to determine remaining nodes in the cluster
failed during critical configuration information
  please supply <-force> option to continue

C:\Users\Administrator>C:\oracle\product\10.2.0\db_1\BIN\localconfig.bat deconfig -force
GetConfiguredClusterNodes:  failed to initialize subsystem, rc(21)
failed to determine remaining nodes in the cluster
failed during critical configuration information
  <-force> option specified, continuing
Step 1:  shutting down node apps
failed executing check for CRS resources
  [ 2 ] The system cannot find the file specified.
failed executing check for CRS resources

failure determining CRS resources state, continuing due to FORCE option
  DEBRESTDDB            Removing node apps
PRKC-1056 : Failed to get the hostname for node DEBRESTDDB
PRKH-1010 : Unable to communicate with CRS services.
  [Communications Error(Native: prsr_initCLSS:[3])]
  DEBRESTDDB            Removing ONS configuration
failed to remove ONS configuration
  [ 2 ] The system cannot find the file specified.
  DEBRESTDDB            failed to execute removal of ONS configuration
failuring during delete of node apps, continuing
Step 2:  shutting down local CRS stack
  DEBRESTDDB            failed to located service OracleEVMService, err(1060)
failed to stop CRS stack on all nodes to be removed, continuing
Step 3:  removing CRS stack from requested nodes
Step 4:  stopping extra CRS services
Step 5:  cleanup up registry keys
Step 6:  perform cleanup of the OCR repository C:\oracle\product\10.2.0\db_1\cdata\localhost\local.ocr
successful deconfiguration of the cluster

C:\Users\Administrator>

 

    重建该服务:

C:\Users\Administrator>C:\oracle\product\10.2.0\db_1\BIN\localconfig.bat add
Step 1:  creating new OCR repository
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'administrator', privgrp ''..
Operation successful.
Step 2:  creating new CSS service
successfully created local CSS service
successfully added CSS to home

C:\Users\Administrator>

 

    最后,启动ASM实例,启动物理备库,打开同主库的同步,完成同步。

    值得记住的地方:

    ① 不要轻易修改机器名,除非必要。修改之前,一定一定要理清楚checklist,不可像本例中遗漏了OracleCSService服务项的重建;

    ② 对于生产环境的各种操作,真的要三思而后行;

    ③ 写这篇记录小文的时候,发现Metalink上有该案例的详细操作说明哇:How to change the Hostname when Oracle 10G and ASM are used [ID 422729.1]