监听日志过大导致数据库连接变慢

收到公司一项目组邮件,说某开发环境的数据库连接不稳定,时快时慢,但是一旦连上数据库,整体的操作并不慢,想让我帮忙看看。

环境:Windows 2K8 R2 64位操作系统+Oracle 11gR2 64 bit单实例数据库 。

现象:在数据库服务器上,tnsping本地机器上的Oracle网络服务名,果然时快时慢。快时延迟只有10毫秒或者是0毫秒,而慢时延迟能达到10000多毫秒。这个显然不能接受。

思路:

1 第1反映可能是Oracle网络配置问题;
2 数据库服务器或者监听进程过于繁忙导致的;

处理:

经过排查和分析,由于是开发环境,各种不知名的原因,导致该Windows服务器上安装了1套10g的数据库软件[未建库],1套文中提到的11gR2单实例数据库,10g和11g版本的环境分别创建了各自的监听程序,10g的在默认1521端口,11g的在1522端口,并且2个监听程序的名字都是listener。问题更为蹊跷的是,如果tnsping 10g软件下的网络服务名很正常,tnsping 11g软件下的网络服务名才会出现时快时慢的现象。

于是乎,停掉10g环境下的那个占用1521端口的监听,重新配置11g的监听使之驻留在1521端口上。此时问题依旧:

C:\Users\Administrator>tnsping orcl

TNS Ping Utility for 64-bit Windows: Version 11.2.0.1.0 - Production on 24-6月 -2014 16:49:20

Copyright (c) 1997, 2010, Oracle.  All rights reserved.

已使用的参数文件:
D:\app\Administrator\product\11.2.0\dbhome_1\network\admin\sqlnet.ora


已使用 TNSNAMES 适配器来解析别名
尝试连接 (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 172.168.1.87)(PORT = 1521))) (CONNECT_DATA = (SERVICE_NAME = orcl)))
OK (0 毫秒)

C:\Users\Administrator>tnsping orcl

TNS Ping Utility for 64-bit Windows: Version 11.2.0.1.0 - Production on 24-6月 -2014 16:49:22

Copyright (c) 1997, 2010, Oracle.  All rights reserved.

已使用的参数文件:
D:\app\Administrator\product\11.2.0\dbhome_1\network\admin\sqlnet.ora


已使用 TNSNAMES 适配器来解析别名
尝试连接 (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 172.168.1.87)(PORT = 1521))) (CONNECT_DATA = (SERVICE_NAME = orcl)))
OK (7360 毫秒)

C:\Users\Administrator>

说明问题应该不至于出现在网络配置上面。

同时,该库是一个开发用的数据库,更谈不上繁忙之说。

那么,问题究竟应该出现在那儿呢?既然,在服务器上tnsping本地机器上的网络服务名不稳定,且SQL*PLUS通过网络服务名来访问本地机器上的Oracle服务肯定也会慢,那么就发起一个本地的网络服务连接,然后看看监听的日志信息。果不其然,发现了一点儿端倪,监听程序的日志文件过大:

listener_log_4GB

listener_log_4GB

原来,造成问题的罪魁就是这个过大的日志文件[Windows下Oracle的监听日志文件超过4G大小,便会出现该问题]!解决的方法就很简单,停止监听程序,删除监听日志文件,重启监听,问题解决!

11gR2 OCP实验1:搭建数据库服务器OEL5.5 X86_64

一 实验目的

本实验是整个系列教程的前期准备,即部署Oracle数据库服务器。这里通过虚拟机技术来创建一台Linux服务器,后续的章节中我们将在该数据库服务器上安装Oracle 11gR2数据库软件,并创建数据库。

Note

1 操作系统选择Oracle公司的Oracle Enterprise Linux5.5 X86_64,非必须,也可以选择其它版本Linux,如Redhat,建议选择64位系统,但不建议选择Windows平台。

2 虚拟机平台选择VMware公司的VMware ESXi Version 5.1的虚拟主机,条件不允许的话,也可以选择安装VMware Server作为虚拟机软件。

二 软件获取

三 实验步骤

这里,我们需要完成在VMware ESXi Version 4.0的虚拟主机上创建1台虚拟机的工作,其中虚拟机的基本配置信息如下表[这里仅给出建议值,非必须,具体视实际硬件资源而定]

Item Value Mark
OS

OEL 5.5 X86_64

 
Kernel

2.6.18-194.el5 X86_64

 
Hostname OCP11gR2  
Domain oracleonlinux.cn  
IP

172.16.0.224/20

掩码为255.255.240.0
Gateway

172.16.15.254

 
Hard disk 1 30G  
Hard disk 2 8G 后续配置ASM磁盘用
Hard disk 3 8G 后续配置ASM磁盘用
CPU 4c  
Memory 2G  
Swap 4G  

                    表1 虚拟机基础配置信息表

1 配置虚拟机,安装OEL 5.5 X86_64Linux操作系统

root用户登录172.16.0.181的虚拟主机: 

 

进入虚拟主机管理界面:

    新建虚拟机,选择CustomNext进入下一步:

    给虚拟机命名为11gR2–OCPNext

选择Datastore位置,这里选择第1个名为snap-62ed4023-BAT的存储设备,Next

虚拟机版本选择version 8Next

操作系统类型选择Linux,版本选择Red Hat Enterprise Linux 5(64-bit)Next

CPU核数选择4Next

内存选择2GNext

网卡选择1块,Next

SCSI驱动器类型选择LSI Logic ParallelNext

选择创建新磁盘,Next

磁盘大小选择30GNext

wps_clip_image-3983

虚拟设备节点选择默认SCSI(0:0)Next

选中在虚拟机创建之前编辑选项,Continue

编辑光驱设备,选择ISO文件,找到之前已经准备就绪的OEL 5.5 X86_64位的ISO文件:

选中光驱在启动时连接选项,最后,单击Finish完成。

至此,一台新的虚拟机配置完成。

 

2 启动11gR2–OCP虚拟机,安装Linux操作系统

11gR2–OCP虚拟机加电,进入图形界面开始安装系统:

 

回车,进入安装:

Skip,跳过光驱检查:

 

Next,开始安装Oracle Enterprise Linux

 

语言选择—>English,键盘选择–>U.S.English美式键盘,硬盘分区选择最后一个选项Create custom layoutNext

 

给硬盘分区,其中SWAP分区选择4000M,剩余空间全部划给根分区/Next

 

    网络配置,参照表1 虚拟机基础配置信息表,分别配置eth0网卡的IP、子网掩码、网关、主机名,Next

 

时区选择,亚洲/上海,Next,配置root用户口令,Next

 

选择定制软软包,Next

 

Note:
    Desktop Environments:
               GNOME Desktop Environment
    Applications: 
               Editors
    Development:
               Development Libraries
               Development Tools
               GNOME Software Development
               Legacy Software Development
               X software Development
    Base System :
               Administration Tools
               Base
               Legacy Software Support
               System Tools
               X Window System

其它的软件包,可以忽略,暂时不装,一来加快操作系统的安装进度,二来其余的软件包对于配置Oracle 11gR2 数据库环境而言不需要,如果在将来的安装过程中,系统提示缺失软件包的时候,我们也可以手工从安装光盘中自行安装。Next

进入格式化文件系统,开始安装软件包,直到最后:

 

根据提示,选择reboot系统。

系统重启之后,Forward,选择关闭FirewallNext

 

禁用SELinuxNext

 

直到最后一步,按照提示,再次重新启动系统,使刚才的配置生效。

重启之后,键入root用户名和对应的口令,登录进入图形界面:

 

到此,通过虚拟机技术已经成功创建1OEL 5.5 X86_64位系统的数据库服务器。

 

四 实验结果

   通过完成本实验,您应该掌握通过虚拟机技术来创建1Linux虚拟机,以便为后续的实验课程做好铺垫!

Oracle 11g OCP Practices and Solutions

一 概述和说明

概述

本实验手册《Oracle 11gR2 OCP Practices and Solutions》内容涵盖Oracle 官方11gR2 OCP培训[Oracle Database 11g Administration Workshop IOracle Database 11g Administration Workshop II]课程所有动手实践内容,用于配合学习官方培训教材,辅助学员备考Oracle 11gR2 OCP,掌握Oracle DBA相关知识,以期祝您顺利通过11g OCP考试,早日进军Oracle DBA高薪金领行业!

读者对象

本系列实验内容涵盖Oracle 官方11gR2 OCP培训内容,因此极为适合需要备考Oracle 11g OCP的朋友们,绝不局限于应试11g OCP考试,所以也可用于一线DBA速查手册,提高您的实际动手能力,告别’Paper’ DBA的尴尬窘境!

在使用和学习的过程中,如果您对Unix/Linux系统有基本的使用经验,外加对关系型数据库有基本认识,是最好不过的了。当然,如果您犹豫的是,这些基本条件您都没有,甚至是一个电脑菜鸟,从未写过程序,未听说过gateway,那么我想,这些都不应该是阻碍您学习Oracle的因素,因为本系列实验手册是傻瓜式的step by step从基础到深入的过程,您需要做的就是照着实验手册,认真独立的完成实验并享受学习Oracle知识的快乐就好!

致谢

首先,最应该感谢的是广大读者和学员朋友们对于我一如既往的的信任与厚爱!如果,您有幸看到这些文字的话(甭管我们是否相识,也甭管您来自哪里?),没错,您就是我最应该感谢的读者和朋友!当然,感谢之外,更是希望通过本人的拙识可以助您顺利通过OCP考试,抑或能在您的实际工作当中派上用场。那样的话,我就心满意足加心安理得!

其次,应该感谢我媳妇儿,是她在我日复一日的精心准备和撰写本系列实验的过程里,主动承担家里家外的的大小杂务,使我可以全身心的投入到制作实验手册的工作中去,并可以享受这整个过程!

最后,臭不要脸的轻声地感谢一下自己,感谢自己可以长期持续的专注于这件看似渺小但实际上有可能给广大学员、网友们带去利好的事情上,并保持着乐观高效的工作状态!

 

版权声明

本系列实验手册中涉及所有文字、图片、视频、音频、代码皆为本人原创内容,在此本人郑重申明,我享有对这些内容的拥有、复制和使用权,未经本人授权,不得以任何形式的传播、转载、复制和使用,如需,请注明内容来源,并经本人许可的前提下自由使用。

    严禁用于各种商业用途,如转载后在淘宝等各网站进行商业性的兜售买卖。具体可见,本人之前的一篇《声讨淘宝上及其他各类非法销售本人Oracle视频作品的卖家和网站》文章!

致淘宝卖家及买家的两封信

致其他各家“代售”本人视频淘宝店主的一封感谢信!

各位亲:

你们知道吗?

本系列教程中涉及所有文字、图片、视频、音频、代码、脚本皆为本人原创内容~

不过,本人偷偷小声的告诉你们,这些内容的拥有、复制和使用权,归我,也归你们各位亲啦~

因为,正是有了你们的大力推广和销售,才能得以顺利高效的将本人所学的一丁点儿Oracle数据库知识可以精准的传递给更多更多需要学习Oracle数据库知识的网友们撒~

同时,请允许我替那些已经或者将要通过本系列视频收获Oracle数据库知识网友们,向你们各位日夜坚守在电脑前的各位亲们,说声:“谢谢!”。

各 位亲,本人欢迎你们继续对本系列视频教程使用各种无上限无底限想得到的用的上的用的着的派的上用场的方式去推广和持续捧场,这样一来捏,你们各位亲或许能 挣点儿豆浆油条稀饭钱儿,二来捏更为重要的是,你们可以继续将我的知识传递给更多更多各种迫切需要学习奋斗充电、为求知为求职、为加薪为跳槽的各种日夜奋 战在IT一线的同行们!!!

好了,不跟你们诸位亲多说了,bye…..

 

致各位淘宝买家的一封信!

更亲的各位亲:

你们还不知道吧?

虽然上述的那些亲们曾经不止一次再来一次的打击我幼小的心灵,但是,我不依然活的潇洒过的自在?懒得跟那些“臭”亲们计较,哼…

而且呢,我也知道之前有诸多诸多的你们是从他们那些“臭”亲那里拍下的视频宝贝,然后通过视频内容找到我的网站[http://www.oracleonlinux.cn/]、我的QQ[155166225]、我的Rock Orcle QQ群[280889316]、我的邮箱[3dian14@gmail.com],然后各种视频疑问和问题QQ我,各种脚本需求邮件我,试问,我曾经拒绝过吗?曾经某一段时间,我每天需要回复你们的邮件就有几十封,QQ上提问的就更不必说了,苍天大地啊,偶也Hold不住呀…

    当然呢,更应该感谢的就是你们各位更亲的亲啦,有了你们一如既往的支持和鼓励,才能激发我创作更多更高质量的视频。

    好了,也不跟你们各位亲多说啦,你们要记得哦,近期本店将隆重推出多套Oracle数据库相关视频作品,包括Oracle 11gR2 OCP认证系列Oracle 10g RAC 升级至11gRACOracle性能优化等等系列;还有哦,这些视频将是我扛鼎制作,本店是全网唯一渠道,其他家没有滴哦~

    点击可查看原文链接,或者这个链接进入淘宝店铺

11gR2修改主机名导致CRS-0184以及CRS-4000的错误

crs_stat_error

我们通常都说,在部署Oracle数据库服务器前,要规划好主机名、IP地址等基础信息,一经确定,尤其是部署好Oracle数据库之后,就不要轻易修改主机名或者是IP地址等。而且我们也都知道这个理儿,但可是,可但是,在实际工作当中,你难免不会遇到这种情况。

除了自己前不久曾遭遇到的一则,在Windows Server 2008 R2 X64上跑的一套10.2.0.5.0配置有ASM单实例物理备库因修改主机名的故障。而今又遇到一则类似的故障场景:

一 操作系统版本:CentOS release 5.6 (Final) X86_64平台

二 Oracle版本:Oracle 11gR2 11.2.0.1.0 64位配置ASM的单实例数据库

三 故障现象:修改主机名并重启主机后,导致数据库不可用,准确的说是grid软件的CRS进程都不能启动。如下:

1 检查服务状态:

[shell]
wxwl_iop-> crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

wxwl_iop-> [/shell]

2 发现ohasd守护进程都不存在:

[shell]
wxwl_iop-> ps -ef|grep has
root 4869 1 0 Nov14 ? 00:00:02 /bin/sh /etc/init.d/init.ohasd run
grid 27544 26907 0 09:58 pts/2 00:00:00 grep has
wxwl_iop->[/shell]

3 检查Oracle restart是否配置以及尝试手工启动Oracle Restart均报错:

[shell]
wxwl_iop-> crsctl config has
CRS-4014: Oracle High Availability Services autostart was not defined.
Failure at scls_scr_create with code 1
Internal Error Information:
Category: -1
Operation: has_ha_privs
Location: scrcreate5
Other: need ha priv
System Dependent Information: 0

CRS-4000: Command Config failed, or completed with errors.
wxwl_iop-> crsctl start has
Failure at scls_scr_create with code 1
Internal Error Information:
Category: -1
Operation: has_ha_privs
Location: scrcreate5
Other: need ha priv
System Dependent Information: 0

CRS-4000: Command Start failed, or completed with errors.
wxwl_iop-> [/shell]

4 在/etc/oracle/scls_scr路径下,看到只有之前的旧主机名下的Oracle Restart的配置信息,并没有关于新主机名的Oracle Restart的配置信息

[shell]
[root@wxwl_iop scls_scr]# ll
total 4
drwxr-xr-x 4 root oinstall 4096 Nov 7 16:42 localhost
[root@wxwl_iop scls_scr]# pwd
/etc/oracle/scls_scr
[/shell]

四 故障原因:人为修改了数据库服务器主机名,并重启主机,导致Oracle Restart启动报错,进而ASM实例,监听、磁盘组以及数据库实例均无法正常启动。

五 解决步骤:

1 root用户强制删除CRS相关的配置信息:

[shell]
[root@wxwl_iop scls_scr]# /u01/app/11.2.0/grid/crs/install/rootcrs.pl -deconfig -force
2013-11-15 10:00:51: Parsing the host name
2013-11-15 10:00:51: Checking for super user privileges
2013-11-15 10:00:51: User has super user privileges
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
Usage: srvctl <command> <object> [<options>]
commands: enable|disable|start|stop|status|add|remove|modify|getenv|setenv|unsetenv|config
objects: database|service|asm|diskgroup|listener|home|ons|eons
For detailed help on each command and object and its options use:
srvctl <command> -h or
srvctl <command> <object> -h
PRKO-2012 : nodeapps object is not supported in Oracle Restart
ADVM/ACFS is not supported on centos-release-5-6.el5.centos.1

ACFS-9201: Not Supported
CRS-4013: This command is not supported in a single-node configuration.
CRS-4000: Command Stop failed, or completed with errors.
You must kill crs processes or reboot the system to properly
cleanup the processes started by Oracle clusterware
Use of uninitialized value in chdir at /usr/lib/perl5/5.8.8/File/Find.pm line 751.
Use of chdir(”) or chdir(undef) as chdir() is deprecated at /usr/lib/perl5/5.8.8/File/Find.pm line 751.
error: package cvuqdisk is not installed
Successfully deconfigured Oracle clusterware stack on this node
[root@wxwl_iop scls_scr]# [/shell]

2 root用户重新配置Oracle Restart:

[shell]
[root@wxwl_iop scls_scr]# /u01/app/11.2.0/grid/crs/install/roothas.pl
2013-11-15 10:02:36: Checking for super user privileges
2013-11-15 10:02:36: User has super user privileges
2013-11-15 10:02:36: Parsing the host name
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
LOCAL ADD MODE
Creating OCR keys for user ‘grid’, privgrp ‘oinstall’..
Operation successful.
CRS-4664: Node wxwl_iop successfully pinned.
Adding daemon to inittab
CRS-4123: Oracle High Availability Services has been started.
ohasd is starting
ADVM/ACFS is not supported on centos-release-5-6.el5.centos.1

wxwl_iop 2013/11/15 10:02:49 /u01/app/11.2.0/grid/cdata/wxwl_iop/backup_20131115_100249.olr
Successfully configured Oracle Grid Infrastructure for a Standalone Server
[root@wxwl_iop scls_scr]#[/shell]

重新配置Oracle Restart之后,在/etc/oracle/scls_scr路径下,看到之前的旧主机名下的Oracle Restart的配置信息自动消除,而出现新的关于新主机名的Oracle Restart的配置信息。

[shell]
[root@wxwl_iop scls_scr]# ll
total 4
drwxr-xr-x 4 root oinstall 4096 Nov 15 10:02 wxwl_iop
[root@wxwl_iop scls_scr]#
[/shell]

且此时,grid用户可顺利通过执行 crs_stat -t查看服务状态:

[shell]
wxwl_iop-> crs_stat -t
Name Type Target State Host
————————————————————
ora.cssd ora.cssd.type OFFLINE OFFLINE
ora.diskmon ora….on.type OFFLINE OFFLINE
wxwl_iop-> [/shell]

3 通过srvctl命令手工添加Listener、ASM、oracle instance到Oracle Restart管理

grid用户添加Listener、ASM服务到Oracle Restart

[shell]
wxwl_iop-> srvctl add listener
wxwl_iop-> crs_stat -t
Name Type Target State Host
————————————————————
ora….ER.lsnr ora….er.type OFFLINE OFFLINE
ora.cssd ora.cssd.type OFFLINE OFFLINE
ora.diskmon ora….on.type OFFLINE OFFLINE
wxwl_iop-> srvctl add asm -l listener -d data,fra
wxwl_iop-> crs_stat -t
Name Type Target State Host
————————————————————
ora….ER.lsnr ora….er.type OFFLINE OFFLINE
ora.asm ora.asm.type OFFLINE OFFLINE
ora.cssd ora.cssd.type OFFLINE OFFLINE
ora.diskmon ora….on.type OFFLINE OFFLINE
wxwl_iop-> [/shell]

oracle用户添加数据库到Oracle Restart
grid 用户添加出错:

[shell]
wxwl_iop-> srvctl add database -d iopdb -o /u01/app/oracle/product/11.2.0/db_1
PRCD-1025 : Failed to create database iopdb
PRKH-1014 : Current user grid is not the same as oracle owner oracle of oracle home /u01/app/oracle/product/11.2.0/db_1.
wxwl_iop->
[/shell]

需要以oracle用户添加:

[shell]
wxwl_iop-> srvctl add database -d normaldb -o /u01/app/oracle/product/11.2.0/db_1 -n normaldb -a ‘data,fra’
wxwl_iop->
[/shell]

最后,对于配置了Oracle Restart环境下的数据库,分别以grid用户通过srvctl命令来启动listener、ASM实例,以oracle用户通过srvctl命令来启动数据库。

启示:对于已经配置好Oracle数据库的服务器,真的不要轻易修改主机名或者IP地址,尤其是RAC环境或者是11gR2开始的配置有Oracle Restart的standalone Server环境。当然,在安装和配置数据库之前,务必要做好充足的前期规划。当然,如果非要修改的话,也要配合DBA一起来重新配置。

附:关于文章提及为什么要在/etc/oracle/scls_scr路径下,查看是否存在关于主机名相关的Oracle Restart配置信息,将在下一篇中分享O(∩_∩)O~

Grid软件安装过程中orainstRoot.sh和root.sh脚本到底做了什么?

在Linux平台上,从Oracle 11gR2开始,不论是单实例的库,还是RAC库,如果需要用到ASM时,通常都需要单独安装Grid Infrastructure软件。而在Linux平台上以图形界面【OUI Oracle Universal Installer】安装Oracle 11gR2 Grid软件过程中,会提示以root用户执行orainstRoot.sh和root.sh脚本【如下图一:单实例安装,图二:RAC安装】。

Linux平台安装单实例时,grid软件需要执行的脚本

图一:Linux平台安装单实例时,grid软件需要执行的脚本

Linux平台安装RAC时,grid软件需要执行的脚本

图二:Linux平台安装RAC时,grid软件需要执行的脚本

那么这两个脚本分别做了什么,有什么作用呢?本文简单记录:

一 /u01/app/oraInventory/orainstRoot.sh脚本的作用

1 添加grid用户及oinstall组对Oracle Central Inventory 目录的读写权限;

Oracle Central Inventory 目录默认情况下,位于$ORACLE_BASE的上一级目录下,如下图三:

Oracle Central Inventory

图三:Oracle Central Inventory的路径

2 删除其它用户组对该目录的读、写、执行权限;

[shell]11gOCM-> pwd
/u01/app
11gOCM> ls
total 20
drwxrwxr-x. 3 grid oinstall 4096 Oct 28 10:38 11.2.0
drwxr-xr-x 2 grid oinstall 4096 Oct 30 10:20 acfs
drwxrwxr-x. 8 grid oinstall 4096 Oct 28 10:38 grid
drwxrwxr-x. 8 oracle oinstall 4096 Nov 5 10:57 oracle
drwxrwx—. 6 grid oinstall 4096 Oct 10 11:09 oraInventory
11gOCM->[/shell]

3 创建/etc/oraInst.loc文件。

11gOCM->
cat /etc/oraInst.loc
inventory_loc=/u01/app/oraInventory
inst_group=oinstall
11gOCM->

二 /u01/app/11.2.0/grid/root.sh脚本的作用

1 cp grid用户的$ORACLE_HOME/bin下的oraenv、dbhome、coraenv可执行程序到/usr/local/bin路径下;

11gOCM-> pwd
/u01/app/11.2.0/grid/bin
11gOCM-> ll oraenv dbhome coraenv 
-rwxr-xr-x. 1 grid oinstall 5778 Jan  1  2000 coraenv
-rwxr-xr-x. 1 grid oinstall 2415 Jan  1  2000 dbhome
-rwxr-xr-x. 1 grid oinstall 6183 Jan  1  2000 oraenv
11gOCM-> ll /usr/local/bin/
total 232
-rwxr-xr-x. 1 grid root   5778 Oct 10 10:54 coraenv
-rwxr-xr-x. 1 grid root   2415 Oct 10 10:54 dbhome
-rwxr-xr-x. 1 grid root   6183 Oct 10 10:54 oraenv
-rwxr-xr-x  1 root root 214001 Oct 10 16:39 rlwrap
11gOCM->

2 创建/etc/oratab文件;

 
11gOCM-> cat /etc/oratab 
#Backup file is  /u01/app/11.2.0/grid/srvm/admin/oratab.bak.11gocm line added by Agent
#

# This file is used by ORACLE utilities.  It is created by root.sh
# and updated by either Database Configuration Assistant while creating
# a database or ASM Configuration Assistant while creating ASM instance.

# A colon, ':', is used as the field terminator.  A new line terminates
# the entry.  Lines beginning with a pound sign, '#', are comments.
#
# Entries are of the form:
#   $ORACLE_SID:$ORACLE_HOME:<N|Y>:
#
# The first and second fields are the system identifier and home
# directory of the database respectively.  The third filed indicates
# to the dbstart utility that the database should , "Y", or should not,
# "N", be brought up at system boot time.
#
# Multiple entries with the same $ORACLE_SID are not allowed.
#
#
+ASM:/u01/app/11.2.0/grid:N
orcl:/u01/app/oracle/product/11.2.0/db_1:N              # line added by Agent
11gOCM->

3 创建grid用户的OCR keys文件;

4 启动ohasd守护进程,并且将ohasd守护进程的启动文件写入到/etc/inittab文件中,以便于ohasd守护进程随着操作系统的启动而启动。如果是Oracle Enterprise Linux 6及以上版本的操作系统的话,则将该配置信息写入到/etc/init/oracle-ohasd.conf的独立配置文件中。

11gOCM-> cat /etc/init/oracle-ohasd.conf 
# Copyright (c) 2001, 2011, Oracle and/or its affiliates. All rights reserved. 
#
# Oracle OHASD startup

start on runlevel [35]
stop  on runlevel [!35]
respawn
exec /etc/init.d/init.ohasd run >/dev/null 2>&1

这是Oracle Enterprise Linux 6及以上版本的操作系统上的变化,在以前版本的操作系统下,由/etc/inittab文件来控制的条目均转移到/etc/init下单个文件来控制。

Oracle Grid Infrastructure包含哪些组件及其新特性?

Oracle 从11gR2开始,推出Grid Infrastructure[我个人习惯称之为Grid软件],那么Grid Infrastructure究竟有哪些优势,又包含哪些组件呢

首先,Grid Infrastructure包含下述组件

ASM:Automatic Storage Management;

ACFS:ASM Cluster File System;

ACFS snapshot;

Oracle Clusterware;

Oracle Restart。

其次,Grid Infrastructure 又有什么新特性呢?

探其优势,简单总结,从其包含的组件可知,如果在RAC环境下,无需单独安装clusterware软件,只需安装grid即可。如果在单实例环境下,数据库的存储如果想用ASM的话,那么就必须得先安装grid软件,并创建ASM磁盘组之后,方可安装oracle软件,在创建数据库的时候,存储才可以选择放在ASM磁盘组上。

另:对于非RAC架构下的Standalone 数据库,其Oracle Restart可以接管包括数据库服务、数据库实例、ASM实例、ASM磁盘组、数据库监听等组件的故障处理。举例来说,假设数据库在运行过程中,如果监听进程意外终止、或者数据库意外crash掉,那么Oracle Restart将接管并重启这些服务。

最后,11gR2的ASM同之前的ASM相比,又有什么增强呢?

如下图:

grid_asm

从上,看到在11gR2之前[包含11gR1及10g],ASM位于操作系统层面之上,ASM为数据库访问存储设备提供驱动,然后为上层的数据库提供存储,最后数据库供最上层的应用系统访问。

而从11gR2开始,看到ASM依然位于OS层面之上,和ASM同层的多出来个ADVM[ASM Dynamic Volume Manager]。往上,位于同层的除了database之外,还有ACFS[ASM Cluster File System]和ASM为第三方文件系统提供支持。

补充,简单描述ASM disk,ASM diskgroup,ADVM,ACFS的关系?

一块儿物理硬盘或物理硬盘的一个分区或存储设备上的LUN或一个LVM均可以被创建成一块ASM disk,一块或多块儿ASM磁盘在逻辑上可以被用来创建成一个ASM磁盘组。

有了ASM diskgroup之后,可以通过图形界面工具[ASMCA、OEM]或者命令行工具[ASMCMD、SQL*PLUS]在磁盘组上创建出一个或多个ASM Dynamic Volume 。ADVM上的卷设备文件对于操作系统而言被作为块设备来识别,其位于/dev/asm/<volume name>-nnn。

使用mkfs命令就可以在ADVM上创建ACFS类型的文件系统了[命令如:mkfs -t acfs -n advm-volume-name /dev/asm/<volume name>-nnn],最后类似于在操作系统上挂载其它设备一样,可以将/dev/asm/<volume name>-nnn直接挂在到操作系统上使用了。

索引扫描路径之3:Index Skip Scan

Index_Skip_Scan

在前面的两篇中,分别描述了Index Unique ScanIndex Range Scan,今天来描述Index Skip Scan。

 所谓的Index Skip Scan就是指,查询的WHERE条件中“SKIP”【跳过,没带上】了复合索引的前导列,只是指定了复合索引的其它列,这样CBO选择Index Skip Scan的路径来访问数据的方式。

如果复合索引的前导列唯一值越少(选择性低),非前导列索引唯一值越多(选择性高),则Index Skip Scan的性能越优。

事实上,在处理的过程中,Oracle将该复合索引上分为若干个logical subindexes(逻辑子索引),逻辑子索引的具体个数取决于前导列的唯一值值个数,即前导列有几个重复值就分为几个逻辑子索引。

Index Skip Scan适用的场景:

1 Index一定是一个复合索引(索引类型可以是UNIQUE的唯一索引,也可以是NONUIQUE的非唯一索引);
2 且复合索引的前导列重复值很多(即唯一值少),非前导列的重复值很少(唯一值多);
3 WHERE限制条件中,并没有带上该符合索引的前导列,仅仅是使用非前导列来过滤数据;

测试与验证:

1 创建测试表:

SQL> conn hr/hr
Connected.
SQL> create table skip_t as select object_id,object_name,object_type from all_objects;

Table created.

SQL> select count(*) from skip_t;

  COUNT(*)
----------
     55683

SQL> select count(distinct object_type) from skip_t;

COUNT(DISTINCTOBJECT_TYPE)
--------------------------
                        25

SQL> select count(distinct object_id) from skip_t;

COUNT(DISTINCTOBJECT_ID)
------------------------
                   55683

SQL> select count(distinct object_name) from skip_t;

COUNT(DISTINCTOBJECT_NAME)
--------------------------
                     30902

SQL>

从上,看到skip_t表数据情况:有55683条记录,且OBJECT_ID字段均唯一,无重复值。OBJECT_TYPE唯一值只有25个。
2 接下来,在OBJECT_TYPE和OBJECT_ID字段上创建复合索引,并收集统计信息:

SQL> create index idx_skip_t on skip_t(object_type,object_id);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'skip_t',cascade=>true);

PL/SQL procedure successfully completed.

SQL>

3 “Skip”掉复合索引前导列的情况,即让CBO按照预期的选择Index Skip Scan的路径来访问数据:

SQL> set autotrace trace
SQL> select object_type,object_id from skip_t where object_id=30000;

Execution Plan
----------------------------------------------------------
Plan hash value: 2119292092

-------------------------------------------------------------------------------
| Id  | Operation        | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |            |     1 |    15 |    26   (0)| 00:00:01 |
|*  1 |  INDEX SKIP SCAN | IDX_SKIP_T |     1 |    15 |    26   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OBJECT_ID"=30000)
       filter("OBJECT_ID"=30000)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         14  consistent gets
          0  physical reads
          0  redo size
        611  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> select object_type,object_name from skip_t where object_id=30000;

Execution Plan
----------------------------------------------------------
Plan hash value: 1237951313

------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |     1 |    40 |    27   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| SKIP_T     |     1 |    40 |    27   (0)| 00:00:01 |
|*  2 |   INDEX SKIP SCAN           | IDX_SKIP_T |     1 |       |    26   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=30000)
       filter("OBJECT_ID"=30000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         15  consistent gets
          0  physical reads
          0  redo size
        640  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

4 使用索引前导列的情况:

SQL> select object_type,object_id from skip_t where object_type='TABLE';

136 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3840590361

-------------------------------------------------------------------------------
| Id  | Operation        | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |            |  2227 | 33405 |     9   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| IDX_SKIP_T |  2227 | 33405 |     9   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OBJECT_TYPE"='TABLE')

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         12  consistent gets
          0  physical reads
          0  redo size
       3592  bytes sent via SQL*Net to client
        623  bytes received via SQL*Net from client
         11  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        136  rows processed

SQL> select object_type,object_name from skip_t where object_type='TABLE';

136 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1483414757

------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |  2227 | 80172 |    43   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| SKIP_T     |  2227 | 80172 |    43   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX_SKIP_T |  2227 |       |     9   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_TYPE"='TABLE')

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         40  consistent gets
          0  physical reads
          0  redo size
       6671  bytes sent via SQL*Net to client
        623  bytes received via SQL*Net from client
         11  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        136  rows processed

SQL>

遭遇ORA-17410及ORA-03137:TTC协议内部错误:[12333]

早上,项目组同事发过来一则应用系统连接数据库错误的请求技术支持,具体信息可见下述报错截图:

ora-17410

ora-17410

这则错误信息对于我还真是大姑娘上轿–头一回遇到。下面简单记录下处理错误的思路和方法:

环境说明:一套Linux X86_64位环境的Oracle 11gR2【11.2.0.1.0】的单实例数据库。应用系统通过调用Tomcat的JDBC驱动来访问数据,如下:

OS:

[oracle@oracle11g ~]$ uname -rm
2.6.18-194.el5 x86_64
[oracle@oracle11g ~]$

数据库:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

SQL>

同事说,上周五的时候,通过登录应用系统来访问数据库还一切正常,今天早上登录系统就报上述错误,更加觉得蹊跷。

1 首先,让同事登录应用系统,重现错误。然后查看alert日志文件,看到下述错误信息的线索:

Mon Sep 16 09:43:24 2013
Errors in file /u01/app/oracle/diag/rdbms/glndb/GLNDB/trace/GLNDB_ora_11692.trc  (incident=78959):
ORA-03137: TTC 协议内部错误: [12333] [6] [88] [77] [] [] [] []
Incident details in: /u01/app/oracle/diag/rdbms/glndb/GLNDB/incident/incdir_78959/GLNDB_ora_11692_i78959.trc
Mon Sep 16 09:43:27 2013
Trace dumping is performing id=[cdmp_20130916094327]
Mon Sep 16 09:43:28 2013
Sweep [inc][78959]: completed
Sweep [inc2][78959]: completed

2 根据alert日志提示,继续查看/u01/app/oracle/diag/rdbms/glndb/GLNDB/trace/GLNDB_ora_11692.trc文件,看到:

*** 2013-09-16 09:43:24.912
*** SESSION ID:(410.54431) 2013-09-16 09:43:24.912
*** CLIENT ID:() 2013-09-16 09:43:24.912
*** SERVICE NAME:(SYS$USERS) 2013-09-16 09:43:24.912
*** MODULE NAME:(JDBC Thin Client) 2013-09-16 09:43:24.912
*** ACTION NAME:() 2013-09-16 09:43:24.912

--- PROTOCOL VIOLATION DETECTED ---
...
...
...
  NamespaceDump:  
            Child Cursor:  Heap0=0x9fae5148 Heap6=0xb8bc6b48 Heap0 
Load Time=09-16-2013 09:43:24 Heap6 Load Time=09-16-2013 09:43:24 
  NamespaceDump:  
    Parent Cursor:  sql_id=4x3cjtfs37n4v parent=0xb21328e8 maxchild=5 plk=y ppn=n 
      Current Cursor Sharing Diagnostics Nodes:  
        Child Node: 3  ID=40 reason=Bind mismatch(33) size=2x4 
          init ranges in first pass: 1 
          selectivity: 0 
        Child Node: 2  ID=37 reason=Authorization Check failed(4) size=5x4 
          translation table position: 0 
          original handle: 3055622408 
          temp handle: 2705674440 
          schema: 161 
          synonym object number: 0 
        Child Node: 1  ID=3 reason=Optimizer mismatch(13) size=3x4 
          kxscflg: 32 
          kxscfl4: 4194560 
          dnum_kksfcxe: 1 
      Aged Out Cursor Sharing Diagnostic Nodes:  
        Child Node: 3  ID=37 reason=Authorization Check failed(4) size=5x4 
          translation table position: 0 
          original handle: 2705674440 
          temp handle: 3055622408 
          schema: 160 
          synonym object number: 0 
        Child Node: 2  ID=40 reason=Bind mismatch(25) size=0x0 
          extended cursor sharing:   
        Child Node: 1  ID=3 reason=Optimizer mismatch(13) size=3x4 
          kxscflg: 32 
          kxscfl4: 4194560 
          dnum_kksfcxe: 1 
        Child Node: 0  ID=37 reason=Authorization Check failed(4) size=5x4 
          translation table position: 0 
          original handle: 3055622408 
          temp handle: 2705674440 
          schema: 161 
          synonym object number: 0 
        Child Node: 0  ID=3 reason=Optimizer mismatch(13) size=3x4 
          kxscflg: 32 
          kxscfl4: 4194560 
          dnum_kksfcxe: 0 
        Child Node: 3  ID=37 reason=Authorization Check failed(4) size=5x4 
          translation table position: 0 
          original handle: 3055622408 
          temp handle: 2705674440 
          schema: 161 
          synonym object number: 0 
        Child Node: 2  ID=3 reason=Optimizer mismatch(13) size=3x4 
          kxscflg: 32 
          kxscfl4: 4194560 
          dnum_kksfcxe: 2 
        Child Node: 1  ID=37 reason=Authorization Check failed(4) size=5x4 
          translation table position: 0 
          original handle: 2705674440 
          temp handle: 3055622408 
          schema: 160 
          synonym object number: 0 
        Child Node: 0  ID=3 reason=Optimizer mismatch(13) size=3x4 
          kxscflg: 32 
          kxscfl4: 4194560 
          dnum_kksfcxe: 0 
        Child Node: 1  ID=37 reason=Authorization Check failed(4) size=5x4 
          translation table position: 0 
          original handle: 2705674440 
          temp handle: 3055622408 
          schema: 160 
          synonym object number: 0 
        Child Node: 0  ID=3 reason=Optimizer mismatch(13) size=3x4 
          kxscflg: 32 
          kxscfl4: 4194560 
          dnum_kksfcxe: 0 
        Child Node: 0  ID=3 reason=Optimizer mismatch(13) size=3x4 
          kxscflg: 32 
          kxscfl4: 4194560 
          dnum_kksfcxe: 0 
        Child Node: 0  ID=3 reason=Optimizer mismatch(13) size=3x4 
          kxscflg: 32 
          kxscfl4: 4194560 
          dnum_kksfcxe: 0 
        Child Node: 0  ID=3 reason=Optimizer mismatch(13) size=3x4 
          kxscflg: 32 
          kxscfl4: 4194560 
          dnum_kksfcxe: 0    kkscs=0xb2132df0 nxt=0xb21331f8 flg=11 cld=0 
hd=0xa1a00330 par=0xb21328e8
   Mutex 0xb2132df0(0, 0) idn 3000000000
   ct=19 hsh=0 unp=(nil) unn=0 hvl=b2133798 nhv=0 ses=(nil)
   hep=0xb2132e80 flg=80 ld=0 ob=(nil) ptr=(nil) fex=(nil)
   kkscs=0xb21331f8 nxt=0xb9a94588 flg=1a cld=1 hd=0xaf246848 par=0xb21328e8
   Mutex 0xb21331f8(0, 0) idn 0
   ct=12 hsh=0 unp=(nil) unn=0 hvl=b21336e0 nhv=0 ses=(nil)
   hep=0xb2133288 flg=80 ld=1 ob=0xb2c09860 ptr=0xab22a250 fex=0xab2295f0
   kkscs=0xb9a94588 nxt=0xb9a94930 flg=18 cld=2 hd=0xaf3c53e8 par=0xb21328e8
   Mutex 0xb9a94588(0, 0) idn 0
   ct=13 hsh=0 unp=(nil) unn=0 hvl=b21336a0 nhv=0 ses=(nil)
   hep=0xb9a94618 flg=80 ld=1 ob=0xb97655a0 ptr=0xa4e14f58 fex=0xa4e142f8
   kkscs=0xb9a94930 nxt=0xa840b1b0 flg=14 cld=3 hd=0xa1515da8 par=0xb21328e8
   Mutex 0xb9a94930(0, 0) idn 0
   ct=14 hsh=0 unp=(nil) unn=0 hvl=b2133638 nhv=1 ses=0xc0467f40
   hsv[0]=0
   hep=0xb9a949c0 flg=80 ld=1 ob=0xa81e1108 ptr=0xb16c2ea0 fex=0xb16c2240
   kkscs=0xa840b1b0 nxt=(nil) flg=18 cld=4 hd=0xaf3038c8 par=0xb21328e8
   Mutex 0xa840b1b0(0, 0) idn 0
   ct=19 hsh=0 unp=(nil) unn=0 hvl=b2133660 nhv=0 ses=(nil)
   hep=0xa840b240 flg=80 ld=1 ob=0x9fae5060 ptr=0xb8bc6b48 fex=0xb8bc5ee8
cursor instantiation=0x2ac85fded0d8 used=1379295804 exec_id=16777397 exec=1
 child#4(0xaf3038c8) pcs=0xa840b1b0
  clk=0xb6c31f68 ci=0x9fae5148 pn=0xaf984ba0 ctx=0xb8bc6b48
 kgsccflg=0 llk[0x2ac85fded0e0,0x2ac85fded0e0] idx=0
 xscflg=c0110676 fl2=5d000008 fl3=42222008 fl4=180
 sharing failure(s)=62000
----- Bind Info (kkscoacd) -----
 Bind#0
  oacdty=01 mxl=32(24) mxlc=00 mal=00 scl=00 pre=00
  oacflg=03 fl2=1000010 frm=01 csi=873 siz=32 off=0
  kxsbbbfp=2ac85fdecae0  bln=32  avl=00  flg=05
 Frames pfr 0x2ac85fdecea8 siz=27104 efr 0x2ac85fdecd98 siz=27088
 kxscphp=0x2ac85fdc03d8 siz=984 inu=456 nps=408
 kxscbhp=0x2ac85fdc01f8 siz=984 inu=176 nps=56
 kxscwhp=0x2ac85fdc0798 siz=16352 inu=15344 nps=5160
Starting SQL statement dump
SQL Information
user_id=161 user_name=WX_TMS_TEST module=JDBC Thin Client action=
sql_id=4x3cjtfs37n4v plan_hash_value=-785040234 problem_type=4
----- Current SQL Statement for this session (sql_id=4x3cjtfs37n4v) -----
select distinct viewbutton5_.SYS_VIEW_BUTTON_ID as SYS1_155_, 
viewbutton5_.RECORD_VERSION as RECORD2_155_, viewbutton5_.BTN_NAME as BTN3_155_, viewbutton5_.BTN_TITLE_KEY as BTN4_155_, viewbutton5_.BTN_MSG_KEY as BTN5_155_,
viewbutton5_.BUTTON_URL as BUTTON6_155_, viewbutton5_.SYS_MENU_ITEM_ID as SYS7_155_ 
from SYS_ROLE_MENU_BUTTON rolemenubu0_ left outer join SYS_ROLE_MENU_ITEM rolemenuit1_ 
on rolemenubu0_.SYS_ROLE_MENU_ITEM_ID=rolemenuit1_.SYS_ROLE_MENU_ITEM_ID left outer 
join SYS_ROLE role2_ on rolemenuit1_.SYS_ROLE_ID=role2_.SYS_ROLE_ID left outer join 
SYS_ROLE_USER roleusers3_ on role2_.SYS_ROLE_ID=roleusers3_.SYS_ROLE_ID left 
outer join SYS_USER user4_ on roleusers3_.USER_ID=user4_.USER_ID inner join 
SYS_VIEW_BUTTON viewbutton5_ on 
rolemenubu0_.SYS_VIEW_BUTTON_ID=viewbutton5_.SYS_VIEW_BUTTON_ID where user4_.USER_CODE=:1
sql_text_length=837
sql=select distinct viewbutton5_.SYS_VIEW_BUTTON_ID as SYS1_155_, viewbutton5_.RECORD_VERSION as RECORD2_155_, viewbutton5_.BTN_NAME as BTN3_155_, viewbutton5_.BTN_TITLE_KEY as BTN4_155_, viewbutton5_.BTN_MSG_KEY as BTN5_155_, viewbutton5_.BUTTON_URL as BUTTON
sql=6_155_, viewbutton5_.SYS_MENU_ITEM_ID as SYS7_155_ from SYS_ROLE_MENU_BUTTON rolemenubu0_ 
left outer join SYS_ROLE_MENU_ITEM rolemenuit1_ on rolemenubu0_.SYS_ROLE_MENU_ITEM_ID=rolemenuit1_.SYS_ROLE_MENU_ITEM_ID left outer 
join SYS_ROLE role2_ on rolemenuit
sql=1_.SYS_ROLE_ID=role2_.SYS_ROLE_ID left outer join SYS_ROLE_USER roleusers3_ on role2_.SYS_ROLE_ID=roleusers3_.SYS_ROLE_ID left outer join SYS_USER user4_ on roleusers3_.USER_ID=user4_.USER_ID inner join SYS_VIEW_BUTTON viewbutton5_ on 
rolemenubu0_.SYS_VIEW
sql=_BUTTON_ID=viewbutton5_.SYS_VIEW_BUTTON_ID where user4_.USER_CODE=:1
...
...

3 看到这里,没有太好的解决思路。同事给出提示信息说,同一套库,另外一套应用环境下,通过WebLogic来访问数据库时,则不会报错。难道是因为JDBC的不同驱动导致的?寻求MetaLink,Troubleshooting ORA-3137 [12333] Errors Encountered When Using Oracle JDBC Driver (Doc ID 1361107.1)

通过Metalink上的提示,得到造成该错误的可能原因:

① 部分版本的JDBC驱动会引起该错误:

Bug 9445675  NO MORE DATA TO READ FROM SOCKET WHEN USING END-TO-END METRICS

This bug does affect the JDBC driver.  This bug may be the cause when all of the following conditions are met:

  1. You are using the 10.1.x.x or the 11.2.0.1 JDBC driver; the bug does not affect 10.2.x.x, or 11.1.x.x versions of the driver, nor versions 11.2.0.2 or above
  2. You are using end-to-end metrics in your Java code
  3. The server side ORA-3137 [12333] error is accompanied by the client side Java exception “No more data to read from socket”

This bug is fixed in the 11.2.0.2 version of the JDBC driver and above.  It is discussed in the following notes:

Note 9445675.8 Bug 9445675 – “No more data” / ORA-3137 using end to end metrics with JDBC Thin

Note 1081275.1 “java.sql.SQLRecoverableException: No more data to read from socket” is Thrown When End-to-end Metrics is Used

② 数据库自身的BUG也会导致该错误:

Unpublished Bug:9703463 – ORA-3137 [12333] or ORA-600 [kpobav-1] When Using Bind Peeking

This bug affects versions 11.1.0.6, 11.1.0.7, and 11.2.0.1 of the RDBMS.  It is fixed in version 11.2.0.2 of the database.

It can also occur intermittently; similarly to unpublished Bug:8625762, this is a bind peeking bug.

==================================================

Unpublished Bug 9373370  DATA BASE RETURNS WRONG CURSORID WHEN THERE IS AN ORA-01013

This bug affects the 10.2, 11.1, 11.2.0.1, and 11.2.0.2 databases.  It is discussed in the following note:

Note 9373370.8 Bug 9373370 – The wrong cursor may be executed by JDBC thin following a query timeout / ORA-3137 [12333]

While the bug primarily manifests in ORA-1006 or ORA-1008 errors, the problem may also result in ORA-600 [12333] or ORA-3137 [12333] errors appearing on the server side.

判断本案例的错误应该是 Bug:9703463导致的,从trace文件中看到该SQL语句的确用到了绑定变量,且数据库版本是11.2.0.1.0。

解决思路:

①  可以通过更换不同版本的JDBC驱动来避免该错误,也说明为什么同事在另外一套环境下,通过Weblogic的JDBC来访问数据库时,则不会遇到该错误;

② 给数据库打patch,初步认为可以通过打Patch:9703463 来解决;

③ 通过修改数据库参数来规避该错误:

修改之后,同事用之前的Tomcat那个版本的驱动来重新访问数据库时,则不再报错;

④ 直接升级数据库版本至11.2.0.3.0,通过导出导入的方式将刚项目组下的schema数据复制一份到一套11.2.0.3.0的库上,重新使用Tomcat那个版本的驱动来重新访问数据库时,亦不再报错。

索引扫描路径之2:Index Range Scan

index_range_scan

在上一篇文章,索引扫描路径之1:Index Unique Scan 中,详细描述了Oracle的CBO优化器会在何种情形下选择通过Index Unique Scan的方式来访问数据。

本篇描述CBO优化器会在什么场景下选择Index Range Scan的路径来访问数据?

Index Range Scan通常见于通过索引去查找具有高选择性的数据。默认情况下,Oracle是按照被索引字段升序排序来存放索引记录的,如果被索引字段有重复值,则按照相应的ROWID做升序排序来存放。

较为常见的Index Range ScanWHERE条件如下:

Col1=:b1

Col1>:b1

Col1>=:b1

Col1<:b1

Col1<=:b1

或者是通过AND连接的满足上述可构成前导列的条件。再或者WHERE条件中有使用BETWEEN…AND条件。

同样,给出验证与说明:

系统版本、数据库版本和平台,以及测试表同与之前的环境。即选择Linux X86_64平台上的一套11.2.0.1.0库,使用HR这个schema下的departments表。

1 唯一索引,CBO选择Index Range Scan的场景

SQL> set autotrace trace
SQL> select * from departments where department_id>260;

Execution Plan
----------------------------------------------------------
Plan hash value: 3346631158

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |     1 |    20 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| DEPARTMENTS |     1 |    20 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | DEPT_ID_PK  |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DEPARTMENT_ID">260)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
        743  bytes sent via SQL*Net to client
        492  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

从上,即使是从唯一索引中获取一条数据[满足department_id>260条件的记录,确实只有1条],优化器也选择了走INDEX RANGE SCAN,因为上述SQL并不满足Index Unique Scan的条件。恰恰满足了INDEX RANGE SCAN 的条件。

2 接下来,一个典型的使用INDEX RANGE SCAN 的场景:

SQL> select * from departments where location_id=1700;

21 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 735461860

------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                  |    21 |   420 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| DEPARTMENTS      |    21 |   420 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | DEPT_LOCATION_IX |    21 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("LOCATION_ID"=1700)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          6  consistent gets
          1  physical reads
          0  redo size
       1468  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         21  rows processed

SQL>

或者:

SQL> select * from departments where location_id<=1700;

23 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 735461860

------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                  |    24 |   480 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| DEPARTMENTS      |    24 |   480 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | DEPT_LOCATION_IX |    24 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("LOCATION_ID"<=1700)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads
          0  redo size
       1502  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         23  rows processed

SQL>

再或者:

SQL> select * from departments where location_id>=1700;

25 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 735461860

------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                  |    25 |   500 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| DEPARTMENTS      |    25 |   500 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | DEPT_LOCATION_IX |    25 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("LOCATION_ID">=1700)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads
          0  redo size
       1561  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         25  rows processed

SQL>

可以看到,不管是location_id>=1700还是location_id=1700或者是location_id<=1700条件,CBO都选择了location_id字段上的索引DEPT_LOCATION_IX通过INDEX RANGE SCAN 的方式去获取数据。

3 接下来,BETWEEN…AND的场景:

SQL> select department_id,location_id,rowid from departments where location_id between 1500 and 2600;

25 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 735461860

------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                  |    26 |   494 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| DEPARTMENTS      |    26 |   494 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | DEPT_LOCATION_IX |    26 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

     2 - access("LOCATION_ID">=1500 AND "LOCATION_ID"<=2600)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads
          0  redo size
       1556  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         25  rows processed

SQL>

那么,如果是通过department_id字段来做BETWEEN…AND条件测试呢?即测试在一个唯一性索引上使用BETWEEN…AND条件的情形。

SQL> select department_id,location_id,rowid from departments where department_id between 120 and 130;

Execution Plan
----------------------------------------------------------
Plan hash value: 3346631158

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |     3 |    57 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| DEPARTMENTS |     3 |    57 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | DEPT_ID_PK  |     3 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DEPARTMENT_ID">=120 AND "DEPARTMENT_ID"<=130)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        748  bytes sent via SQL*Net to client
        492  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          2  rows processed

SQL>

测试结果发现,即使是唯一索引,CBO依然会选择INDEX RANGE SCAN。
反之,如果条件换做是department_id between 120 and 120的话,很明显,聪明的CBO这次肯定会选择Index Unique Scan。

SQL> select department_id,location_id,rowid from departments where department_id between 120 and 120;

Execution Plan
----------------------------------------------------------
Plan hash value: 4024094692

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |     1 |    19 |     1   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| DEPARTMENTS |     1 |    19 |     1   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | DEPT_ID_PK  |     1 |       |     0   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DEPARTMENT_ID"=120)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
        673  bytes sent via SQL*Net to client
        492  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

4 最后,看一个关于INDEX RANGE SCAN DESCENDING的场景:

CBO选择INDEX RANGE SCAN DESCENDING的方式来获取数据,适用于 ORDER BY column_name DESC的场景。在默认情况下,索引是按照索引关键字升序来存放数据的。

按照常理来讲,如果有ORDER BY column_name DESC的条件,那么

① 需要先从索引中读取数据;

② 再按照条件中column_name字段做降序排序。

而选择INDEX RANGE SCAN DESCENDING,则是直接在索引上按照索引关键字降序查找数据,这样正是为了避免先按照索引来查找数据,然后再做一次降序排序的操作。

SQL> set autotrace on
SQL> select department_id,location_id,rowid from departments where location_id between 1500 and 2600 order by location_id desc;

DEPARTMENT_ID LOCATION_ID ROWID
------------- ----------- ------------------
           80        2500 AAAz+PABRAAAACbAAH
           40        2400 AAAz+PABRAAAACbAAD
           20        1800 AAAz+PABRAAAACbAAB
          270        1700 AAAz+PABRAAAACbAAa
          260        1700 AAAz+PABRAAAACbAAZ
          250        1700 AAAz+PABRAAAACbAAY
          240        1700 AAAz+PABRAAAACbAAX
          230        1700 AAAz+PABRAAAACbAAW
          220        1700 AAAz+PABRAAAACbAAV
          210        1700 AAAz+PABRAAAACbAAU
          200        1700 AAAz+PABRAAAACbAAT

DEPARTMENT_ID LOCATION_ID ROWID
------------- ----------- ------------------
          190        1700 AAAz+PABRAAAACbAAS
          180        1700 AAAz+PABRAAAACbAAR
          170        1700 AAAz+PABRAAAACbAAQ
          160        1700 AAAz+PABRAAAACbAAP
          150        1700 AAAz+PABRAAAACbAAO
          140        1700 AAAz+PABRAAAACbAAN
          130        1700 AAAz+PABRAAAACbAAM
          120        1700 AAAz+PABRAAAACbAAL
          110        1700 AAAz+PABRAAAACbAAK
          100        1700 AAAz+PABRAAAACbAAJ
           90        1700 AAAz+PABRAAAACbAAI

DEPARTMENT_ID LOCATION_ID ROWID
------------- ----------- ------------------
           30        1700 AAAz+PABRAAAACbAAC
           10        1700 AAAz+PABRAAAACbAAA
           50        1500 AAAz+PABRAAAACbAAE

25 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2689299823

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |    26 |   494 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | DEPARTMENTS      |    26 |   494 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN DESCENDING| DEPT_LOCATION_IX |    26 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("LOCATION_ID">=1500 AND "LOCATION_ID"<=2600)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads
          0  redo size
       1575  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         25  rows processed

SQL>

从上可以看到,对于LOCATION_ID=1700的记录,则正是按照ROWID做的降序排序操作。