遇到ORA-600 [kmgs_parameter_update_timeout_1] [1565]的错误 续

刚发完上篇文章之后,又从老杨的一篇文章“http://blog.itpub.net/post/468/450451?SelectActiveLayout=a”中看到一些有所帮助的信息,而这个案例中在alert里看到,在这个600的错误抛出之前一条错误信息中看到:

Wed Jul 25 09:56:46  2012
Thread 1 advanced to log sequence 580 (LGWR switch)
  Current log# 1 seq# 580 mem# 0: +DATA/zhfr8db/onlinelog/group_1.271.783424599
  Current log# 1 seq# 580 mem# 1: +FLASH/zhfr8db/onlinelog/group_1.256.783424601
Wed Jul 25 10:49:39  2012
Unexpected communication failure with ASM instance:
 error 21561 (ORA-21561: 生成 OID 失败
)

的错误信息。

从后续的alert日志里,尝试关闭数据库服务器的时候抛出的错误中也有类似错误:

Wed Jul 25 10:55:03  2012
Trace dumping is performing id=[cdmp_20120725105503]
Wed Jul 25 10:56:06  2012
Restarting dead background process MMON
MMON started with pid=33, OS id=6756
Wed Jul 25 12:46:20  2012
Unexpected communication failure with ASM instance:
 error 21561 (ORA-21561: 生成 OID 失败
)
NOTE: ASMB process state dumped to trace file c:\oracle\product\10.2.0\admin\zhfr8db\udump\zhfr8db1_ora_5552.trc
Wed Jul 25 12:47:03  2012
Unexpected communication failure with ASM instance:
 error 21561 (ORA-21561: 生成 OID 失败
)
NOTE: ASMB process state dumped to trace file c:\oracle\product\10.2.0\admin\zhfr8db\udump\zhfr8db1_ora_2008.trc
Wed Jul 25 12:50:01  2012
Unexpected communication failure with ASM instance:
 error 21561 (ORA-21561: 生成 OID 失败
)

再从c:\oracle\product\10.2.0\admin\zhfr8db\udump\zhfr8db1_ora_5552.trc文件跟踪看到:

*** 2012-07-25 12:46:20.268
*** CLIENT ID:() 2012-07-25 12:46:20.268
      ----------------------------------------
      SO: 000000047111DEF0, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00
      (process) Oracle pid=31, calls cur/top: 000000047E15DC20/000000047E15DC20, flag: (6) SYSTEM
                int error: 0, call error: 0, sess error: 0, txn error 0
  (post info) last post received: 0 0 33
              last post received-location: ksrpublish
              last process to post me: 7e11e6f8 1 6
              last post sent: 849 0 4
              last post sent-location: kslpsr
              last process posted by me: 7312aef8 1 6
        (latch info) wait_event=0 bits=0
        Process Group: DEFAULT, pseudo proc: 00000004731384B8
        O/S info: user: SYSTEM, term: DATACENTER01, ospid: 4828 
        OSD pid info: Windows thread id: 4828, image: ORACLE.EXE (ASMB)
        Short stack dump: 
ksdxfstk+42<-ksdxcb+1630<-ssthreadsrgruncallback+589<-OracleOradebugThreadStart+975<-0000000077D6B71A
<-0000000077EF047A<-0000000077DA79F3<-0000000008653328<-000000000865190C<-0000000005F564A9
<-0000000005F0CF64<-0000000005EE5D88<-0000000005EE57F9<-0000000005EA5ECB<-ttcdrv+14881
<-0000000005EAAA6D<-xupirtrc+1335<-xupirtr+216<-upirtr+23<-kpurcs+45
<-OCIKDispatch+32<-kfnOpExecute+146<-kfnbRun+1062<-ksbrdp+988<-opirip+700
<-opidrv+856<-sou2o+52<-opimai_real+268<-opimai+96<-BackgroundThreadStart+637<-0000000077D6B71A
        ----------------------------------------
        SO: 000000047114E1A0, type: 4, owner: 000000047111DEF0, flag: INIT/-/-/0x00
        (session) sid: 189 trans: 0000000000000000, creator: 000000047111DEF0, flag: (51) USR/- BSY/-/-/-/-/-
                  DID: 0001-001F-00000003, short-term DID: 0000-0000-00000000
                  txn branch: 0000000000000000
                  oct: 0, prv: 0, sql: 0000000000000000, psql: 0000000000000000, user: 0/SYS
        service name: SYS$BACKGROUND
        waiting for 'ASM background timer' wait_time=0, seconds since wait started=420165
                    =0, =0, =0
                    blocking sess=0x0000000000000000 seq=31
        Dumping Session Wait History
         for 'ASM background timer' count=1 wait_time=4.999949 sec
                    =0, =0, =0
         for 'ASM background timer' count=1 wait_time=4.999893 sec
                    =0, =0, =0
         for 'ASM background timer' count=1 wait_time=5.000022 sec
                    =0, =0, =0
         for 'ASM background timer' count=1 wait_time=4.999948 sec
                    =0, =0, =0
         for 'ASM background timer' count=1 wait_time=4.999924 sec
                    =0, =0, =0
         for 'ASM background timer' count=1 wait_time=5.000012 sec
                    =0, =0, =0
         for 'ASM background timer' count=1 wait_time=4.999948 sec
                    =0, =0, =0
         for 'ASM background timer' count=1 wait_time=4.999858 sec
                    =0, =0, =0
         for 'ASM background timer' count=1 wait_time=4.999991 sec
                    =0, =0, =0
         for 'ASM background timer' count=1 wait_time=5.000000 sec
                    =0, =0, =0
        Sampled Session History of session 189 serial 1
        ---------------------------------------------------
        The sampled session history is constructed by sampling
        the target session every 1 second. The sampling process
        captures at each sample if the session is in a non-idle wait,
        an idle wait, or not in a wait. If the session is in a
        non-idle wait then one interval is shown for all the samples
        the session was in the same non-idle wait. If the
        session is in an idle wait or not in a wait for
        consecutive samples then one interval is shown for all
        the consecutive samples. Though we display these consecutive
        samples  in a single interval the session may NOT be continuously
        idle or not in a wait (the sampling process does not know).
 
        The history is displayed in reverse chronological order.
 
        sample interval: 1 sec, max history 120 sec
        ---------------------------------------------------
          [121 samples,                                    12:44:20 - 12:46:20]
            idle wait at each sample
        temporary object counter: 0
          KTU Session Commit Cache Dump for IDLs: 
          KTU Session Commit Cache Dump for Non-IDLs: 
          ----------------------------------------
          UOL used : 0 locks(used=0, free=0)
          KGX Atomic Operation Log 000000047AECC840
           Mutex 0000000000000000(0, 0) idn 0 oper NONE
           Cursor Pin uid 189 efd 3 whr 11 slp 0
          KGX Atomic Operation Log 000000047AECC888
           Mutex 0000000000000000(0, 0) idn 0 oper NONE
           Library Cache uid 189 efd 0 whr 0 slp 0
          KGX Atomic Operation Log 000000047AECC8D0
           Mutex 0000000000000000(0, 0) idn 0 oper NONE
           Library Cache uid 189 efd 0 whr 0 slp 0
          ----------------------------------------
          SO: 000000045A233D80, type: 41, owner: 000000047114E1A0, flag: INIT/-/-/0x00
          (dummy) nxc=0, nlb=0   
        ----------------------------------------
        SO: 0000000472172A40, type: 11, owner: 000000047111DEF0, flag: INIT/-/-/0x00
        (broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: 000000047111DEF0,
                           event: 31, last message event: 31,
                           last message waited event: 31,                            next message: 0000000476225BC8(0), messages read: 0
                           channel: (00000004711640E0) system events broadcast channel
                                    scope: 2, event: 30690, last mesage event: 7413,
                                    publishers/subscribers: 1/45,
                                    messages published: 3
                                    oldest msg (?): 0000000476225BB8 id: 1 pub: 000000047E11F768
                                    heuristic msg queue length: 3
        ----------------------------------------
        SO: 0000000472270FA0, type: 19, owner: 000000047111DEF0, flag: INIT/-/-/0x00
         GES MSG BUFFERS: st=emp chunk=0x0000000000000000 hdr=0x0000000000000000 lnk=0x0000000000000000 flags=0x0 inc=0
          outq=0 sndq=0 opid=0 prmb=0x0 
          mbg[i]=(0 0) mbg[b]=(0 0) mbg[r]=(0 0)
          fmq[i]=(0 0) fmq[b]=(0 0) fmq[r]=(0 0)
          mop[s]=0 mop[q]=0 pendq=0 zmbq=0
          nonksxp_recvs=0
        ------------process 0x0000000472270FA0--------------------
        proc version      : 0
        Local node        : 0
        pid               : 4828
        lkp_node          : 0
        svr_mode          : 0
        proc state        : KJP_FROZEN
        Last drm hb acked : 0
        Total accesses    : 3
        Imm.  accesses    : 0
        Locks on ASTQ     : 0
        Locks Pending AST : 0
        Granted locks     : 0
        AST_Q: 
        PENDING_Q: 
        GRANTED_Q: 
        ----------------------------------------
        SO: 000000047E15DC20, type: 3, owner: 000000047111DEF0, flag: INIT/-/-/0x00
        (call) sess: cur 7114e1a0, rec 0, usr 7114e1a0; depth: 0
          ----------------------------------------
          SO: 000000045ED23770, type: 84, owner: 000000047E15DC20, flag: INIT/-/-/0x00
          (kfgso) flags: 00000000 clt: 3 err: 0 hint: 0
          (kfgpn) rpi: 1 itrn:0000000000000000 gst:0000000000000000 usrp:0000000000000000
          busy: 0 rep: 0 grp: 5d60b840 check: 0/0 glink: 5d60b888 5d60b888
            kfgrp:  number: 0/0 type: 0 compat: 0.0.0.0.0 dbcompat:0.0.0.0.0
            timestamp: 0 state: 0 flags: 2 gpnlist: 5ed237f0 5ed237f0
            KFGPN at 5ed23770 in dependent chain
        ----------------------------------------
        SO: 000000045AEEDD48, type: 16, owner: 000000047111DEF0, flag: INIT/-/-/0x00
        (osp req holder)
PSO child state object changes :
Dump of memory from 0x0000000474167DC0 to 0x0000000474167FC8
474167DC0 00000005 00000000 5AEEDD48 00000004  [........H..Z....]
474167DD0 00000010 000313F4 7E15DC20 00000004  [........ ..~....]
474167DE0 00000003 000313F4 72270FA0 00000004  [..........'r....]
474167DF0 00000013 000312CB 72172A40 00000004  [........@*.r....]
474167E00 0000000B 000313F4 7114E1A0 00000004  [...........q....]
474167E10 00000004 000312CB 00000000 00000000  [................]
474167E20 00000000 00000000 00000000 00000000  [................]
        Repeat 25 times
474167FC0 00000000 00000000                    [........]        
*** 2012-07-25 12:46:37.393
*** CLIENT ID:() 2012-07-25 12:46:37.393
WARNING:Could not lower the asynch I/O limit to 256 for SQL direct I/O. It is set to -1 
WARNING:Could not lower the asynch I/O limit to 320 for SQL direct I/O. It is set to -1 
WARNING:Could not lower the asynch I/O limit to 256 for SQL direct I/O. It is set to -1 
WARNING:Could not lower the asynch I/O limit to 256 for SQL direct I/O. It is set to -1 
WARNING:Could not lower the asynch I/O limit to 288 for SQL direct I/O. It is set to -1 
WARNING:Could not lower the asynch I/O limit to 224 for SQL direct I/O. It is set to -1 
WARNING:Could not lower the asynch I/O limit to 256 for SQL direct I/O. It is set to -1 
WARNING:Could not lower the asynch I/O limit to 192 for SQL direct I/O. It is set to -1 
WARNING:Could not lower the asynch I/O limit to 160 for SQL direct I/O. It is set to -1 
*** 2012-07-25 12:47:20.314
*** CLIENT ID:() 2012-07-25 12:47:20.314
WARNING:Could not lower the asynch I/O limit to 256 for SQL direct I/O. It is set to -1

难道问题是:

WARNING:Could not lower the asynch I/O limit to 256 for SQL direct I/O. It is set to -1 
WARNING:Could not lower the asynch I/O limit to 320 for SQL direct I/O. It is set to -1 
WARNING:Could not lower the asynch I/O limit to 256 for SQL direct I/O. It is set to -1 
WARNING:Could not lower the asynch I/O limit to 256 for SQL direct I/O. It is set to -1 
WARNING:Could not lower the asynch I/O limit to 288 for SQL direct I/O. It is set to -1 
WARNING:Could not lower the asynch I/O limit to 224 for SQL direct I/O. It is set to -1 
WARNING:Could not lower the asynch I/O limit to 256 for SQL direct I/O. It is set to -1 
WARNING:Could not lower the asynch I/O limit to 192 for SQL direct I/O. It is set to -1 
WARNING:Could not lower the asynch I/O limit to 160 for SQL direct I/O. It is set to -1

导致的,这些WARNING又说明什么呢?

遇到ORA-600 [kmgs_parameter_update_timeout_1] [1565]的错误

           今天上午,在一套Windows 2003 64位的双节点10.2.0.5.0的64位RAC数据库上,遇到一则600的错误,ORA-00600: 内部错误代码, 参数: [kmgs_parameter_update_timeout_1], [1565], [], [], [], [], [], []。从(节点1的)alert日志里获取的具体错误信息如下:

Wed Jul 25 10:49:39  2012
Unexpected communication failure with ASM instance:
 error 21561 (ORA-21561: 生成 OID 失败
)
NOTE: ASMB process state dumped to trace file c:\oracle\product\10.2.0\admin\zhfr8db\bdump\zhfr8db1_mmon_4624.trc
System State dumped to trace file c:\oracle\product\10.2.0\admin\zhfr8db\bdump\zhfr8db1_mmon_4624.trc
Wed Jul 25 10:55:02  2012
Errors in file c:\oracle\product\10.2.0\admin\zhfr8db\bdump\zhfr8db1_mmon_4624.trc:
ORA-00600: 内部错误代码, 参数: [kmgs_parameter_update_timeout_1], [1565], [], [], [], [], [], []
ORA-01565: 标识文件 '+DATA/zhfr8db/spfilezhfr8db.ora' 时出错
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/zhfr8db/spfilezhfr8db.ora
ORA-21561: 生成 OID 失败

Wed Jul 25 10:55:03  2012
Trace dumping is performing id=[cdmp_20120725105503]
Wed Jul 25 10:56:06  2012
Restarting dead background process MMON
MMON started with pid=33, OS id=6756

           当时的表现情况是,客户端无法通过应用系统访问数据库,客户端通过tnsping service_name的返回结果也是,有时通,返回10毫秒,而有时挂死了。

           登录到其中的一个节点服务器(节点1)上,执行lsnrctl status,查看监听状态也挂死,而到另外一个节点(节点2)上,执行lsnrctl status一切正常。并且,分别在两个节点上,通过SQL*PLUS连接数据库均正常,执行crs_stat -t返回的结果也都正常。

           接下来,分别在两个节点上作了一个AWR报告,分析了AWR发现并没有发现数据库的异常现象。

           从告警日志里看到跟SPFILE相关的错误,于是在SQL*PLUS里尝试create pfile from spfile,以及使用RMAN工具backup spfile时,都报错了,当时匆忙,具体错误号没有记录下来。

           没辙了,就到节点1上,尝试通过srvctl stop database -d db_unique_name关闭数据库时,挂死了,丝毫没有反应;通过SQL*PLUS在节点1上,尝试shutdown immediate来手工关闭实例,也未果。

           当时,从网络上找到老杨以及惜分飞的文章,不过,貌似都和我遇到的这个情况不太类似。

           然后,在没有更好的办法的前提下,直接重启节点1的Windows服务器,毕竟当时应用已经不可用了,再者是RAC架构,最起码还留着一个实例。重启之后,数据库一切正常了。

           最后,从MetaLink上找到这样的一篇文章:ORA-600 [kmgs_parameter_update_timeout_1], [1565] While Accessing Spfile Managed By ASM [ID 553915.1]该文档描述,该错误影响到10.2及以上版本的数据库,原因是:

This is due to unpublished bug 5399699 where ORA-600 [kmgs_parameter_update_timeout_1] or similar errors can occur in MMON when ASM is being used.

In 10g MMON manages all memory re-size activity by modifying related parameters. If MMON is not running DBW0 will handle this task. The parameter update activity is triggered by a timeout. Basically this error indicates that the MMON process is not able to write to the SPFILE to store some settings required for dynamic SGA parameter adjustments.

           也就是说:这个错误是oracle还未发布的一个bug,bug号是5399699。这个错误是说在10g的数据库里,MMON进程动态的管理内存,MMON(Memory Monitor)是10g数据库的新进程。从10g开始,数据库支持自动调整SGA内存,当需要调整(动态增大或减小)的时候,MMON进程会自动完成,MMON会把这个改变的信息,写入到SPFILE里。

       结合最开始的alert日志文件的内容,是由于当时MMON进程无法把这个信息写入到SPFILE里导致的,导致后来MMON后台进程僵死了,在Wed Jul 25 10:56:06  2012的时候,MMON进程又被重启了。然后数据库一直处于“假死”的状态。

       Oracle Metalink上这篇文章给的解决方案:

Solution

1.  Upgrade to the 10.2.0.4.4 PSU Patch:9352164 or higher where this bug is fixed.

OR

2. Check if Patch:5399699 is available for your RDBMS release and platform.

OR

3.  Use one of the following workarounds:

  • Relocate the spfile either to some other diskgroup on which the archive logs are not being managed.
  • Move the spfile to the file system

          显然,第1个与当前环境下的数据库版本不一致;而第2个方案中,又没有查到Windows 2003 X64平台下的补丁;第3个方案,我当前是RAC的库,如果把SPFILE迁移到文件系统上的话,又不太合适,除非给每个实例单独配置PFILE。

          最后,这个问题,对于我这个Oracle菜鸟而言,依然头痛,没有更好的解决方案,如果大家有遇到过类似的案例的话,请不吝赐教! 

 

记录一次在IBM P750小机上给Oracle动态扩展存储

           本文详细描述一次在IBM P750的小机上动态给Oracle数据库扩展存储空间的操作。

           背景描述:一套2台IBM P750的小机通过HACMP做的HA,上面跑的是Oracle 11gR2的单实例数据库,除小机自带两块300G本地存储之外,共享存储采用的是IBM DS 5100,做RAID 10之后,可用空间2.1TB。目前该机器上有两个VG:rootvg,datavg。其中,rootvg存放AIX操作系统,由本机自带两块盘提供物理卷,datavg给oracle数据库用,物理卷是阵列上的磁盘。其中,datavg下的逻辑卷/oradata用于存放数据库的数据文件、联机日志文件、控制文件等;/oraflash主要用于存放归档日志和RMAN备份。

           1 添加之前,查看当前文件系统使用信息:

$ df -g
Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
/dev/hd4           1.00      0.78   23%    10542     6% /
/dev/hd2          10.00      7.48   26%    52002     3% /usr
/dev/hd9var        5.00      4.45   12%     8742     1% /var
/dev/hd3          10.00      7.22   28%      729     1% /tmp
/dev/hd1           0.50      0.49    2%      135     1% /home
/dev/hd11admin      0.50      0.50    1%        5     1% /admin
/proc                 -         -    -         -     -  /proc
/dev/hd10opt       0.50      0.23   55%    10267    16% /opt
/dev/livedump      0.50      0.50    1%        4     1% /var/adm/ras/livedump
/dev/oracle      100.00     80.71   20%   357020     2% /u01
/dev/oradata     500.00    135.74   73%       39     1% /oradata
/dev/oraflash    500.00     58.62   89%     1186     1% /oraflash
$

           从上看到,挂载在/oraflash下的/dev/oraflash文件系统大小是500G,可用空间剩余58G,挂载在/oradata下的/dev/oradata文件系统大小是500G,剩余空间是135G,因业务量迅速增长,现需要扩充存储空间。接下来准备扩充文件系统/oradata和/oraflash,准备分别扩充300G。     

           2 扩之前,查看VG信息:

$ lsvg -o
datavg
rootvg
$ 

           看到,当前varyon的卷组是rootvg和datavg。

           3 查看datavg的物理卷信息:

$ lsvg -p datavg
datavg:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk2            active            1599        0           00..00..00..00..00
hdisk3            active            1599        0           00..00..00..00..00
hdisk4            active            1199        397         00..00..00..157..240
hdisk5            active            1599        1598        320..319..319..320..320
hdisk6            active            1599        1599        320..320..319..320..320
hdisk7            active            1599        0           00..00..00..00..00
hdisk8            active            1599        0           00..00..00..00..00
hdisk9            active            1599        797         157..00..00..320..320
hdisk10           active            1599        1599        320..320..319..320..320
hdisk11           active            1599        1599        320..320..319..320..320
$ 

             4 查看datavg下的逻辑卷信息:

$ lsvg -l datavg
datavg:
LV NAME             TYPE       LPs     PPs     PVs  LV STATE      MOUNT POINT
oradata             jfs2       4000    4000    3    open/syncd    /oradata
loglv01             jfs2log    1       1       1    open/syncd    N/A
oraflash            jfs2       4000    4000    3    open/syncd    /oraflash
$

              从上,看到oradata,oraflash两个逻辑卷都位于datavg卷组下。

              5 接下来,查看datavg的详细信息:

$ lsvg datavg
VOLUME GROUP:       datavg                   VG IDENTIFIER:  00f64a5100004c000000012d5e49cf72
VG STATE:           active                   PP SIZE:        128 megabyte(s)
VG PERMISSION:      read/write               TOTAL PPs:      15590 (1995520 megabytes)
MAX LVs:            256                      FREE PPs:       7589 (971392 megabytes)
LVs:                3                        USED PPs:       8001 (1024128 megabytes)
OPEN LVs:           3                        QUORUM:         6 (Enabled)
TOTAL PVs:          10                       VG DESCRIPTORS: 10
STALE PVs:          0                        STALE PPs:      0
ACTIVE PVs:         10                       AUTO ON:        no
MAX PPs per VG:     32768                    MAX PVs:        1024
LTG size (Dynamic): 256 kilobyte(s)          AUTO SYNC:      no
HOT SPARE:          no                       BB POLICY:      relocatable
PV RESTRICTION:     none
$

           从上看到,datavg卷组的PP size=128M,Totol PPs=15590,意味着卷组的总大小=128*15590M=1948G,已用8001个PPs(1000G),可用PPs 7589个(948G)。说明,卷组上还有空间可供使用。

           6 查看逻辑卷oraflash的信息:

$ lslv oraflash
LOGICAL VOLUME:     oraflash               VOLUME GROUP:   datavg
LV IDENTIFIER:      00f64a5100004c000000012d5e49cf72.3 PERMISSION:     read/write
VG STATE:           active/complete        LV STATE:       opened/syncd
TYPE:               jfs2                   WRITE VERIFY:   off
MAX LPs:            4000                   PP SIZE:        128 megabyte(s)
COPIES:             1                      SCHED POLICY:   parallel
LPs:                4000                   PPs:            4000
STALE PPs:          0                      BB POLICY:      relocatable
INTER-POLICY:       minimum                RELOCATABLE:    yes
INTRA-POLICY:       middle                 UPPER BOUND:    1024
MOUNT POINT:        /oraflash              LABEL:          /oraflash
MIRROR WRITE CONSISTENCY: on/ACTIVE
EACH LP COPY ON A SEPARATE PV ?: yes
Serialize IO ?:     NO
DEVICESUBTYPE : DS_LVZ
COPY 1 MIRROR POOL: None
COPY 2 MIRROR POOL: None
COPY 3 MIRROR POOL: None
$

             这里,看到oraflash逻辑卷MAX LPs、LPs、PPs都是4000,说明如果要直接扩充文件系统的话,扩充之后的文件系统Max LPs不能超过4000,否则就得先扩逻辑卷oraflash了。oraflash文件系统类型是jfs2。

             7 顺便查看oraflash占用哪几个物理卷的信息:

$ lslv -l oraflash
oraflash:/oraflash
PV                COPIES        IN BAND       DISTRIBUTION
hdisk7            1599:000:000  20%           320:320:319:320:320
hdisk8            1599:000:000  20%           320:320:319:320:320
hdisk9            802:000:000   39%           163:320:319:000:000
$

             8 尝试直接使用smitty直接扩充oraflash文件系统,尝试增加100G,即从现有的500G扩充到600G:

             从上看到,oraflash文件系统类型是jfs2,直接以root执行smitty chjfs2命令,进入smitty界面:

 

      然后,选择/oraflash,回车,进入下一操作界面:

 

      将Unit Size选择为G,Number of units输入600,表示600G大小,即该文件系统的扩充目标大小。回车,进入下图:

             

               发现报错,提示逻辑卷oraflash的超出最大4000个Lps。扩充失败。看来得先修改逻辑卷oraflash的属性了。

               9   接下来,先修改逻辑卷oraflash的最大Lps,这里准备增加2400个,从目前的4000个Lps增加到6400个。2400*128=300G,这个需要事先计算好。

                    root用户执行smitty chlv,进入下述操作界面:

         选择第一项,Change a Logical Volume,然后选择对应的oraflash,如下图:

 

        回车,进入下一界面:

 

        然后,修改MAXIMUM NUMBER of LOGICAL PARTITIONS值为6400,改完之后,直接回车,进入下一界面:

       提示成功,状态OK。执行Esc+0退出。

            10  再次对/oraflash文件系统进行扩充。依次以root执行smitty chjfs2命令,选择/oraflash文件系统,同样将Unit Size选择为G,Number of units输入800,表示800G大小,即该文件系统的扩充目标大小。这里,因为上述我们已经将逻辑卷oraflash的最大Lps,增加2400个Lps,2400Lps*128M/Lps=300G,所以我们的目标大小是800G。如下图所示:

               

                 修改之后,直接回车。发现最后扩充成功了。

                 如法炮制,通过smit chlv修改oradata逻辑卷的MAX LPs为6400个之后,然后,执行smit chjfs2选择修改/oradata文件系统,扩展到800G。

                 最终,修改之后,文件系统使用信息如下:

# df -g
Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
/dev/hd4           1.00      0.78   23%    10542     6% /
/dev/hd2          10.00      7.48   26%    52002     3% /usr
/dev/hd9var        5.00      4.45   12%     8742     1% /var
/dev/hd3          10.00      7.22   28%      729     1% /tmp
/dev/hd1           0.50      0.49    2%      135     1% /home
/dev/hd11admin      0.50      0.50    1%        5     1% /admin
/proc                 -         -    -         -     -  /proc
/dev/hd10opt       0.50      0.23   55%    10267    16% /opt
/dev/livedump      0.50      0.50    1%        4     1% /var/adm/ras/livedump
/dev/oracle      100.00     80.71   20%   357030     2% /u01
/dev/oradata     800.00    435.70   46%       39     1% /oradata
/dev/oraflash    800.00    358.54   56%     1187     1% /oraflash
#

                  看到/oradata和/oraflash已从500G扩充到800G。同时,看到datavg的FREE PPS变少了,从之前的7589减少到2789,减少了7589-2789=4800个,正好是分别往oradata和oraflash上加的2400个:

# lsvg datavg
VOLUME GROUP:       datavg                   VG IDENTIFIER:  00f64a5100004c000000012d5e49cf72
VG STATE:           active                   PP SIZE:        128 megabyte(s)
VG PERMISSION:      read/write               TOTAL PPs:      15590 (1995520 megabytes)
MAX LVs:            256                      FREE PPs:       2789 (356992 megabytes)
LVs:                3                        USED PPs:       12801 (1638528 megabytes)
OPEN LVs:           3                        QUORUM:         6 (Enabled)
TOTAL PVs:          10                       VG DESCRIPTORS: 10
STALE PVs:          0                        STALE PPs:      0
ACTIVE PVs:         10                       AUTO ON:        no
MAX PPs per VG:     32768                    MAX PVs:        1024
LTG size (Dynamic): 256 kilobyte(s)          AUTO SYNC:      no
HOT SPARE:          no                       BB POLICY:      relocatable
PV RESTRICTION:     none
#

                  并且,oraflash和oradata的PPs都从4000增加到6400:

# lsvg -l datavg
datavg:
LV NAME             TYPE       LPs     PPs     PVs  LV STATE      MOUNT POINT
oradata             jfs2       6400    6400    5    open/syncd    /oradata
loglv01             jfs2log    1       1       1    open/syncd    N/A
oraflash            jfs2       6400    6400    5    open/syncd    /oraflash
#

                    而此时,oraflash、oradata占用物理卷信息也改变了:

# lslv -l oradata
oradata:/oradata
PV                COPIES        IN BAND       DISTRIBUTION
hdisk2            1599:000:000  20%           320:320:319:320:320
hdisk3            1599:000:000  20%           320:320:319:320:320
hdisk4            1199:000:000  20%           240:240:239:240:240
hdisk11           1599:000:000  20%           320:320:319:320:320
hdisk5            404:000:000   78%           000:319:085:000:000
# lslv -l oraflash
oraflash:/oraflash
PV                COPIES        IN BAND       DISTRIBUTION
hdisk7            1599:000:000  20%           320:320:319:320:320
hdisk8            1599:000:000  20%           320:320:319:320:320
hdisk9            1599:000:000  20%           320:320:319:320:320
hdisk6            1599:000:000  20%           320:320:319:320:320
hdisk10           004:000:000   100%          000:004:000:000:000
#

                  最终,完成在IBM P750小机上在Oracle数据库正常运行的前提下,动态给Oracle添加存储。

解决VARCHAR2和NVARCHAR2隐式数据类型转换导致的性能问题

           前段时间,接到公司一项目组的数据库优化需求:一个计算费用的存储过程执行起来非常之慢,需要优化。

           拿到存储过程之后,快速看了下代码,然后通过PL/SQL Developer去查看SQL代码的执行计划。其中,看到下面的一条简单的SQL语句的执行计划有点儿问题,导致主表CA_SE_MANIFEST走的全表扫描,而该表的数据量很大,而TMP_COM_ID2又是一张临时表。可是主表CA_SE_MANIFEST的BL_NO_ID字段是主键字段,有唯一索引,却没有走索引,这样的话,执行效率应该不会好到哪儿去!

           SQL语句:

DELETE FROM TMP_COM_ID2 WHERE EXISTS(SELECT 1 FROM CA_SE_MANIFEST WHERE BL_NO_ID = C_ID AND DOCUMENT_TYPE IN ('1','2'))

           执行计划:

           经过分析,发现表TMP_COM_ID2的结构信息如下,只有1个字段,其中C_ID字段的数据类型是NVARCHAR2

SQL> desc tmp_com_id2
Name Type          Nullable Default Comments 
---- ------------- -------- ------- -------- 
C_ID NVARCHAR2(20)                           
 
SQL> 

           而主表CA_SE_MANIFEST的BL_NO_ID字段数据类型却是VARCHAR2

SQL> desc ca_se_manifest
Name                      Type         Nullable Default Comments                                                                                                                                                                                                           
------------------ ------------------- -------- ------- -------------------------------- 
BL_NO_ID           VARCHAR2(20 BYTE)                    出口提单序号                                                                                                                                                                                                       
BL_NO              VARCHAR2(40 BYTE)                    同一船名、航次下、分公司的提单号不重复                                                                                                                                             
CARRIER_BL_NO      VARCHAR2(20 BYTE)   Y                船公司提单号 
...
...

           基本上定位到了原因,是由于VARCHAR2和NVARCHAR2数据类型不一致导致的隐式数据类型转换,进而导致主表CA_SE_MANIFEST的主键字段BL_NO_ID上的索引失效,最终产生了全表扫描

           经过分析,可以将临时表TMP_COM_ID2的C_ID字段数据类型改成VARCHAR2,而主表CA_SE_MANIFEST的定义信息显然不可轻易修改。修改完临时表的字段数据类型之后,该条SQL的执行计划已经不再是CA_SE_MANIFEST的全表扫描了:

          最终,这个小问题得以解决。

          补充:① VARCHAR2 可以认为是VARCHAR的变种,它是一个变长的字符串数据类型;

                    ②NVARCHAR2 是包含UNICODE编码格式数据的变长字符串。