Oracle 11g Concepts 笔记1：联合索引的创建及注意事项

10月 31,2011 有5条评论 Concepts, Oracle

众所周知，每一位Oracle数据库技术牛人，每一名Oracle DBA在谈到自己的成长历程或者分享学习经验时，都会强调阅读Oracle的官方文档的重要性和必要性。而Oracle提供的官方文档数量之多，范围之广让很多初学者都望而却步，尤其是对于英语基础不好的网友更是头痛不已。其实，就我个人而言，也是经历过诸多痛苦的。事实上，我想但凡每一位认认真真地阅读过官方文档的过来人都或多或少的会有同感。既然Oracle提供给我们的学习文档如此之多，那么我们又该选择哪些文档入手呢？

我的建议是，优先阅读也是最应该阅读 Oracle Database Concepts这本书，因为这本书涵盖了Oracle数据库的全部基本概念，多读此书，收获颇多。记得当初08年的时候，自学Oracle，备考Oracle 10g OCA时，自己坚持着把10g 版本的 Oracle Concepts 10g Release 2前2/3部分反复阅读了几次，得以对Oracle数据库算是有个基本了解。

本系列文章将以笔记的形式记录自己在阅读和学习Oracle 11gR2 Database Concepts 【文档编号为E16508-05】的心得和体会。一方面算是对自己的知识总结和记录，另一方面也希望这一系列的文章能够对广大Oracle数据库技术爱好者带来帮助。当然，如果有理解的不对或者是纰漏之处，还恳请广大网友及时回复给予指正。好了，废话不再赘述。

Oracle 11gR2 Database Concepts第3章Page 57讲到Indexes and Index-Organized Tables，其中Page 59讲到Composite Indexes。

首先，我们创建测试表，并在该表上创建联合索引：

SQL> conn / as sysdba
Connected.
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production 

SQL> conn hr/hr

Connected.

SQL> create table t(id1 number,id2 number,name varchar2(10));

Table created.

SQL> create index idx_t_id1_id2 on t(id1,id2);

Index created.

SQL> insert into t values(1,1,'oracle');

1 row created.

SQL> insert into t values(2,2,'on');

1 row created.

SQL> insert into t values(3,2,'linux');

1 row created.

SQL> commit;

Commit complete.

SQL> select * from t;

ID1        ID2 NAME

---------- ---------- ----------

1          1 oracle

2          2 on

3          2 linux

然后，查看执行计划1：

SQL> set autot trace exp

SQL> select * from t where id1=1;

Execution Plan

----------------------------------------------------------

Plan hash value: 1660670018

---------------------------------------------------------------------------------------------

| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |               |     1 |    33 |     1   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| T             |     1 |    33 |     1   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN          | IDX_T_ID1_ID2 |     1 |       |     1   (0)| 00:00:01 |

---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("ID1"=1)

Note

-----

- dynamic sampling used for this statement (level=2)

SQL>

发现该语句走索引。

查看执行计划2：

SQL> select * from t where id2=1;

Execution Plan

----------------------------------------------------------

Plan hash value: 1601196873

--------------------------------------------------------------------------

| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |      |     1 |    33 |     3   (0)| 00:00:01 |

|*  1 |  TABLE ACCESS FULL| T    |     1 |    33 |     3   (0)| 00:00:01 |

--------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter("ID2"=1)

Note

-----

- dynamic sampling used for this statement (level=2)

SQL>

SQL> select * from t where id2=2;

Execution Plan

----------------------------------------------------------

Plan hash value: 1601196873

--------------------------------------------------------------------------

| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |      |     2 |    66 |     3   (0)| 00:00:01 |

|*  1 |  TABLE ACCESS FULL| T    |     2 |    66 |     3   (0)| 00:00:01 |

--------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter("ID2"=2)

Note

-----

- dynamic sampling used for this statement (level=2)

SQL>

发现该语句走的全表扫描，而没走索引，id2字段上创建了联合索引，为什么不走索引呢？

查看执行计划3：

SQL> select * from t where id1=3 and id2=2;

Execution Plan

----------------------------------------------------------

Plan hash value: 1660670018

---------------------------------------------------------------------------------------------

| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |               |     1 |    33 |     1   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| T             |     1 |    33 |     1   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN          | IDX_T_ID1_ID2 |     1 |       |     1   (0)| 00:00:01 |

---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("ID1"=3 AND "ID2"=2)

Note

-----

- dynamic sampling used for this statement (level=2)

SQL> select * from t where id2=2 and id1=3;

Execution Plan

----------------------------------------------------------

Plan hash value: 1660670018

---------------------------------------------------------------------------------------------

| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |               |     1 |    33 |     1   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| T             |     1 |    33 |     1   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN          | IDX_T_ID1_ID2 |     1 |       |     1   (0)| 00:00:01 |

---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("ID1"=3 AND "ID2"=2)

Note

-----

- dynamic sampling used for this statement (level=2)

SQL>

Where从句中包含联合索引的所有字段时，优化器都会选择走索引，不论是将id1放前还是id2放前。

查看执行计划4：

SQL> select * from t where id1=2 and name='on';

Execution Plan

----------------------------------------------------------

Plan hash value: 1660670018

---------------------------------------------------------------------------------------------

| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |               |     1 |    33 |     1   (0)| 00:00:01 |

|*  1 |  TABLE ACCESS BY INDEX ROWID| T             |     1 |    33 |     1   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN          | IDX_T_ID1_ID2 |     1 |       |     1   (0)| 00:00:01 |

---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter("NAME"='on')

2 - access("ID1"=2)

Note

-----

- dynamic sampling used for this statement (level=2)

SQL> select * from t where id2=2 and name='on';

Execution Plan

----------------------------------------------------------

Plan hash value: 1601196873

--------------------------------------------------------------------------

| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |      |     1 |    33 |     3   (0)| 00:00:01 |

|*  1 |  TABLE ACCESS FULL| T    |     1 |    33 |     3   (0)| 00:00:01 |

--------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter("ID2"=2 AND "NAME"='on')

Note

-----

- dynamic sampling used for this statement (level=2)

SQL>

看到，第一条SQL语句（id1=2 and name=’on’）走了索引，而第二条SQL语句（id2=2 and name=’on’）走的却是全表扫描，似乎更奇怪？

最后，我们可以得出以下结论：

1 对于联合索引，在Where限制条件中出现所有被索引的列时，优化器会选择走索引（上述执行计划3）；

2 对于联合索引，在Where限制条件中出现联合索引中前导的列，即创建联合索引时前导的列时，优化器会选择走索引（执行计划1,2,4中，凡是有id1作为限制条件时都会走索引，而将id2作为限制条件时均不走索引，而走全表扫描）；

3 对于创建联合索引时，应考虑被索引字段的优先顺序，应将经常作为限制条件的字段放在首位；重复值少，即高基数(high-cardinaltiy)的列往前靠，而重复值多的字段往后靠；

4 对于同一张表，可以创建多个不同的联合索引，前提是，被索引的字段的个数、顺序不能一样,否则报错，ORA-01408。如：

SQL> create index idx1_t_id1_id2 on t(id1,id2);

create index idx1_t_id1_id2 on t(id1,id2)

*

ERROR at line 1:

ORA-01408: such column list already indexed

SQL> create index idx1_t_id1_id2 on t(id2,id1);

Index created.

SQL>

5 联合索引及前导列的解释：

A composite index, also called a concatenated index,is an index on multiple columns in a table.
A leading portion of an index is a set of one or more columns that were specified first and consecutively in the list of columns in the CREATE INDEX statement that created the index.

Consider this CREATE INDEX statement:

CREATE INDEX comp_ind ON tab1(x, y, z);

These combinations of columns are leading portions of the index: x, xy, and xyz.

These combinations of columns are not leading portions of the index: yz, y, and z.

A SQL statement can use an access path involving a composite index if the statement contains constructs that use a leading portion of the index.

zhouyf| 2011年11月2日

个人对实验有不同的理解。
1、表的数据量太小，无法反应真实的情况
2、对表没有进行分析，可能会是RBO的优化器
3、过索扫描还有一个skip index的功能，可能可以解决没有前导例没有出现在谓语中的情况。

Asher| 2011年11月3日

To zhouyf
首先，感谢你阅读此文并发表回复；
其次，经过再次测试后，得出的结论跟我之前的结论是一样的：
1 建立测试表，模拟近72000条记录：
SQL> create table t(id1,id2,type ) as select OBJECT_ID,DATA_OBJECT_ID, OBJECT_NAME from dba_objects;
Table created.
SQL> select count(*) from t;
COUNT(*)
———-
72904
SQL> create index idx_t_id1_id2 on t(id1,id2);
Index created.
SQL> set line 160
SQL> set autotrace trace exp;
SQL> select * from t where id1=5;
Execution Plan
———————————————————-
Plan hash value: 1660670018
———————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
———————————————————————————————
| 0 | SELECT STATEMENT | | 1 | 92 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 92 | 3 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_T_ID1_ID2 | 1 | | 2 (0)| 00:00:01 |
———————————————————————————————
Predicate Information (identified by operation id):
—————————————————
2 – access(“ID1″=5)
Note
—–
– dynamic sampling used for this statement (level=2)
SQL>
2 上述id1作谓语时走索引，而下述id2作谓语时不走索引：
SQL> select * from t where id2=5;
Execution Plan
———————————————————-
Plan hash value: 1601196873
————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
————————————————————————–
| 0 | SELECT STATEMENT | | 4 | 368 | 100 (1)| 00:00:02 |
|* 1 | TABLE ACCESS FULL| T | 4 | 368 | 100 (1)| 00:00:02 |
————————————————————————–
Predicate Information (identified by operation id):
—————————————————
1 – filter(“ID2″=5)
Note
—–
– dynamic sampling used for this statement (level=2)
3 再看下述，id1,id2均出现在谓语中，都走了索引：
SQL> select * from t where id1=5 and id2=6;
Execution Plan
———————————————————-
Plan hash value: 1660670018
———————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
———————————————————————————————
| 0 | SELECT STATEMENT | | 1 | 92 | 1 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 92 | 1 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_T_ID1_ID2 | 1 | | 1 (0)| 00:00:01 |
———————————————————————————————
Predicate Information (identified by operation id):
—————————————————
2 – access(“ID1″=5 AND “ID2″=6)
Note
—–
– dynamic sampling used for this statement (level=2)
SQL> select * from t where id2=6 and id1=5;
Execution Plan
———————————————————-
Plan hash value: 1660670018
———————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
———————————————————————————————
| 0 | SELECT STATEMENT | | 1 | 92 | 1 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 92 | 1 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_T_ID1_ID2 | 1 | | 1 (0)| 00:00:01 |
———————————————————————————————
Predicate Information (identified by operation id):
—————————————————
2 – access(“ID1″=5 AND “ID2″=6)
Note
—–
– dynamic sampling used for this statement (level=2)
SQL>
4 id1作谓语时走索引，id2作谓语时不走索引：
SQL> select * from t where id1=5 and type=’TABLE’;
Execution Plan
———————————————————-
Plan hash value: 1660670018
———————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
———————————————————————————————
| 0 | SELECT STATEMENT | | 8 | 736 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID| T | 8 | 736 | 3 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_T_ID1_ID2 | 4 | | 2 (0)| 00:00:01 |
———————————————————————————————
Predicate Information (identified by operation id):
—————————————————
1 – filter(“TYPE”=’TABLE’)
2 – access(“ID1″=5)
Note
—–
– dynamic sampling used for this statement (level=2)
SQL> select * from t where id2=5 and type=’TABLE’;
Execution Plan
———————————————————-
Plan hash value: 1601196873
————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
————————————————————————–
| 0 | SELECT STATEMENT | | 8 | 736 | 100 (1)| 00:00:02 |
|* 1 | TABLE ACCESS FULL| T | 8 | 736 | 100 (1)| 00:00:02 |
————————————————————————–
Predicate Information (identified by operation id):
—————————————————
1 – filter(“ID2″=5 AND “TYPE”=’TABLE’)
Note
—–
– dynamic sampling used for this statement (level=2)
SQL>
然后，当前数据库的优化器是CBO模式，在CBO模式下，即使表未作分析，CBO优化器也会动态采样数据，以帮助CBO做出正确的执行计划，上述的就是做了2级的动态采样。
还有，你的第3个理解可以说的仔细点儿吗？
最后，多谢你的不同观点，欢迎继续探讨。

zhouyf| 2011年11月4日

对于第3点：
从9i开始索引的扫描方式有一种是 index skip scan。
经过我的测试　
1.如果前导列是高度重复的话，那么就可能会使用index skip scan.
2.如果前导列具有大量唯一性，可能就无法自动使用index skip scan，虽然index skip scan的逻辑读会更少.

一起学习

Asher| 2011年11月7日

To zhouyf:
周兄所言index skip scan确实如此，后续将会补充，多谢指出！

wxjzqymtl| 2011年12月2日

我的理解是创建合适的复合索引最重要的两点就是前导列的选择和索引列的顺序。前导列应该选择sql语句中经常出现在where条件中的列；而索引的顺序选择正如博主所说的选择性高重复值少的列应该尽量靠前。
对于index skip scan的却是存在前导列的选择性很低的时候有可能被使用，不过出现这种情况不正是因为我们的索引列顺序选择不当吗
在cbo的情况下，最终走索引还是走全表扫描还是取决于二者的开销，如果全表扫描的开销小于走复合索引的话那么执行计划还是会选择全表扫描的（在统计信息收集正确的前提下）