索引扫描路径之1：Index Unique Scan

近期在读一本名为《Expert Indexing in Oracle Database 11g》的书，个人觉得写的很不错，收获颇多。在这里就把一些关于CBO 通过INDEX 扫描数据的相关知识稍作整理，算是读书笔记。

其中，在读到第7章，Tuning Index Usage时，提到常见的通过索引访问数据的方式有下述几种：

Index Unique Scan;

Index Range Scan;

Index Skip Scan;

Index Full Scan;

Index Fast Full Scan;

这里，首先说Index Unique Scan。

CBO【Cost Based Optimizer】通常在什么场景下会选择通过Index Unique Scan的方式来访问数据呢？

1 必须是通过唯一索引【UNIQUE】来访问数据；

2 通过唯一索引来访问数据时，每次返回的记录数必须是1条；

3 WHERE从句中必须要用等值【=】条件来过滤数据；

我个人目前的理解是，必须同时满足上述3个条件，CBO才会选择通过Index Unique Scan的路径来访问数据。

下面给出验证：

1 操作系统版本和数据库版本：
操作系统：

[oracle@oracle11g ~]$ uname -rm
2.6.18-194.el5 x86_64
[oracle@oracle11g ~]$

数据库：

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

SQL>

2 测试表、表结构、表数据如下：

SQL> conn hr/hr
Connected.
SQL> desc departments;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 DEPARTMENT_ID                             NOT NULL NUMBER(4)
 DEPARTMENT_NAME                           NOT NULL VARCHAR2(30)
 MANAGER_ID                                         NUMBER(6)
 LOCATION_ID                                        NUMBER(4)

SQL> select * from departments;

DEPARTMENT_ID DEPARTMENT_NAME                MANAGER_ID LOCATION_ID
------------- ------------------------------ ---------- -----------
           10 Administration                        200        1700
           20 Marketing                             201        1800
           30 Purchasing                            114        1700
           40 Human Resources                       203        2400
           50 Shipping                              121        1500
           60 IT                                    103        1400
           70 Public Relations                      204        2700
           80 Sales                                 145        2500
           90 Executive                             100        1700
          100 Finance                               108        1700
          110 Accounting                            205        1700

DEPARTMENT_ID DEPARTMENT_NAME                MANAGER_ID LOCATION_ID
------------- ------------------------------ ---------- -----------
          120 Treasury                                         1700
          130 Corporate Tax                                    1700
          140 Control And Credit                               1700
          150 Shareholder Services                             1700
          160 Benefits                                         1700
          170 Manufacturing                                    1700
          180 Construction                                     1700
          190 Contracting                                      1700
          200 Operations                                       1700
          210 IT Support                                       1700
          220 NOC                                              1700

DEPARTMENT_ID DEPARTMENT_NAME                MANAGER_ID LOCATION_ID
------------- ------------------------------ ---------- -----------
          230 IT Helpdesk                                      1700
          240 Government Sales                                 1700
          250 Retail Sales                                     1700
          260 Recruiting                                       1700
          270 Payroll                                          1700

27 rows selected.

SQL>

3 该表上的索引信息如下：

SQL> select index_name,index_type,table_name,uniqueness from user_indexes where table_name='DEPARTMENTS';

INDEX_NAME                     INDEX_TYPE                  TABLE_NAME                     UNIQUENES
------------------------------ --------------------------- ------------------------------ ---------
DEPT_ID_PK                     NORMAL                      DEPARTMENTS                    UNIQUE
DEPT_LOCATION_IX               NORMAL                      DEPARTMENTS                    NONUNIQUE

SQL> select * from user_ind_columns where table_name='DEPARTMENTS';

INDEX_NAME                     TABLE_NAME                     COLUMN_NAME     COLUMN_POSITION COLUMN_LENGTH CHAR_LENGTH DESC
------------------------------ ------------------------------ --------------- --------------- ------------- ----------- ----
DEPT_ID_PK                     DEPARTMENTS                    DEPARTMENT_ID                 1            22           0 ASC
DEPT_LOCATION_IX               DEPARTMENTS                    LOCATION_ID                   1            22           0 ASC

SQL>

从上看到departments表有2个索引，其中在DEPARTMENT_ID字段上的索引DEPT_ID_PK 为主键索引，是个UNIQUE类型的索引。而LOCATION_ID 字段上的DEPT_LOCATION_IX则为非唯一【NONUNIQUE】索引。

4 通过该表上的主键字段department_id来定位数据：

SQL> set autotrace trace
SQL> select department_id from departments where department_id=270;

Execution Plan
----------------------------------------------------------
Plan hash value: 1856623209

--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |     1 |     4 |     0   (0)| 00:00:01 |
|*  1 |  INDEX UNIQUE SCAN| DEPT_ID_PK |     1 |     4 |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("DEPARTMENT_ID"=270)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          1  consistent gets
          0  physical reads
          0  redo size
        532  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

从上述执行计划里看到，CBO选择了Index Unique Scan，因为department_id字段上有一个DEPT_ID_PK 的UNIQUE类型的索引，且上述SQL满足Index Unique Scan的条件。

而如果WHERE条件换成department_id>=270时，则CBO选择了Index Range Scan的访问路径，因为该WHERE条件是>=而非等值匹配。CBO认为返回的记录数可能不止1条，即使实际情况下满足条件的记录数确实是1条，所以选择了Index Range Scan。

SQL> select department_id from departments where department_id>=270;

Execution Plan
----------------------------------------------------------
Plan hash value: 4271874676

-------------------------------------------------------------------------------
| Id  | Operation        | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |            |     1 |     4 |     1   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| DEPT_ID_PK |     1 |     4 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("DEPARTMENT_ID">=270)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          1  consistent gets
          0  physical reads
          0  redo size
        532  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

5 通过该表上的location_id字段来定位数据的场景：

SQL> select location_id from departments where location_id=2700;

Execution Plan
----------------------------------------------------------
Plan hash value: 2790119751

-------------------------------------------------------------------------------------
| Id  | Operation        | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                  |     1 |     3 |     1   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| DEPT_LOCATION_IX |     1 |     3 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("LOCATION_ID"=2700)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          1  consistent gets
          0  physical reads
          0  redo size
        529  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> select location_id from departments where location_id=1700;

21 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2790119751

-------------------------------------------------------------------------------------
| Id  | Operation        | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                  |    21 |    63 |     1   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| DEPT_LOCATION_IX |    21 |    63 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("LOCATION_ID"=1700)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        901  bytes sent via SQL*Net to client
        535  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         21  rows processed

SQL>

从上述执行计划看到，通过LOCATION_ID 字段上的DEPT_LOCATION_IX来定位数据时，不管是通过location_id=2700还是location_id=1400或者location_id=1700，结果都是走INDEX RANGE SCAN。因为DEPT_LOCATION_IX不是唯一索引，即使location_id=2700或者location_id=1400条件能确保返回的记录数只有1条，CBO也不会选择Index Unique Scan。

6 通过唯一索引、非唯一索引来定位数据的场景：

非唯一索引的情况：

SQL> create table all_t as select object_id,object_name from all_objects;

Table created.

SQL> desc all_t;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OBJECT_ID                                 NOT NULL NUMBER
 OBJECT_NAME                               NOT NULL VARCHAR2(30)

SQL> select table_name,index_name,index_type,uniqueness from user_indexes where table_name='ALL_T';

no rows selected

SQL> create index idx_all_t on all_t(object_id);

Index created.

SQL> select table_name,index_name,index_type,uniqueness from user_indexes where table_name='ALL_T';

TABLE_NAME                     INDEX_NAME                     INDEX_TYPE                  UNIQUENES
------------------------------ ------------------------------ --------------------------- ---------
ALL_T                          IDX_ALL_T                      NORMAL                      NONUNIQUE

SQL> exec dbms_stats.gather_table_stats(user,'all_t',cascade=>true);

PL/SQL procedure successfully completed.

SQL> set autotrace trace
SQL> select * from all_t where object_id=30000;

Execution Plan
----------------------------------------------------------
Plan hash value: 2459308467

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |     1 |    31 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ALL_T     |     1 |    31 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX_ALL_T |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=30000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        627  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

唯一索引的情况：

SQL> drop index idx_all_t ;

Index dropped.

SQL> create unique index idx_all_t on all_t(object_id);

Index created.

SQL> select table_name,index_name,index_type,uniqueness from user_indexes where table_name='ALL_T';

TABLE_NAME                     INDEX_NAME                     INDEX_TYPE                  UNIQUENES
------------------------------ ------------------------------ --------------------------- ---------
ALL_T                          IDX_ALL_T                      NORMAL                      UNIQUE

SQL> set autotrace trace
SQL> exec dbms_stats.gather_table_stats(user,'all_t',cascade=>true);

PL/SQL procedure successfully completed.

SQL> select * from all_t where object_id=30000;

Execution Plan
----------------------------------------------------------
Plan hash value: 3932529197

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |     1 |    31 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ALL_T     |     1 |    31 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | IDX_ALL_T |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=30000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        491  bytes sent via SQL*Net to client
        512  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

从上可以看到，虽然测试表all_t上的字段object_id字段虽然的确可以确保唯一，但是在第一种情况下，索引IDX_ALL_T为非唯一索引的时候，CBO毅然选择了INDEX RANGE SCAN，而第二种情况下，却是Index Unique Scan。

也就是说，即使可以确保表中某个字段是唯一的，如果该字段上的索引不是唯一的，那么CBO选择的将是通过INDEX RANGE SCAN的路径来访问数据。想要表达的是，真实场景下这种情况的确会存在，因为有些系统是通过应用程序来保证表中的字段唯一，而并没有在表上对应的字段上通过创建唯一约束或者是唯一索引来保证数据的唯一性。那么，这样的话，将会导致CBO在选择执行计划的情况下，有可能会出现偏差。所以，对于这种情况下，即使应用程序可以保证数据的唯一性，最好还是在表上创建一个唯一索引比较稳妥。

最后，留一个小疑问给所有的看到这篇文章的朋友，对于第6种情况，是不是第二种场景下的Index Unique Scan的效率要高于第一种场景下的INDEX RANGE SCAN？因为，从执行计划来看的话，一致性读的次数分别为3和4。