零背景说明

这是生产环境下，一则PostgreSQL数据库SQL优化的案例小结：单条SQL执行时间，从优化前2秒到优化后，降低到2毫秒，性能提升1000倍。数据库版本为PostgreSQL 9.3。原始SQL语句如下：

SELECT t.id,
       t.code,
       t.name,
       t.level,
       t.IS_CLOSED isClosed,
       t.PARENT_ID parentId,
       t.dir,
       t.remark
FROM t_report_type t
WHERE 1=1
  AND t.code IN
    (SELECT report_type
     FROM t_report_temp
     WHERE corp_id IS NULL
       AND partner_id IS NULL)
ORDER BY t.code

优化前执行计划如下：

EXPLAIN ANALYZE
SELECT t.id,
       t.code,
       t.name,
       t.level,
       t.IS_CLOSED isClosed,
       t.PARENT_ID parentId,
       t.dir,
       t.remark
FROM t_report_type t
WHERE 1=1
  AND t.code IN
    (SELECT report_type
     FROM t_report_temp
     WHERE corp_id IS NULL
       AND partner_id IS NULL)
ORDER BY t.code

┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                       QUERY PLAN                                                                       │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Sort  (cost=3179.42..3179.93 rows=204 width=165) (actual time=2179.224..2179.250 rows=383 loops=1)                                                     │
│   Sort Key: t.code                                                                                                                                     │
│   Sort Method: quicksort  Memory: 126kB                                                                                                                │
│   ->  Nested Loop Semi Join  (cost=0.55..3171.59 rows=204 width=165) (actual time=26.092..2178.526 rows=383 loops=1)                                   │
│         ->  Seq Scan on t_report_type t  (cost=0.00..16.59 rows=459 width=165) (actual time=0.002..0.187 rows=459 loops=1)                             │
│         ->  Index Only Scan using t_report_temp_01 on t_report_temp  (cost=0.55..1873.80 rows=428 width=7) (actual time=4.744..4.744 rows=1 loops=459) │
│               Index Cond: ((corp_id IS NULL) AND (partner_id IS NULL) AND (report_type = (t.code)::text))                                              │
│               Heap Fetches: 53                                                                                                                         │
│ Total runtime: 2179.301 ms                                                                                                                             │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)

一分析问题

结合执行来分析SQL语句，查看表结构，数据量分布：

zyd=> \d t_report_type
                 Table "zyd.t_report_type"
┌───────────┬────────────────────────┬─────────────────────┐
│  Column   │          Type          │      Modifiers      │
├───────────┼────────────────────────┼─────────────────────┤
│ id        │ character(32)          │ not null            │
│ code      │ character varying(32)  │ not null            │
│ name      │ character varying(256) │ not null            │
│ parent_id │ character varying(32)  │                     │
│ dir       │ character varying(256) │                     │
│ level     │ smallint               │                     │
│ is_closed │ character(1)           │ default '0'::bpchar │
│ remark    │ character varying(200) │                     │
└───────────┴────────────────────────┴─────────────────────┘
Indexes:
    "t_report_type_pkey" PRIMARY KEY, btree (id)
    "t_report_type_01" btree (code, parent_id, dir)

zyd=> select count(*) from t_report_type;
┌───────┐
│ count │
├───────┤
│   459 │
└───────┘
(1 row)

zyd=> select count(*) from t_report_temp;
┌────────┐
│ count  │
├────────┤
│ 366132 │
└────────┘
(1 row)

zyd=> \d t_report_temp
                 Table "zyd.t_report_temp"
┌─────────────┬────────────────────────────────┬───────────┐
│   Column    │              Type              │ Modifiers │
├─────────────┼────────────────────────────────┼───────────┤
│ id          │ character(32)                  │ not null  │
│ report_code │ character varying(40)          │ not null  │
│ report_name │ character varying(100)         │ not null  │
│ report_type │ character varying(32)          │ not null  │
│ corp_id     │ character(32)                  │           │
│ partner_id  │ character(32)                  │           │
│ industry_id │ character(32)                  │           │
│ is_default  │ character(1)                   │           │
│ remark      │ character varying(200)         │           │
│ source_type │ character varying(255)         │           │
│ is_show     │ character(1)                   │ default 0 │
│ create_ts   │ timestamp(6) without time zone │           │
│ update_ts   │ timestamp(6) without time zone │           │
│ is_simple   │ character(1)                   │           │
└─────────────┴────────────────────────────────┴───────────┘
Indexes:
    "t_report_temp_pkey" PRIMARY KEY, btree (id)
    "t_report_temp_01" btree (corp_id, industry_id, partner_id, report_code, report_type, report_name)
    "t_report_temp_02" btree (industry_id, partner_id, report_type)
    "t_report_temp_03" btree (industry_id, report_type)
    "t_report_temp_04" btree (report_name, corp_id, report_code)
    "t_report_temp_index_1" btree (corp_id, report_code)

zyd=> select count(*) from t_report_temp where corp_id is null and partner_id is null;
┌───────┐
│ count │
├───────┤
│  1126 │
└───────┘
(1 row)

zyd=>

看到，这是一个简单的2张表做join连接的SQL查询语句，优化器选择了嵌套循环nested loop的关联方式，其中一张表t_report_type为小表，数据量为459，优化器选择全表扫描的方式获取数据，另外一张表t_report_temp数据量相对多一些，366132条数据，但是满足条件的数据却只有1126条，同时选择了index only scan的方式，单次执行耗时4.744毫秒。

这是从执行计划看到的基本信息，能优化的突破口基本也就是对于表的访问能不能更快？虽然这里选择的是index only scan的方式。

三解决问题

先尝试重新收集表的统计信息，再查看其执行计划是否有好转？

zyd=> analyze verbose t_report_temp;
INFO:  analyzing "zyd.t_report_temp"
INFO:  "t_report_temp": scanned 9494 of 9494 pages, containing 366132 live rows and 1345 dead rows; 30000 rows in sample, 366132 estimated total rows
ANALYZE
zyd=> analyze verbose t_report_type;
INFO:  analyzing "zyd.t_report_type"
INFO:  "t_report_type": scanned 12 of 12 pages, containing 459 live rows and 0 dead rows; 459 rows in sample, 459 estimated total rows
ANALYZE
zyd=>
...查看执行计划
...
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                       QUERY PLAN                                                                       │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Sort  (cost=3179.42..3179.93 rows=204 width=165) (actual time=1973.808..1973.836 rows=383 loops=1)                                                     │
│   Sort Key: t.code                                                                                                                                     │
│   Sort Method: quicksort  Memory: 126kB                                                                                                                │
│   ->  Nested Loop Semi Join  (cost=0.55..3171.59 rows=204 width=165) (actual time=29.583..1973.127 rows=383 loops=1)                                   │
│         ->  Seq Scan on t_report_type t  (cost=0.00..16.59 rows=459 width=165) (actual time=0.002..0.175 rows=459 loops=1)                             │
│         ->  Index Only Scan using t_report_temp_01 on t_report_temp  (cost=0.55..1873.80 rows=428 width=7) (actual time=4.297..4.297 rows=1 loops=459) │
│               Index Cond: ((corp_id IS NULL) AND (partner_id IS NULL) AND (report_type = (t.code)::text))                                              │
│               Heap Fetches: 53                                                                                                                         │
│ Total runtime: 1973.885 ms                                                                                                                             │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)

zyd=>

发现，并没有好转。观察到其执行计划中，对于相对大的那个表t_report_temp，优化器选择了通过t_report_temp_01这个index来做的index only scan，每次耗时4毫秒左右，但是loops=459，累积起来就1800多毫秒了。

同时，该indext_report_temp_01是一个联合index，基于(corp_id, industry_id, partner_id, report_code, report_type, report_name) 6个字段创建的联合index，观察表和该index的大小：

zyd=> \dt+ t_report_temp
                        List of relations
┌────────┬───────────────┬───────┬───────┬───────┬──────────────┐
│ Schema │     Name      │ Type  │ Owner │ Size  │ Description  │
├────────┼───────────────┼───────┼───────┼───────┼──────────────┤
│ zyd    │ t_report_temp │ table │ zyd   │ 74 MB │ 报表模板主表 │
└────────┴───────────────┴───────┴───────┴───────┴──────────────┘
(1 row)

zyd=> \di+ t_report_temp_01
                                 List of relations
┌────────┬──────────────────┬───────┬───────┬───────────────┬───────┬─────────────┐
│ Schema │       Name       │ Type  │ Owner │     Table     │ Size  │ Description │
├────────┼──────────────────┼───────┼───────┼───────────────┼───────┼─────────────┤
│ zyd    │ t_report_temp_01 │ index │ zyd   │ t_report_temp │ 80 MB │             │
└────────┴──────────────────┴───────┴───────┴───────────────┴───────┴─────────────┘
(1 row)

zyd=>

发现，这个index 80MB，比表本身74MB还大。是不是推断出，此种情况下，优化器选择通过该index来获取数据不够好呢，或者说还不如直接扫描这个表呢？

开始验证：

zyd=> set enable_indexscan =off;
SET
zyd=> set enable_indexonlyscan =off;
SET
zyd=> set enable_bitmapscan =off;
SET
zyd=> explain analyze..
..
..
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                             QUERY PLAN                                                              │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Merge Join  (cost=13420.55..13423.58 rows=204 width=165) (actual time=69.095..69.417 rows=383 loops=1)                              │
│   Merge Cond: ((t.code)::text = (t_report_temp.report_type)::text)                                                                  │
│   ->  Sort  (cost=36.88..38.03 rows=459 width=165) (actual time=0.513..0.569 rows=459 loops=1)                                      │
│         Sort Key: t.code                                                                                                            │
│         Sort Method: quicksort  Memory: 145kB                                                                                       │
│         ->  Seq Scan on t_report_type t  (cost=0.00..16.59 rows=459 width=165) (actual time=0.002..0.121 rows=459 loops=1)          │
│   ->  Sort  (cost=13383.67..13383.79 rows=49 width=7) (actual time=68.576..68.600 rows=383 loops=1)                                 │
│         Sort Key: t_report_temp.report_type                                                                                         │
│         Sort Method: quicksort  Memory: 42kB                                                                                        │
│         ->  HashAggregate  (cost=13381.80..13382.29 rows=49 width=7) (actual time=68.253..68.309 rows=383 loops=1)                  │
│               ->  Seq Scan on t_report_temp  (cost=0.00..13163.63 rows=87269 width=7) (actual time=0.005..67.787 rows=1126 loops=1) │
│                     Filter: ((corp_id IS NULL) AND (partner_id IS NULL))                                                            │
│                     Rows Removed by Filter: 365114                                                                                  │
│ Total runtime: 69.480 ms                                                                                                            │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(14 rows)

zyd=>

此时，可以看到，当我们禁用index only scan，index scan，bitmap index scan之后，使优化器对于表t_report_temp选择全表扫描之后，整个SQL的执行效率反而更好。
于是，尝试在(report_type,corp_id,partner_id)创建联合index，观察一下SQL效率：

zyd=> create index CONCURRENTLY idx_3 on t_report_temp(report_type,corp_id,partner_id);
CREATE INDEX
zyd=> explain analyze ....
..
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                QUERY PLAN                                                                 │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Sort  (cost=267.54..268.05 rows=204 width=165) (actual time=2.525..2.554 rows=383 loops=1)                                                │
│   Sort Key: t.code                                                                                                                        │
│   Sort Method: quicksort  Memory: 126kB                                                                                                   │
│   ->  Nested Loop Semi Join  (cost=0.42..259.72 rows=204 width=165) (actual time=0.041..2.083 rows=383 loops=1)                           │
│         ->  Seq Scan on t_report_type t  (cost=0.00..16.59 rows=459 width=165) (actual time=0.002..0.053 rows=459 loops=1)                │
│         ->  Index Only Scan using idx_3 on t_report_temp  (cost=0.42..29.58 rows=428 width=7) (actual time=0.004..0.004 rows=1 loops=459) │
│               Index Cond: ((report_type = (t.code)::text) AND (corp_id IS NULL) AND (partner_id IS NULL))                                 │
│               Heap Fetches: 27                                                                                                            │
│ Total runtime: 2.600 ms                                                                                                                   │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)

zyd=>

发现，此时SQL的执行耗时从之前的2秒降低到2.6毫秒。

四小结

通过深入分析SQL执行计划、结合表结构、数据量及数据分布，是做好SQL优化的基本出发点。需要更为深入学习和分析SQL执行计划，能看懂SQL执行计划，对SQL执行计划有感觉，多看多读执行计划，多思考。

PostgreSQL官方文档中，关于执行计划描述，有这么一句话：

Plan-reading is an art that requires some experience to master, but this section attempts to cover the basics.

PostgreSQL数据库SQL优化案例：从2秒到2毫秒

PostgreSQL数据库SQL优化案例：从2秒到2毫秒

零背景说明

一分析问题

三解决问题

四小结

发表评论
取消回复

发表评论

PostgreSQL数据库SQL优化案例：从2秒到2毫秒

零 背景说明

一 分析问题

三 解决问题

四 小结

发表评论 取消回复

发表评论

零背景说明

一分析问题

三解决问题

四小结

发表评论
取消回复