零 背景与说明
数据库的性能会影响整个业务系统的性能,而数据库的性能问题,90%以上的几率都是由性能低下的SQL导致的。
一个即将上线的新系统,开发同事发过来2个SQL优化的请求。
PostgreSQL数据库SQL性能优化案例二则
一 SQL一优化分析处理思路
1 格式化原始SQL:
SELECT COUNT(b.code) AS totalCount, b.status AS status
FROM t_ai_prd_bill b
WHERE b.status != '900' AND b.is_deleted = FALSE
GROUP BY b.status
2 原始SQL执行计划
ai=> explain analyze SELECT COUNT(b.code) AS totalCount, b.status AS status
ai-> FROM t_ai_prd_bill b
ai-> WHERE b.status != '900' AND b.is_deleted = FALSE
ai-> GROUP BY b.status;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=3870.50..3870.51 rows=1 width=17) (actual time=12.660..12.661 rows=5 loops=1)
Group Key: status
-> Seq Scan on t_ai_prd_bill b (cost=0.00..3839.75 rows=6150 width=17) (actual time=0.013..11.139 rows=6153 loops=1)
Filter: ((NOT is_deleted) AND (status <> '900'::bpchar))
Rows Removed by Filter: 32227
Planning time: 0.165 ms
Execution time: 12.719 ms
(7 rows)
ai=>
3 获取t_ai_prd_bill表结构,数据量和数据分布
ai=> \d t_ai_prd_bill
Table "ai.t_ai_prd_bill"
Column | Type | Modifiers
--------------------+--------------------------------+---------------------
id | character varying(32) | not null
code | character varying(12) |
packet_code | character varying(10) |
status | character(3) |
is_deleted | boolean | default false
classify_type | character varying(1) |
last_oper_time | timestamp(6) without time zone |
template_no | character varying(32) |
oss_key | character varying(255) |
receive_time | timestamp(6) without time zone |
is_question | boolean | default false
ocr_type | character(1) |
cust_id | character varying(32) |
exception_category | character(1) | default '0'::bpchar
exception_detail | character varying(400) |
is_suspended | boolean |
is_urgent | boolean |
raw_deadline | timestamp(6) without time zone |
priority | smallint |
deadline | timestamp(6) without time zone |
company_id | character varying(32) |
company_name | character varying(255) |
cust_category | character varying(32) |
source | character varying(32) |
mode_code | character varying(32) |
deadline_category | character varying(3) |
location_id | character varying(32) |
cust_name | character varying(128) |
template_category | character varying(32) |
platform_id | character varying(32) |
country_name | character varying(64) |
is_sended | boolean | default false
Indexes:
"t_ai_prd_bill_pkey" PRIMARY KEY, btree (id)
"idx_bill_code" btree (code)
"idx_bill_create_time_status_deleted" btree (receive_time, status, is_deleted)
"idx_bill_cust_id" btree (cust_id)
"idx_bill_packet_code" btree (packet_code)
"idx_bill_status" btree (status)
ai=> select status,count(*) from t_ai_prd_bill group by status;
status | count
--------+-------
400 | 2511
401 | 183
500 | 3174
600 | 1
701 | 284
900 | 32227
(6 rows)
ai=>
4 改写等价SQL
结合上述的t_ai_prd_bill表结构、index信息、以及数据量来分析其执行计划。
t_ai_prd_bill表中的status字段,表示业务状态。status=900表示已完成业务操作的票据,status=901表示理票失败的票据,且随着业务正常流转,该表中status=900的记录会越来越多,占据数据量的绝大多数,同时status=901的记录应该非常少。也就是说,这条SQL原本是想查询在业务上除理票已完成的所有其它状态票据信息,即(status != 900)。
这样的话,如果从业务端着手,考虑将status=901表示理票失败的票据,如改为status=899表示理票失败。则,SQL可以等价改造为:
SELECT COUNT(b.code) AS totalCount, b.status AS status
FROM t_ai_prd_bill b
WHERE b.status < '900' AND b.is_deleted = FALSE
GROUP BY b.status
5 新SQL执行计划
ai=> explain analyze SELECT COUNT(b.code) AS totalCount, b.status AS status
ai-> FROM t_ai_prd_bill b
ai-> WHERE b.status < '900' AND b.is_deleted = FALSE ai-> GROUP BY b.status;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=3782.19..3782.20 rows=1 width=17) (actual time=5.423..5.424 rows=5 loops=1)
Group Key: status
-> Bitmap Heap Scan on t_ai_prd_bill b (cost=247.95..3751.44 rows=6150 width=17) (actual time=0.996..3.811 rows=6153 loops=1)
Recheck Cond: (status < '900'::bpchar) Filter: (NOT is_deleted) Heap Blocks: exact=1213 -> Bitmap Index Scan on idx_bill_status (cost=0.00..246.41 rows=6150 width=0) (actual time=0.825..0.825 rows=6153 loops=1)
Index Cond: (status < '900'::bpchar)
Planning time: 0.156 ms
Execution time: 5.480 ms
(10 rows)
ai=>
6 小结
通过改写SQL前后的执行计划对比,执行时间从12ms降低到5ms。且,随着业务的推进,t_ai_prd_bill表中数据量越来越多,优化带来的效果也会越来越好。
这里,由于业务上将票据状态定义为’100,200,300,400,401,500,600,402,700,701,800,900,901’,每个代码表示每个不同的业务中间状态。该业务需求是想查询统计业务状态不成功的所有票据,程序员自然想到的是通过status !=’900’来过滤数据。却忽视了数据库index里,优化器对于不等于 !=的where条件,走不到index。
二 SQL二优化分析处理思路
1 原始SQL
select a.status, sum(case when b.status is null then 0 else 1 end),
sum(case when b.cust_category = 'DZ001' then 1 else 0 end) as "zyd"
, sum(case when b.cust_category = 'ZEJ001' then 1 else 0 end) as "zyj"
, sum(case when b.cust_category = 'MY001' then 1 else 0 end) as "my"
from (select regexp_split_to_table('100,200,300,400,401,500,600,402,700,701,800,901', ',') as status) a
left join t_ai_prd_bill b on (b.status = a.status)
WHERE b.is_deleted = FALSE
group by a.status
order by a.status;
2 原始SQL执行计划
ai=> explain analyze select a.status, sum(case when b.status is null then 0 else 1 end),
ai-> sum(case when b.cust_category = 'DZ001' then 1 else 0 end) as "zyd"
ai-> , sum(case when b.cust_category = 'ZEJ001' then 1 else 0 end) as "zyj"
ai-> , sum(case when b.cust_category = 'MY001' then 1 else 0 end) as "my"
ai-> from (select regexp_split_to_table('100,200,300,400,401,500,600,402,700,701,800,901', ',') as status) a
ai-> left join t_ai_prd_bill b on (b.status = a.status)
ai-> WHERE b.is_deleted = FALSE
ai-> group by a.status
ai-> order by a.status;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=6328.61..15390.61 rows=200 width=42) (actual time=70.104..75.647 rows=5 loops=1)
Group Key: (regexp_split_to_table('100,200,300,400,401,500,600,402,700,701,800,901'::text, ','::text))
-> Merge Join (cost=6328.61..10560.61 rows=241400 width=42) (actual time=43.422..57.495 rows=35869 loops=1)
Merge Cond: ((regexp_split_to_table('100,200,300,400,401,500,600,402,700,701,800,901'::text, ','::text)) = ((b.status)::text))
-> Sort (cost=64.84..67.34 rows=1000 width=32) (actual time=0.117..0.119 rows=12 loops=1)
Sort Key: (regexp_split_to_table('100,200,300,400,401,500,600,402,700,701,800,901'::text, ','::text))
Sort Method: quicksort Memory: 25kB
-> Result (cost=0.00..5.01 rows=1000 width=0) (actual time=0.094..0.102 rows=12 loops=1)
-> Sort (cost=6263.78..6384.48 rows=48280 width=10) (actual time=43.293..46.852 rows=48280 loops=1)
Sort Key: ((b.status)::text)
Sort Method: quicksort Memory: 3629kB
-> Seq Scan on t_ai_prd_bill b (cost=0.00..2507.80 rows=48280 width=10) (actual time=0.012..30.322 rows=48280 loops=1)
Filter: (NOT is_deleted)
Planning time: 0.367 ms
Execution time: 75.737 ms
(15 rows)
ai=>
3 分析SQL及其执行计划
原始SQL是想查询统计t_ai_prd_bill表,按照status字段来分组排序。这里,绕了一个弯路,通过引入函数regexp_split_to_table来构造一个单列的虚拟表a,再使其与t_ai_prd_bill表做一次left join,没有必要。同时,从执行计划里看到有个较大的内存排序,Sort Method: quicksort Memory: 3629kB。
4 尝试改写等价SQL
select b.status,sum(case when b.status is null then 0 else 1 end),
sum(case when b.cust_category = 'DZ001' then 1 else 0 end) as "zyd"
, sum(case when b.cust_category = 'ZEJ001' then 1 else 0 end) as "zyj"
, sum(case when b.cust_category = 'MY001' then 1 else 0 end) as "my"
from t_ai_prd_bill b
WHERE b.is_deleted = FALSE
group by b.status
order by b.status;
5 改写后的SQL执行计划
ai=> explain analyze select b.status,sum(case when b.status is null then 0 else 1 end),
ai-> sum(case when b.cust_category = 'DZ001' then 1 else 0 end) as "zyd"
ai-> , sum(case when b.cust_category = 'ZEJ001' then 1 else 0 end) as "zyj"
ai-> , sum(case when b.cust_category = 'MY001' then 1 else 0 end) as "my"
ai-> from t_ai_prd_bill b
ai-> WHERE b.is_deleted = FALSE
ai-> group by b.status
ai-> order by b.status;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=3473.54..3473.55 rows=6 width=10) (actual time=49.986..49.987 rows=6 loops=1)
Sort Key: status
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=3473.40..3473.46 rows=6 width=10) (actual time=49.932..49.934 rows=6 loops=1)
Group Key: status
-> Seq Scan on t_ai_prd_bill b (cost=0.00..2507.80 rows=48280 width=10) (actual time=0.008..11.934 rows=48280 loops=1)
Filter: (NOT is_deleted)
Planning time: 0.109 ms
Execution time: 50.060 ms
(9 rows)
ai=>
6 小结
通过分析,改写SQL,前后对比执行计划,发现减少了不必要的内存排序操作,进而让数据库执行SQL响应更快,提升效率。