另辟蹊径的一则PostgreSQL数据库SQL优化案例

一问题现象

早上，收到项目组诉求，业务系统的某个模块从昨天下午3：30左右出现有数据插入失败的情况。希望能从数据库端或者服务器上看看有没有异常现象，并给予支持。

登录到数据库控制台上，经过初步排查发现，看到下述现象：

从昨天上午10：29开始，系统频繁出现类似插入SQL，且每次执行耗时1秒。更为糟糕的是，该类型SQL一直持续到今天早上，不停的在执行。

二分析问题

拎出其中一条SQL进行分析

INSERT
INTO t_ai_prd_history_effective_record
  (
    ID,
    create_time,
    cust_category,
    cust_id,
    cust_name,
    company_id,
    company_name,
    template_no,
    template_name,
    template_field_identifier,
    template_field_name,
    submit_value
  )
SELECT random_string (32),
  now(),
  $1,
  $2,
  $3,
  $4,
  $5,
  $6,
  $7,
  $8,
  $9,
  $10
WHERE NOT EXISTS
  (SELECT 1
  FROM t_ai_prd_history_effective_record r
  WHERE COALESCE (r.cust_category, '')           = COALESCE ($11, '')
  AND COALESCE (r.cust_id, '')                   = COALESCE ($12, '')
  AND COALESCE (r.company_id, '')                = COALESCE ($13, '')
  AND COALESCE (r.template_no, '')               = COALESCE ($14, '')
  AND COALESCE (r.template_field_identifier, '') = COALESCE ($15,'')
  AND COALESCE (r.submit_value, '')              = COALESCE ($16, '')
  )

对应的参数为：

params: $1                                   = 'MY001',
  $2                                             = 'b8168c7513014b0c9769f3d61574833d',
  $3                                             = 'WP MANAGEMENT SERVICES',
  $4                                             = '1BABS7HSS5UH01FE140A000085535171',
  $5                                             = 'KLINIK PERGIGIAN M DENTAL',
  $6                                             = 'MYBL0303',
  $7                                             = 'Expenses',
  $8                                             = 'InvoiceDate',
  $9                                             = 'InvoiceDate（发票日期）',
  $10                                            = '20170614',
  $11                                            = 'MY001',
  $12                                            = 'b8168c7513014b0c9769f3d61574833d',
  $13                                            = '1BABS7HSS5UH01FE140A000085535171',
  $14                                            = 'MYBL0303',
  $15                                            = 'InvoiceDate',
  $16                                            = '20170614'

显然，该SQL的目的是要向表t_ai_prd_history_effective_record进行有条件的插入数据，其条件是表中不存在

COALESCE (r.cust_category, '')           = COALESCE ('DZ001', '')
ai->   AND COALESCE (r.cust_id, '')                   = COALESCE ('b151ad4f86ab4ec5aee8c4cc377e9eb7', '')
ai->   AND COALESCE (r.company_id, '')                = COALESCE ('04cb580238dc49af8bfb46e00e959a1a', '')
ai->   AND COALESCE (r.template_no, '')               = COALESCE ('KJDZ0101', '')
ai->   AND COALESCE (r.template_field_identifier, '') = COALESCE ('ItemQuantity','')
ai->   AND COALESCE (r.submit_value, '')              = COALESCE ('10100$__$6080$__$$__$$__$', '');

的情况下。具体体现在SQL中的WHERE NOT EXISTS。

那么，问题就明了了，应该是出现在NOT EXISTS的子查询上。

接下来，单独看看这个子查询的执行计划：

 ai=> explain analyze SELECT 1
ai->   FROM t_ai_prd_history_effective_record r
ai->   WHERE COALESCE (r.cust_category, '')           = COALESCE ('DZ001', '')
ai->   AND COALESCE (r.cust_id, '')                   = COALESCE ('b151ad4f86ab4ec5aee8c4cc377e9eb7', '')
ai->   AND COALESCE (r.company_id, '')                = COALESCE ('04cb580238dc49af8bfb46e00e959a1a', '')
ai->   AND COALESCE (r.template_no, '')               = COALESCE ('KJDZ0101', '')
ai->   AND COALESCE (r.template_field_identifier, '') = COALESCE ('ItemQuantity','')
ai->   AND COALESCE (r.submit_value, '')              = COALESCE ('10100$__$6080$__$$__$$__$', '');
                                                                                                                                                                                                                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on t_ai_prd_history_effective_record r  (cost=0.00..203817.53 rows=1 width=0) (actual time=1266.116..1267.093 rows=1 loops=1)
   Filter: (((COALESCE(cust_category, ''::character varying))::text = 'DZ001'::text) AND ((COALESCE(cust_id, ''::character varying))::text = 'b151ad4f86ab4ec5aee8c4cc377e9eb7'::text) AND ((COALESCE(company_id, ''::character varying))::text = '04cb580238dc49af8bfb46e00e959a1a'::text) AND ((COALESCE(template_no, ''::character varying))::text = 'KJDZ0101'::text) AND ((COALESCE(template_field_identifier, ''::character varying))::text = 'ItemQuantity'::text) AND (COALESCE(submit_value, ''::text) = '10100$__$6080$__$$__$$__$'::text))
   Rows Removed by Filter: 3193300
 Planning time: 0.129 ms
 Execution time: 1267.133 ms
(5 rows)
ai=>

t_ai_prd_history_effective_record表数据量：

ai=> select count(*) from t_ai_prd_history_effective_record;
  count 
---------
 3193138
(1 row)
ai=>

意料之中，每次执行这个去重的不存在判断，都对这个拥有300多万条数据的t_ai_prd_history_effective_record表执行一次全表扫描，可想而知，执行效率肯定低下，每次耗时1秒多，且随着该表的数据不断增多，插入的速度将变得越来越慢。

这里，可以说是，每一条记录的插入，都将导致下一次插入变得更慢！

三解决方案

找到了问题的症结，就要着手优化了。终极目标就是怎么可以绕过那个NOT exists的子查询？

经过和BPO部门林朝荣同学讨论，给出下述方案：

能不能对于t_ai_prd_history_effective_record表的子查询的那6个字段上建立一个唯一性索引，如果可以的话，那么我们就在插入的时候不再需要那个where条件判断去重了，而由这个唯一性索引的约束去做判断，如果插入重复数据的话，就违反约束，报错抛出异常。这样的话，看似解决了这个问题，但是，如果从业务上考虑的话，的确是存在过多的重复数据，如果报错的话，业务人员会在系统上频繁的收到报错信息，不是很友好。

既然这样的话，那问题就变的更简单了。直接粗暴的把SQL的where not exists的去重判断删掉，先一股脑儿的插入，不论是否重复，然后，写个批处理的SQL，放到晚上再统一把重复的记录删掉，问题不就解决了嘛。

给出一个删除PostgreSQL数据库表重复记录的示例：

ai=> create table t(id int primary key,name varchar(10),addr varchar(10));
CREATE TABLE
ai=> insert into t values(1,'onlyou.com','xiamen');
INSERT 0 1
ai=> insert into t values(2,'apple.com','usa');   
INSERT 0 1
ai=> insert into t values(3,'apple.com','usa');
INSERT 0 1
ai=> insert into t values(4,'google','usa');
INSERT 0 1
ai=> insert into t values(5,'google','usa');
INSERT 0 1
ai=> select * from t;
 id |    name    |  addr 
----+------------+--------
  1 | onlyou.com | xiamen
  2 | apple.com  | usa
  3 | apple.com  | usa
  4 | google     | usa
  5 | google     | usa
(5 rows)
ai=> select t1.id from t t1 where t1.ctid<>(select max(t2.ctid) from t t2 where t1.name=t2.name and t1.addr=t2.addr);
 id
----
  2
  4
(2 rows)
ai=> delete from t where id in(select t1.id from t t1 where t1.ctid<>(select max(t2.ctid) from t t2 where t1.name=t2.name and t1.addr=t2.addr));
DELETE 2
ai=> select * from t;                                                                                                                          
 id |    name    |  addr 
----+------------+--------
  1 | onlyou.com | xiamen
  3 | apple.com  | usa
  5 | google     | usa
(3 rows)
ai=>

四小结

发现并找到慢SQL的问题所在，进行有针对性的优化改造，解决问题往往不一定难，深入思考一下多尝试。

以上，给出一个优化的思路和方向。