一 问题现象
早上,收到项目组诉求,业务系统的某个模块从昨天下午3:30左右出现有数据插入失败的情况。希望能从数据库端或者服务器上看看有没有异常现象,并给予支持。
登录到数据库控制台上,经过初步排查发现,看到下述现象:
二 分析问题
拎出其中一条SQL进行分析
INSERT INTO t_ai_prd_history_effective_record ( ID, create_time, cust_category, cust_id, cust_name, company_id, company_name, template_no, template_name, template_field_identifier, template_field_name, submit_value ) SELECT random_string (32), now(), $1, $2, $3, $4, $5, $6, $7, $8, $9, $10 WHERE NOT EXISTS (SELECT 1 FROM t_ai_prd_history_effective_record r WHERE COALESCE (r.cust_category, '') = COALESCE ($11, '') AND COALESCE (r.cust_id, '') = COALESCE ($12, '') AND COALESCE (r.company_id, '') = COALESCE ($13, '') AND COALESCE (r.template_no, '') = COALESCE ($14, '') AND COALESCE (r.template_field_identifier, '') = COALESCE ($15,'') AND COALESCE (r.submit_value, '') = COALESCE ($16, '') )
对应的参数为:
params: $1 = 'MY001', $2 = 'b8168c7513014b0c9769f3d61574833d', $3 = 'WP MANAGEMENT SERVICES', $4 = '1BABS7HSS5UH01FE140A000085535171', $5 = 'KLINIK PERGIGIAN M DENTAL', $6 = 'MYBL0303', $7 = 'Expenses', $8 = 'InvoiceDate', $9 = 'InvoiceDate(发票日期)', $10 = '20170614', $11 = 'MY001', $12 = 'b8168c7513014b0c9769f3d61574833d', $13 = '1BABS7HSS5UH01FE140A000085535171', $14 = 'MYBL0303', $15 = 'InvoiceDate', $16 = '20170614'
显然,该SQL的目的是要向表t_ai_prd_history_effective_record进行有条件的插入数据,其条件是表中不存在
COALESCE (r.cust_category, '') = COALESCE ('DZ001', '') ai-> AND COALESCE (r.cust_id, '') = COALESCE ('b151ad4f86ab4ec5aee8c4cc377e9eb7', '') ai-> AND COALESCE (r.company_id, '') = COALESCE ('04cb580238dc49af8bfb46e00e959a1a', '') ai-> AND COALESCE (r.template_no, '') = COALESCE ('KJDZ0101', '') ai-> AND COALESCE (r.template_field_identifier, '') = COALESCE ('ItemQuantity','') ai-> AND COALESCE (r.submit_value, '') = COALESCE ('10100$__$6080$__$$__$$__$', '');
的情况下。具体体现在SQL中的WHERE NOT EXISTS。
那么,问题就明了了,应该是出现在NOT EXISTS的子查询上。
接下来,单独看看这个子查询的执行计划:
ai=> explain analyze SELECT 1 ai-> FROM t_ai_prd_history_effective_record r ai-> WHERE COALESCE (r.cust_category, '') = COALESCE ('DZ001', '') ai-> AND COALESCE (r.cust_id, '') = COALESCE ('b151ad4f86ab4ec5aee8c4cc377e9eb7', '') ai-> AND COALESCE (r.company_id, '') = COALESCE ('04cb580238dc49af8bfb46e00e959a1a', '') ai-> AND COALESCE (r.template_no, '') = COALESCE ('KJDZ0101', '') ai-> AND COALESCE (r.template_field_identifier, '') = COALESCE ('ItemQuantity','') ai-> AND COALESCE (r.submit_value, '') = COALESCE ('10100$__$6080$__$$__$$__$', ''); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on t_ai_prd_history_effective_record r (cost=0.00..203817.53 rows=1 width=0) (actual time=1266.116..1267.093 rows=1 loops=1) Filter: (((COALESCE(cust_category, ''::character varying))::text = 'DZ001'::text) AND ((COALESCE(cust_id, ''::character varying))::text = 'b151ad4f86ab4ec5aee8c4cc377e9eb7'::text) AND ((COALESCE(company_id, ''::character varying))::text = '04cb580238dc49af8bfb46e00e959a1a'::text) AND ((COALESCE(template_no, ''::character varying))::text = 'KJDZ0101'::text) AND ((COALESCE(template_field_identifier, ''::character varying))::text = 'ItemQuantity'::text) AND (COALESCE(submit_value, ''::text) = '10100$__$6080$__$$__$$__$'::text)) Rows Removed by Filter: 3193300 Planning time: 0.129 ms Execution time: 1267.133 ms (5 rows) ai=>
t_ai_prd_history_effective_record表数据量:
ai=> select count(*) from t_ai_prd_history_effective_record; count --------- 3193138 (1 row) ai=>
意料之中,每次执行这个去重的不存在判断,都对这个拥有300多万条数据的t_ai_prd_history_effective_record表执行一次全表扫描,可想而知,执行效率肯定低下,每次耗时1秒多,且随着该表的数据不断增多,插入的速度将变得越来越慢。
这里,可以说是,每一条记录的插入,都将导致下一次插入变得更慢!
三 解决方案
找到了问题的症结,就要着手优化了。终极目标就是怎么可以绕过那个NOT exists的子查询?
经过和BPO部门林朝荣同学讨论,给出下述方案:
- 能不能对于t_ai_prd_history_effective_record表的子查询的那6个字段上建立一个唯一性索引,如果可以的话,那么我们就在插入的时候不再需要那个where条件判断去重了,而由这个唯一性索引的约束去做判断,如果插入重复数据的话,就违反约束,报错抛出异常。这样的话,看似解决了这个问题,但是,如果从业务上考虑的话,的确是存在过多的重复数据,如果报错的话,业务人员会在系统上频繁的收到报错信息,不是很友好。
- 既然这样的话,那问题就变的更简单了。直接粗暴的把SQL的where not exists的去重判断删掉,先一股脑儿的插入,不论是否重复,然后,写个批处理的SQL,放到晚上再统一把重复的记录删掉,问题不就解决了嘛。
给出一个删除PostgreSQL数据库表重复记录的示例:
ai=> create table t(id int primary key,name varchar(10),addr varchar(10)); CREATE TABLE ai=> insert into t values(1,'onlyou.com','xiamen'); INSERT 0 1 ai=> insert into t values(2,'apple.com','usa'); INSERT 0 1 ai=> insert into t values(3,'apple.com','usa'); INSERT 0 1 ai=> insert into t values(4,'google','usa'); INSERT 0 1 ai=> insert into t values(5,'google','usa'); INSERT 0 1 ai=> select * from t; id | name | addr ----+------------+-------- 1 | onlyou.com | xiamen 2 | apple.com | usa 3 | apple.com | usa 4 | google | usa 5 | google | usa (5 rows) ai=> select t1.id from t t1 where t1.ctid<>(select max(t2.ctid) from t t2 where t1.name=t2.name and t1.addr=t2.addr); id ---- 2 4 (2 rows) ai=> delete from t where id in(select t1.id from t t1 where t1.ctid<>(select max(t2.ctid) from t t2 where t1.name=t2.name and t1.addr=t2.addr)); DELETE 2 ai=> select * from t; id | name | addr ----+------------+-------- 1 | onlyou.com | xiamen 3 | apple.com | usa 5 | google | usa (3 rows) ai=>
四 小结
发现并找到慢SQL的问题所在,进行有针对性的优化改造,解决问题往往不一定难,深入思考一下多尝试。
以上,给出一个优化的思路和方向。