接项目组同事反馈,一条SQL,在beta环境足够快,生产环境就慢很多。对于这种问题,我通常都需要额外的向他们解释一会儿,比如数据量分布,统计信息,不同的执行计划等…. 总之,这种现象是一个完全正常的现象。
一 首先,那么,就直接从生产环境上看一下SQL的执行计划:
cattle=> explain analyze SELECT cattle-> ci.*, fa.farm_name AS farmName, cattle-> cs.cowshed_no AS cowshedNo cattle-> FROM cattle-> t_cattle_ca_cattleinfo AS ci cattle-> LEFT JOIN t_cattle_crm_cust_info AS cci ON ci.owner_id = cci. ID cattle-> LEFT JOIN t_cattle_bd_farm AS fa ON ci.farm_id = fa. ID cattle-> LEFT JOIN t_cattle_bd_cowshed AS cs ON ci.dorm_id = cs. ID cattle-> LEFT JOIN t_cattle_sys_company t1 ON ci.owner_id = t1. ID cattle-> WHERE cattle-> ci. FLAG = 1 cattle-> AND ci.corp_id = 'OAO_SYSTEMINTIAL_000000000000000' cattle-> ORDER BY cattle-> ci.create_time DESC cattle-> LIMIT 10 OFFSET (1- 1); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=305987.09..305987.12 rows=10 width=766) (actual time=1527.412..1527.415 rows=10 loops=1) -> Sort (cost=305987.09..306338.21 rows=140447 width=766) (actual time=1527.411..1527.414 rows=10 loops=1) Sort Key: ci.create_time Sort Method: top-N heapsort Memory: 35kB -> Hash Left Join (cost=82.84..302952.08 rows=140447 width=766) (actual time=1313.868..1504.013 rows=24673 loops=1) Hash Cond: ((ci.dorm_id)::text = (cs.id)::text) -> Hash Left Join (cost=3.81..301095.66 rows=140447 width=759) (actual time=1313.053..1492.219 rows=24673 loops=1) Hash Cond: ((ci.farm_id)::text = (fa.id)::text) -> Seq Scan on t_cattle_ca_cattleinfo ci (cost=0.00..299160.71 rows=140447 width=738) (actual time=1312.957..1474.099 rows=24673 loops=1) Filter: (((flag)::text = '1'::text) AND ((corp_id)::text = 'OAO_SYSTEMINTIAL_000000000000000'::text)) -> Hash (cost=3.36..3.36 rows=36 width=54) (actual time=0.043..0.043 rows=36 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 4kB -> Seq Scan on t_cattle_bd_farm fa (cost=0.00..3.36 rows=36 width=54) (actual time=0.018..0.027 rows=36 loops=1) -> Hash (cost=63.46..63.46 rows=1246 width=40) (actual time=0.796..0.796 rows=1246 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 89kB -> Seq Scan on t_cattle_bd_cowshed cs (cost=0.00..63.46 rows=1246 width=40) (actual time=0.003..0.438 rows=1246 loops=1) Total runtime: 1527.576 ms (17 rows) cattle=>
从上,可以看到,优化器在此处对于t_cattle_ca_cattleinfo表选择的是全表扫描。且耗时大概1474毫秒左右,而整个SQL的总耗时为1527.576 ms,说明绝大部分的时间开销都花费在对于t_cattle_ca_cattleinfo表的访问上。
二 接下来,分析t_cattle_ca_cattleinfo表的数据分布信息:
cattle=> select count(*) from t_cattle_ca_cattleinfo; count ------- 24673 (1 row) cattle=> select count(*) from t_cattle_ca_cattleinfo where corp_id = 'OAO_SYSTEMINTIAL_000000000000000'; count ------- 24673 (1 row) cattle=> select count(*) from t_cattle_ca_cattleinfo where corp_id = 'OAO_SYSTEMINTIAL_000000000000000' and FLAG = 1; count ------- 24673 (1 row) cattle=>
结合上述,看到t_cattle_ca_cattleinfo表总计24673条记录,而且满足corp_id = ‘OAO_SYSTEMINTIAL_000000000000000’ and FLAG = 1条件的数据,就是全表的数据总量。
但是,结合上述执行计划来看,优化器认为该表有rows=140447,可是手工查询到该表当前只有24673条记录。说明,统计信息出了问题。
三 然后,手工在线收集表统计信息:
cattle=> vacuum FREEZE ANALYZE verbose t_cattle_ca_cattleinfo; INFO: vacuuming "cattle.t_cattle_ca_cattleinfo" INFO: scanned index "t_cattle_ca_cattleinfo_pkey" to remove 4763 row versions DETAIL: CPU 0.08s/0.12u sec elapsed 0.24 sec. INFO: scanned index "cattleCode_index" to remove 4763 row versions DETAIL: CPU 0.02s/0.04u sec elapsed 0.07 sec. INFO: scanned index "corpId_index" to remove 4763 row versions DETAIL: CPU 0.08s/0.13u sec elapsed 0.22 sec. INFO: scanned index "farmId_index" to remove 4763 row versions DETAIL: CPU 0.07s/0.12u sec elapsed 0.20 sec. INFO: scanned index "feedId_index" to remove 4763 row versions DETAIL: CPU 0.09s/0.10u sec elapsed 0.21 sec. INFO: scanned index "manageCode_index" to remove 4763 row versions DETAIL: CPU 0.00s/0.03u sec elapsed 0.04 sec. INFO: scanned index "rfid_index" to remove 4763 row versions DETAIL: CPU 0.13s/0.12u sec elapsed 0.26 sec. INFO: "t_cattle_ca_cattleinfo": removed 4763 row versions in 1052 pages DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "t_cattle_ca_cattleinfo_pkey" now contains 24673 row versions in 31871 pages DETAIL: 4763 index row versions were removed. 14490 index pages have been deleted, 14490 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "cattleCode_index" now contains 24673 row versions in 17799 pages DETAIL: 4763 index row versions were removed. 2795 index pages have been deleted, 2795 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "corpId_index" now contains 24673 row versions in 37233 pages DETAIL: 4763 index row versions were removed. 36745 index pages have been deleted, 36742 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "farmId_index" now contains 24673 row versions in 37134 pages DETAIL: 4763 index row versions were removed. 36654 index pages have been deleted, 36652 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "feedId_index" now contains 24673 row versions in 37155 pages DETAIL: 4763 index row versions were removed. 36595 index pages have been deleted, 36595 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "manageCode_index" now contains 24673 row versions in 17764 pages DETAIL: 4762 index row versions were removed. 2744 index pages have been deleted, 2744 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "rfid_index" now contains 24673 row versions in 34752 pages DETAIL: 4763 index row versions were removed. 19034 index pages have been deleted, 19034 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "t_cattle_ca_cattleinfo": found 753 removable, 24673 nonremovable row versions in 297054 out of 297054 pages DETAIL: 0 dead row versions cannot be removed yet. There were 4399722 unused item pointers. 0 pages are entirely empty. CPU 1.17s/1.69u sec elapsed 2.92 sec. INFO: vacuuming "pg_toast.pg_toast_1962065" INFO: index "pg_toast_1962065_index" now contains 0 row versions in 1 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_1962065": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: analyzing "cattle.t_cattle_ca_cattleinfo" INFO: "t_cattle_ca_cattleinfo": scanned 30000 of 297054 pages, containing 2630 live rows and 0 dead rows; 2630 rows in sample, 24811 estimated total rows VACUUM
四 重新查看该SQL执行计划:
cattle=> explain analyze SELECT cattle-> ci.*, fa.farm_name AS farmName, cattle-> cs.cowshed_no AS cowshedNo cattle-> FROM cattle-> t_cattle_ca_cattleinfo AS ci cattle-> LEFT JOIN t_cattle_crm_cust_info AS cci ON ci.owner_id = cci. ID cattle-> LEFT JOIN t_cattle_bd_farm AS fa ON ci.farm_id = fa. ID cattle-> LEFT JOIN t_cattle_bd_cowshed AS cs ON ci.dorm_id = cs. ID cattle-> LEFT JOIN t_cattle_sys_company t1 ON ci.owner_id = t1. ID cattle-> WHERE cattle-> ci. FLAG = 1 cattle-> AND ci.corp_id = 'OAO_SYSTEMINTIAL_000000000000000' cattle-> ORDER BY cattle-> ci.create_time DESC cattle-> LIMIT 10 OFFSET (1- 1); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=225802.81..225802.84 rows=10 width=766) (actual time=79.316..79.319 rows=10 loops=1) -> Sort (cost=225802.81..225864.84 rows=24811 width=766) (actual time=79.314..79.315 rows=10 loops=1) Sort Key: ci.create_time Sort Method: top-N heapsort Memory: 35kB -> Hash Left Join (cost=149207.67..225266.65 rows=24811 width=766) (actual time=9.327..54.520 rows=24673 loops=1) Hash Cond: ((ci.dorm_id)::text = (cs.id)::text) -> Hash Left Join (cost=149128.63..224878.17 rows=24811 width=759) (actual time=8.693..42.217 rows=24673 loops=1) Hash Cond: ((ci.farm_id)::text = (fa.id)::text) -> Bitmap Heap Scan on t_cattle_ca_cattleinfo ci (cost=149124.82..224533.21 rows=24811 width=738) (actual time=8.649..22.418 rows=24673 loops=1) Recheck Cond: ((corp_id)::text = 'OAO_SYSTEMINTIAL_000000000000000'::text) Filter: ((flag)::text = '1'::text) -> Bitmap Index Scan on "corpId_index" (cost=0.00..149118.62 rows=24811 width=0) (actual time=8.182..8.182 rows=24673 loops=1) Index Cond: ((corp_id)::text = 'OAO_SYSTEMINTIAL_000000000000000'::text) -> Hash (cost=3.36..3.36 rows=36 width=54) (actual time=0.025..0.025 rows=36 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 4kB -> Seq Scan on t_cattle_bd_farm fa (cost=0.00..3.36 rows=36 width=54) (actual time=0.005..0.011 rows=36 loops=1) -> Hash (cost=63.46..63.46 rows=1246 width=40) (actual time=0.617..0.617 rows=1246 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 89kB -> Seq Scan on t_cattle_bd_cowshed cs (cost=0.00..63.46 rows=1246 width=40) (actual time=0.001..0.251 rows=1246 loops=1) Total runtime: 79.503 ms (20 rows) cattle=>
对比,收集统计信息之前,SQL执行耗时1500毫秒,更新统计信息之后,耗时79毫秒。
五 小结:
一则典型的数据库因为表的统计信息不准,导致优化器选择错误的执行计划的一个例子。