PostgreSQL统计信息不准导致错误的执行计划

接项目组同事反馈,一条SQL,在beta环境足够快,生产环境就慢很多。对于这种问题,我通常都需要额外的向他们解释一会儿,比如数据量分布,统计信息,不同的执行计划等…. 总之,这种现象是一个完全正常的现象。

一 首先,那么,就直接从生产环境上看一下SQL的执行计划:

cattle=> explain analyze SELECT
cattle-> ci.*, fa.farm_name AS farmName,
cattle-> cs.cowshed_no AS cowshedNo
cattle-> FROM
cattle-> t_cattle_ca_cattleinfo AS ci
cattle-> LEFT JOIN t_cattle_crm_cust_info AS cci ON ci.owner_id = cci. ID
cattle-> LEFT JOIN t_cattle_bd_farm AS fa ON ci.farm_id = fa. ID
cattle-> LEFT JOIN t_cattle_bd_cowshed AS cs ON ci.dorm_id = cs. ID
cattle-> LEFT JOIN t_cattle_sys_company t1 ON ci.owner_id = t1. ID
cattle-> WHERE
cattle-> ci. FLAG = 1
cattle-> AND ci.corp_id = 'OAO_SYSTEMINTIAL_000000000000000'
cattle-> ORDER BY
cattle-> ci.create_time DESC
cattle-> LIMIT 10 OFFSET (1- 1);
                                                                           QUERY PLAN                                                                            
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=305987.09..305987.12 rows=10 width=766) (actual time=1527.412..1527.415 rows=10 loops=1)
   ->  Sort  (cost=305987.09..306338.21 rows=140447 width=766) (actual time=1527.411..1527.414 rows=10 loops=1)
         Sort Key: ci.create_time
         Sort Method: top-N heapsort  Memory: 35kB
         ->  Hash Left Join  (cost=82.84..302952.08 rows=140447 width=766) (actual time=1313.868..1504.013 rows=24673 loops=1)
               Hash Cond: ((ci.dorm_id)::text = (cs.id)::text)
               ->  Hash Left Join  (cost=3.81..301095.66 rows=140447 width=759) (actual time=1313.053..1492.219 rows=24673 loops=1)
                     Hash Cond: ((ci.farm_id)::text = (fa.id)::text)
                     ->  Seq Scan on t_cattle_ca_cattleinfo ci  (cost=0.00..299160.71 rows=140447 width=738) (actual time=1312.957..1474.099 rows=24673 loops=1)
                           Filter: (((flag)::text = '1'::text) AND ((corp_id)::text = 'OAO_SYSTEMINTIAL_000000000000000'::text))
                     ->  Hash  (cost=3.36..3.36 rows=36 width=54) (actual time=0.043..0.043 rows=36 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 4kB
                           ->  Seq Scan on t_cattle_bd_farm fa  (cost=0.00..3.36 rows=36 width=54) (actual time=0.018..0.027 rows=36 loops=1)
               ->  Hash  (cost=63.46..63.46 rows=1246 width=40) (actual time=0.796..0.796 rows=1246 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 89kB
                     ->  Seq Scan on t_cattle_bd_cowshed cs  (cost=0.00..63.46 rows=1246 width=40) (actual time=0.003..0.438 rows=1246 loops=1)
 Total runtime: 1527.576 ms
(17 rows)

cattle=>

从上,可以看到,优化器在此处对于t_cattle_ca_cattleinfo表选择的是全表扫描。且耗时大概1474毫秒左右,而整个SQL的总耗时为1527.576 ms,说明绝大部分的时间开销都花费在对于t_cattle_ca_cattleinfo表的访问上。

二  接下来,分析t_cattle_ca_cattleinfo表的数据分布信息:

cattle=> select count(*) from t_cattle_ca_cattleinfo;                                                   
 count 
-------
 24673
(1 row)

cattle=> select count(*) from t_cattle_ca_cattleinfo where corp_id = 'OAO_SYSTEMINTIAL_000000000000000';
 count 
-------
 24673
(1 row)

cattle=> select count(*) from t_cattle_ca_cattleinfo where corp_id = 'OAO_SYSTEMINTIAL_000000000000000' and FLAG = 1;
 count 
-------
 24673
(1 row)

cattle=>

结合上述,看到t_cattle_ca_cattleinfo表总计24673条记录,而且满足corp_id = ‘OAO_SYSTEMINTIAL_000000000000000’ and FLAG = 1条件的数据,就是全表的数据总量。

但是,结合上述执行计划来看,优化器认为该表有rows=140447,可是手工查询到该表当前只有24673条记录。说明,统计信息出了问题。

三 然后,手工在线收集表统计信息:

cattle=> vacuum FREEZE ANALYZE verbose t_cattle_ca_cattleinfo;
INFO:  vacuuming "cattle.t_cattle_ca_cattleinfo"
INFO:  scanned index "t_cattle_ca_cattleinfo_pkey" to remove 4763 row versions
DETAIL:  CPU 0.08s/0.12u sec elapsed 0.24 sec.
INFO:  scanned index "cattleCode_index" to remove 4763 row versions
DETAIL:  CPU 0.02s/0.04u sec elapsed 0.07 sec.
INFO:  scanned index "corpId_index" to remove 4763 row versions
DETAIL:  CPU 0.08s/0.13u sec elapsed 0.22 sec.
INFO:  scanned index "farmId_index" to remove 4763 row versions
DETAIL:  CPU 0.07s/0.12u sec elapsed 0.20 sec.
INFO:  scanned index "feedId_index" to remove 4763 row versions
DETAIL:  CPU 0.09s/0.10u sec elapsed 0.21 sec.
INFO:  scanned index "manageCode_index" to remove 4763 row versions
DETAIL:  CPU 0.00s/0.03u sec elapsed 0.04 sec.
INFO:  scanned index "rfid_index" to remove 4763 row versions
DETAIL:  CPU 0.13s/0.12u sec elapsed 0.26 sec.
INFO:  "t_cattle_ca_cattleinfo": removed 4763 row versions in 1052 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "t_cattle_ca_cattleinfo_pkey" now contains 24673 row versions in 31871 pages
DETAIL:  4763 index row versions were removed.
14490 index pages have been deleted, 14490 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "cattleCode_index" now contains 24673 row versions in 17799 pages
DETAIL:  4763 index row versions were removed.
2795 index pages have been deleted, 2795 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "corpId_index" now contains 24673 row versions in 37233 pages
DETAIL:  4763 index row versions were removed.
36745 index pages have been deleted, 36742 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "farmId_index" now contains 24673 row versions in 37134 pages
DETAIL:  4763 index row versions were removed.
36654 index pages have been deleted, 36652 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "feedId_index" now contains 24673 row versions in 37155 pages
DETAIL:  4763 index row versions were removed.
36595 index pages have been deleted, 36595 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "manageCode_index" now contains 24673 row versions in 17764 pages
DETAIL:  4762 index row versions were removed.
2744 index pages have been deleted, 2744 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "rfid_index" now contains 24673 row versions in 34752 pages
DETAIL:  4763 index row versions were removed.
19034 index pages have been deleted, 19034 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "t_cattle_ca_cattleinfo": found 753 removable, 24673 nonremovable row versions in 297054 out of 297054 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 4399722 unused item pointers.
0 pages are entirely empty.
CPU 1.17s/1.69u sec elapsed 2.92 sec.
INFO:  vacuuming "pg_toast.pg_toast_1962065"
INFO:  index "pg_toast_1962065_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_1962065": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing "cattle.t_cattle_ca_cattleinfo"
INFO:  "t_cattle_ca_cattleinfo": scanned 30000 of 297054 pages, containing 2630 live rows and 0 dead rows; 2630 rows in sample, 24811 estimated total rows
VACUUM

四 重新查看该SQL执行计划:

cattle=> explain analyze SELECT
cattle-> ci.*, fa.farm_name AS farmName,
cattle-> cs.cowshed_no AS cowshedNo
cattle-> FROM
cattle-> t_cattle_ca_cattleinfo AS ci
cattle-> LEFT JOIN t_cattle_crm_cust_info AS cci ON ci.owner_id = cci. ID
cattle-> LEFT JOIN t_cattle_bd_farm AS fa ON ci.farm_id = fa. ID
cattle-> LEFT JOIN t_cattle_bd_cowshed AS cs ON ci.dorm_id = cs. ID
cattle-> LEFT JOIN t_cattle_sys_company t1 ON ci.owner_id = t1. ID
cattle-> WHERE
cattle-> ci. FLAG = 1
cattle-> AND ci.corp_id = 'OAO_SYSTEMINTIAL_000000000000000'
cattle-> ORDER BY
cattle-> ci.create_time DESC
cattle-> LIMIT 10 OFFSET (1- 1);  
                                                                               QUERY PLAN                                                                               
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=225802.81..225802.84 rows=10 width=766) (actual time=79.316..79.319 rows=10 loops=1)
   ->  Sort  (cost=225802.81..225864.84 rows=24811 width=766) (actual time=79.314..79.315 rows=10 loops=1)
         Sort Key: ci.create_time
         Sort Method: top-N heapsort  Memory: 35kB
         ->  Hash Left Join  (cost=149207.67..225266.65 rows=24811 width=766) (actual time=9.327..54.520 rows=24673 loops=1)
               Hash Cond: ((ci.dorm_id)::text = (cs.id)::text)
               ->  Hash Left Join  (cost=149128.63..224878.17 rows=24811 width=759) (actual time=8.693..42.217 rows=24673 loops=1)
                     Hash Cond: ((ci.farm_id)::text = (fa.id)::text)
                     ->  Bitmap Heap Scan on t_cattle_ca_cattleinfo ci  (cost=149124.82..224533.21 rows=24811 width=738) (actual time=8.649..22.418 rows=24673 loops=1)
                           Recheck Cond: ((corp_id)::text = 'OAO_SYSTEMINTIAL_000000000000000'::text)
                           Filter: ((flag)::text = '1'::text)
                           ->  Bitmap Index Scan on "corpId_index"  (cost=0.00..149118.62 rows=24811 width=0) (actual time=8.182..8.182 rows=24673 loops=1)
                                 Index Cond: ((corp_id)::text = 'OAO_SYSTEMINTIAL_000000000000000'::text)
                     ->  Hash  (cost=3.36..3.36 rows=36 width=54) (actual time=0.025..0.025 rows=36 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 4kB
                           ->  Seq Scan on t_cattle_bd_farm fa  (cost=0.00..3.36 rows=36 width=54) (actual time=0.005..0.011 rows=36 loops=1)
               ->  Hash  (cost=63.46..63.46 rows=1246 width=40) (actual time=0.617..0.617 rows=1246 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 89kB
                     ->  Seq Scan on t_cattle_bd_cowshed cs  (cost=0.00..63.46 rows=1246 width=40) (actual time=0.001..0.251 rows=1246 loops=1)
 Total runtime: 79.503 ms
(20 rows)

cattle=>

对比,收集统计信息之前,SQL执行耗时1500毫秒,更新统计信息之后,耗时79毫秒。
五  小结:

一则典型的数据库因为表的统计信息不准,导致优化器选择错误的执行计划的一个例子。

PostgreSQL编译安装uuid函数

1 现象

一台源码编译安装之后的PostgreSQL数据库,在创建使用uuid的时候,报错。

 
postgres=# \c
You are now connected to database "postgres" as user "postgres".
postgres=# CREATE OR REPLACE FUNCTION uuid_generate_v4()
postgres-#   RETURNS uuid AS
postgres-# '$libdir/uuid-ossp', 'uuid_generate_v4'
postgres-#   LANGUAGE c VOLATILE STRICT
postgres-#   COST 1;
ERROR:  could not access file "$libdir/uuid-ossp": No such file or directory
postgres=#

2 原因

源码编译安装的时候,没有选择编译uuid,需要重新编译

[postgres@localhost postgresql-9.6.4]$ pwd
/home/postgres/postgresql-9.6.4
[postgres@localhost postgresql-9.6.4]$ ./configure --prefix=/postgres/9.6.4/ --with-ossp-uuid 
..
.
..

这里有两点需要注意:

  • 需要使用之前解压出来的源码文件,如果之前解压的源码被删除,那么可以用与已经安装的数据库软件相同版本的压缩包重新解压;
  • --prefix选项要与之前安装时的相同目录;

在重新编译的过程中,如果遇到下述错误:

checking for uuid_export in -lossp-uuid... no
checking for uuid_export in -luuid... no
configure: error: library 'ossp-uuid' or 'uuid' is required for OSSP-UUID

则,解决:yum install uuid-devel -y

yum install uuid-devel -y

然后,重新继续编译,编译完成之后,再到源码的contrib路径下,执行 make和make install来安装uuid-ossp.

[postgres@localhost uuid-ossp]$ pwd
/home/postgres/postgresql-9.6.4/contrib/uuid-ossp
[postgres@localhost uuid-ossp]$ make 
make: 对“all”无需做任何事。
[postgres@localhost uuid-ossp]$ make install
/bin/mkdir -p '/postgres/9.6.4/lib'
/bin/mkdir -p '/postgres/9.6.4/share/extension'
/bin/mkdir -p '/postgres/9.6.4/share/extension'
/bin/install -c -m 755  uuid-ossp.so '/postgres/9.6.4/lib/uuid-ossp.so'
/bin/install -c -m 644 ./uuid-ossp.control '/postgres/9.6.4/share/extension/'
/bin/install -c -m 644 ./uuid-ossp--1.1.sql ./uuid-ossp--1.0--1.1.sql ./uuid-ossp--unpackaged--1.0.sql  '/postgres/9.6.4/share/extension/'
[postgres@localhost uuid-ossp]$ 

3 最后,重新启动数据库

后来经过测试,可以不需要重启数据库。

直接到对应的库里,创建该函数即可。

postgres=# \c zyd_test 
You are now connected to database "zyd_test" as user "postgres".
zyd_test=# CREATE OR REPLACE FUNCTION uuid_generate_v4()
  RETURNS uuid AS
'$libdir/uuid-ossp', 'uuid_generate_v4'
  LANGUAGE c VOLATILE STRICT
  COST 1;
CREATE FUNCTION
zyd_test=# alter function uuid_generate_v4() owner to zyd_test ;
ALTER FUNCTION
zyd_test=#