1 现象

一台源码编译安装之后的PostgreSQL数据库，在创建使用uuid的时候，报错。

 
postgres=# \c
You are now connected to database "postgres" as user "postgres".
postgres=# CREATE OR REPLACE FUNCTION uuid_generate_v4()
postgres-#   RETURNS uuid AS
postgres-# '$libdir/uuid-ossp', 'uuid_generate_v4'
postgres-#   LANGUAGE c VOLATILE STRICT
postgres-#   COST 1;
ERROR:  could not access file "$libdir/uuid-ossp": No such file or directory
postgres=#

2 原因

源码编译安装的时候，没有选择编译uuid，需要重新编译

[postgres@localhost postgresql-9.6.4]$ pwd
/home/postgres/postgresql-9.6.4
[postgres@localhost postgresql-9.6.4]$ ./configure --prefix=/postgres/9.6.4/ --with-ossp-uuid 
..
.
..

这里有两点需要注意:

需要使用之前解压出来的源码文件，如果之前解压的源码被删除，那么可以用与已经安装的数据库软件相同版本的压缩包重新解压；

--prefix选项要与之前安装时的相同目录；

在重新编译的过程中，如果遇到下述错误：

checking for uuid_export in -lossp-uuid... no
checking for uuid_export in -luuid... no
configure: error: library 'ossp-uuid' or 'uuid' is required for OSSP-UUID

则，解决：yum install uuid-devel -y

yum install uuid-devel -y

然后，重新继续编译，编译完成之后，再到源码的contrib路径下，执行 make和make install来安装uuid-ossp.

[postgres@localhost uuid-ossp]$ pwd
/home/postgres/postgresql-9.6.4/contrib/uuid-ossp
[postgres@localhost uuid-ossp]$ make 
make: 对“all”无需做任何事。
[postgres@localhost uuid-ossp]$ make install
/bin/mkdir -p '/postgres/9.6.4/lib'
/bin/mkdir -p '/postgres/9.6.4/share/extension'
/bin/mkdir -p '/postgres/9.6.4/share/extension'
/bin/install -c -m 755  uuid-ossp.so '/postgres/9.6.4/lib/uuid-ossp.so'
/bin/install -c -m 644 ./uuid-ossp.control '/postgres/9.6.4/share/extension/'
/bin/install -c -m 644 ./uuid-ossp--1.1.sql ./uuid-ossp--1.0--1.1.sql ./uuid-ossp--unpackaged--1.0.sql  '/postgres/9.6.4/share/extension/'
[postgres@localhost uuid-ossp]$

3 最后，重新启动数据库

后来经过测试，可以不需要重启数据库。

直接到对应的库里，创建该函数即可。

postgres=# \c zyd_test 
You are now connected to database "zyd_test" as user "postgres".
zyd_test=# CREATE OR REPLACE FUNCTION uuid_generate_v4()
  RETURNS uuid AS
'$libdir/uuid-ossp', 'uuid_generate_v4'
  LANGUAGE c VOLATILE STRICT
  COST 1;
CREATE FUNCTION
zyd_test=# alter function uuid_generate_v4() owner to zyd_test ;
ALTER FUNCTION
zyd_test=#

零背景与说明

数据库的性能会影响整个业务系统的性能，而数据库的性能问题，90%以上的几率都是由性能低下的SQL导致的。

一个即将上线的新系统，开发同事发过来2个SQL优化的请求。

PostgreSQL数据库SQL性能优化案例二则

一 SQL一优化分析处理思路

1 格式化原始SQL：

SELECT COUNT(b.code) AS totalCount, b.status AS status	
FROM t_ai_prd_bill b 
WHERE b.status != '900' AND b.is_deleted = FALSE 
GROUP BY b.status

2 原始SQL执行计划

ai=> explain analyze SELECT COUNT(b.code) AS totalCount, b.status AS status 
ai-> FROM t_ai_prd_bill b 
ai-> WHERE b.status != '900' AND b.is_deleted = FALSE 
ai-> GROUP BY b.status;
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=3870.50..3870.51 rows=1 width=17) (actual time=12.660..12.661 rows=5 loops=1)
   Group Key: status
   ->  Seq Scan on t_ai_prd_bill b  (cost=0.00..3839.75 rows=6150 width=17) (actual time=0.013..11.139 rows=6153 loops=1)
         Filter: ((NOT is_deleted) AND (status <> '900'::bpchar))
         Rows Removed by Filter: 32227
 Planning time: 0.165 ms
 Execution time: 12.719 ms
(7 rows)

ai=>

3 获取t_ai_prd_bill表结构，数据量和数据分布

 
ai=> \d t_ai_prd_bill                                      
                         Table "ai.t_ai_prd_bill"
       Column       |              Type              |      Modifiers      
--------------------+--------------------------------+---------------------
 id                 | character varying(32)          | not null
 code               | character varying(12)          | 
 packet_code        | character varying(10)          | 
 status             | character(3)                   | 
 is_deleted         | boolean                        | default false
 classify_type      | character varying(1)           | 
 last_oper_time     | timestamp(6) without time zone | 
 template_no        | character varying(32)          | 
 oss_key            | character varying(255)         | 
 receive_time       | timestamp(6) without time zone | 
 is_question        | boolean                        | default false
 ocr_type           | character(1)                   | 
 cust_id            | character varying(32)          | 
 exception_category | character(1)                   | default '0'::bpchar
 exception_detail   | character varying(400)         | 
 is_suspended       | boolean                        | 
 is_urgent          | boolean                        | 
 raw_deadline       | timestamp(6) without time zone | 
 priority           | smallint                       | 
 deadline           | timestamp(6) without time zone | 
 company_id         | character varying(32)          | 
 company_name       | character varying(255)         | 
 cust_category      | character varying(32)          | 
 source             | character varying(32)          | 
 mode_code          | character varying(32)          | 
 deadline_category  | character varying(3)           | 
 location_id        | character varying(32)          | 
 cust_name          | character varying(128)         | 
 template_category  | character varying(32)          | 
 platform_id        | character varying(32)          | 
 country_name       | character varying(64)          | 
 is_sended          | boolean                        | default false
Indexes:
    "t_ai_prd_bill_pkey" PRIMARY KEY, btree (id)
    "idx_bill_code" btree (code)
    "idx_bill_create_time_status_deleted" btree (receive_time, status, is_deleted)
    "idx_bill_cust_id" btree (cust_id)
    "idx_bill_packet_code" btree (packet_code)
    "idx_bill_status" btree (status)

ai=> select status,count(*) from t_ai_prd_bill group by status;
 status | count 
--------+-------
 400    |  2511
 401    |   183
 500    |  3174
 600    |     1
 701    |   284
 900    | 32227
(6 rows)

ai=>

4 改写等价SQL

结合上述的t_ai_prd_bill表结构、index信息、以及数据量来分析其执行计划。

t_ai_prd_bill表中的status字段，表示业务状态。status=900表示已完成业务操作的票据，status=901表示理票失败的票据，且随着业务正常流转，该表中status=900的记录会越来越多，占据数据量的绝大多数，同时status=901的记录应该非常少。也就是说，这条SQL原本是想查询在业务上除理票已完成的所有其它状态票据信息，即(status != 900)。

这样的话，如果从业务端着手，考虑将status=901表示理票失败的票据，如改为status=899表示理票失败。则，SQL可以等价改造为：

SELECT COUNT(b.code) AS totalCount, b.status AS status	
FROM t_ai_prd_bill b 
WHERE b.status < '900' AND b.is_deleted = FALSE 
GROUP BY b.status

5 新SQL执行计划

ai=> explain analyze SELECT COUNT(b.code) AS totalCount, b.status AS status 
ai-> FROM t_ai_prd_bill b 
ai-> WHERE b.status < '900' AND b.is_deleted = FALSE ai-> GROUP BY b.status;
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=3782.19..3782.20 rows=1 width=17) (actual time=5.423..5.424 rows=5 loops=1)
   Group Key: status
   ->  Bitmap Heap Scan on t_ai_prd_bill b  (cost=247.95..3751.44 rows=6150 width=17) (actual time=0.996..3.811 rows=6153 loops=1)
         Recheck Cond: (status < '900'::bpchar) Filter: (NOT is_deleted) Heap Blocks: exact=1213 ->  Bitmap Index Scan on idx_bill_status  (cost=0.00..246.41 rows=6150 width=0) (actual time=0.825..0.825 rows=6153 loops=1)
               Index Cond: (status < '900'::bpchar) 
Planning time: 0.156 ms 
Execution time: 5.480 ms 
(10 rows) 
ai=>

6 小结

通过改写SQL前后的执行计划对比，执行时间从12ms降低到5ms。且，随着业务的推进，t_ai_prd_bill表中数据量越来越多，优化带来的效果也会越来越好。

这里，由于业务上将票据状态定义为’100,200,300,400,401,500,600,402,700,701,800,900,901’，每个代码表示每个不同的业务中间状态。该业务需求是想查询统计业务状态不成功的所有票据，程序员自然想到的是通过status !=’900’来过滤数据。却忽视了数据库index里，优化器对于不等于 !=的where条件，走不到index。

二 SQL二优化分析处理思路

1 原始SQL

select a.status, sum(case when b.status is null then 0 else 1 end),
        sum(case when b.cust_category = 'DZ001' then 1 else 0 end) as "zyd"
        , sum(case when b.cust_category = 'ZEJ001' then 1 else 0 end) as "zyj"
        , sum(case when b.cust_category = 'MY001' then 1 else 0 end) as "my"
        from (select regexp_split_to_table('100,200,300,400,401,500,600,402,700,701,800,901', ',') as status) a
        left join t_ai_prd_bill b on (b.status = a.status)
        WHERE b.is_deleted = FALSE
        group by a.status
        order by a.status;

2 原始SQL执行计划

ai=> explain analyze select a.status, sum(case when b.status is null then 0 else 1 end),
ai->         sum(case when b.cust_category = 'DZ001' then 1 else 0 end) as "zyd"
ai->         , sum(case when b.cust_category = 'ZEJ001' then 1 else 0 end) as "zyj"
ai->         , sum(case when b.cust_category = 'MY001' then 1 else 0 end) as "my"
ai->         from (select regexp_split_to_table('100,200,300,400,401,500,600,402,700,701,800,901', ',') as status) a
ai->         left join t_ai_prd_bill b on (b.status = a.status)
ai->         WHERE b.is_deleted = FALSE
ai->         group by a.status
ai->         order by a.status;
                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=6328.61..15390.61 rows=200 width=42) (actual time=70.104..75.647 rows=5 loops=1)
   Group Key: (regexp_split_to_table('100,200,300,400,401,500,600,402,700,701,800,901'::text, ','::text))
   ->  Merge Join  (cost=6328.61..10560.61 rows=241400 width=42) (actual time=43.422..57.495 rows=35869 loops=1)
         Merge Cond: ((regexp_split_to_table('100,200,300,400,401,500,600,402,700,701,800,901'::text, ','::text)) = ((b.status)::text))
         ->  Sort  (cost=64.84..67.34 rows=1000 width=32) (actual time=0.117..0.119 rows=12 loops=1)
               Sort Key: (regexp_split_to_table('100,200,300,400,401,500,600,402,700,701,800,901'::text, ','::text))
               Sort Method: quicksort  Memory: 25kB
               ->  Result  (cost=0.00..5.01 rows=1000 width=0) (actual time=0.094..0.102 rows=12 loops=1)
         ->  Sort  (cost=6263.78..6384.48 rows=48280 width=10) (actual time=43.293..46.852 rows=48280 loops=1)
               Sort Key: ((b.status)::text)
               Sort Method: quicksort  Memory: 3629kB
               ->  Seq Scan on t_ai_prd_bill b  (cost=0.00..2507.80 rows=48280 width=10) (actual time=0.012..30.322 rows=48280 loops=1)
                     Filter: (NOT is_deleted)
 Planning time: 0.367 ms
 Execution time: 75.737 ms
(15 rows)

ai=>

3 分析SQL及其执行计划

原始SQL是想查询统计t_ai_prd_bill表，按照status字段来分组排序。这里，绕了一个弯路，通过引入函数regexp_split_to_table来构造一个单列的虚拟表a，再使其与t_ai_prd_bill表做一次left join，没有必要。同时，从执行计划里看到有个较大的内存排序，Sort Method: quicksort Memory: 3629kB。

4 尝试改写等价SQL

select  b.status,sum(case when b.status is null then 0 else 1 end),
        sum(case when b.cust_category = 'DZ001' then 1 else 0 end) as "zyd"
        , sum(case when b.cust_category = 'ZEJ001' then 1 else 0 end) as "zyj"
        , sum(case when b.cust_category = 'MY001' then 1 else 0 end) as "my"
        from t_ai_prd_bill b 
        WHERE b.is_deleted = FALSE
        group by b.status
        order by b.status;

5 改写后的SQL执行计划

ai=> explain analyze select  b.status,sum(case when b.status is null then 0 else 1 end),
ai->         sum(case when b.cust_category = 'DZ001' then 1 else 0 end) as "zyd"
ai->         , sum(case when b.cust_category = 'ZEJ001' then 1 else 0 end) as "zyj"
ai->         , sum(case when b.cust_category = 'MY001' then 1 else 0 end) as "my"
ai->         from t_ai_prd_bill b 
ai->         WHERE b.is_deleted = FALSE
ai->         group by b.status
ai->         order by b.status;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=3473.54..3473.55 rows=6 width=10) (actual time=49.986..49.987 rows=6 loops=1)
   Sort Key: status
   Sort Method: quicksort  Memory: 25kB
   ->  HashAggregate  (cost=3473.40..3473.46 rows=6 width=10) (actual time=49.932..49.934 rows=6 loops=1)
         Group Key: status
         ->  Seq Scan on t_ai_prd_bill b  (cost=0.00..2507.80 rows=48280 width=10) (actual time=0.008..11.934 rows=48280 loops=1)
               Filter: (NOT is_deleted)
 Planning time: 0.109 ms
 Execution time: 50.060 ms
(9 rows)

ai=>

6 小结

通过分析，改写SQL，前后对比执行计划，发现减少了不必要的内存排序操作，进而让数据库执行SQL响应更快，提升效率。

PostgreSQL删除表中的大量数据之后，统计信息不准的一则案例

12月 13,2017 留下评论 PostgreSQL

一问题现象

接开发同事反馈，某个日志表原来有100多万条记录，删除到只剩下200多条记录，在删除大量数据之后，执行查询，还是非常缓慢。

 postgres=# \c superp2b_test superp2b_test
You are now connected to database "superp2b_test" as user "superp2b_test".
superp2b_test=> select count(*) from t_sys_exception_log;
count
-------
267
(1 row)
 
Time: 101702.768 ms
superp2b_test=> create table t_sys_exception_log_bak as select * from t_sys_exception_log;
SELECT 267
Time: 160531.138 ms
superp2b_test=> select count(*) from t_sys_exception_log_bak ;
count
-------
267
(1 row)
 
Time: 87.180 ms
superp2b_test=>

通过执行CTAS复制该表为t_sys_exception_log_bak之后，查询该复制表却不慢。

二查看表结构和表及相关索引统计信息

superp2b_test=> \d t_sys_exception_log
              Table "public.t_sys_exception_log"
      Column      |              Type              | Modifiers
------------------+--------------------------------+-----------
 id               | character(32)                  | not null
 class_name       | character varying(512)         |
 create_time      | timestamp(6) without time zone |
 remark           | character varying(1024)        |
 type             | character varying(4)           |
 exception_type   | character varying(1024)        |
 exception_detail | text                           |
 method_name      | character varying(255)         |
 source           | character varying(1024)        |
 target           | character varying(1024)        |
 operator_id      | character(32)                  |
 operator_name    | character varying(255)         |
Indexes:
    "t_sys_exception_log_pkey" PRIMARY KEY, btree (id)
    "t_sys_exception_log_class_name_idx" btree (class_name)
    "t_sys_exception_log_create_time_idx" btree (create_time)
    "t_sys_exception_log_exception_type_idx" btree (exception_type)
    "t_sys_exception_log_method_name_idx" btree (method_name)
    "t_sys_exception_log_operator_id_idx" btree (operator_id)
    "t_sys_exception_log_operator_name_idx" btree (operator_name)
         
superp2b_test=> select relname,relpages,to_char(reltuples,'999999999') from pg_class where relname like 't_sys_exception_log%';
                relname                 | relpages |  to_char  
----------------------------------------+----------+------------
 t_sys_exception_log                    |   293660 |    1515498
 t_sys_exception_log_class_name_idx     |    14813 |    1515502
 t_sys_exception_log_pkey               |    20655 |    1515502
 t_sys_exception_log_create_time_idx    |     4162 |    1515502
 t_sys_exception_log_exception_type_idx |    12626 |    1515502
 t_sys_exception_log_method_name_idx    |     8555 |    1515502
 t_sys_exception_log_operator_id_idx    |     5302 |    1515502
 t_sys_exception_log_operator_name_idx  |     5267 |    1515502
 t_sys_exception_log_bak                |       38 |        267
(9 rows)
Time: 12.494 ms
superp2b_test=>

从上，看到，虽然表里当前只有267条记录，但是由于删除大量数据之后，数据库没有及时更新该表的统计信息，导致数据库依然认为该表有1515498条记录，占用了293660个磁盘页。

三分析其执行计划

superp2b_test=> explain analyze select count(*) from t_sys_exception_log;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=312603.72..312603.73 rows=1 width=0) (actual time=25771.014..25771.014 rows=1 loops=1)
-> Seq Scan on t_sys_exception_log (cost=0.00..308814.98 rows=1515498 width=0) (actual time=14.402..25770.915 rows=267 loops=1)
Total runtime: 25771.087 ms
(3 rows)
 
Time: 25779.274 ms
superp2b_test=>

的确，发现优化器对于该select count(*) from t_sys_exception_log语句，执行的全表扫描，且，优化器认为该表仍然有rows=1515498。实际上，该表只有267条记录。

四收集统计信息

 superp2b_test=> analyze verbose t_sys_exception_log;
信息:  正在分析 "public.t_sys_exception_log"
信息:  "t_sys_exception_log": 已经扫描了293660页的30000, 包含28可用的记录和154878不能用的记录; 在示例中有28条记录,估算所有记录为1360704 .
ANALYZE
Time: 14294.586 ms
superp2b_test=> select relname,relpages,to_char(reltuples,'999999999') from pg_class where relname like 't_sys_exception_log%';
                relname                 | relpages |  to_char  
----------------------------------------+----------+------------
 t_sys_exception_log                    |   293660 |    1360704
 t_sys_exception_log_class_name_idx     |    14818 |    1360704
 t_sys_exception_log_pkey               |    20659 |    1360704
 t_sys_exception_log_create_time_idx    |     4163 |    1360704
 t_sys_exception_log_exception_type_idx |    12629 |    1360704
 t_sys_exception_log_method_name_idx    |     8559 |    1360704
 t_sys_exception_log_operator_id_idx    |     5304 |    1360704
 t_sys_exception_log_operator_name_idx  |     5268 |    1360704
 t_sys_exception_log_bak                |       38 |        267
(9 rows)
Time: 18.558 ms
superp2b_test=> explain analyze select count(*) from t_sys_exception_log;                                                     
                                                             QUERY PLAN                                                             
-------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=310668.80..310668.81 rows=1 width=0) (actual time=24315.485..24315.485 rows=1 loops=1)
   ->  Seq Scan on t_sys_exception_log  (cost=0.00..307267.04 rows=1360704 width=0) (actual time=19.217..24315.358 rows=267 loops=1)
 Total runtime: 24315.574 ms
(3 rows)
Time: 24346.502 ms
superp2b_test=>

通过analyze来收集统计信息时，数据库扫描了该表所占有的磁盘页的293660页中的30000页。同时，数据库认为该表仍然有1360704条记录。此时，数据库收集的统计信息依然是不准确的，或者说是错误的。且，数据库还把这个错误的统计信息更新到字典表pg_statistic中。

接下来，再执行一次统计信息收集，并查看其select count(*) from t_sys_exception_log语句的执行计划：

superp2b_test=> analyze verbose t_sys_exception_log;                                                                          
信息:  正在分析 "public.t_sys_exception_log"
信息:  "t_sys_exception_log": 已经扫描了293660页的30000, 包含26可用的记录和105480不能用的记录; 在示例中有26条记录,估算所有记录为1221722 .
ANALYZE
Time: 15645.520 ms
superp2b_test=> select relname,relpages,to_char(reltuples,'999999999') from pg_class where relname like 't_sys_exception_log%';
                relname                 | relpages |  to_char  
----------------------------------------+----------+------------
 t_sys_exception_log                    |   293660 |    1221722
 t_sys_exception_log_class_name_idx     |    14818 |    1221722
 t_sys_exception_log_pkey               |    20659 |    1221722
 t_sys_exception_log_create_time_idx    |     4163 |    1221722
 t_sys_exception_log_exception_type_idx |    12629 |    1221722
 t_sys_exception_log_method_name_idx    |     8559 |    1221722
 t_sys_exception_log_operator_id_idx    |     5304 |    1221722
 t_sys_exception_log_operator_name_idx  |     5268 |    1221722
 t_sys_exception_log_bak                |       38 |        267
(9 rows)
Time: 11.437 ms
superp2b_test=> explain analyze select count(*) from t_sys_exception_log;
                                                             QUERY PLAN                                                            
------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=308931.52..308931.53 rows=1 width=0) (actual time=21058.435..21058.435 rows=1 loops=1)
   ->  Seq Scan on t_sys_exception_log  (cost=0.00..305877.22 rows=1221722 width=0) (actual time=0.146..21058.327 rows=268 loops=1)
 Total runtime: 21058.525 ms
(3 rows)
Time: 21161.205 ms
superp2b_test=>

数据库依然认为，t_sys_exception_log占用293660个磁盘页，有1221722条记录。同时，这个统计信息也是错误的。

五数据库自动收集统计信息

由于手头上有事儿，暂时停止了上述手工收集该表统计信息的工作。忙完其它事情之后，再回过头来继续的时候，发现了“惊喜”。

 superp2b_test=> explain analyze select count(*) from t_sys_exception_log;                                                      
                                                                                   QUERY PLAN                                                                                   
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=21547.18..21547.19 rows=1 width=0) (actual time=27.386..27.386 rows=1 loops=1)
   ->  Index Only Scan using t_sys_exception_log_create_time_idx on t_sys_exception_log  (cost=0.30..20562.24 rows=393976 width=0) (actual time=9.056..27.337 rows=268 loops=1)
         Heap Fetches: 23
 Total runtime: 27.450 ms
(4 rows)

Time: 127.384 ms
superp2b_test=> select relname,relpages,to_char(reltuples,'999999999') from pg_class where relname like 't_sys_exception_log%';
                relname                 | relpages |  to_char   
----------------------------------------+----------+------------
 t_sys_exception_log                    |   293660 |     393976
 t_sys_exception_log_class_name_idx     |    14818 |        268
 t_sys_exception_log_pkey               |    20659 |        268
 t_sys_exception_log_create_time_idx    |     4163 |        268
 t_sys_exception_log_exception_type_idx |    12629 |        268
 t_sys_exception_log_method_name_idx    |     8559 |        268
 t_sys_exception_log_operator_id_idx    |     5304 |        268
 t_sys_exception_log_operator_name_idx  |     5268 |        268
 t_sys_exception_log_bak                |       38 |        267
(9 rows)

Time: 18.922 ms
superp2b_test=> analyze verbose t_sys_exception_log;                                                                           
信息:  正在分析 "public.t_sys_exception_log"
信息:  "t_sys_exception_log": 已经扫描了293660页的30000, 包含32可用的记录和20不能用的记录; 在示例中有32条记录,估算所有记录为353760 .
ANALYZE
Time: 22083.290 ms
superp2b_test=> select relname,relpages,to_char(reltuples,'999999999') from pg_class where relname like 't_sys_exception_log%';
                relname                 | relpages |  to_char   
----------------------------------------+----------+------------
 t_sys_exception_log                    |   293660 |     353760
 t_sys_exception_log_class_name_idx     |    14818 |     353760
 t_sys_exception_log_pkey               |    20659 |     353760
 t_sys_exception_log_create_time_idx    |     4163 |     353760
 t_sys_exception_log_exception_type_idx |    12629 |     353760
 t_sys_exception_log_method_name_idx    |     8559 |     353760
 t_sys_exception_log_operator_id_idx    |     5304 |     353760
 t_sys_exception_log_operator_name_idx  |     5268 |     353760
 t_sys_exception_log_bak                |       38 |        267
(9 rows)

Time: 13.805 ms
superp2b_test=> explain analyze select count(*) from t_sys_exception_log;                                                      
                                                                                  QUERY PLAN                                                                                   
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=20885.77..20885.78 rows=1 width=0) (actual time=0.193..0.194 rows=1 loops=1)
   ->  Index Only Scan using t_sys_exception_log_create_time_idx on t_sys_exception_log  (cost=0.30..20001.37 rows=353760 width=0) (actual time=0.052..0.159 rows=268 loops=1)
         Heap Fetches: 23
 Total runtime: 0.230 ms
(4 rows)

Time: 1.753 ms
superp2b_test=> select relname,relpages,to_char(reltuples,'999999999') from pg_class where relname like 't_sys_exception_log%';
                relname                 | relpages |  to_char   
----------------------------------------+----------+------------
 t_sys_exception_log                    |   293660 |     353760
 t_sys_exception_log_class_name_idx     |    14818 |     353760
 t_sys_exception_log_pkey               |    20659 |     353760
 t_sys_exception_log_create_time_idx    |     4163 |     353760
 t_sys_exception_log_exception_type_idx |    12629 |     353760
 t_sys_exception_log_method_name_idx    |     8559 |     353760
 t_sys_exception_log_operator_id_idx    |     5304 |     353760
 t_sys_exception_log_operator_name_idx  |     5268 |     353760
 t_sys_exception_log_bak                |       38 |        267
(9 rows)

Time: 1.458 ms
superp2b_test=> vacuum(verbose,full,freeze,analyze) t_sys_exception_log;
信息:  正在清理 (vacuum)  "public.t_sys_exception_log"
信息:  "t_sys_exception_log": 在293660个页中找到0个可删除行版本，268不可删除的行版本.
DETAIL:  0 死行版本号仍不能移除.
CPU 3.04s/1.11u sec elapsed 29.73 sec.
信息:  正在分析 "public.t_sys_exception_log"
信息:  "t_sys_exception_log": 已经扫描了38页的38, 包含268可用的记录和0不能用的记录; 在示例中有268条记录,估算所有记录为268 .
VACUUM
Time: 32353.681 ms
superp2b_test=> select relname,relpages,to_char(reltuples,'999999999') from pg_class where relname like 't_sys_exception_log%';
                relname                 | relpages |  to_char   
----------------------------------------+----------+------------
 t_sys_exception_log                    |       38 |        268
 t_sys_exception_log_class_name_idx     |        5 |        268
 t_sys_exception_log_bak                |       38 |        267
 t_sys_exception_log_pkey               |        4 |        268
 t_sys_exception_log_create_time_idx    |        2 |        268
 t_sys_exception_log_exception_type_idx |        5 |        268
 t_sys_exception_log_method_name_idx    |        4 |        268
 t_sys_exception_log_operator_id_idx    |        4 |        268
 t_sys_exception_log_operator_name_idx  |        2 |        268
(9 rows)

Time: 6.762 ms
superp2b_test=> explain analyze select count(*) from t_sys_exception_log;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=41.35..41.36 rows=1 width=0) (actual time=0.103..0.103 rows=1 loops=1)
   ->  Seq Scan on t_sys_exception_log  (cost=0.00..40.68 rows=268 width=0) (actual time=0.005..0.074 rows=268 loops=1)
 Total runtime: 0.133 ms
(3 rows)

Time: 0.535 ms
superp2b_test=> select count(*) from t_sys_exception_log;                
 count 
-------
   268
(1 row)

Time: 1.007 ms
superp2b_test=>

说明：
1 第一次执行查看执行计划时，发现执行计划有所改变，变的“更快更好”了，从之前的全表扫描变为索引扫描，执行效率变好了。但是，这个离真实情况下的执行计划，应该还是有误差，因为表里才268条记录，此时不应该走index的；
2 接下来，查看统计信息，发现t_sys_exception_log表上的统计信息依然显示有393976条记录，而该表上的index却显示的有268条记录，与真实情况向吻合；
3 再手工收集一次统计信息看看，再次发生偏差，结果显示有353760条记录，连之前索引上的统计信息也从正确的268条记录，变为了353760条记录；
4 干脆执行vacuum(verbose,full,freeze,analyze) t_sys_exception_log来回收空间+更新统计信息。这里，不论是从vacuum的verbose信息来看，还是从pg_class里的信息查询来看，统计信息终于正确了；
5 最后，再次查看执行计划，发现回归正常了。统计一个只有268条记录的表数据量，其选择的全表扫描的执行计划，比较合理。且，该条语句本身，执行起来，也回归正常了，不再像前面那么慢，让人费解了。

六小结与疑问

1 对于PostgreSQL数据库而言，如果有频繁的DML操作，或者批量DML操作的情形，要及时更新统计信息；
2 为什么手工执行analyze收集统计信息时，得到的统计信息结果，与实际情况偏差那么大？
3 多次执行analyze之后，依然得不到准确的统计信息？
4 间隔一段时间之后，发现统计信息比较接近真实情况（此时，猜测应该是数据库autovacuum自动更新了统计信息）？
5 如果是autovacuum更新统计信息之后，为什么优化器选择的执行计划不够好，只有268条记录的表执行count操作，还是选择index scan?
6 为什么，最后通过执行vacuum来回收空间+更新统计信息之后，统计信息保持正确且不再变动？

一则PostgreSQL数据库的性能问题定位处理

6月 2,2017 留下评论 Linux, PostgreSQL

一、先看一台运行PostgreSQL数据库服务器的top和iostat相关信息图：

top:

[root@db2 ~ 11:14:42]$ top

top - 11:16:10 up 2 days, 13:01,  5 users,  load average: 51.62, 41.75, 29.06
Tasks: 948 total,   5 running, 943 sleeping,   0 stopped,   0 zombie
Cpu(s):  9.0%us, 59.2%sy,  0.0%ni, 14.1%id, 17.2%wa,  0.0%hi,  0.5%si,  0.0%st
Mem:  16284812k total, 16159260k used,   125552k free,     5700k buffers
Swap:  8191992k total,   107980k used,  8084012k free,  8010540k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                
31450 postgres  20   0 4534m 2.3g 2.2g D 46.6 15.0   7:02.50 postgres                                                
30166 postgres  20   0 4528m 2.3g 2.2g D 46.0 14.9   4:05.40 postgres                                                
30185 postgres  20   0 4494m 2.2g 2.1g D 43.5 14.1   4:05.08 postgres                                                
31453 postgres  20   0 4529m 1.9g 1.8g D 42.9 12.3   3:39.13 postgres                                                
30232 postgres  20   0 4526m 2.3g 2.1g D 40.4 14.6   6:07.51 postgres                                                
 2811 postgres  20   0 4521m 874m 742m D 39.8  5.5   1:36.34 postgres                                                
31457 postgres  20   0 4534m 1.8g 1.7g R 39.5 11.6   3:55.23 postgres                                                
30956 postgres  20   0 4521m 1.7g 1.5g D 38.8 10.8   2:14.67 postgres                                                
 2714 postgres  20   0 4519m 1.4g 1.3g D 37.9  9.1   1:19.96 postgres                                                
30182 postgres  20   0 4525m 1.9g 1.8g D 37.9 12.5   3:40.31 postgres                                                
31444 postgres  20   0 4525m 2.2g 2.0g D 37.6 13.9   3:29.11 postgres                                                
31654 postgres  20   0 4526m 2.4g 2.2g D 36.7 15.3   5:04.19 postgres                                                
 2717 postgres  20   0 4519m 847m 718m D 36.1  5.3   1:37.20 postgres                                                
  753 postgres  20   0 4533m 3.7g 3.6g D 35.4 23.9  27:52.65 postgres                                                
31451 postgres  20   0 4433m 1.8g 1.7g D 35.4 11.5   2:36.85 postgres                                                
30701 postgres  20   0 4520m 1.7g 1.6g D 35.1 11.1   2:09.85 postgres                                                
31448 postgres  20   0 4435m 2.2g 2.1g D 33.3 13.9   4:16.70 postgres                                                
29192 postgres  20   0 4526m 2.3g 2.1g D 32.6 14.6   4:19.91 postgres                                                
30693 postgres  20   0 4525m 1.9g 1.8g D 32.0 12.4   2:29.72 postgres                                                
 3448 postgres  20   0 4451m 383m 305m D 31.4  2.4   0:49.98 postgres                                                
 3568 postgres  20   0 4388m 131m 113m D 29.5  0.8   0:10.03 postgres                                                
 3435 postgres  20   0 4376m 364m 355m D 29.2  2.3   0:12.23 postgres                                                
[root@db2 ~ 11:16:22]$

iostat:

 
[root@db2 ~ 11:16:25]$ iostat 1 10 /dev/emcpowerf 
Linux 2.6.32-279.el6.x86_64 (db2) 	2016年06月02日 	_x86_64_	(24 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.51    0.00    0.84    2.04    0.00   92.61

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf       210.87     10256.34     11724.86 2257542690 2580782824

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          10.07    0.00   80.00    7.12    0.00    2.82

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf     34443.00   3095056.00   1747336.00    3095056    1747336

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.38    0.00   90.25    1.25    0.00    0.13

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf        83.79      7430.09     14483.79      89384     174240

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          18.00    0.00   70.16   10.69    0.00    1.14

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      2887.25    323498.04     71545.10     329968      72976

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.76    0.00   86.36    3.17    0.00    0.71

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      1996.00    222072.00     34056.00     222072      34056

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           7.75    0.00   89.37    2.50    0.00    0.38

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      2344.00    229568.00    149152.00     229568     149152

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.68    0.00   82.81    5.88    0.00    1.63

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      2585.00    242224.00    154320.00     242224     154320

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          14.57    0.00   24.36   34.20    0.00   26.87

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      2692.08    283271.29    137940.59     286104     139320

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          11.80    0.00   85.10    2.02    0.00    1.07

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      3367.33    268491.09    124879.21     271176     126128

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.77    0.00   85.36    4.32    0.00    0.54

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      2792.00    274976.00    143192.00     274976     143192

[root@db2 ~ 11:23:07]$

以及，文件系统结构图：

[root@db2 ~ 11:03:54]$ df -Th
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/sda1     ext4     58G   14G   42G  25% /
tmpfs        tmpfs    7.8G     0  7.8G   0% /dev/shm
/dev/sda3     ext4    210G   54G  146G  27% /data/log
/dev/emcpowerd1
              ext4     99G   58G   37G  62% /data/pgsql/backup
/dev/emcpowerc1
              ext4     99G   89G  5.5G  95% /data/pgsql/data_old
/dev/emcpowerf
              ext4    197G  146G   42G  78% /data/pgsql/data
/dev/emcpowerg
              ext4    493G   66G  403G  14% /data/pgsql/backup/archive
[root@db2 ~ 11:04:14]$

二结论：数据库服务器已经出现性能瓶颈，具体问题出在系统IO上。通常情况是，应用系统的SQL写的不够好，或者是糟糕的SQL执行计划导致消耗过高的IO资源。

三分析思路

1 从top上，看到系统负载已经比较高：load average: 51.62, 41.75, 29.06

2 CPU负载上看，Cpu(s): 9.0%us, 59.2%sy, 0.0%ni, 14.1%id, 17.2%wa, 0.0%hi, 0.5%si, 0.0%st 。

其中， 9.0%us用于user space process处理较为正常。但是在系统负载较高的情况下，且有59.2%sy的CPU资源用于system kernel running。The amount of time spent in the kernel should be as low as possible.This number can peak much higher, especially when there is a lot of I/O happening.这里，意味着CPU消耗在用于系统调度上的时间偏高了，尤其是当系统出现大量IO操作时，该值可能非常高。这里的59%略高，系统有IO等待操作。

另，14.1%id, 系统CPU闲置时间比较低，通常情况下我们希望它不要太低，太低了说明CPU太忙，太高也不好，太高了意味着CPU一直处于空闲状态。17.2%wa,CPU用于等待IO操作完成的时间占比。这里，CPU在等待IO完成。

3 进程状态列，有大量的进程处于D状态。进程状态通常有：

R：TASK_RUNNING，可执行状态。处于此状态的进程正在CPU上运行；同一时间，少量进程处于该状态；

S：TASK_INTERRUPTIBLE，可中断睡眠状态。处于该状态的进程因为正在等待某些事件发生(如等待socket连接，等待信号量),而事件还未发生，所以被挂起；同一时间，大量进程处于该状态

D：TASK_UNINTERRUPTIBLE，不可中断睡眠状态。该状态的进程同样处于休眠状态，但是该进程不能被其它事件或信号所打断。常见于，在进程对硬件设备进行操作时(如进程调用read系统来对某个设备文件进行读操作，read系统需要调用到最终执行对应设备的驱动代码，并且与对应设备进行交互时)，可能需要使用TASK_UNINTERRUPTIBLE状态来保护进程，以避免进程在与设备进行交互的过程中被打断，最终造成设备陷入不可控的状态。

结合本例，可以看到有大量进程处于D状态，即不可中断的睡眠状态。说明有大量的数据库进程在与操作系统的IO系统进行交互，且这些进程不能被中断。说白了，就是进程在进行的IO操作迟迟未完成，且该进程不能被打断。就是系统出现了大量的IO等待。

4 从iostat上，也可以看到用于存放数据库的设备上IO操作频繁，该设备的IO请求很高。

这是去年今日(2016年6月2日)处理的一则PostgreSQL数据库性能问题定位的case，不是特意要放在今天(2017年6月2日)来说。而是，近期在看几篇Linux文档，加上之前翻看《PostgreSQL 9 Administration Cookbook》这本书，P249里面提到：

Not enough CPU power or disk I/O capacity for the current load

These are usually caused by suboptimal query plans, but sometimes you just have not a powerful enough computer.

Here, top is your friend for quick checks, and from the command line, run the following:

user@host:~$ top

First, watch the CPU idle reading in the top. If this is in low single-digits for most of the time, you probably have problems with CPU power.

If you have a high load average with still lots of CPU idle left, you are probably out of disk bandwidth. In this case, you should also have lots of postgres processes in status D.

索性，做一个整理总结，便于理清自己处理性能问题时，能有一个个清晰的思路和线索。

四参考：

1 Understanding Linux CPU stats

2 top进程状态详解R,S,D

PostgreSQL授予/回收其他用户访问当前用户下所有表的权限小结

4月 28,2017 留下评论 PostgreSQL

Q1:PostgreSQL数据库中，如何授予其他用户可以查询当前用户下的所有表的权限？

A1:分2个步骤。首先，赋予其他用户访问当前用户下所有表的查询权限；其次，其他用户要有使用当前用户的schema的权限。

例子：当前库为testdb，当前用户为testu1,schema为test_schema,表为t1,t2。如何让其他用户testu2有查询t1、t2表的权限？

当前数据库版本为EDB 9.3,以testu1连接到testdb数据库，在test_schema下有2张表,t1和t2,各有1条记录。

testdb=> select version;
                                                       version                                                       
---------------------------------------------------------------------------------------------------------------------
 EnterpriseDB 9.3.11.33 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55), 64-bit
(1 row)

testdb=> \c
You are now connected to database "testdb" as user "testu1".
testdb=> \dn
      List of schemas
    Name     |    Owner     
-------------+--------------
 public      | enterprisedb
 test_schema | testu1
(2 rows)

testdb=> \d            
          List of relations
   Schema    | Name | Type  | Owner  
-------------+------+-------+--------
 test_schema | t1   | table | testu1
 test_schema | t2   | table | testu1
(2 rows)

testdb=> select * from t1;
 id 
----
  1
(1 row)

testdb=> select * from t2;
 id 
----
  2
(1 row)

testdb=>

同时，testu2用户连接testdb时，不能访问testu1在test_schema下的表。提示，对模式 test_schema 权限不够。

testdb=> \c
You are now connected to database "testdb" as user "testu2".
testdb=> \dn
      List of schemas
    Name     |    Owner     
-------------+--------------
 public      | enterprisedb
 test_schema | testu1
(2 rows)

testdb=> select * from test_schema.t1;
错误:  对模式 test_schema 权限不够
LINE 1: select * from test_schema.t1;
                      ^
testdb=>

接下来，testu1用户授权给testu2使用test_schema的权限。

testdb=> \c
You are now connected to database "testdb" as user "testu1".
testdb=> grant USAGE on SCHEMA test_schema to testu2;
GRANT
testdb=>

授权之后，看看testu2能否访问testu1在test_schema下的表？

 
postgres=# \c testdb testu2;
You are now connected to database "testdb" as user "testu2".
testdb=> select * from test_schema.t1;
错误:  对关系 t1 权限不够
testdb=>

不再提示对模式 test_schema 权限不够，而是对表的查询权限不够。
接下来，授予testu2对于test_schema下所有表的查询权限：

 
testdb=> \c
You are now connected to database "testdb" as user "testu1".
testdb=> grant SELECT on all tables in schema test_schema to testu2;
GRANT
testdb=>

最后，以testu2用户登录testdb数据库时，可以顺利访问testu1用户在test_schema下的所有表了：

testdb=> \c
You are now connected to database "testdb" as user "testu2".
testdb=> select * from test_schema.t1;
 id 
----
  1
(1 row)

testdb=> select * from test_schema.t2;
 id 
----
  2
(1 row)

testdb=>

Q2:对于已经授予某个用户查询当前用户下的指定schema下所有表的情况下，当前用户下在指定schema下创建的新表，其他用户是否有权限访问？如果没有，该怎么赋予其他用户查询当前用户新建的表？

A2:对于新建的表，其他用户没有查询权限。可以通过alter default privileges来赋予访问新建的表的权限。具体如下：

testu2用户在testdb数据库下的test_schema下新建t3表，testu2自己正常访问没问题。而testu2连接testdb数据库，则不能查询新建的t3表。

testdb=> \c
You are now connected to database "testdb" as user "testu1".
testdb=> create table t3(id int);
CREATE TABLE
testdb=> insert into t3 values(3);
INSERT 0 1
testdb=> select * from t3;
 id 
----
  3
(1 row)

testdb=> \c testdb testu2
You are now connected to database "testdb" as user "testu2".
testdb=> select * from test_schema.t3;
错误:  对关系 t3 权限不够
testdb=>

可以通过下述2种方法中的任意一种，来使testu2用户可以查询test1用户新建的t3表：

方法1

 testdb=> \c testdb testu1
You are now connected to database "testdb" as user "testu1".
testdb=> grant SELECT on all tables in schema test_schema to testu2;
GRANT
testdb=> \c testdb testu2
You are now connected to database "testdb" as user "testu2".
testdb=> select * from test_schema.t3;                              
 id 
----
  3
(1 row)

testdb=>

或者，方法2：

 
testdb=> \c testdb testu1
You are now connected to database "testdb" as user "testu1".
testdb=> revoke SELECT on t3 from testu2;
REVOKE
testdb=> \c testdb testu2
You are now connected to database "testdb" as user "testu2".
testdb=> select * from test_schema.t3;                  
错误:  对关系 t3 权限不够
testdb=> \c testdb testu1
You are now connected to database "testdb" as user "testu1".
testdb=> grant SELECT on t3 to testu2;
GRANT
testdb=> \c testdb testu2
You are now connected to database "testdb" as user "testu2".
testdb=> select * from test_schema.t3;  
 id 
----
  3
(1 row)

testdb=>

显然，上述的2种方法都不是最好的方法。不可能对于每一张新建的表都重新执行一次赋权操作。通过alter default privileges赋权操作来实现。

 
testdb=> \c testdb testu1
You are now connected to database "testdb" as user "testu1".
testdb=> alter default privileges in schema test_schema grant SELECT on tables to testu2;
ALTER DEFAULT PRIVILEGES
testdb=> create table t4(id int);
CREATE TABLE
testdb=> insert into t4 values(4);
INSERT 0 1
testdb=> select * from t4;
 id 
----
  4
(1 row)

testdb=> \c testdb testu2
You are now connected to database "testdb" as user "testu2".
testdb=> select * from test_schema.t4;                                                   
 id 
----
  4
(1 row)

testdb=>

此外，可以通过\ddp命令来查看默认权限[\describe default privileges,我猜测这条命令的简写意义所在]：

testdb=> \ddp+
            Default access privileges
 Owner  |   Schema    | Type  | Access privileges 
--------+-------------+-------+-------------------
 testu1 | test_schema | table | testu2=r/testu1
(1 row)

testdb=>

Q3:对于已经拥有默认权限的用户的情况下，如何删除？即，如何处理类似DETAIL: privileges for default privileges on new relations belonging to role testu1 in schema test_schema的错误？

A3：需要3个步骤。1，回收schema的usage权限；2，回收对所有表的查询权限；3，执行类似alter default privileges来回收默认权限后再删除。

即使通过最高权限的数据库管理员来删除testu2用户也报错，甚至加上cascade也删除不成功。

postgres=# \c
You are now connected to database "postgres" as user "enterprisedb".
postgres=# drop user testu2;
错误:  无法删除"testu2"因为有其它对象倚赖它
DETAIL:  在数据库 testdb中的6个对象
postgres=# drop user testu2 cascade;
错误:  无法删除"testu2"因为有其它对象倚赖它
DETAIL:  在数据库 testdb中的6个对象
postgres=#

依次回收schema的usage权限、回收所有表的查询权限、最后通过alter default privileges来回收默认权限,同时通过\ddp查看的默认权限为空：

testdb=> \c
You are now connected to database "testdb" as user "testu1".
testdb=> revoke USAGE on SCHEMA test_schema from testu2;
REVOKE
testdb=> revoke SELECT on all tables in schema test_schema from testu2;
REVOKE
testdb=> alter default privileges in schema test_schema revoke SELECT on tables from testu2;
ALTER DEFAULT PRIVILEGES
testdb=> \ddp
         Default access privileges
 Owner | Schema | Type | Access privileges 
-------+--------+------+-------------------
(0 rows)

testdb=>

最后，通过管理员来顺利删除testu2用户：

postgres=# drop user testu2;
DROP ROLE
postgres=#

小结：
1 把当前用户下某个schema下的表的查询权限赋给其他用户，既需要把schema的usage权限给其他用户，也要赋予其他用户对于表的查询权限（可以指定特定表来赋权查询操作，也可以赋予一次性赋予所有表的查询权限）；
2 对于将来新建表的查询权限想要一次性的赋予权限，则需要通过alter default privileges赋予默认权限来操作；
3 回收权限的话，需要回收schema的usage权限，也要回收表的查询权限，还有回收默认权限；
4 可以通过\ddp来查看默认赋权信息。

解决PostgreSQL因日志文件过少导致的故障

4月 9,2017 留下评论 PostgreSQL

1早上，公司一项目(近期在做一个活动，截止日期是今天，2017.4.9，估计有很大用户在截止日期之前，大量的刷票)同事反馈说，系统特别卡，极其的慢，页面刷不出来。

登录到数据库之后，看到大量的查询等待：

[root@ecs-yqb4 ~]# ps -ef|grep post
postgres 8939 1 0 2016 ? 00:02:33 /data/postgres/9.4.4/bin/postgres
postgres 8947 8939 0 2016 ? 00:02:23 postgres: checkpointer process
postgres 8948 8939 0 2016 ? 00:08:17 postgres: writer process
postgres 8949 8939 0 2016 ? 00:03:51 postgres: wal writer process
postgres 8950 8939 0 2016 ? 00:03:04 postgres: autovacuum launcher process
postgres 8951 8939 0 2016 ? 00:11:38 postgres: stats collector process
postgres 24561 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54151) SELECT
postgres 24562 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54152) SELECT
postgres 24563 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54153) SELECT
postgres 24564 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54154) SELECT
postgres 24565 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54155) SELECT
postgres 24566 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54156) SELECT
postgres 24567 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54157) SELECT
postgres 24568 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54158) SELECT
postgres 24569 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54159) SELECT
postgres 24570 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54160) SELECT
postgres 24748 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54210) SELECT
postgres 24749 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54211) SELECT
postgres 24750 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54212) SELECT
postgres 24751 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54213) SELECT
postgres 24752 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54214) SELECT
postgres 24753 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54215) SELECT
postgres 24754 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54216) SELECT
postgres 24755 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54217) SELECT
postgres 24756 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54218) SELECT
postgres 24757 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54219) SELECT
postgres 24758 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54220) SELECT
postgres 24759 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54221) SELECT
postgres 24760 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54222) SELECT
postgres 24761 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54223) SELECT
postgres 24762 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54224) SELECT
postgres 24763 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54225) SELECT
postgres 24764 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54226) SELECT
postgres 24765 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54227) SELECT
postgres 24766 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54228) SELECT
postgres 24767 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54229) SELECT
postgres 24768 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54230) SELECT
postgres 24769 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54231) SELECT
postgres 24770 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54232) SELECT
postgres 24771 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54233) idle in transaction
postgres 24773 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54234) SELECT
postgres 24774 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54235) SELECT
postgres 24775 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54236) SELECT
postgres 24776 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54237) SELECT
postgres 24778 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54238) SELECT
postgres 24779 8939 3 09:06 ? 00:01:21 postgres: onlyou_activity onlyou_activity 10.20.1.12(54239) SELECT
postgres 24780 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54240) SELECT
postgres 24781 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54241) SELECT
postgres 24782 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54242) SELECT
postgres 24783 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54243) SELECT
postgres 24784 8939 3 09:06 ? 00:01:21 postgres: onlyou_activity onlyou_activity 10.20.1.12(54244) SELECT
postgres 24785 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54245) SELECT
postgres 24786 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54246) SELECT
postgres 24787 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54247) SELECT
postgres 24788 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54248) SELECT
postgres 24789 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54249) SELECT
postgres 24790 8939 3 09:06 ? 00:01:21 postgres: onlyou_activity onlyou_activity 10.20.1.12(54250) SELECT
postgres 24791 8939 3 09:06 ? 00:01:21 postgres: onlyou_activity onlyou_activity 10.20.1.12(54251) SELECT
postgres 24793 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54252) SELECT
postgres 24795 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54253) SELECT
postgres 24797 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54254) SELECT
postgres 24798 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54255) SELECT
postgres 24799 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54256) SELECT
postgres 24800 8939 3 09:06 ? 00:01:21 postgres: onlyou_activity onlyou_activity 10.20.1.12(54257) SELECT
postgres 24801 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54258) SELECT
postgres 24803 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54259) idle
postgres 24804 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54260) SELECT
postgres 24805 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54261) SELECT
postgres 24806 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54262) SELECT
postgres 24808 8939 3 09:06 ? 00:01:21 postgres: onlyou_activity onlyou_activity 10.20.1.12(54263) SELECT
postgres 24809 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54264) SELECT
postgres 24810 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54265) SELECT
postgres 24811 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54266) SELECT
postgres 24812 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54267) SELECT
postgres 24813 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54268) SELECT
postgres 24814 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54269) SELECT
postgres 24815 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54270) SELECT
postgres 24816 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54271) SELECT
postgres 24817 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54272) SELECT
postgres 24818 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54273) idle
postgres 24819 8939 3 09:06 ? 00:01:21 postgres: onlyou_activity onlyou_activity 10.20.1.12(54274) SELECT
postgres 24820 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54275) SELECT
postgres 24821 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54276) SELECT
postgres 24822 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54277) SELECT
postgres 24823 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54278) SELECT
postgres 24826 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54279) SELECT
postgres 24827 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54280) SELECT
postgres 24828 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54281) SELECT
postgres 24829 8939 3 09:06 ? 00:01:21 postgres: onlyou_activity onlyou_activity 10.20.1.12(54282) SELECT
postgres 24830 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54283) SELECT
postgres 24831 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54284) SELECT
postgres 24832 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54285) SELECT
postgres 24833 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54286) SELECT
postgres 24834 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54287) SELECT
postgres 24835 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54288) SELECT
postgres 24836 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54289) SELECT
postgres 24837 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54290) SELECT
postgres 24838 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54291) SELECT
postgres 24840 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54292) SELECT
postgres 24841 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54293) SELECT
postgres 24842 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54294) SELECT
postgres 24843 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54295) SELECT
postgres 24844 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54296) SELECT
postgres 24845 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54297) SELECT
postgres 24846 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54298) SELECT
postgres 24849 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54299) SELECT
root 25338 25304 0 09:36 pts/2 00:00:00 su - postgres
postgres 25339 25338 0 09:36 pts/2 00:00:00 -bash
root 25404 25122 0 09:43 pts/0 00:00:00 su - postgres
postgres 25405 25404 0 09:43 pts/0 00:00:00 -bash
root 25462 25442 0 09:45 pts/3 00:00:00 grep post
[root@ecs-yqb4 ~]#

2初步反应是有较慢的SQL语句，消耗系统资源导致的。通过下述SQL查看：

 select t1.datname, t2.query, t2.calls, t2.total_time, t2.total_time/t2.calls from pg_database t1, pg_stat_statements t2 where t1.oid=t2.dbid order by t2.total_time desc limit 10;

发现系统没有启用 pg_stat_statements这个扩展，于是，线上安装，

create extension pg_stat_statements;

并修改配置文件，/data/postgres/9.4.4/data/postgresql.conf：

shared_preload_libraries = ‘pg_stat_statements’ # (change requires restart)

修改该参数，需要重启数据库：

 pg_ctl restart -m fast;

再次，查看消耗系统资源的SQL。发现并没有如预期那样能看到消耗过高系统资源问题SQL语句存在。

3于是，进一步排查，发现是数据库的日志文件设置过少，导致数据库后台在做大量的checkpoint操作，导致数据库响应不及时，几乎是hang住挂起状态。

找到原因之后，再次修改数据库的配置文件：/data/postgres/9.4.4/data/postgresql.conf
增加日志文件到30个：
checkpoint_segments = 30 # in logfile segments, min 1, 16MB each

重新加载配置文件使之生效，或者重启数据库。

 
pg_ctl reload;

pg_ctl restart -m fast;

设置生效：

 
[postgres@ecs-yqb4 pg_xlog]$ pwd
/data/postgres/9.4.4/data/pg_xlog
[postgres@ecs-yqb4 pg_xlog]$ ll
-rw------- 1 postgres postgres 16777216 4月 9 10:24 000000010000000700000082
-rw------- 1 postgres postgres 16777216 4月 9 10:24 000000010000000700000083
-rw------- 1 postgres postgres 16777216 4月 9 10:25 000000010000000700000084
-rw------- 1 postgres postgres 16777216 4月 9 10:25 000000010000000700000085
-rw------- 1 postgres postgres 16777216 4月 9 10:25 000000010000000700000086
-rw------- 1 postgres postgres 16777216 4月 9 10:25 000000010000000700000087
-rw------- 1 postgres postgres 16777216 4月 9 10:26 000000010000000700000088
-rw------- 1 postgres postgres 16777216 4月 9 10:26 000000010000000700000089
-rw------- 1 postgres postgres 16777216 4月 9 10:26 00000001000000070000008A
-rw------- 1 postgres postgres 16777216 4月 9 10:26 00000001000000070000008B
-rw------- 1 postgres postgres 16777216 4月 9 10:27 00000001000000070000008C
-rw------- 1 postgres postgres 16777216 4月 9 10:27 00000001000000070000008D
-rw------- 1 postgres postgres 16777216 4月 9 10:27 00000001000000070000008E
-rw------- 1 postgres postgres 16777216 4月 9 10:27 00000001000000070000008F
-rw------- 1 postgres postgres 16777216 4月 9 10:28 000000010000000700000090
-rw------- 1 postgres postgres 16777216 4月 9 10:28 000000010000000700000091
-rw------- 1 postgres postgres 16777216 4月 9 10:28 000000010000000700000092
-rw------- 1 postgres postgres 16777216 4月 9 10:28 000000010000000700000093
-rw------- 1 postgres postgres 16777216 4月 9 10:29 000000010000000700000094
-rw------- 1 postgres postgres 16777216 4月 9 10:29 000000010000000700000095
-rw------- 1 postgres postgres 16777216 4月 9 10:29 000000010000000700000096
-rw------- 1 postgres postgres 16777216 4月 9 10:29 000000010000000700000097

Category PostgreSQL

1 现象

2 原因

3 最后，重新启动数据库

零 背景与说明

一 SQL一优化分析处理思路

1 格式化原始SQL：

2 原始SQL执行计划

3 获取t_ai_prd_bill表结构，数据量和数据分布

4 改写等价SQL

5 新SQL执行计划

6 小结

二 SQL二优化分析处理思路

1 原始SQL

2 原始SQL执行计划

3 分析SQL及其执行计划

4 尝试改写等价SQL

5 改写后的SQL执行计划

6 小结

一 问题现象

二查看表结构和表及相关索引统计信息

三 分析其执行计划

四 收集统计信息

五 数据库自动收集统计信息

六 小结与疑问

一、先看一台运行PostgreSQL数据库服务器的top和iostat相关信息图：

top:

iostat:

以及，文件系统结构图：

二 结论：数据库服务器已经出现性能瓶颈，具体问题出在系统IO上。通常情况是，应用系统的SQL写的不够好，或者是糟糕的SQL执行计划导致消耗过高的IO资源。

三 分析思路

1 从top上，看到系统负载已经比较高：load average: 51.62, 41.75, 29.06

2 CPU负载上看，Cpu(s): 9.0%us, 59.2%sy, 0.0%ni, 14.1%id, 17.2%wa, 0.0%hi, 0.5%si, 0.0%st 。

3 进程状态列，有大量的进程处于D状态。进程状态通常有：

4 从iostat上，也可以看到用于存放数据库的设备上IO操作频繁，该设备的IO请求很高。

四 参考：

1 Understanding Linux CPU stats

2 top进程状态详解R,S,D

Q1:PostgreSQL数据库中，如何授予其他用户可以查询当前用户下的所有表的权限？

A1:分2个步骤。首先，赋予其他用户访问当前用户下所有表的查询权限；其次，其他用户要有使用当前用户的schema的权限。

Q2:对于已经授予某个用户查询当前用户下的指定schema下所有表的情况下，当前用户下在指定schema下创建的新表，其他用户是否有权限访问？如果没有，该怎么赋予其他用户查询当前用户新建的表？

A2:对于新建的表，其他用户没有查询权限。可以通过alter default privileges来赋予访问新建的表的权限。具体如下：

Q3:对于已经拥有默认权限的用户的情况下，如何删除？即，如何处理类似DETAIL: privileges for default privileges on new relations belonging to role testu1 in schema test_schema的错误？

A3：需要3个步骤。1，回收schema的usage权限；2，回收对所有表的查询权限；3，执行类似alter default privileges来回收默认权限后再删除。

1早上，公司一项目(近期在做一个活动，截止日期是今天，2017.4.9，估计有很大用户在截止日期之前，大量的刷票)同事反馈说，系统特别卡，极其的慢，页面刷不出来。

2初步反应是有较慢的SQL语句，消耗系统资源导致的。通过下述SQL查看：

3于是，进一步排查，发现是数据库的日志文件设置过少，导致数据库后台在做大量的checkpoint操作，导致数据库响应不及时，几乎是hang住挂起状态。

4 很快，系统恢复正常使用。做个简单的记录，免于再次掉坑。

零背景与说明

一问题现象

三分析其执行计划

四收集统计信息

五数据库自动收集统计信息

六小结与疑问

二结论：数据库服务器已经出现性能瓶颈，具体问题出在系统IO上。通常情况是，应用系统的SQL写的不够好，或者是糟糕的SQL执行计划导致消耗过高的IO资源。

三分析思路

四参考：

A1:分2个步骤。首先，赋予其他用户访问当前用户下所有表的查询权限；其次，其他用户要有使用当前用户的schema的权限。

Q2:对于已经授予某个用户查询当前用户下的指定schema下所有表的情况下，当前用户下在指定schema下创建的新表，其他用户是否有权限访问？如果没有，该怎么赋予其他用户查询当前用户新建的表？

Q3:对于已经拥有默认权限的用户的情况下，如何删除？即，如何处理类似DETAIL: privileges for default privileges on new relations belonging to role testu1 in schema test_schema的错误？

A3：需要3个步骤。1，回收schema的usage权限；2，回收对所有表的查询权限；3，执行类似alter default privileges来回收默认权限后再删除。