PostgreSQL删除表中的大量数据之后,统计信息不准的一则案例

一 问题现象

接开发同事反馈,某个日志表原来有100多万条记录,删除到只剩下200多条记录,在删除大量数据之后,执行查询,还是非常缓慢。

 postgres=# \c superp2b_test superp2b_test
You are now connected to database "superp2b_test" as user "superp2b_test".
superp2b_test=> select count(*) from t_sys_exception_log;
count
-------
267
(1 row)
 
Time: 101702.768 ms
superp2b_test=> create table t_sys_exception_log_bak as select * from t_sys_exception_log;
SELECT 267
Time: 160531.138 ms
superp2b_test=> select count(*) from t_sys_exception_log_bak ;
count
-------
267
(1 row)
 
Time: 87.180 ms
superp2b_test=>

通过执行CTAS复制该表为t_sys_exception_log_bak之后,查询该复制表却不慢。

二查看表结构和表及相关索引统计信息

superp2b_test=> \d t_sys_exception_log
              Table "public.t_sys_exception_log"
      Column      |              Type              | Modifiers
------------------+--------------------------------+-----------
 id               | character(32)                  | not null
 class_name       | character varying(512)         |
 create_time      | timestamp(6) without time zone |
 remark           | character varying(1024)        |
 type             | character varying(4)           |
 exception_type   | character varying(1024)        |
 exception_detail | text                           |
 method_name      | character varying(255)         |
 source           | character varying(1024)        |
 target           | character varying(1024)        |
 operator_id      | character(32)                  |
 operator_name    | character varying(255)         |
Indexes:
    "t_sys_exception_log_pkey" PRIMARY KEY, btree (id)
    "t_sys_exception_log_class_name_idx" btree (class_name)
    "t_sys_exception_log_create_time_idx" btree (create_time)
    "t_sys_exception_log_exception_type_idx" btree (exception_type)
    "t_sys_exception_log_method_name_idx" btree (method_name)
    "t_sys_exception_log_operator_id_idx" btree (operator_id)
    "t_sys_exception_log_operator_name_idx" btree (operator_name)
         
superp2b_test=> select relname,relpages,to_char(reltuples,'999999999') from pg_class where relname like 't_sys_exception_log%';
                relname                 | relpages |  to_char  
----------------------------------------+----------+------------
 t_sys_exception_log                    |   293660 |    1515498
 t_sys_exception_log_class_name_idx     |    14813 |    1515502
 t_sys_exception_log_pkey               |    20655 |    1515502
 t_sys_exception_log_create_time_idx    |     4162 |    1515502
 t_sys_exception_log_exception_type_idx |    12626 |    1515502
 t_sys_exception_log_method_name_idx    |     8555 |    1515502
 t_sys_exception_log_operator_id_idx    |     5302 |    1515502
 t_sys_exception_log_operator_name_idx  |     5267 |    1515502
 t_sys_exception_log_bak                |       38 |        267
(9 rows)
Time: 12.494 ms
superp2b_test=>

从上,看到,虽然表里当前只有267条记录,但是由于删除大量数据之后,数据库没有及时更新该表的统计信息,导致数据库依然认为该表有1515498条记录,占用了293660个磁盘页。

三 分析其执行计划

superp2b_test=> explain analyze select count(*) from t_sys_exception_log;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=312603.72..312603.73 rows=1 width=0) (actual time=25771.014..25771.014 rows=1 loops=1)
-> Seq Scan on t_sys_exception_log (cost=0.00..308814.98 rows=1515498 width=0) (actual time=14.402..25770.915 rows=267 loops=1)
Total runtime: 25771.087 ms
(3 rows)
 
Time: 25779.274 ms
superp2b_test=>

的确,发现优化器对于该select count(*) from t_sys_exception_log语句,执行的全表扫描,且,优化器认为该表仍然有rows=1515498。实际上,该表只有267条记录。

四 收集统计信息

 superp2b_test=> analyze verbose t_sys_exception_log;
信息:  正在分析 "public.t_sys_exception_log"
信息:  "t_sys_exception_log": 已经扫描了293660页的30000, 包含28可用的记录和154878不能用的记录; 在示例中有28条记录,估算所有记录为1360704 .
ANALYZE
Time: 14294.586 ms
superp2b_test=> select relname,relpages,to_char(reltuples,'999999999') from pg_class where relname like 't_sys_exception_log%';
                relname                 | relpages |  to_char  
----------------------------------------+----------+------------
 t_sys_exception_log                    |   293660 |    1360704
 t_sys_exception_log_class_name_idx     |    14818 |    1360704
 t_sys_exception_log_pkey               |    20659 |    1360704
 t_sys_exception_log_create_time_idx    |     4163 |    1360704
 t_sys_exception_log_exception_type_idx |    12629 |    1360704
 t_sys_exception_log_method_name_idx    |     8559 |    1360704
 t_sys_exception_log_operator_id_idx    |     5304 |    1360704
 t_sys_exception_log_operator_name_idx  |     5268 |    1360704
 t_sys_exception_log_bak                |       38 |        267
(9 rows)
Time: 18.558 ms
superp2b_test=> explain analyze select count(*) from t_sys_exception_log;                                                     
                                                             QUERY PLAN                                                             
-------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=310668.80..310668.81 rows=1 width=0) (actual time=24315.485..24315.485 rows=1 loops=1)
   ->  Seq Scan on t_sys_exception_log  (cost=0.00..307267.04 rows=1360704 width=0) (actual time=19.217..24315.358 rows=267 loops=1)
 Total runtime: 24315.574 ms
(3 rows)
Time: 24346.502 ms
superp2b_test=>

通过analyze来收集统计信息时,数据库扫描了该表所占有的磁盘页的293660页中的30000页。同时,数据库认为该表仍然有1360704条记录。此时,数据库收集的统计信息依然是不准确的,或者说是错误的。且,数据库还把这个错误的统计信息更新到字典表pg_statistic中。

接下来,再执行一次统计信息收集,并查看其select count(*) from t_sys_exception_log语句的执行计划:

superp2b_test=> analyze verbose t_sys_exception_log;                                                                          
信息:  正在分析 "public.t_sys_exception_log"
信息:  "t_sys_exception_log": 已经扫描了293660页的30000, 包含26可用的记录和105480不能用的记录; 在示例中有26条记录,估算所有记录为1221722 .
ANALYZE
Time: 15645.520 ms
superp2b_test=> select relname,relpages,to_char(reltuples,'999999999') from pg_class where relname like 't_sys_exception_log%';
                relname                 | relpages |  to_char  
----------------------------------------+----------+------------
 t_sys_exception_log                    |   293660 |    1221722
 t_sys_exception_log_class_name_idx     |    14818 |    1221722
 t_sys_exception_log_pkey               |    20659 |    1221722
 t_sys_exception_log_create_time_idx    |     4163 |    1221722
 t_sys_exception_log_exception_type_idx |    12629 |    1221722
 t_sys_exception_log_method_name_idx    |     8559 |    1221722
 t_sys_exception_log_operator_id_idx    |     5304 |    1221722
 t_sys_exception_log_operator_name_idx  |     5268 |    1221722
 t_sys_exception_log_bak                |       38 |        267
(9 rows)
Time: 11.437 ms
superp2b_test=> explain analyze select count(*) from t_sys_exception_log;
                                                             QUERY PLAN                                                            
------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=308931.52..308931.53 rows=1 width=0) (actual time=21058.435..21058.435 rows=1 loops=1)
   ->  Seq Scan on t_sys_exception_log  (cost=0.00..305877.22 rows=1221722 width=0) (actual time=0.146..21058.327 rows=268 loops=1)
 Total runtime: 21058.525 ms
(3 rows)
Time: 21161.205 ms
superp2b_test=>

数据库依然认为,t_sys_exception_log占用293660个磁盘页,有1221722条记录。同时,这个统计信息也是错误的。

五 数据库自动收集统计信息

由于手头上有事儿,暂时停止了上述手工收集该表统计信息的工作。忙完其它事情之后,再回过头来继续的时候,发现了“惊喜”。

 superp2b_test=> explain analyze select count(*) from t_sys_exception_log;                                                      
                                                                                   QUERY PLAN                                                                                   
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=21547.18..21547.19 rows=1 width=0) (actual time=27.386..27.386 rows=1 loops=1)
   ->  Index Only Scan using t_sys_exception_log_create_time_idx on t_sys_exception_log  (cost=0.30..20562.24 rows=393976 width=0) (actual time=9.056..27.337 rows=268 loops=1)
         Heap Fetches: 23
 Total runtime: 27.450 ms
(4 rows)

Time: 127.384 ms
superp2b_test=> select relname,relpages,to_char(reltuples,'999999999') from pg_class where relname like 't_sys_exception_log%';
                relname                 | relpages |  to_char   
----------------------------------------+----------+------------
 t_sys_exception_log                    |   293660 |     393976
 t_sys_exception_log_class_name_idx     |    14818 |        268
 t_sys_exception_log_pkey               |    20659 |        268
 t_sys_exception_log_create_time_idx    |     4163 |        268
 t_sys_exception_log_exception_type_idx |    12629 |        268
 t_sys_exception_log_method_name_idx    |     8559 |        268
 t_sys_exception_log_operator_id_idx    |     5304 |        268
 t_sys_exception_log_operator_name_idx  |     5268 |        268
 t_sys_exception_log_bak                |       38 |        267
(9 rows)

Time: 18.922 ms
superp2b_test=> analyze verbose t_sys_exception_log;                                                                           
信息:  正在分析 "public.t_sys_exception_log"
信息:  "t_sys_exception_log": 已经扫描了293660页的30000, 包含32可用的记录和20不能用的记录; 在示例中有32条记录,估算所有记录为353760 .
ANALYZE
Time: 22083.290 ms
superp2b_test=> select relname,relpages,to_char(reltuples,'999999999') from pg_class where relname like 't_sys_exception_log%';
                relname                 | relpages |  to_char   
----------------------------------------+----------+------------
 t_sys_exception_log                    |   293660 |     353760
 t_sys_exception_log_class_name_idx     |    14818 |     353760
 t_sys_exception_log_pkey               |    20659 |     353760
 t_sys_exception_log_create_time_idx    |     4163 |     353760
 t_sys_exception_log_exception_type_idx |    12629 |     353760
 t_sys_exception_log_method_name_idx    |     8559 |     353760
 t_sys_exception_log_operator_id_idx    |     5304 |     353760
 t_sys_exception_log_operator_name_idx  |     5268 |     353760
 t_sys_exception_log_bak                |       38 |        267
(9 rows)

Time: 13.805 ms
superp2b_test=> explain analyze select count(*) from t_sys_exception_log;                                                      
                                                                                  QUERY PLAN                                                                                   
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=20885.77..20885.78 rows=1 width=0) (actual time=0.193..0.194 rows=1 loops=1)
   ->  Index Only Scan using t_sys_exception_log_create_time_idx on t_sys_exception_log  (cost=0.30..20001.37 rows=353760 width=0) (actual time=0.052..0.159 rows=268 loops=1)
         Heap Fetches: 23
 Total runtime: 0.230 ms
(4 rows)

Time: 1.753 ms
superp2b_test=> select relname,relpages,to_char(reltuples,'999999999') from pg_class where relname like 't_sys_exception_log%';
                relname                 | relpages |  to_char   
----------------------------------------+----------+------------
 t_sys_exception_log                    |   293660 |     353760
 t_sys_exception_log_class_name_idx     |    14818 |     353760
 t_sys_exception_log_pkey               |    20659 |     353760
 t_sys_exception_log_create_time_idx    |     4163 |     353760
 t_sys_exception_log_exception_type_idx |    12629 |     353760
 t_sys_exception_log_method_name_idx    |     8559 |     353760
 t_sys_exception_log_operator_id_idx    |     5304 |     353760
 t_sys_exception_log_operator_name_idx  |     5268 |     353760
 t_sys_exception_log_bak                |       38 |        267
(9 rows)

Time: 1.458 ms
superp2b_test=> vacuum(verbose,full,freeze,analyze) t_sys_exception_log;
信息:  正在清理 (vacuum)  "public.t_sys_exception_log"
信息:  "t_sys_exception_log": 在293660个页中找到0个可删除行版本,268不可删除的行版本.
DETAIL:  0 死行版本号仍不能移除.
CPU 3.04s/1.11u sec elapsed 29.73 sec.
信息:  正在分析 "public.t_sys_exception_log"
信息:  "t_sys_exception_log": 已经扫描了38页的38, 包含268可用的记录和0不能用的记录; 在示例中有268条记录,估算所有记录为268 .
VACUUM
Time: 32353.681 ms
superp2b_test=> select relname,relpages,to_char(reltuples,'999999999') from pg_class where relname like 't_sys_exception_log%';
                relname                 | relpages |  to_char   
----------------------------------------+----------+------------
 t_sys_exception_log                    |       38 |        268
 t_sys_exception_log_class_name_idx     |        5 |        268
 t_sys_exception_log_bak                |       38 |        267
 t_sys_exception_log_pkey               |        4 |        268
 t_sys_exception_log_create_time_idx    |        2 |        268
 t_sys_exception_log_exception_type_idx |        5 |        268
 t_sys_exception_log_method_name_idx    |        4 |        268
 t_sys_exception_log_operator_id_idx    |        4 |        268
 t_sys_exception_log_operator_name_idx  |        2 |        268
(9 rows)

Time: 6.762 ms
superp2b_test=> explain analyze select count(*) from t_sys_exception_log;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=41.35..41.36 rows=1 width=0) (actual time=0.103..0.103 rows=1 loops=1)
   ->  Seq Scan on t_sys_exception_log  (cost=0.00..40.68 rows=268 width=0) (actual time=0.005..0.074 rows=268 loops=1)
 Total runtime: 0.133 ms
(3 rows)

Time: 0.535 ms
superp2b_test=> select count(*) from t_sys_exception_log;                
 count 
-------
   268
(1 row)

Time: 1.007 ms
superp2b_test=>

说明:
1 第一次执行查看执行计划时,发现执行计划有所改变,变的“更快更好”了,从之前的全表扫描变为索引扫描,执行效率变好了。但是,这个离真实情况下的执行计划,应该还是有误差,因为表里才268条记录,此时不应该走index的;
2 接下来,查看统计信息,发现t_sys_exception_log表上的统计信息依然显示有393976条记录,而该表上的index却显示的有268条记录,与真实情况向吻合;
3 再手工收集一次统计信息看看,再次发生偏差,结果显示有353760条记录,连之前索引上的统计信息也从正确的268条记录,变为了353760条记录;
4 干脆执行vacuum(verbose,full,freeze,analyze) t_sys_exception_log来回收空间+更新统计信息。这里,不论是从vacuum的verbose信息来看,还是从pg_class里的信息查询来看,统计信息终于正确了;
5 最后,再次查看执行计划,发现回归正常了。统计一个只有268条记录的表数据量,其选择的全表扫描的执行计划,比较合理。且,该条语句本身,执行起来,也回归正常了,不再像前面那么慢,让人费解了。

六 小结与疑问

1 对于PostgreSQL数据库而言,如果有频繁的DML操作,或者批量DML操作的情形,要及时更新统计信息;
2 为什么手工执行analyze收集统计信息时,得到的统计信息结果,与实际情况偏差那么大?
3 多次执行analyze之后,依然得不到准确的统计信息?
4 间隔一段时间之后,发现统计信息比较接近真实情况(此时,猜测应该是数据库autovacuum自动更新了统计信息)?
5 如果是autovacuum更新统计信息之后,为什么优化器选择的执行计划不够好,只有268条记录的表执行count操作,还是选择index scan?
6 为什么,最后通过执行vacuum来回收空间+更新统计信息之后,统计信息保持正确且不再变动?

在CentOS 7上安装redis

1 下载:从Redis官方网站下载安装最新的稳定版安装包;

[root@localhost ~]# cat /etc/redhat-release 
CentOS Linux release 7.2.1511 (Core) 
[root@localhost ~]# uname -rm
3.10.0-327.el7.x86_64 x86_64
[root@localhost ~]# wget http://download.redis.io/releases/redis-4.0.2.tar.gz
--2017-11-22 14:32:03--  http://download.redis.io/releases/redis-4.0.2.tar.gz
正在解析主机 download.redis.io (download.redis.io)... 109.74.203.151
正在连接 download.redis.io (download.redis.io)|109.74.203.151|:80... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:1713990 (1.6M) [application/x-gzip]
正在保存至: “redis-4.0.2.tar.gz”

100%[===================================================================================================================================>] 1,713,990    890KB/s 用时 1.9s   

2017-11-22 14:32:09 (890 KB/s) - 已保存 “redis-4.0.2.tar.gz” [1713990/1713990])

[root@localhost ~]#

2 解压:解压安装包;

[root@localhost ~]# tar -zxvf redis-4.0.2.tar.gz 
...
...
redis-4.0.2/utils/releasetools/02_upload_tarball.sh
redis-4.0.2/utils/releasetools/03_test_release.sh
redis-4.0.2/utils/releasetools/04_release_hash.sh
redis-4.0.2/utils/releasetools/changelog.tcl
redis-4.0.2/utils/speed-regression.tcl
redis-4.0.2/utils/whatisdoing.sh
[root@localhost ~]#

3 编译:进入安装包,执行make;

 [root@localhost ~]# cd redis-4.0.2
[root@localhost redis-4.0.2]# pwd
/root/redis-4.0.2
[root@localhost redis-4.0.2]# ll
总用量 280
-rw-rw-r--  1 root root 131381 9月  21 22:12 00-RELEASENOTES
-rw-rw-r--  1 root root     53 9月  21 22:12 BUGS
-rw-rw-r--  1 root root   1815 9月  21 22:12 CONTRIBUTING
-rw-rw-r--  1 root root   1487 9月  21 22:12 COPYING
drwxrwxr-x  6 root root   4096 9月  21 22:12 deps
-rw-rw-r--  1 root root     11 9月  21 22:12 INSTALL
-rw-rw-r--  1 root root    151 9月  21 22:12 Makefile
-rw-rw-r--  1 root root   4223 9月  21 22:12 MANIFESTO
-rw-rw-r--  1 root root  20530 9月  21 22:12 README.md
-rw-rw-r--  1 root root  57764 9月  21 22:12 redis.conf
-rwxrwxr-x  1 root root    271 9月  21 22:12 runtest
-rwxrwxr-x  1 root root    280 9月  21 22:12 runtest-cluster
-rwxrwxr-x  1 root root    281 9月  21 22:12 runtest-sentinel
-rw-rw-r--  1 root root   7606 9月  21 22:12 sentinel.conf
drwxrwxr-x  3 root root   4096 9月  21 22:12 src
drwxrwxr-x 10 root root   4096 9月  21 22:12 tests
drwxrwxr-x  8 root root   4096 9月  21 22:12 utils
[root@localhost redis-4.0.2]# make 
cd src && make all
make[1]: 进入目录“/root/redis-4.0.2/src”
    CC Makefile.dep
make[1]: 离开目录“/root/redis-4.0.2/src”
make[1]: 进入目录“/root/redis-4.0.2/src”
...
...
    INSTALL redis-check-rdb
    INSTALL redis-check-aof

Hint: It's a good idea to run 'make test' ;)

make[1]: 离开目录“/root/redis-4.0.2/src”
[root@localhost redis-4.0.2]#

4 启动Redis;

 [root@localhost redis-4.0.2]# /root/redis-4.0.2/src/redis-server 
1745:C 22 Nov 14:39:22.049 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1745:C 22 Nov 14:39:22.049 # Redis version=4.0.2, bits=64, commit=00000000, modified=0, pid=1745, just started
1745:C 22 Nov 14:39:22.049 # Warning: no config file specified, using the default config. In order to specify a config file use /root/redis-4.0.2/src/redis-server /path/to/redis.conf
1745:M 22 Nov 14:39:22.051 * Increased maximum number of open files to 10032 (it was originally set to 1024).
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 4.0.2 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 1745
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

1745:M 22 Nov 14:39:22.052 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1745:M 22 Nov 14:39:22.052 # Server initialized
1745:M 22 Nov 14:39:22.052 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1745:M 22 Nov 14:39:22.052 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1745:M 22 Nov 14:39:22.052 * Ready to accept connections

 

 

至此,Redis已经安装完毕。

5 通过redis-cli,访问和使用Redis:

 [root@localhost ~]# /root/redis-4.0.2/src/redis-cli 
127.0.0.1:6379> info
# Server
redis_version:4.0.2
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:1e096713e5234ce6
redis_mode:standalone
os:Linux 3.10.0-327.el7.x86_64 x86_64
...
....
127.0.0.1:6379> set a testredis
OK
127.0.0.1:6379> get a
"testredis"
127.0.0.1:6379>

 

接着看文档,继续动手实践。

Java学习笔记003:primitive type小结

十年以前,在大学校校园里,学习Java的时候,遇到讲基本数据类型的章节,因自己缺乏足够的耐心,进而觉得麻烦,啰嗦,而直接选择略过,自然自己所学的那一丁点儿关于Java的编程知识,随着考试的结束而结束了。

现如今,开始动手看《Thinking in Java》这本书时,同样会遇到讲解数据类型的章节,同样的知识点儿,同样的知识内容,遇到同样的读者,只不过这次有了不一样心态的读者。相较过去而言,即使我有了足够的耐心和不一样的心态,也没有能够一次性搞定这些数据类型。在反复翻看了这一章节的书本内容之后,加上从互联网上看到这篇关于讲解Java变量和数据类型文字,以及看到这个关于primitive type的小结图,瞬间觉得对基本数据类型了解和掌握的通透。

小结:

  1. 知识早晚是要学的,十年前欠下的债现在还,要还抓紧还,现在还也不晚,还干净,还彻底;
  2. 对于学习新知识要有足够足够的耐心,反复打磨,刻意练习;
  3. 同一个知识点,有时候需要从多个角度、维度去学习;
  4. 基本数据类型分为布尔类型、字符类型、数字类型。其中数字类型又分为整型数字和浮点数;

 

zookeeper启动之后又异常死掉的处理

通过in-place的方式把一台CentOS 6.5的Linux机器升级到Centos 7.2之后,发现该机器上之前运行的zookeeper启动异常,记录如下:

 
[root@zkserver1 bin]# sh zkServer.sh start
JMX enabled by default
Using config: /usr/local/zookeeper-3.4.5/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@zkserver1 bin]# ps -ef|grep zook
root       891   843  0 14:10 pts/0    00:00:00 grep --color=auto zook
[root@zkserver1 bin]# cat zookeeper.out 
2017-06-22 14:10:04,154 [myid:] - INFO  [main:QuorumPeerConfig@101] - Reading configuration from: /usr/local/zookeeper-3.4.5/bin/../conf/zoo.cfg
2017-06-22 14:10:04,166 [myid:] - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2017-06-22 14:10:04,167 [myid:] - INFO  [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
2017-06-22 14:10:04,168 [myid:] - INFO  [main:DatadirCleanupManager@101] - Purge task is not scheduled.
2017-06-22 14:10:04,168 [myid:] - WARN  [main:QuorumPeerMain@113] - Either no config or no quorum defined in config, running  in standalone mode
2017-06-22 14:10:04,190 [myid:] - INFO  [main:QuorumPeerConfig@101] - Reading configuration from: /usr/local/zookeeper-3.4.5/bin/../conf/zoo.cfg
2017-06-22 14:10:04,191 [myid:] - INFO  [main:ZooKeeperServerMain@95] - Starting server
2017-06-22 14:10:04,206 [myid:] - INFO  [main:Environment@100] - Server environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
2017-06-22 14:10:04,207 [myid:] - INFO  [main:Environment@100] - Server environment:host.name=zkserver1
2017-06-22 14:10:04,207 [myid:] - INFO  [main:Environment@100] - Server environment:java.version=1.7.0_21
2017-06-22 14:10:04,208 [myid:] - INFO  [main:Environment@100] - Server environment:java.vendor=Oracle Corporation
2017-06-22 14:10:04,209 [myid:] - INFO  [main:Environment@100] - Server environment:java.home=/usr/java/jdk1.7.0_21/jre
2017-06-22 14:10:04,209 [myid:] - INFO  [main:Environment@100] - Server environment:java.class.path=/usr/local/zookeeper-3.4.5/bin/../build/classes:/usr/local/zookeeper-3.4.5/bin/../build/lib/*.jar:/usr/local/zookeeper-3.4.5/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/local/zookeeper-3.4.5/bin/../lib/slf4j-api-1.6.1.jar:/usr/local/zookeeper-3.4.5/bin/../lib/netty-3.2.2.Final.jar:/usr/local/zookeeper-3.4.5/bin/../lib/log4j-1.2.15.jar:/usr/local/zookeeper-3.4.5/bin/../lib/jline-0.9.94.jar:/usr/local/zookeeper-3.4.5/bin/../zookeeper-3.4.5.jar:/usr/local/zookeeper-3.4.5/bin/../src/java/lib/*.jar:/usr/local/zookeeper-3.4.5/bin/../conf:
2017-06-22 14:10:04,210 [myid:] - INFO  [main:Environment@100] - Server environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2017-06-22 14:10:04,211 [myid:] - INFO  [main:Environment@100] - Server environment:java.io.tmpdir=/tmp
2017-06-22 14:10:04,212 [myid:] - INFO  [main:Environment@100] - Server environment:java.compiler=
2017-06-22 14:10:04,212 [myid:] - INFO  [main:Environment@100] - Server environment:os.name=Linux
2017-06-22 14:10:04,213 [myid:] - INFO  [main:Environment@100] - Server environment:os.arch=amd64
2017-06-22 14:10:04,214 [myid:] - INFO  [main:Environment@100] - Server environment:os.version=3.10.0-327.el7.x86_64
2017-06-22 14:10:04,215 [myid:] - INFO  [main:Environment@100] - Server environment:user.name=root
2017-06-22 14:10:04,215 [myid:] - INFO  [main:Environment@100] - Server environment:user.home=/root
2017-06-22 14:10:04,216 [myid:] - INFO  [main:Environment@100] - Server environment:user.dir=/usr/local/zookeeper-3.4.5/bin
2017-06-22 14:10:04,227 [myid:] - INFO  [main:ZooKeeperServer@726] - tickTime set to 2000
2017-06-22 14:10:04,228 [myid:] - INFO  [main:ZooKeeperServer@735] - minSessionTimeout set to -1
2017-06-22 14:10:04,229 [myid:] - INFO  [main:ZooKeeperServer@744] - maxSessionTimeout set to -1
2017-06-22 14:10:04,274 [myid:] - INFO  [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:2181
2017-06-22 14:10:04,343 [myid:] - ERROR [main:FileTxnSnapLog@210] - Parent /dubbo/com.onlyou.es.bck.service.IUserStoreService/consumers missing for /dubbo/com.onlyou.es.bck.service.IUserStoreService/consumers/consumer%3A%2F%2F172.16.11.73%2Fcom.onlyou.es.bck.service.IUserStoreService%3Fapplication%3Des-client-web%26category%3Dconsumers%26check%3Dfalse%26default.check%3Dfalse%26default.reference.filter%3DdubboConsumerFilter%26default.retries%3D0%26default.timeout%3D1000000%26dubbo%3D2.5.3%26interface%3Dcom.onlyou.es.bck.service.IUserStoreService%26methods%3DgetUserStoreByCorpIdAndStoreId%2CdeleteStoreUserById%2ClistStoreCode%2CgetUserStoreByUserId%2CsearchStoreIdsByUserId%2CsearchUserStaffList%2CsaveStoreUser%2CgetUserStoreByCorpId%2CsearchStoreInfosByUserId%26pid%3D488%26revision%3Ddev-SNAPSHOT%26side%3Dconsumer%26timestamp%3D1496999541671
2017-06-22 14:10:04,345 [myid:] - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
java.io.IOException: Failed to process transaction type: 1 error: KeeperErrorCode = NoNode for /dubbo/com.onlyou.es.bck.service.IUserStoreService/consumers
        at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:153)
        at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
        at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250)
        at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
        at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
        at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
        at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
        at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
        at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /dubbo/com.onlyou.es.bck.service.IUserStoreService/consumers
        at org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:211)
        at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
        ... 9 more
[root@zkserver1 bin]#

原因:可能是服务器升级的过程中,导致zookeeper的数据文件被”破坏”了,导致重启zookeeper时异常。

解决:删除zookeeper的数据文件/tmp/zookeeper,然后重启zookeeper。

一则PostgreSQL数据库的性能问题定位处理

一、先看一台运行PostgreSQL数据库服务器的top和iostat相关信息图:

top:
[root@db2 ~ 11:14:42]$ top

top - 11:16:10 up 2 days, 13:01,  5 users,  load average: 51.62, 41.75, 29.06
Tasks: 948 total,   5 running, 943 sleeping,   0 stopped,   0 zombie
Cpu(s):  9.0%us, 59.2%sy,  0.0%ni, 14.1%id, 17.2%wa,  0.0%hi,  0.5%si,  0.0%st
Mem:  16284812k total, 16159260k used,   125552k free,     5700k buffers
Swap:  8191992k total,   107980k used,  8084012k free,  8010540k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                
31450 postgres  20   0 4534m 2.3g 2.2g D 46.6 15.0   7:02.50 postgres                                                
30166 postgres  20   0 4528m 2.3g 2.2g D 46.0 14.9   4:05.40 postgres                                                
30185 postgres  20   0 4494m 2.2g 2.1g D 43.5 14.1   4:05.08 postgres                                                
31453 postgres  20   0 4529m 1.9g 1.8g D 42.9 12.3   3:39.13 postgres                                                
30232 postgres  20   0 4526m 2.3g 2.1g D 40.4 14.6   6:07.51 postgres                                                
 2811 postgres  20   0 4521m 874m 742m D 39.8  5.5   1:36.34 postgres                                                
31457 postgres  20   0 4534m 1.8g 1.7g R 39.5 11.6   3:55.23 postgres                                                
30956 postgres  20   0 4521m 1.7g 1.5g D 38.8 10.8   2:14.67 postgres                                                
 2714 postgres  20   0 4519m 1.4g 1.3g D 37.9  9.1   1:19.96 postgres                                                
30182 postgres  20   0 4525m 1.9g 1.8g D 37.9 12.5   3:40.31 postgres                                                
31444 postgres  20   0 4525m 2.2g 2.0g D 37.6 13.9   3:29.11 postgres                                                
31654 postgres  20   0 4526m 2.4g 2.2g D 36.7 15.3   5:04.19 postgres                                                
 2717 postgres  20   0 4519m 847m 718m D 36.1  5.3   1:37.20 postgres                                                
  753 postgres  20   0 4533m 3.7g 3.6g D 35.4 23.9  27:52.65 postgres                                                
31451 postgres  20   0 4433m 1.8g 1.7g D 35.4 11.5   2:36.85 postgres                                                
30701 postgres  20   0 4520m 1.7g 1.6g D 35.1 11.1   2:09.85 postgres                                                
31448 postgres  20   0 4435m 2.2g 2.1g D 33.3 13.9   4:16.70 postgres                                                
29192 postgres  20   0 4526m 2.3g 2.1g D 32.6 14.6   4:19.91 postgres                                                
30693 postgres  20   0 4525m 1.9g 1.8g D 32.0 12.4   2:29.72 postgres                                                
 3448 postgres  20   0 4451m 383m 305m D 31.4  2.4   0:49.98 postgres                                                
 3568 postgres  20   0 4388m 131m 113m D 29.5  0.8   0:10.03 postgres                                                
 3435 postgres  20   0 4376m 364m 355m D 29.2  2.3   0:12.23 postgres                                                
[root@db2 ~ 11:16:22]$ 
iostat:
 
[root@db2 ~ 11:16:25]$ iostat 1 10 /dev/emcpowerf 
Linux 2.6.32-279.el6.x86_64 (db2) 	2016年06月02日 	_x86_64_	(24 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.51    0.00    0.84    2.04    0.00   92.61

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf       210.87     10256.34     11724.86 2257542690 2580782824

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          10.07    0.00   80.00    7.12    0.00    2.82

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf     34443.00   3095056.00   1747336.00    3095056    1747336

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.38    0.00   90.25    1.25    0.00    0.13

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf        83.79      7430.09     14483.79      89384     174240

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          18.00    0.00   70.16   10.69    0.00    1.14

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      2887.25    323498.04     71545.10     329968      72976

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.76    0.00   86.36    3.17    0.00    0.71

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      1996.00    222072.00     34056.00     222072      34056

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           7.75    0.00   89.37    2.50    0.00    0.38

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      2344.00    229568.00    149152.00     229568     149152

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.68    0.00   82.81    5.88    0.00    1.63

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      2585.00    242224.00    154320.00     242224     154320

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          14.57    0.00   24.36   34.20    0.00   26.87

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      2692.08    283271.29    137940.59     286104     139320

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          11.80    0.00   85.10    2.02    0.00    1.07

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      3367.33    268491.09    124879.21     271176     126128

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.77    0.00   85.36    4.32    0.00    0.54

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      2792.00    274976.00    143192.00     274976     143192

[root@db2 ~ 11:23:07]$
以及,文件系统结构图:
[root@db2 ~ 11:03:54]$ df -Th
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/sda1     ext4     58G   14G   42G  25% /
tmpfs        tmpfs    7.8G     0  7.8G   0% /dev/shm
/dev/sda3     ext4    210G   54G  146G  27% /data/log
/dev/emcpowerd1
              ext4     99G   58G   37G  62% /data/pgsql/backup
/dev/emcpowerc1
              ext4     99G   89G  5.5G  95% /data/pgsql/data_old
/dev/emcpowerf
              ext4    197G  146G   42G  78% /data/pgsql/data
/dev/emcpowerg
              ext4    493G   66G  403G  14% /data/pgsql/backup/archive
[root@db2 ~ 11:04:14]$

二 结论:数据库服务器已经出现性能瓶颈,具体问题出在系统IO上。通常情况是,应用系统的SQL写的不够好,或者是糟糕的SQL执行计划导致消耗过高的IO资源。

三 分析思路

1 从top上,看到系统负载已经比较高:load average: 51.62, 41.75, 29.06
2 CPU负载上看,Cpu(s):  9.0%us, 59.2%sy,  0.0%ni, 14.1%id, 17.2%wa,  0.0%hi,  0.5%si,  0.0%st 。

其中, 9.0%us用于user space process处理较为正常。但是在系统负载较高的情况下,且有59.2%sy的CPU资源用于system kernel running。The amount of time spent in the kernel should be as low as possible.This number can peak much higher, especially when there is a lot of I/O happening.这里,意味着CPU消耗在用于系统调度上的时间偏高了,尤其是当系统出现大量IO操作时,该值可能非常高。这里的59%略高,系统有IO等待操作。

另,14.1%id, 系统CPU闲置时间比较低,通常情况下我们希望它不要太低,太低了说明CPU太忙,太高也不好,太高了意味着CPU一直处于空闲状态。17.2%wa,CPU用于等待IO操作完成的时间占比。这里,CPU在等待IO完成。

3 进程状态列,有大量的进程处于D状态。进程状态通常有:

R:TASK_RUNNING,可执行状态。处于此状态的进程正在CPU上运行;同一时间,少量进程处于该状态;

S:TASK_INTERRUPTIBLE,可中断睡眠状态。处于该状态的进程因为正在等待某些事件发生(如等待socket连接,等待信号量),而事件还未发生,所以被挂起;同一时间,大量进程处于该状态

D:TASK_UNINTERRUPTIBLE,不可中断睡眠状态。该状态的进程同样处于休眠状态,但是该进程不能被其它事件或信号所打断。常见于,在进程对硬件设备进行操作时(如进程调用read系统来对某个设备文件进行读操作,read系统需要调用到最终执行对应设备的驱动代码,并且与对应设备进行交互时),可能需要使用TASK_UNINTERRUPTIBLE状态来保护进程,以避免进程在与设备进行交互的过程中被打断,最终造成设备陷入不可控的状态。

结合本例,可以看到有大量进程处于D状态,即不可中断的睡眠状态。说明有大量的数据库进程在与操作系统的IO系统进行交互,且这些进程不能被中断。说白了,就是进程在进行的IO操作迟迟未完成,且该进程不能被打断。就是系统出现了大量的IO等待。

4 从iostat上,也可以看到用于存放数据库的设备上IO操作频繁,该设备的IO请求很高。

这是去年今日(2016年6月2日)处理的一则PostgreSQL数据库性能问题定位的case,不是特意要放在今天(2017年6月2日)来说。而是,近期在看几篇Linux文档,加上之前翻看《PostgreSQL 9 Administration Cookbook》这本书,P249里面提到:

Not enough CPU power or disk I/O capacity for the current load

These are usually caused by suboptimal query plans, but sometimes you just have not a powerful enough computer.

Here, top is your friend for quick checks, and from the command line, run the following:

user@host:~$ top

First, watch the CPU idle reading in the top. If this is in low single-digits for most of the time, you probably have problems with CPU power.

If you have a high load average with still lots of CPU idle left, you are probably out of disk bandwidth. In this case, you should also have lots of postgres processes in status D.

 

索性,做一个整理总结,便于理清自己处理性能问题时,能有一个个清晰的思路和线索。

四 参考:

1 Understanding Linux CPU stats
2  top进程状态详解R,S,D  

CentOS 6.5升级至CentOS 7完整步骤及注意事项

一 背景:

1 公司内部大量开发、测试、UAT环境机器都是CentOS 6.5的物理机/虚拟机;
2 公司在阿里云上的ECS机器环境同样多数为CentOS 6.5;
3 近期,公司内部在推Docker技术,而相对稳定且版本不至于太低的docker,需要部署在至少RHEL/CentOS 7.0以上版本的操作系统上。

二 说明:

本文档用于记录升级Centos 6.5到CentOS 7.2版本的操作步骤,以及升级后带来的问题和解决办法。升级前,本机的OS版本、内核版本、文件系统信息如下:

[root@dev-middleware ~]# cat /etc/redhat-release 
CentOS release 6.5 (Final)
[root@dev-middleware ~]# uname -rm
2.6.32-431.el6.x86_64 x86_64
[root@dev-middleware ~]# df -Th
Filesystem                   Type   Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root ext4    50G   40G  7.2G  85% /
tmpfs                        tmpfs  1.9G     0  1.9G   0% /dev/shm
/dev/sda1                    ext4   485M   32M  428M   7% /boot
/dev/mapper/VolGroup-lv_home ext4   144G  193M  136G   1% /home
[root@dev-middleware ~]#

根据从网络上的获取的相关参考信息,本文档中的内容和步骤并不适用于Centos其它版本6.X升级至CentOS 7的参考。且,该升级是in-place-upgrade的升级方式,即直接在原机上执行升级的操作,存在一定风险。而官方推荐的非in-place-upgrade的方式,即先备份相关的文件和数据,然后安装CentOS 7系统,再恢复还原相关数据的作法,相对比较安全可控。

三 升级:

1 添加1个yum的repo文件:

vi /etc/yum.repo.d/upgrade.repo
内容为:

[upg]
name=CentOS-$releasever - Upgrade Tool
baseurl=http://dev.centos.org/centos/6/upg/x86_64/
gpgcheck=1
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6

2 安装升级工具:

yum install redhat-upgrade-tool preupgrade-assistant-contents

3 执行预升级检查:

preupg -l

preupg -s CentOS6_7

如果有类似下述错误:

I/O warning : failed to load external entity "/usr/share/openscap/xsl/security-guide.xsl"
compilation error: file /usr/share/preupgrade/xsl/preup.xsl line 40 element import
xsl:import : unable to load /usr/share/openscap/xsl/security-guide.xsl
I/O warning : failed to load external entity "/usr/share/openscap/xsl/oval-report.xsl"
compilation error: file /usr/share/preupgrade/xsl/preup.xsl line 41 element import
xsl:import : unable to load /usr/share/openscap/xsl/oval-report.xsl
I/O warning : failed to load external entity "/usr/share/openscap/xsl/sce-report.xsl"
compilation error: file /usr/share/preupgrade/xsl/preup.xsl line 42 element import
xsl:import : unable to load /usr/share/openscap/xsl/sce-report.xsl
OpenSCAP Error:: Could not parse XSLT file '/usr/share/preupgrade/xsl/preup.xsl' [oscapxml.c:416]
Unable to open file /root/preupgrade/result.html
Usage: preupg [options]
preupg: error: [Errno 2] No such file or directory: '/root/preupgrade/result.html'

则,原因是openscap软件包的版本太高。通过下述先删除,再安装的来办法解决:

yum erase openscap
yum install http://dev.centos.org/centos/6/upg/x86_64/Packages/openscap-1.0.8-1.0.1.el6.centos.x86_64.rpm
yum install redhat-upgrade-tool preupgrade-assistant-contents

4 再次执行,预升级检查:

preupg -s CentOS6_7

执行结果:

....
094/096 ...done (NIS server maps check)
095/096 ...done (NIS server MAXUID and MAXGID limits check)
096/096 ...done (NIS server config file back-up)
Assessment finished (time 02:15s)
Result table with checks and their results for main contents:
....

|Content for enabling and disabling services based on CentOS 6 system |needs_action |
---------------------------------------------------------------------------------------------------------------
Tarball with results is stored here /root/preupgrade-results/preupg_results-170510111635.tar.gz .
The latest assessment is stored in directory /root/preupgrade .
Summary information:
We found some potential in-place upgrade risks.
Read the file /root/preupgrade/result.html for more details.
Upload results to UI by command:
e.g. preupg -u http://127.0.0.1:8099/submit/ -r /root/preupgrade-results/preupg_results-*.tar.gz .

执行升级的过程中,可能会有的一些不兼容问题,则都在上述的预升级检查结果中,需要关注并处理。

5 导入仓库的RPM签名证书

 rpm --import http://172.16.11.36/yum/test/RPM-GPG-KEY-CentOS-7

需要说明的是,这里导入的是使用本地搭建的一个yum源的签名证书。这个本地源,是基于从网络上下载的CentOS-7-x86_64-DVD-1611.iso进行搭建的。当然,并不一定非要搭建本地yum源,也可以直接从下载的该ISO文件中,获取RPM-GPG-KEY-CentOS-7,直接在本地机器执行导入。为什么这么做,后面会有详细的解释和说明。
该文件的内容如下:

[root@localhost test]# cat RPM-GPG-KEY-CentOS-7
-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: GnuPG v1.4.5 (GNU/Linux)

mQINBFOn/0sBEADLDyZ+DQHkcTHDQSE0a0B2iYAEXwpPvs67cJ4tmhe/iMOyVMh9
Yw/vBIF8scm6T/vPN5fopsKiW9UsAhGKg0epC6y5ed+NAUHTEa6pSOdo7CyFDwtn
4HF61Esyb4gzPT6QiSr0zvdTtgYBRZjAEPFVu3Dio0oZ5UQZ7fzdZfeixMQ8VMTQ
4y4x5vik9B+cqmGiq9AW71ixlDYVWasgR093fXiD9NLT4DTtK+KLGYNjJ8eMRqfZ
Ws7g7C+9aEGHfsGZ/SxLOumx/GfiTloal0dnq8TC7XQ/JuNdB9qjoXzRF+faDUsj
WuvNSQEqUXW1dzJjBvroEvgTdfCJfRpIgOrc256qvDMp1SxchMFltPlo5mbSMKu1
x1p4UkAzx543meMlRXOgx2/hnBm6H6L0FsSyDS6P224yF+30eeODD4Ju4BCyQ0jO
IpUxmUnApo/m0eRelI6TRl7jK6aGqSYUNhFBuFxSPKgKYBpFhVzRM63Jsvib82rY
438q3sIOUdxZY6pvMOWRkdUVoz7WBExTdx5NtGX4kdW5QtcQHM+2kht6sBnJsvcB
JYcYIwAUeA5vdRfwLKuZn6SgAUKdgeOtuf+cPR3/E68LZr784SlokiHLtQkfk98j
NXm6fJjXwJvwiM2IiFyg8aUwEEDX5U+QOCA0wYrgUQ/h8iathvBJKSc9jQARAQAB
tEJDZW50T1MtNyBLZXkgKENlbnRPUyA3IE9mZmljaWFsIFNpZ25pbmcgS2V5KSA8
c2VjdXJpdHlAY2VudG9zLm9yZz6JAjUEEwECAB8FAlOn/0sCGwMGCwkIBwMCBBUC
CAMDFgIBAh4BAheAAAoJECTGqKf0qA61TN0P/2730Th8cM+d1pEON7n0F1YiyxqG
QzwpC2Fhr2UIsXpi/lWTXIG6AlRvrajjFhw9HktYjlF4oMG032SnI0XPdmrN29lL
F+ee1ANdyvtkw4mMu2yQweVxU7Ku4oATPBvWRv+6pCQPTOMe5xPG0ZPjPGNiJ0xw
4Ns+f5Q6Gqm927oHXpylUQEmuHKsCp3dK/kZaxJOXsmq6syY1gbrLj2Anq0iWWP4
Tq8WMktUrTcc+zQ2pFR7ovEihK0Rvhmk6/N4+4JwAGijfhejxwNX8T6PCuYs5Jiv
hQvsI9FdIIlTP4XhFZ4N9ndnEwA4AH7tNBsmB3HEbLqUSmu2Rr8hGiT2Plc4Y9AO
aliW1kOMsZFYrX39krfRk2n2NXvieQJ/lw318gSGR67uckkz2ZekbCEpj/0mnHWD
3R6V7m95R6UYqjcw++Q5CtZ2tzmxomZTf42IGIKBbSVmIS75WY+cBULUx3PcZYHD
ZqAbB0Dl4MbdEH61kOI8EbN/TLl1i077r+9LXR1mOnlC3GLD03+XfY8eEBQf7137
YSMiW5r/5xwQk7xEcKlbZdmUJp3ZDTQBXT06vavvp3jlkqqH9QOE8ViZZ6aKQLqv
pL+4bs52jzuGwTMT7gOR5MzD+vT0fVS7Xm8MjOxvZgbHsAgzyFGlI1ggUQmU7lu3
uPNL0eRx4S1G4Jn5
=OGYX
-----END PGP PUBLIC KEY BLOCK-----
[root@localhost test]#

6 执行升级:

 
centos-upgrade-tool-cli --network 7 --instrepo=http://vault.centos.org/centos/7.2.1511/os/x86_64/
setting up repos...
.treeinfo | 1.1 kB 00:00
Preupgrade assistant risk check found risks for this upgrade.
You can run preupg --riskcheck --verbose to view these risks.
Addressing high risk issues is required before the in-place upgrade
and ignoring these risks may result in a broken upgrade and unsupported upgrade.
Please backup your data.
...
...

(276/277): zlib-1.2.7-17.el7.x86_64.rpm | 90 kB 00:00
(277/277): zlib-devel-1.2.7-17.el7.x86_64.rpm | 50 kB 00:00
testing upgrade transaction
rpm transaction 100% [======================================================================================================================================================]
rpm install 100% [==========================================================================================================================================================]
setting up system for upgrade
Finished. Reboot to start upgrade.

7 根据提示,重启升级。

升级后的OS、内核版本、文件系统信息如下:

[root@dev-middleware conf.d]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
[root@dev-middleware conf.d]# uname -rm
3.10.0-327.el7.x86_64 x86_64
[root@dev-middleware conf.d]# df -Th
文件系统 类型 容量 已用 可用 已用% 挂载点
/dev/mapper/VolGroup-lv_root ext4 50G 43G 4.3G 91% /
devtmpfs devtmpfs 1.9G 0 1.9G 0% /dev
tmpfs tmpfs 1.9G 0 1.9G 0% /dev/shm
tmpfs tmpfs 1.9G 80M 1.8G 5% /run
tmpfs tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/sda1 ext4 477M 110M 338M 25% /boot
/dev/mapper/VolGroup-lv_home ext4 144G 354M 136G 1% /home
[root@dev-middleware conf.d]#

四 遇到的问题,及解决办法

1 Downloading failed: invalid data in .treeinfo: No option ‘upgrade’ in section: ‘images-x86_64’
升级过程中,可能会遇到:No option ‘upgrade’ in section: ‘images-x86_64’:

redhat-upgrade-tool --network 7.0 --force --instrepo  http://mirrors.aliyun.com/centos/7.3.1611/os/x86_64/
setting up repos...
YumRepo Error: All mirror URLs are not using ftp, http[s] or file.
 Eg. Invalid release/
removing mirrorlist with no valid mirrors: /var/tmp/system-upgrade/base/mirrorlist.txt
YumRepo Error: All mirror URLs are not using ftp, http[s] or file.
 Eg. Invalid release/
removing mirrorlist with no valid mirrors: /var/tmp/system-upgrade/extras/mirrorlist.txt
YumRepo Error: All mirror URLs are not using ftp, http[s] or file.
 Eg. Invalid release/
removing mirrorlist with no valid mirrors: /var/tmp/system-upgrade/updates/mirrorlist.txt
No upgrade available for the following repos: base extras updates
.treeinfo                                                                                                                                             |  946 B     00:00     
getting boot images...

Downloading failed: invalid data in .treeinfo: No option 'upgrade' in section: 'images-x86_64'

原因,如果使用互联上的签名文件的话,互联网上的yum源都会有相同的报错,.treeinfo: No option ‘upgrade’ in section: ‘images-x86_64’。即,网上的yum源下的文件.treeinfo里都没有upgrade选项。所以,经过测试,才使用了上述升级步骤中的导入本地rpm签名证书。

2 升级后,遇到nginx启动报错,Starting nginx: /usr/sbin/nginx: error while loading shared libraries: libpcre.so.0: cannot open shared object file: No such file or directory

[root@dev-middleware ~]# systemctl status nginx.service
● nginx.service - LSB: start and stop nginx
Loaded: loaded (/etc/rc.d/init.d/nginx; bad; vendor preset: disabled)
Active: failed (Result: exit-code) since 六 2017-05-27 16:49:28 CST; 5min ago
Docs: man:systemd-sysv-generator(8)

5月 27 16:49:28 dev-middleware systemd[1]: Starting LSB: start and stop nginx...
5月 27 16:49:28 dev-middleware nginx[1661]: Starting nginx: /usr/sbin/nginx: error while loading shared libraries: libpcre.so.0: cannot open shared object file: No such file or directory
5月 27 16:49:28 dev-middleware nginx[1661]: [失败]
5月 27 16:49:28 dev-middleware systemd[1]: nginx.service: control process exited, code=exited status=1
5月 27 16:49:28 dev-middleware systemd[1]: Failed to start LSB: start and stop nginx.
5月 27 16:49:28 dev-middleware systemd[1]: Unit nginx.service entered failed state.
5月 27 16:49:28 dev-middleware systemd[1]: nginx.service failed.
[root@dev-middleware ~]#

解决办法:

a 查找libpcre.so文件
 
[root@dev-middleware ~]# find / -name libpcre.so*
/usr/lib64/libpcre.so.1.2.0
/usr/lib64/libpcre.so
/usr/lib64/libpcre.so.1
/usr/lib64/libpcre.so.0
[root@dev-middleware ~]#
b 创建链接文件
 
[root@dev-middleware ~]# link /usr/lib64/libpcre.so.1 /lib64/libpcre.so.0
c 输出LD_LIBRARY_PATH环境变量
 
[root@dev-middleware ~]# export LD_LIBRARY_PATH=/usr/local/lib:/lib64:$LD_LIBRARY_PATH
[root@dev-middleware ~]# echo $LD_LIBRARY_PATH
/usr/local/lib:/lib64:/usr/lib64:/usr/local/lib:
[root@dev-middleware ~]#
d 相关链接:

链接:
关于共享库找不到:
http://sempike.blogspot.com/2016/02/update-to-centos-7-libpcreso0-no-such.html
关于nginx启动报错:
http://blog.csdn.net/bzhxuexi/article/details/35821121
为防止以后出错,还是把export LD_LIBRARY_PATH=/usr/local/lib:/lib64:$LD_LIBRARY_PATH写入root的profile。

3 升级后,遇到类似问题:/ppas/9.3as/bin/edb-postgres: error while loading shared libraries: libsasl2.so.2: cannot open shared object file: No such file or directory

这是,之前一台运行CentOS 6.5的机器,上面跑着PostgreSQL数据库,升级到CentOS 7版本之后,数据库启动失败。

[enterprisedb@localhost ~]$ pg_ctl start
/ppas/9.3as/bin/edb-postgres: error while loading shared libraries: libsasl2.so.2: cannot open shared object file: No such file or directory
no data was returned by command ""/ppas/9.3as/bin/edb-postgres" -V"
The program "edb-postgres" is needed by pg_ctl but was not found in the
same directory as "/ppas/9.3as/bin/pg_ctl".
Check your installation.
[enterprisedb@localhost ~]$

解决办法:

a root创建链接文件:
 
[root@localhost ~]# find / -name libsasl2.*
/usr/lib64/libsasl2.so.3.0.0
/usr/lib64/libsasl2.so.3
[root@localhost ~]# link /usr/lib64/libsasl2.so.3 /lib64/libsasl2.so.2
b 然后,PostgreSQL数据库用户,输出环境变量:
 
[enterprisedb@localhost ~]$ export LD_LIBRARY_PATH=/usr/lib64/:/lib64
[enterprisedb@localhost ~]$ echo $LD_LIBRARY_PATH
/usr/lib64/:/lib64
[enterprisedb@localhost ~]$ pg_ctl status
pg_ctl: no server running
[enterprisedb@localhost ~]$ pg_ctl start
server starting
[enterprisedb@localhost ~]$ 2017-05-27 17:18:05 CST 日志:

** EnterpriseDB Dynamic Tuning Agent ********************************************
* System Utilization: 100% *
* Database Version: 9.3.11.33 *
* Operating System Version: *
* Number of Processors: 0 *
* Processor Type: *
* Processor Architecture: *
* Database Size: 0.1 GB *
* RAM: 15.5 GB *
* Shared Memory: 15839 MB *
* Max DB Connections: 0 *
* Autovacuum: on *
* Autovacuum Naptime: 60 Seconds *
* InfiniteCache: off *
* InfiniteCache Servers: 0 *
* InfiniteCache Size: 0.000 GB *
*********************************************************************************

2017-05-27 17:18:05 CST 日志: 已加载的库 "$libdir/dbms_pipe"
2017-05-27 17:18:05 CST 日志: 已加载的库 "$libdir/edb_gen"
2017-05-27 17:18:05 CST 日志: redirecting log output to logging collector process
2017-05-27 17:18:05 CST 提示: Future log output will appear in directory "pg_log".

[enterprisedb@localhost ~]$

五 参考链接:

远程升级云服务器系统 CentOS 6.x 至 CentOS 7.x

PostgreSQL授予/回收其他用户访问当前用户下所有表的权限小结

Q1:PostgreSQL数据库中,如何授予其他用户可以查询当前用户下的所有表的权限?

A1:分2个步骤。首先,赋予其他用户访问当前用户下所有表的查询权限;其次,其他用户要有使用当前用户的schema的权限。

例子:当前库为testdb,当前用户为testu1,schema为test_schema,表为t1,t2。如何让其他用户testu2有查询t1、t2表的权限?

当前数据库版本为EDB 9.3,以testu1连接到testdb数据库,在test_schema下有2张表,t1和t2,各有1条记录。

testdb=> select version;
                                                       version                                                       
---------------------------------------------------------------------------------------------------------------------
 EnterpriseDB 9.3.11.33 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55), 64-bit
(1 row)

testdb=> \c
You are now connected to database "testdb" as user "testu1".
testdb=> \dn
      List of schemas
    Name     |    Owner     
-------------+--------------
 public      | enterprisedb
 test_schema | testu1
(2 rows)

testdb=> \d            
          List of relations
   Schema    | Name | Type  | Owner  
-------------+------+-------+--------
 test_schema | t1   | table | testu1
 test_schema | t2   | table | testu1
(2 rows)

testdb=> select * from t1;
 id 
----
  1
(1 row)

testdb=> select * from t2;
 id 
----
  2
(1 row)

testdb=>

同时,testu2用户连接testdb时,不能访问testu1在test_schema下的表。提示,对模式 test_schema 权限不够。

testdb=> \c
You are now connected to database "testdb" as user "testu2".
testdb=> \dn
      List of schemas
    Name     |    Owner     
-------------+--------------
 public      | enterprisedb
 test_schema | testu1
(2 rows)

testdb=> select * from test_schema.t1;
错误:  对模式 test_schema 权限不够
LINE 1: select * from test_schema.t1;
                      ^
testdb=>

接下来,testu1用户授权给testu2使用test_schema的权限。

testdb=> \c
You are now connected to database "testdb" as user "testu1".
testdb=> grant USAGE on SCHEMA test_schema to testu2;
GRANT
testdb=>

授权之后,看看testu2能否访问testu1在test_schema下的表?

 
postgres=# \c testdb testu2;
You are now connected to database "testdb" as user "testu2".
testdb=> select * from test_schema.t1;
错误:  对关系 t1 权限不够
testdb=>

不再提示对模式 test_schema 权限不够,而是对表的查询权限不够。
接下来,授予testu2对于test_schema下所有表的查询权限:

 
testdb=> \c
You are now connected to database "testdb" as user "testu1".
testdb=> grant SELECT on all tables in schema test_schema to testu2;
GRANT
testdb=>

最后,以testu2用户登录testdb数据库时,可以顺利访问testu1用户在test_schema下的所有表了:

testdb=> \c
You are now connected to database "testdb" as user "testu2".
testdb=> select * from test_schema.t1;
 id 
----
  1
(1 row)

testdb=> select * from test_schema.t2;
 id 
----
  2
(1 row)

testdb=>

Q2:对于已经授予某个用户查询当前用户下的指定schema下所有表的情况下,当前用户下在指定schema下创建的新表,其他用户是否有权限访问?如果没有,该怎么赋予其他用户查询当前用户新建的表?

A2:对于新建的表,其他用户没有查询权限。可以通过alter default privileges来赋予访问新建的表的权限。具体如下:

testu2用户在testdb数据库下的test_schema下新建t3表,testu2自己正常访问没问题。而testu2连接testdb数据库,则不能查询新建的t3表。

testdb=> \c
You are now connected to database "testdb" as user "testu1".
testdb=> create table t3(id int);
CREATE TABLE
testdb=> insert into t3 values(3);
INSERT 0 1
testdb=> select * from t3;
 id 
----
  3
(1 row)

testdb=> \c testdb testu2
You are now connected to database "testdb" as user "testu2".
testdb=> select * from test_schema.t3;
错误:  对关系 t3 权限不够
testdb=>

可以通过下述2种方法中的任意一种,来使testu2用户可以查询test1用户新建的t3表:

方法1

 testdb=> \c testdb testu1
You are now connected to database "testdb" as user "testu1".
testdb=> grant SELECT on all tables in schema test_schema to testu2;
GRANT
testdb=> \c testdb testu2
You are now connected to database "testdb" as user "testu2".
testdb=> select * from test_schema.t3;                              
 id 
----
  3
(1 row)

testdb=>

或者,方法2:

 
testdb=> \c testdb testu1
You are now connected to database "testdb" as user "testu1".
testdb=> revoke SELECT on t3 from testu2;
REVOKE
testdb=> \c testdb testu2
You are now connected to database "testdb" as user "testu2".
testdb=> select * from test_schema.t3;                  
错误:  对关系 t3 权限不够
testdb=> \c testdb testu1
You are now connected to database "testdb" as user "testu1".
testdb=> grant SELECT on t3 to testu2;
GRANT
testdb=> \c testdb testu2
You are now connected to database "testdb" as user "testu2".
testdb=> select * from test_schema.t3;  
 id 
----
  3
(1 row)

testdb=>

显然,上述的2种方法都不是最好的方法。不可能对于每一张新建的表都重新执行一次赋权操作。通过alter default privileges赋权操作来实现。

 
testdb=> \c testdb testu1
You are now connected to database "testdb" as user "testu1".
testdb=> alter default privileges in schema test_schema grant SELECT on tables to testu2;
ALTER DEFAULT PRIVILEGES
testdb=> create table t4(id int);
CREATE TABLE
testdb=> insert into t4 values(4);
INSERT 0 1
testdb=> select * from t4;
 id 
----
  4
(1 row)

testdb=> \c testdb testu2
You are now connected to database "testdb" as user "testu2".
testdb=> select * from test_schema.t4;                                                   
 id 
----
  4
(1 row)

testdb=>

此外,可以通过\ddp命令来查看默认权限[\describe default privileges,我猜测这条命令的简写意义所在]:

testdb=> \ddp+
            Default access privileges
 Owner  |   Schema    | Type  | Access privileges 
--------+-------------+-------+-------------------
 testu1 | test_schema | table | testu2=r/testu1
(1 row)

testdb=> 

Q3:对于已经拥有默认权限的用户的情况下,如何删除?即,如何处理类似DETAIL: privileges for default privileges on new relations belonging to role testu1 in schema test_schema的错误?

A3:需要3个步骤。1,回收schema的usage权限;2,回收对所有表的查询权限;3,执行类似alter default privileges来回收默认权限后再删除。

即使通过最高权限的数据库管理员来删除testu2用户也报错,甚至加上cascade也删除不成功。

postgres=# \c
You are now connected to database "postgres" as user "enterprisedb".
postgres=# drop user testu2;
错误:  无法删除"testu2"因为有其它对象倚赖它
DETAIL:  在数据库 testdb中的6个对象
postgres=# drop user testu2 cascade;
错误:  无法删除"testu2"因为有其它对象倚赖它
DETAIL:  在数据库 testdb中的6个对象
postgres=#

依次回收schema的usage权限、回收所有表的查询权限、最后通过alter default privileges来回收默认权限,同时通过\ddp查看的默认权限为空:

testdb=> \c
You are now connected to database "testdb" as user "testu1".
testdb=> revoke USAGE on SCHEMA test_schema from testu2;
REVOKE
testdb=> revoke SELECT on all tables in schema test_schema from testu2;
REVOKE
testdb=> alter default privileges in schema test_schema revoke SELECT on tables from testu2;
ALTER DEFAULT PRIVILEGES
testdb=> \ddp
         Default access privileges
 Owner | Schema | Type | Access privileges 
-------+--------+------+-------------------
(0 rows)

testdb=> 

最后,通过管理员来顺利删除testu2用户:

postgres=# drop user testu2;
DROP ROLE
postgres=# 

小结:
1 把当前用户下某个schema下的表的查询权限赋给其他用户,既需要把schema的usage权限给其他用户,也要赋予其他用户对于表的查询权限(可以指定特定表来赋权查询操作,也可以赋予一次性赋予所有表的查询权限);
2 对于将来新建表的查询权限想要一次性的赋予权限,则需要通过alter default privileges赋予默认权限来操作;
3 回收权限的话,需要回收schema的usage权限,也要回收表的查询权限,还有回收默认权限;
4 可以通过\ddp来查看默认赋权信息。

Java学习笔记002:primitive type效率高

问题:在Java中,为什么使用primitive type的效率要比通过new来创建数据类型的效率高?

首先,在Java中,primitive type从字面上可以理解为基本数据类型,或者原始数据类型。它是char、byte、short、int、long、double、float、Boolean的统称。

其次,在Java中,存放数据的地方有下面5个地方。

  1. register,寄存器。即,位于CPU芯片上的寄存器,其特点是速度特别快,但是不能直接操作;
  2. stack,内存栈区。位于Random Access Memory上,即内存中。运行速度次于register,可直接操作;
  3. heap,内存堆。同样位于内存区。其特点是,所有通过new关键字来创建出来的对象都位于该内存堆区;
  4. Read Only Memory,只读存取器。通常用于存放常量或者静态的不变的数据;
  5. non-RAM,非内存存储。说白了,就是把那些可以独立于程序存放的数据,不放在上面任何一个地方存储。而是存放在诸如磁盘或光盘上。类似于Oracle数据库的外部表的数据不存放在数据库里,而是存放在数据库之外的存储上一样。

这里,有一个重点是,在Java中,所有通过new来创建的对象,都会存放在内存的heap区中。另外,从1-5的存储区,其运行速度是逐步降低的。因此,当我们使用new去创建对象/变量时,其运行速度肯定不如内存的stack区效率高。

所以,Java也沿袭了C/C++的做法,提供了一些原始数据类型,并且把这些类型的变量放到内存的stack区来处理。当我们需要使用这些类型的变量时,直接拿过来用即可,而不再需要通过new关键字来新建1个出来。如下,整型变量n1要比n2快:

 
public class PrimitiveType {
	public static void main(String[] args) {
		int n1 = 47;
		Integer n2 = new Integer(47);
		System.out.println("Fast:n1" + n1);
		System.out.println("Slow:n2" + n2);
	}
}

 

小结:Java中提供的原始数据类型,都是在内存的stack区处理。通过new来创建的变量是在内存的heap区进行处理。 前者执行效率要比通过new在内存heap区创建变量的执行效率高。

解决PostgreSQL因日志文件过少导致的故障

1早上,公司一项目(近期在做一个活动,截止日期是今天,2017.4.9,估计有很大用户在截止日期之前,大量的刷票)同事反馈说,系统特别卡,极其的慢,页面刷不出来。

 

登录到数据库之后,看到大量的查询等待:

[root@ecs-yqb4 ~]# ps -ef|grep post
postgres 8939 1 0 2016 ? 00:02:33 /data/postgres/9.4.4/bin/postgres
postgres 8947 8939 0 2016 ? 00:02:23 postgres: checkpointer process
postgres 8948 8939 0 2016 ? 00:08:17 postgres: writer process
postgres 8949 8939 0 2016 ? 00:03:51 postgres: wal writer process
postgres 8950 8939 0 2016 ? 00:03:04 postgres: autovacuum launcher process
postgres 8951 8939 0 2016 ? 00:11:38 postgres: stats collector process
postgres 24561 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54151) SELECT
postgres 24562 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54152) SELECT
postgres 24563 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54153) SELECT
postgres 24564 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54154) SELECT
postgres 24565 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54155) SELECT
postgres 24566 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54156) SELECT
postgres 24567 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54157) SELECT
postgres 24568 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54158) SELECT
postgres 24569 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54159) SELECT
postgres 24570 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54160) SELECT
postgres 24748 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54210) SELECT
postgres 24749 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54211) SELECT
postgres 24750 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54212) SELECT
postgres 24751 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54213) SELECT
postgres 24752 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54214) SELECT
postgres 24753 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54215) SELECT
postgres 24754 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54216) SELECT
postgres 24755 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54217) SELECT
postgres 24756 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54218) SELECT
postgres 24757 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54219) SELECT
postgres 24758 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54220) SELECT
postgres 24759 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54221) SELECT
postgres 24760 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54222) SELECT
postgres 24761 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54223) SELECT
postgres 24762 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54224) SELECT
postgres 24763 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54225) SELECT
postgres 24764 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54226) SELECT
postgres 24765 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54227) SELECT
postgres 24766 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54228) SELECT
postgres 24767 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54229) SELECT
postgres 24768 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54230) SELECT
postgres 24769 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54231) SELECT
postgres 24770 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54232) SELECT
postgres 24771 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54233) idle in transaction
postgres 24773 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54234) SELECT
postgres 24774 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54235) SELECT
postgres 24775 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54236) SELECT
postgres 24776 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54237) SELECT
postgres 24778 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54238) SELECT
postgres 24779 8939 3 09:06 ? 00:01:21 postgres: onlyou_activity onlyou_activity 10.20.1.12(54239) SELECT
postgres 24780 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54240) SELECT
postgres 24781 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54241) SELECT
postgres 24782 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54242) SELECT
postgres 24783 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54243) SELECT
postgres 24784 8939 3 09:06 ? 00:01:21 postgres: onlyou_activity onlyou_activity 10.20.1.12(54244) SELECT
postgres 24785 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54245) SELECT
postgres 24786 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54246) SELECT
postgres 24787 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54247) SELECT
postgres 24788 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54248) SELECT
postgres 24789 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54249) SELECT
postgres 24790 8939 3 09:06 ? 00:01:21 postgres: onlyou_activity onlyou_activity 10.20.1.12(54250) SELECT
postgres 24791 8939 3 09:06 ? 00:01:21 postgres: onlyou_activity onlyou_activity 10.20.1.12(54251) SELECT
postgres 24793 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54252) SELECT
postgres 24795 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54253) SELECT
postgres 24797 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54254) SELECT
postgres 24798 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54255) SELECT
postgres 24799 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54256) SELECT
postgres 24800 8939 3 09:06 ? 00:01:21 postgres: onlyou_activity onlyou_activity 10.20.1.12(54257) SELECT
postgres 24801 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54258) SELECT
postgres 24803 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54259) idle
postgres 24804 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54260) SELECT
postgres 24805 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54261) SELECT
postgres 24806 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54262) SELECT
postgres 24808 8939 3 09:06 ? 00:01:21 postgres: onlyou_activity onlyou_activity 10.20.1.12(54263) SELECT
postgres 24809 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54264) SELECT
postgres 24810 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54265) SELECT
postgres 24811 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54266) SELECT
postgres 24812 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54267) SELECT
postgres 24813 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54268) SELECT
postgres 24814 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54269) SELECT
postgres 24815 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54270) SELECT
postgres 24816 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54271) SELECT
postgres 24817 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54272) SELECT
postgres 24818 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54273) idle
postgres 24819 8939 3 09:06 ? 00:01:21 postgres: onlyou_activity onlyou_activity 10.20.1.12(54274) SELECT
postgres 24820 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54275) SELECT
postgres 24821 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54276) SELECT
postgres 24822 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54277) SELECT
postgres 24823 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54278) SELECT
postgres 24826 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54279) SELECT
postgres 24827 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54280) SELECT
postgres 24828 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54281) SELECT
postgres 24829 8939 3 09:06 ? 00:01:21 postgres: onlyou_activity onlyou_activity 10.20.1.12(54282) SELECT
postgres 24830 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54283) SELECT
postgres 24831 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54284) SELECT
postgres 24832 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54285) SELECT
postgres 24833 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54286) SELECT
postgres 24834 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54287) SELECT
postgres 24835 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54288) SELECT
postgres 24836 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54289) SELECT
postgres 24837 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54290) SELECT
postgres 24838 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54291) SELECT
postgres 24840 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54292) SELECT
postgres 24841 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54293) SELECT
postgres 24842 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54294) SELECT
postgres 24843 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54295) SELECT
postgres 24844 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54296) SELECT
postgres 24845 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54297) SELECT
postgres 24846 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54298) SELECT
postgres 24849 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54299) SELECT
root 25338 25304 0 09:36 pts/2 00:00:00 su - postgres
postgres 25339 25338 0 09:36 pts/2 00:00:00 -bash
root 25404 25122 0 09:43 pts/0 00:00:00 su - postgres
postgres 25405 25404 0 09:43 pts/0 00:00:00 -bash
root 25462 25442 0 09:45 pts/3 00:00:00 grep post
[root@ecs-yqb4 ~]#

2初步反应是有较慢的SQL语句,消耗系统资源导致的。通过下述SQL查看:

 select t1.datname, t2.query, t2.calls, t2.total_time, t2.total_time/t2.calls from pg_database t1, pg_stat_statements t2 where t1.oid=t2.dbid order by t2.total_time desc limit 10;

发现系统没有启用 pg_stat_statements这个扩展,于是,线上安装,

create extension pg_stat_statements;

并修改配置文件,/data/postgres/9.4.4/data/postgresql.conf:

shared_preload_libraries = ‘pg_stat_statements’ # (change requires restart)

修改该参数,需要重启数据库:

 pg_ctl restart -m fast;

再次,查看消耗系统资源的SQL。发现并没有如预期那样能看到消耗过高系统资源问题SQL语句存在。

3于是,进一步排查,发现是数据库的日志文件设置过少,导致数据库后台在做大量的checkpoint操作,导致数据库响应不及时,几乎是hang住挂起状态。

找到原因之后,再次修改数据库的配置文件:/data/postgres/9.4.4/data/postgresql.conf
增加日志文件到30个:
checkpoint_segments = 30 # in logfile segments, min 1, 16MB each

重新加载配置文件使之生效,或者重启数据库。

 
pg_ctl reload;

pg_ctl restart -m fast;

设置生效:

 
[postgres@ecs-yqb4 pg_xlog]$ pwd
/data/postgres/9.4.4/data/pg_xlog
[postgres@ecs-yqb4 pg_xlog]$ ll
-rw------- 1 postgres postgres 16777216 4月 9 10:24 000000010000000700000082
-rw------- 1 postgres postgres 16777216 4月 9 10:24 000000010000000700000083
-rw------- 1 postgres postgres 16777216 4月 9 10:25 000000010000000700000084
-rw------- 1 postgres postgres 16777216 4月 9 10:25 000000010000000700000085
-rw------- 1 postgres postgres 16777216 4月 9 10:25 000000010000000700000086
-rw------- 1 postgres postgres 16777216 4月 9 10:25 000000010000000700000087
-rw------- 1 postgres postgres 16777216 4月 9 10:26 000000010000000700000088
-rw------- 1 postgres postgres 16777216 4月 9 10:26 000000010000000700000089
-rw------- 1 postgres postgres 16777216 4月 9 10:26 00000001000000070000008A
-rw------- 1 postgres postgres 16777216 4月 9 10:26 00000001000000070000008B
-rw------- 1 postgres postgres 16777216 4月 9 10:27 00000001000000070000008C
-rw------- 1 postgres postgres 16777216 4月 9 10:27 00000001000000070000008D
-rw------- 1 postgres postgres 16777216 4月 9 10:27 00000001000000070000008E
-rw------- 1 postgres postgres 16777216 4月 9 10:27 00000001000000070000008F
-rw------- 1 postgres postgres 16777216 4月 9 10:28 000000010000000700000090
-rw------- 1 postgres postgres 16777216 4月 9 10:28 000000010000000700000091
-rw------- 1 postgres postgres 16777216 4月 9 10:28 000000010000000700000092
-rw------- 1 postgres postgres 16777216 4月 9 10:28 000000010000000700000093
-rw------- 1 postgres postgres 16777216 4月 9 10:29 000000010000000700000094
-rw------- 1 postgres postgres 16777216 4月 9 10:29 000000010000000700000095
-rw------- 1 postgres postgres 16777216 4月 9 10:29 000000010000000700000096
-rw------- 1 postgres postgres 16777216 4月 9 10:29 000000010000000700000097

4 很快,系统恢复正常使用。做个简单的记录,免于再次掉坑。

Java学习笔记001:Java中的变量

在Java程序中,变量既可以是基本数据类型的某个值,也可以是指向某个对象的引用。

后半句,是我自己前段时间在学习Java的过程中,不曾理解到位的内容。这个跟面向过程的编程语言C,有很大的区别。是自己被C语言的思维模式固化了,因为在自己的脑子里,直观的反应就是类似于这样的变量:

 
int n1=10;
char c1='C';

接触Java,脑子里就得有后半句关于变量的概念。

一段关于Java的源代码如下:

 
public class JavaVariables{
	public static void main(String[] args){
		int n1=10;
		String s1="Hello,Java.";
		System.out.println(n1);
		System.out.println(s1);
	}
}

编译运行,如下:

 
$ javac JavaVariables.java 
$ java JavaVariables
10
Hello,Java.
$