解决PostgreSQL因日志文件过少导致的故障

1早上,公司一项目(近期在做一个活动,截止日期是今天,2017.4.9,估计有很大用户在截止日期之前,大量的刷票)同事反馈说,系统特别卡,极其的慢,页面刷不出来。

 

登录到数据库之后,看到大量的查询等待:

[root@ecs-yqb4 ~]# ps -ef|grep post
postgres 8939 1 0 2016 ? 00:02:33 /data/postgres/9.4.4/bin/postgres
postgres 8947 8939 0 2016 ? 00:02:23 postgres: checkpointer process
postgres 8948 8939 0 2016 ? 00:08:17 postgres: writer process
postgres 8949 8939 0 2016 ? 00:03:51 postgres: wal writer process
postgres 8950 8939 0 2016 ? 00:03:04 postgres: autovacuum launcher process
postgres 8951 8939 0 2016 ? 00:11:38 postgres: stats collector process
postgres 24561 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54151) SELECT
postgres 24562 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54152) SELECT
postgres 24563 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54153) SELECT
postgres 24564 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54154) SELECT
postgres 24565 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54155) SELECT
postgres 24566 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54156) SELECT
postgres 24567 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54157) SELECT
postgres 24568 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54158) SELECT
postgres 24569 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54159) SELECT
postgres 24570 8939 3 09:05 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54160) SELECT
postgres 24748 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54210) SELECT
postgres 24749 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54211) SELECT
postgres 24750 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54212) SELECT
postgres 24751 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54213) SELECT
postgres 24752 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54214) SELECT
postgres 24753 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54215) SELECT
postgres 24754 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54216) SELECT
postgres 24755 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54217) SELECT
postgres 24756 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54218) SELECT
postgres 24757 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54219) SELECT
postgres 24758 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54220) SELECT
postgres 24759 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54221) SELECT
postgres 24760 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54222) SELECT
postgres 24761 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54223) SELECT
postgres 24762 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54224) SELECT
postgres 24763 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54225) SELECT
postgres 24764 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54226) SELECT
postgres 24765 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54227) SELECT
postgres 24766 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54228) SELECT
postgres 24767 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54229) SELECT
postgres 24768 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54230) SELECT
postgres 24769 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54231) SELECT
postgres 24770 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54232) SELECT
postgres 24771 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54233) idle in transaction
postgres 24773 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54234) SELECT
postgres 24774 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54235) SELECT
postgres 24775 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54236) SELECT
postgres 24776 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54237) SELECT
postgres 24778 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54238) SELECT
postgres 24779 8939 3 09:06 ? 00:01:21 postgres: onlyou_activity onlyou_activity 10.20.1.12(54239) SELECT
postgres 24780 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54240) SELECT
postgres 24781 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54241) SELECT
postgres 24782 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54242) SELECT
postgres 24783 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54243) SELECT
postgres 24784 8939 3 09:06 ? 00:01:21 postgres: onlyou_activity onlyou_activity 10.20.1.12(54244) SELECT
postgres 24785 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54245) SELECT
postgres 24786 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54246) SELECT
postgres 24787 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54247) SELECT
postgres 24788 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54248) SELECT
postgres 24789 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54249) SELECT
postgres 24790 8939 3 09:06 ? 00:01:21 postgres: onlyou_activity onlyou_activity 10.20.1.12(54250) SELECT
postgres 24791 8939 3 09:06 ? 00:01:21 postgres: onlyou_activity onlyou_activity 10.20.1.12(54251) SELECT
postgres 24793 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54252) SELECT
postgres 24795 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54253) SELECT
postgres 24797 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54254) SELECT
postgres 24798 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54255) SELECT
postgres 24799 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54256) SELECT
postgres 24800 8939 3 09:06 ? 00:01:21 postgres: onlyou_activity onlyou_activity 10.20.1.12(54257) SELECT
postgres 24801 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54258) SELECT
postgres 24803 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54259) idle
postgres 24804 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54260) SELECT
postgres 24805 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54261) SELECT
postgres 24806 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54262) SELECT
postgres 24808 8939 3 09:06 ? 00:01:21 postgres: onlyou_activity onlyou_activity 10.20.1.12(54263) SELECT
postgres 24809 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54264) SELECT
postgres 24810 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54265) SELECT
postgres 24811 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54266) SELECT
postgres 24812 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54267) SELECT
postgres 24813 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54268) SELECT
postgres 24814 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54269) SELECT
postgres 24815 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54270) SELECT
postgres 24816 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54271) SELECT
postgres 24817 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54272) SELECT
postgres 24818 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54273) idle
postgres 24819 8939 3 09:06 ? 00:01:21 postgres: onlyou_activity onlyou_activity 10.20.1.12(54274) SELECT
postgres 24820 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54275) SELECT
postgres 24821 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54276) SELECT
postgres 24822 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54277) SELECT
postgres 24823 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54278) SELECT
postgres 24826 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54279) SELECT
postgres 24827 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54280) SELECT
postgres 24828 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54281) SELECT
postgres 24829 8939 3 09:06 ? 00:01:21 postgres: onlyou_activity onlyou_activity 10.20.1.12(54282) SELECT
postgres 24830 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54283) SELECT
postgres 24831 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54284) SELECT
postgres 24832 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54285) SELECT
postgres 24833 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54286) SELECT
postgres 24834 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54287) SELECT
postgres 24835 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54288) SELECT
postgres 24836 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54289) SELECT
postgres 24837 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54290) SELECT
postgres 24838 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54291) SELECT
postgres 24840 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54292) SELECT
postgres 24841 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54293) SELECT
postgres 24842 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54294) SELECT
postgres 24843 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54295) SELECT
postgres 24844 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54296) SELECT
postgres 24845 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54297) SELECT
postgres 24846 8939 3 09:06 ? 00:01:20 postgres: onlyou_activity onlyou_activity 10.20.1.12(54298) SELECT
postgres 24849 8939 3 09:06 ? 00:01:19 postgres: onlyou_activity onlyou_activity 10.20.1.12(54299) SELECT
root 25338 25304 0 09:36 pts/2 00:00:00 su - postgres
postgres 25339 25338 0 09:36 pts/2 00:00:00 -bash
root 25404 25122 0 09:43 pts/0 00:00:00 su - postgres
postgres 25405 25404 0 09:43 pts/0 00:00:00 -bash
root 25462 25442 0 09:45 pts/3 00:00:00 grep post
[root@ecs-yqb4 ~]#

2初步反应是有较慢的SQL语句,消耗系统资源导致的。通过下述SQL查看:

 select t1.datname, t2.query, t2.calls, t2.total_time, t2.total_time/t2.calls from pg_database t1, pg_stat_statements t2 where t1.oid=t2.dbid order by t2.total_time desc limit 10;

发现系统没有启用 pg_stat_statements这个扩展,于是,线上安装,

create extension pg_stat_statements;

并修改配置文件,/data/postgres/9.4.4/data/postgresql.conf:

shared_preload_libraries = ‘pg_stat_statements’ # (change requires restart)

修改该参数,需要重启数据库:

 pg_ctl restart -m fast;

再次,查看消耗系统资源的SQL。发现并没有如预期那样能看到消耗过高系统资源问题SQL语句存在。

3于是,进一步排查,发现是数据库的日志文件设置过少,导致数据库后台在做大量的checkpoint操作,导致数据库响应不及时,几乎是hang住挂起状态。

找到原因之后,再次修改数据库的配置文件:/data/postgres/9.4.4/data/postgresql.conf
增加日志文件到30个:
checkpoint_segments = 30 # in logfile segments, min 1, 16MB each

重新加载配置文件使之生效,或者重启数据库。

 
pg_ctl reload;

pg_ctl restart -m fast;

设置生效:

 
[postgres@ecs-yqb4 pg_xlog]$ pwd
/data/postgres/9.4.4/data/pg_xlog
[postgres@ecs-yqb4 pg_xlog]$ ll
-rw------- 1 postgres postgres 16777216 4月 9 10:24 000000010000000700000082
-rw------- 1 postgres postgres 16777216 4月 9 10:24 000000010000000700000083
-rw------- 1 postgres postgres 16777216 4月 9 10:25 000000010000000700000084
-rw------- 1 postgres postgres 16777216 4月 9 10:25 000000010000000700000085
-rw------- 1 postgres postgres 16777216 4月 9 10:25 000000010000000700000086
-rw------- 1 postgres postgres 16777216 4月 9 10:25 000000010000000700000087
-rw------- 1 postgres postgres 16777216 4月 9 10:26 000000010000000700000088
-rw------- 1 postgres postgres 16777216 4月 9 10:26 000000010000000700000089
-rw------- 1 postgres postgres 16777216 4月 9 10:26 00000001000000070000008A
-rw------- 1 postgres postgres 16777216 4月 9 10:26 00000001000000070000008B
-rw------- 1 postgres postgres 16777216 4月 9 10:27 00000001000000070000008C
-rw------- 1 postgres postgres 16777216 4月 9 10:27 00000001000000070000008D
-rw------- 1 postgres postgres 16777216 4月 9 10:27 00000001000000070000008E
-rw------- 1 postgres postgres 16777216 4月 9 10:27 00000001000000070000008F
-rw------- 1 postgres postgres 16777216 4月 9 10:28 000000010000000700000090
-rw------- 1 postgres postgres 16777216 4月 9 10:28 000000010000000700000091
-rw------- 1 postgres postgres 16777216 4月 9 10:28 000000010000000700000092
-rw------- 1 postgres postgres 16777216 4月 9 10:28 000000010000000700000093
-rw------- 1 postgres postgres 16777216 4月 9 10:29 000000010000000700000094
-rw------- 1 postgres postgres 16777216 4月 9 10:29 000000010000000700000095
-rw------- 1 postgres postgres 16777216 4月 9 10:29 000000010000000700000096
-rw------- 1 postgres postgres 16777216 4月 9 10:29 000000010000000700000097

4 很快,系统恢复正常使用。做个简单的记录,免于再次掉坑。

发表评论

邮箱地址不会被公开。 必填项已用*标注