一则PostgreSQL数据库的性能问题定位处理

一、先看一台运行PostgreSQL数据库服务器的top和iostat相关信息图：

top:

[root@db2 ~ 11:14:42]$ top

top - 11:16:10 up 2 days, 13:01,  5 users,  load average: 51.62, 41.75, 29.06
Tasks: 948 total,   5 running, 943 sleeping,   0 stopped,   0 zombie
Cpu(s):  9.0%us, 59.2%sy,  0.0%ni, 14.1%id, 17.2%wa,  0.0%hi,  0.5%si,  0.0%st
Mem:  16284812k total, 16159260k used,   125552k free,     5700k buffers
Swap:  8191992k total,   107980k used,  8084012k free,  8010540k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                
31450 postgres  20   0 4534m 2.3g 2.2g D 46.6 15.0   7:02.50 postgres                                                
30166 postgres  20   0 4528m 2.3g 2.2g D 46.0 14.9   4:05.40 postgres                                                
30185 postgres  20   0 4494m 2.2g 2.1g D 43.5 14.1   4:05.08 postgres                                                
31453 postgres  20   0 4529m 1.9g 1.8g D 42.9 12.3   3:39.13 postgres                                                
30232 postgres  20   0 4526m 2.3g 2.1g D 40.4 14.6   6:07.51 postgres                                                
 2811 postgres  20   0 4521m 874m 742m D 39.8  5.5   1:36.34 postgres                                                
31457 postgres  20   0 4534m 1.8g 1.7g R 39.5 11.6   3:55.23 postgres                                                
30956 postgres  20   0 4521m 1.7g 1.5g D 38.8 10.8   2:14.67 postgres                                                
 2714 postgres  20   0 4519m 1.4g 1.3g D 37.9  9.1   1:19.96 postgres                                                
30182 postgres  20   0 4525m 1.9g 1.8g D 37.9 12.5   3:40.31 postgres                                                
31444 postgres  20   0 4525m 2.2g 2.0g D 37.6 13.9   3:29.11 postgres                                                
31654 postgres  20   0 4526m 2.4g 2.2g D 36.7 15.3   5:04.19 postgres                                                
 2717 postgres  20   0 4519m 847m 718m D 36.1  5.3   1:37.20 postgres                                                
  753 postgres  20   0 4533m 3.7g 3.6g D 35.4 23.9  27:52.65 postgres                                                
31451 postgres  20   0 4433m 1.8g 1.7g D 35.4 11.5   2:36.85 postgres                                                
30701 postgres  20   0 4520m 1.7g 1.6g D 35.1 11.1   2:09.85 postgres                                                
31448 postgres  20   0 4435m 2.2g 2.1g D 33.3 13.9   4:16.70 postgres                                                
29192 postgres  20   0 4526m 2.3g 2.1g D 32.6 14.6   4:19.91 postgres                                                
30693 postgres  20   0 4525m 1.9g 1.8g D 32.0 12.4   2:29.72 postgres                                                
 3448 postgres  20   0 4451m 383m 305m D 31.4  2.4   0:49.98 postgres                                                
 3568 postgres  20   0 4388m 131m 113m D 29.5  0.8   0:10.03 postgres                                                
 3435 postgres  20   0 4376m 364m 355m D 29.2  2.3   0:12.23 postgres                                                
[root@db2 ~ 11:16:22]$

iostat:

 
[root@db2 ~ 11:16:25]$ iostat 1 10 /dev/emcpowerf 
Linux 2.6.32-279.el6.x86_64 (db2) 	2016年06月02日 	_x86_64_	(24 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.51    0.00    0.84    2.04    0.00   92.61

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf       210.87     10256.34     11724.86 2257542690 2580782824

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          10.07    0.00   80.00    7.12    0.00    2.82

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf     34443.00   3095056.00   1747336.00    3095056    1747336

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.38    0.00   90.25    1.25    0.00    0.13

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf        83.79      7430.09     14483.79      89384     174240

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          18.00    0.00   70.16   10.69    0.00    1.14

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      2887.25    323498.04     71545.10     329968      72976

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.76    0.00   86.36    3.17    0.00    0.71

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      1996.00    222072.00     34056.00     222072      34056

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           7.75    0.00   89.37    2.50    0.00    0.38

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      2344.00    229568.00    149152.00     229568     149152

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.68    0.00   82.81    5.88    0.00    1.63

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      2585.00    242224.00    154320.00     242224     154320

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          14.57    0.00   24.36   34.20    0.00   26.87

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      2692.08    283271.29    137940.59     286104     139320

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          11.80    0.00   85.10    2.02    0.00    1.07

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      3367.33    268491.09    124879.21     271176     126128

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.77    0.00   85.36    4.32    0.00    0.54

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      2792.00    274976.00    143192.00     274976     143192

[root@db2 ~ 11:23:07]$

以及，文件系统结构图：

[root@db2 ~ 11:03:54]$ df -Th
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/sda1     ext4     58G   14G   42G  25% /
tmpfs        tmpfs    7.8G     0  7.8G   0% /dev/shm
/dev/sda3     ext4    210G   54G  146G  27% /data/log
/dev/emcpowerd1
              ext4     99G   58G   37G  62% /data/pgsql/backup
/dev/emcpowerc1
              ext4     99G   89G  5.5G  95% /data/pgsql/data_old
/dev/emcpowerf
              ext4    197G  146G   42G  78% /data/pgsql/data
/dev/emcpowerg
              ext4    493G   66G  403G  14% /data/pgsql/backup/archive
[root@db2 ~ 11:04:14]$

二结论：数据库服务器已经出现性能瓶颈，具体问题出在系统IO上。通常情况是，应用系统的SQL写的不够好，或者是糟糕的SQL执行计划导致消耗过高的IO资源。

三分析思路

1 从top上，看到系统负载已经比较高：load average: 51.62, 41.75, 29.06

2 CPU负载上看，Cpu(s): 9.0%us, 59.2%sy, 0.0%ni, 14.1%id, 17.2%wa, 0.0%hi, 0.5%si, 0.0%st 。

其中， 9.0%us用于user space process处理较为正常。但是在系统负载较高的情况下，且有59.2%sy的CPU资源用于system kernel running。The amount of time spent in the kernel should be as low as possible.This number can peak much higher, especially when there is a lot of I/O happening.这里，意味着CPU消耗在用于系统调度上的时间偏高了，尤其是当系统出现大量IO操作时，该值可能非常高。这里的59%略高，系统有IO等待操作。

另，14.1%id, 系统CPU闲置时间比较低，通常情况下我们希望它不要太低，太低了说明CPU太忙，太高也不好，太高了意味着CPU一直处于空闲状态。17.2%wa,CPU用于等待IO操作完成的时间占比。这里，CPU在等待IO完成。

3 进程状态列，有大量的进程处于D状态。进程状态通常有：

R：TASK_RUNNING，可执行状态。处于此状态的进程正在CPU上运行；同一时间，少量进程处于该状态；

S：TASK_INTERRUPTIBLE，可中断睡眠状态。处于该状态的进程因为正在等待某些事件发生(如等待socket连接，等待信号量),而事件还未发生，所以被挂起；同一时间，大量进程处于该状态

D：TASK_UNINTERRUPTIBLE，不可中断睡眠状态。该状态的进程同样处于休眠状态，但是该进程不能被其它事件或信号所打断。常见于，在进程对硬件设备进行操作时(如进程调用read系统来对某个设备文件进行读操作，read系统需要调用到最终执行对应设备的驱动代码，并且与对应设备进行交互时)，可能需要使用TASK_UNINTERRUPTIBLE状态来保护进程，以避免进程在与设备进行交互的过程中被打断，最终造成设备陷入不可控的状态。

结合本例，可以看到有大量进程处于D状态，即不可中断的睡眠状态。说明有大量的数据库进程在与操作系统的IO系统进行交互，且这些进程不能被中断。说白了，就是进程在进行的IO操作迟迟未完成，且该进程不能被打断。就是系统出现了大量的IO等待。

4 从iostat上，也可以看到用于存放数据库的设备上IO操作频繁，该设备的IO请求很高。

这是去年今日(2016年6月2日)处理的一则PostgreSQL数据库性能问题定位的case，不是特意要放在今天(2017年6月2日)来说。而是，近期在看几篇Linux文档，加上之前翻看《PostgreSQL 9 Administration Cookbook》这本书，P249里面提到：

Not enough CPU power or disk I/O capacity for the current load

These are usually caused by suboptimal query plans, but sometimes you just have not a powerful enough computer.

Here, top is your friend for quick checks, and from the command line, run the following:

user@host:~$ top

First, watch the CPU idle reading in the top. If this is in low single-digits for most of the time, you probably have problems with CPU power.

If you have a high load average with still lots of CPU idle left, you are probably out of disk bandwidth. In this case, you should also have lots of postgres processes in status D.

索性，做一个整理总结，便于理清自己处理性能问题时，能有一个个清晰的思路和线索。