PostgreSQL主从同步配置,切换步骤

零 说明

本文档详细记录如何一步一步在两台Linux服务器上,搭建PostgreSQL数据库主从模式的高可用数据库环境,以及如何手工切换主备库,及其注意事项。

为了实现数据库对于应用层的透明,配合浮动IP来实现,即使数据库执行主备切换,应用层依然通过浮动IP来访问数据库。即:后端数据库执行主备切换前后,应用系统无需做任何更改。

搭建环境选择在CentOS 7+PostgreSQL 9.4.10上。

基本思路和流程大致如下:

1 主库真实IP为172.16.11.148,配置浮动IP为172.16.11.121,主机名为pgprimary,在其上安装并初始化PostgreSQL数据库;

2 备库真实IP为172.16.11.149,配置浮动IP为172.16.11.121,主机名为pgstandby,在其上只安装PostgreSQL数据库软件;

3 通常情况下,浮动IP 172.16.11.121运行在主库上,当主库出故障时,手工执行数据库的主备切换,备库成为新主库,然后新主库上启用浮动IP;

4 修复并重配原主库,使其成为新主库的备库;

一 环境

两台数据库服务器的主机名、文件系统、OS level、网络配置分别如下:

1主库:

[root@pgprimary ~]# hostname
pgprimary
[root@pgprimary ~]# df -Th
文件系统                类型      容量  已用  可用 已用% 挂载点
/dev/mapper/centos-root xfs        47G  1.1G   46G    3% /
devtmpfs                devtmpfs  908M     0  908M    0% /dev
tmpfs                   tmpfs     920M     0  920M    0% /dev/shm
tmpfs                   tmpfs     920M  8.8M  911M    1% /run
tmpfs                   tmpfs     920M     0  920M    0% /sys/fs/cgroup
/dev/sda1               xfs      1014M  142M  873M   14% /boot
tmpfs                   tmpfs     184M     0  184M    0% /run/user/0
[root@pgprimary ~]# cat /etc/redhat-release 
CentOS Linux release 7.5.1804 (Core) 
[root@pgprimary ~]# uname -rm
3.10.0-862.el7.x86_64 x86_64
[root@pgprimary ~]# ifconfig 
ens160: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.11.148  netmask 255.255.255.0  broadcast 172.16.11.255
        inet6 fe80::58b2:50cd:9d7c:8b23  prefixlen 64  scopeid 0x20
        ether 00:50:56:80:64:27  txqueuelen 1000  (Ethernet)
        RX packets 21338  bytes 4481980 (4.2 MiB)
        RX errors 0  dropped 2378  overruns 0  frame 0
        TX packets 270  bytes 31662 (30.9 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@pgprimary ~]#

2 备库:

[root@pgstandby ~]# hostname
pgstandby
[root@pgstandby ~]# df -Th
文件系统                类型      容量  已用  可用 已用% 挂载点
/dev/mapper/centos-root xfs        47G  1.1G   46G    3% /
devtmpfs                devtmpfs  908M     0  908M    0% /dev
tmpfs                   tmpfs     920M     0  920M    0% /dev/shm
tmpfs                   tmpfs     920M  8.8M  911M    1% /run
tmpfs                   tmpfs     920M     0  920M    0% /sys/fs/cgroup
/dev/sda1               xfs      1014M  142M  873M   14% /boot
tmpfs                   tmpfs     184M     0  184M    0% /run/user/0
[root@pgstandby ~]# cat /etc/redhat-release 
CentOS Linux release 7.5.1804 (Core) 
[root@pgstandby ~]# uname -rm
3.10.0-862.el7.x86_64 x86_64
[root@pgstandby ~]# ifconfig 
ens160: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.11.149  netmask 255.255.255.0  broadcast 172.16.11.255
        inet6 fe80::486a:f6d4:7976:4c9a  prefixlen 64  scopeid 0x20
        ether 00:50:56:80:5f:53  txqueuelen 1000  (Ethernet)
        RX packets 22927  bytes 4822730 (4.5 MiB)
        RX errors 0  dropped 2560  overruns 0  frame 0
        TX packets 262  bytes 28988 (28.3 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@pgstandby ~]#

二 配置浮动IP

步骤3中的监控并启动浮动IP是可选项配置,不适用于本文的数据库主备切换。数据库主备切换还是推荐手工在主备库上启停浮动IP。

1 主库配置浮动IP:

a 拷贝网络配置文件,并修改:

[root@pgprimary ~]# cp /etc/sysconfig/network-scripts/ifcfg-ens160 /etc/sysconfig/network-scripts/ifcfg-ens160:1
[root@pgprimary ~]#

b 修改DEVICE为ens160:1,添加NM_CONTROLLED设置为no,修改IPADDR为浮动IP地址172.16.11.121,并且去掉网关信息,修改完如下:

[root@pgprimary ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens160:1
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=static
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=ens160
UUID=be70afb7-a578-4dc0-b694-af530586a76e
DEVICE=ens160:1
ONBOOT=yes
IPADDR=172.16.11.121
NETMASK=255.255.255.0
NM_CONTROLLED=no
[root@pgprimary ~]#

c 启用ens160:1网络设备,并测试:

[root@pgprimary ~]# ifup ens160:1
[root@pgprimary ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:80:64:27 brd ff:ff:ff:ff:ff:ff
    inet 172.16.11.148/24 brd 172.16.11.255 scope global noprefixroute ens160
       valid_lft forever preferred_lft forever
    inet 172.16.11.121/24 brd 172.16.11.255 scope global secondary ens160:1
       valid_lft forever preferred_lft forever
    inet6 fe80::58b2:50cd:9d7c:8b23/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
[root@pgprimary ~]# ifconfig 
ens160: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.11.148  netmask 255.255.255.0  broadcast 172.16.11.255
        inet6 fe80::58b2:50cd:9d7c:8b23  prefixlen 64  scopeid 0x20
        ether 00:50:56:80:64:27  txqueuelen 1000  (Ethernet)
        RX packets 172902  bytes 59418679 (56.6 MiB)
        RX errors 0  dropped 15626  overruns 0  frame 0
        TX packets 15516  bytes 1443358 (1.3 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens160:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.11.121  netmask 255.255.255.0  broadcast 172.16.11.255
        ether 00:50:56:80:64:27  txqueuelen 1000  (Ethernet)

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@pgprimary ~]#

d 客户端去ping floating IP

$ ping 172.16.11.121
PING 172.16.11.121 (172.16.11.121): 56 data bytes
64 bytes from 172.16.11.121: icmp_seq=0 ttl=63 time=0.653 ms
64 bytes from 172.16.11.121: icmp_seq=1 ttl=63 time=0.766 ms
^C
--- 172.16.11.121 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.653/0.710/0.766/0.057 ms

asher at MacBook-Air-3 in ~
$

2 备库配置浮动IP

具体操作步骤,同上述主库配置浮动IP a,b,c步骤。

测试的时候,需要先在主库停止网络设备ens160:1,然后再在备库启动网络设备ens160:1。

主库停ens160:1

[root@pgprimary ~]# ifdown ens160:1
[root@pgprimary ~]#

备库启动ens160:1

[root@pgstandby ~]# ifup ens160:1
[root@pgstandby ~]# ifconfig 
ens160: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.11.149  netmask 255.255.255.0  broadcast 172.16.11.255
        inet6 fe80::486a:f6d4:7976:4c9a  prefixlen 64  scopeid 0x20
        ether 00:50:56:80:5f:53  txqueuelen 1000  (Ethernet)
        RX packets 182284  bytes 61612908 (58.7 MiB)
        RX errors 0  dropped 16880  overruns 0  frame 0
        TX packets 15365  bytes 1381833 (1.3 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens160:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.11.121  netmask 255.255.255.0  broadcast 172.16.11.255
        ether 00:50:56:80:5f:53  txqueuelen 1000  (Ethernet)

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@pgstandby ~]#

至此,完成了对主备库的浮动IP配置。

3 监控并启停浮动IP【可选项】

a 主库监控并启动浮动IP

监控脚本为:

[root@pgprimary ~]# cat monitor_floating_ip.sh 
#!/bin/bash
#This script is used for monitoring the floating IP(172.16.11.121),if it is unreachable,
#then ifup the ens160:1 device on this postgres PRIMARY server.
#Created by 黄伟, whuang@onlyou.com,20181010.

FLOATING_IP=172.16.11.121

c=$(/usr/bin/ping -$FLOATING_IP -c 1|grep Unreachable|wc -l)
if [ $c -gt 0 ]
then 
   /usr/sbin/ifup ens160:1
fi
[root@pgprimary ~]#

crontab定时任务为:

[root@pgprimary ~]# crontab -l
* * * * * /root/monitor_floating_ip.sh>>/root/monitor_floating_ip.log
* * * * * sleep 10; /root/monitor_floating_ip.sh>>/root/monitor_floating_ip.log
* * * * * sleep 20; /root/monitor_floating_ip.sh>>/root/monitor_floating_ip.log
* * * * * sleep 30; /root/monitor_floating_ip.sh>>/root/monitor_floating_ip.log
* * * * * sleep 40; /root/monitor_floating_ip.sh>>/root/monitor_floating_ip.log
* * * * * sleep 50; /root/monitor_floating_ip.sh>>/root/monitor_floating_ip.log
[root@pgprimary ~]#

说明,脚本用于ping浮动IP,如果ping不通,则在主库上启动浮动IP。定时任务表示每隔10秒调用一次监控浮动IP脚本。

b 备库监控并启动浮动IP

监控脚本为:

[root@pgstandby ~]# cat monitor_floating_ip.sh 
#!/bin/bash
#This script is used for monitoring the floating IP(172.16.11.121),if it is unreachable,and also the PRIMARY
#IP(172.16.11.148) unreachable,then ifup the ens160:1 device on this postgres STANDBY server.
#Created by 黄伟, whuang@onlyou.com,20181010.

PRIMARY_IP=172.16.11.148
FLOATING_IP=172.16.11.121
c1=$(/usr/bin/ping $PRIMARY_IP -c 1|grep Unreachable|wc -l)
c2=$(/usr/bin/ping $FLOATING_IP -c 1|grep Unreachable|wc -l)
c3=$(/usr/sbin/ip addr|grep ens160:1)
if [ $c1 -gt 0 -a $c2 -gt 0 ]
then
   /usr/sbin/ifup ens160:1
elif [ $c1 -eq 0 -a $c2 -eq 0 -a "$c3" ]
then
   /usr/sbin/ifdown ens160:1
fi
[root@pgstandby ~]#

crontab定时任务为:

[root@pgstandby ~]# crontab -l
* * * * * /root/monitor_floating_ip.sh>>/root/monitor_floating_ip.log
* * * * * sleep 10; /root/monitor_floating_ip.sh>>/root/monitor_floating_ip.log
* * * * * sleep 20; /root/monitor_floating_ip.sh>>/root/monitor_floating_ip.log
* * * * * sleep 30; /root/monitor_floating_ip.sh>>/root/monitor_floating_ip.log
* * * * * sleep 40; /root/monitor_floating_ip.sh>>/root/monitor_floating_ip.log
* * * * * sleep 50; /root/monitor_floating_ip.sh>>/root/monitor_floating_ip.log
[root@pgstandby ~]#

说明,脚本用于ping浮动IP,如果ping不通,且同时ping不通主库实际IP地址,那么意味着主库服务器已经宕机或者整个网络异常,则在备库上启动浮动IP。如果ping主、备库IP正常,且备库上浮动IP正常,则意味着主库服务器恢复正常或网络恢复正常,则在备库上停止浮动IP,把浮动IP交给主库托管。定时任务表示每隔10秒调用一次监控浮动IP脚本。

三 主库安装、配置数据库

1 创建用户、路径

[root@pgprimary ~]# groupadd postgres
[root@pgprimary ~]# useradd -g postgres postgres
[root@pgprimary ~]# passwd postgres
更改用户 postgres 的密码 。
新的 密码:
无效的密码: 密码包含用户名在某些地方
重新输入新的 密码:
passwd:所有的身份验证令牌已经成功更新。
[root@pgprimary ~]# mkdir -p /postgres/9.4.10
[root@pgprimary ~]# chown -R postgres:postgres /postgres/
[root@pgprimary ~]#

2 操作系统安装软件包gcc,zlib,readline,tar

[root@pgprimary ~]# yum install gcc*
具体略
[root@pgprimary ~]# yum install -y zlib*
具体略
[root@pgprimary ~]# yum install -y readline*
具体略
[root@pgprimary ~]# yum install -y tar*
具体略

3 下载软件,postgres用户执行:

[postgres@pgprimary ~]$ wget https://ftp.postgresql.org/pub/source/v9.4.10/postgresql-9.4.10.tar.gz
--2018-09-30 17:19:51--  https://ftp.postgresql.org/pub/source/v9.4.10/postgresql-9.4.10.tar.gz
正在解析主机 ftp.postgresql.org (ftp.postgresql.org)... 204.145.124.244, 217.196.149.55, 174.143.35.246, ...
正在连接 ftp.postgresql.org (ftp.postgresql.org)|204.145.124.244|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:23333040 (22M) [application/x-gzip]
正在保存至: “postgresql-9.4.10.tar.gz”

100%[====================================================================================================================================================================================>] 23,333,040  1.32MB/s 用时 20s    

2018-09-30 17:20:14 (1.10 MB/s) - 已保存 “postgresql-9.4.10.tar.gz” [23333040/23333040])

[postgres@pgprimary ~]$ ll
总用量 22788
-rw-rw-r--. 1 postgres postgres 23333040 10月 25 2016 postgresql-9.4.10.tar.gz
[postgres@pgprimary ~]$

4 解压软件,configure,postgres用户执行:

[postgres@pgprimary ~]$ tar -zxvf postgresql-9.4.10.tar.gz 
...
postgresql-9.4.10/configure.in
postgresql-9.4.10/INSTALL
[postgres@pgprimary ~]$ cd postgresql-9.4.10 && ll
总用量 620
-rw-r--r--.  1 postgres postgres    385 10月 25 2016 aclocal.m4
drwxrwxr-x.  2 postgres postgres   4096 10月 25 2016 config
-rwxr-xr-x.  1 postgres postgres 448579 10月 25 2016 configure
-rw-r--r--.  1 postgres postgres  69119 10月 25 2016 configure.in
drwxrwxr-x. 59 postgres postgres   4096 10月 25 2016 contrib
-rw-r--r--.  1 postgres postgres   1192 10月 25 2016 COPYRIGHT
drwxrwxr-x.  3 postgres postgres    107 10月 25 2016 doc
-rw-r--r--.  1 postgres postgres   3620 10月 25 2016 GNUmakefile.in
-rw-r--r--.  1 postgres postgres    283 10月 25 2016 HISTORY
-rw-r--r--.  1 postgres postgres  75420 10月 25 2016 INSTALL
-rw-r--r--.  1 postgres postgres   1489 10月 25 2016 Makefile
-rw-r--r--.  1 postgres postgres   1209 10月 25 2016 README
drwxrwxr-x. 15 postgres postgres   4096 10月 25 2016 src
[postgres@pgprimary postgresql-9.4.10]$ ./configure --prefix=/postgres/9.4.10/
...
config.status: linking src/makefiles/Makefile.linux to src/Makefile.port
[postgres@pgprimary postgresql-9.4.10]$

5 gmake world,postgres用户执行:

[postgres@pgprimary postgresql-9.4.10]$ gmake world
..
gmake[1]: 离开目录“/home/postgres/postgresql-9.4.10/contrib”
PostgreSQL, contrib, and documentation successfully made. Ready to install.
[postgres@pgprimary postgresql-9.4.10]$

6 gmake install-world,postgres用户执行:

[postgres@pgprimary postgresql-9.4.10]$ gmake install-world
...
PostgreSQL, contrib, and documentation installation complete.
[postgres@pgprimary postgresql-9.4.10]$

7 初始化数据库,postgres用户执行:

[postgres@pgprimary ~]$ /postgres/9.4.10/bin/initdb -D /postgres/9.4.10/data
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "zh_CN.UTF-8".
The default database encoding has accordingly been set to "UTF8".
initdb: could not find suitable text search configuration for locale "zh_CN.UTF-8"
The default text search configuration will be set to "simple".

Data page checksums are disabled.

creating directory /postgres/9.4.10/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
creating template1 database in /postgres/9.4.10/data/base/1 ... ok
initializing pg_authid ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating collations ... ok
creating conversions ... ok
creating dictionaries ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
loading PL/pgSQL server-side language ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok
syncing data to disk ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    /postgres/9.4.10/bin/postgres -D /postgres/9.4.10/data
or
    /postgres/9.4.10/bin/pg_ctl -D /postgres/9.4.10/data -l logfile start

[postgres@pgprimary ~]$

8 修改postgres用户环境变量

[postgres@pgprimary ~]$ cat ~/.bash_profile 
# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/.local/bin:$HOME/bin:/postgres/9.4.10/bin
PGDATA=/postgres/9.4.10/data
export PATH
export PGDATA
[postgres@pgprimary ~]$

postgres用户的/home/.bash_profile修改之后的信息如下:

9 修改数据库参数

修改/postgres/9.4.10/data/postgresql.conf文件,使其:

listen_addresses = '*'
port = 5432

其它参数暂时不动。
修改/postgres/9.4.10/data/pg_hba.conf,在其中添加下述;

host    all             all             0.0.0.0/0               md5

其它保持不变。

10 启动数据库

[postgres@pgprimary ~]$ pg_ctl start
server starting
[postgres@pgprimary ~]$ LOG:  database system was shut down at 2018-10-10 09:32:14 CST
LOG:  MultiXact member wraparound protections are now enabled
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started

[postgres@pgprimary ~]$

11 测试数据库

a 备库停用浮动IP

[root@pgstandby ~]# ifdown ens160:1
[root@pgstandby ~]# ifconfig 
ens160: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.11.149  netmask 255.255.255.0  broadcast 172.16.11.255
        inet6 fe80::486a:f6d4:7976:4c9a  prefixlen 64  scopeid 0x20
        ether 00:50:56:80:5f:53  txqueuelen 1000  (Ethernet)
        RX packets 14738337  bytes 3468772296 (3.2 GiB)
        RX errors 0  dropped 1833331  overruns 0  frame 0
        TX packets 149019  bytes 13182295 (12.5 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 1  bytes 104 (104.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1  bytes 104 (104.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@pgstandby ~]#

b 主库启用浮动IP

[root@pgprimary ~]# ifup  ens160:1
[root@pgprimary ~]# ifconfig 
ens160: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.11.148  netmask 255.255.255.0  broadcast 172.16.11.255
        inet6 fe80::58b2:50cd:9d7c:8b23  prefixlen 64  scopeid 0x20
        ether 00:50:56:80:64:27  txqueuelen 1000  (Ethernet)
        RX packets 14551224  bytes 3431186752 (3.1 GiB)
        RX errors 0  dropped 1817255  overruns 0  frame 0
        TX packets 108201  bytes 9678459 (9.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens160:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.11.121  netmask 255.255.255.0  broadcast 172.16.11.255
        ether 00:50:56:80:64:27  txqueuelen 1000  (Ethernet)

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 12  bytes 1089 (1.0 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 12  bytes 1089 (1.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@pgprimary ~]#

c 连接主库原来IP

MacBook-Air-3:~ postgres$ psql -h 172.16.11.148 -p 5432 -d postgres -U postgres
Password for user postgres: 
psql (9.5.2, server 9.4.10)
Type "help" for help.

postgres=# \c
psql (9.5.2, server 9.4.10)
You are now connected to database "postgres" as user "postgres".
postgres=#

d 连接主库浮动IP

MacBook-Air-3:~ postgres$ psql -h 172.16.11.121 -p 5432 -d postgres -U postgres
Password for user postgres: 
psql (9.5.2, server 9.4.10)
Type "help" for help.

postgres=# \c
psql (9.5.2, server 9.4.10)
You are now connected to database "postgres" as user "postgres".
postgres=#

从上,可以看到,当主库启用浮动IP之后,客户端既可以通过原IP访问主库,也可以通过浮动IP访问主库。

四 备库安装数据库软件

具体步骤参照步骤三的1-6和步骤8。其它不需要做更多的配置,需要注意的是,该备库只需要安装数据库软件即可,并不需要初始化数据库。

五 配置主备库同步

1 主库添加同步用户

[root@pgprimary ~]# su - postgres
上一次登录:三 10月 10 10:08:01 CST 2018pts/0 上
[postgres@pgprimary ~]$ psql
psql (9.4.10)
Type "help" for help.

postgres=# CREATE ROLE replica login replication encrypted password 'replica';
CREATE ROLE
postgres=# \q
[postgres@pgprimary ~]$

2 主库修改配置文件pg_hba.conf和postgresql.conf

修改/postgres/9.4.10/data/pg_hba.conf,添加下述行:

host    replication     replica     172.16.11.149/32           md5

修改/postgres/9.4.10/data/postgresql.conf,修改下述行:

wal_level = hot_standby  # 这个是设置主为wal的主机
max_wal_senders = 4 # 这个设置了可以最多有几个流复制连接,差不多有几个从,就设置几个
wal_keep_segments = 256 # 设置流复制保留的最多的xlog数目
wal_sender_timeout = 60s # 设置流复制主机发送数据的超时时间
max_connections = 100 # 这个设置要注意下,主库的该参数要根据实际情况设置。从库的max_connections必须要大于主库的该参数

3 主库重启

[postgres@pgprimary ~]$ pg_ctl restart -m fast
waiting for server to shut down...LOG:  received fast shutdown request
.LOG:  aborting any active transactions
LOG:  autovacuum launcher shutting down
FATAL:  terminating connection due to administrator command
LOG:  shutting down
LOG:  database system is shut down
 done
server stopped
server starting
[postgres@pgprimary ~]$ LOG:  database system was shut down at 2018-10-10 16:35:19 CST
LOG:  MultiXact member wraparound protections are now enabled
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started

[postgres@pgprimary ~]$

4 备库,执行主库的基础备份

[root@pgstandby ~]# su - postgres
上一次登录:三 10月 10 09:43:15 CST 2018pts/0 上
[postgres@pgstandby ~]$ ll /postgres/9.4.10/
总用量 16
drwxrwxr-x. 2 postgres postgres 4096 9月  30 17:30 bin
drwxrwxr-x. 6 postgres postgres 4096 9月  30 17:30 include
drwxrwxr-x. 4 postgres postgres 4096 9月  30 17:30 lib
drwxrwxr-x. 8 postgres postgres 4096 9月  30 17:30 share
[postgres@pgstandby ~]$ which pg_basebackup 
/postgres/9.4.10/bin/pg_basebackup
[postgres@pgstandby ~]$ pg_basebackup -F p --progress -D /postgres/9.4.10/data -h 172.16.11.121 -p 5432 -U replica --password -X stream
Password: 
21122/21122 kB (100%), 1/1 tablespace
[postgres@pgstandby ~]$ ll /postgres/9.4.10/
总用量 20
drwxrwxr-x.  2 postgres postgres 4096 9月  30 17:30 bin
drwx------. 18 postgres postgres 4096 10月 10 16:38 data
drwxrwxr-x.  6 postgres postgres 4096 9月  30 17:30 include
drwxrwxr-x.  4 postgres postgres 4096 9月  30 17:30 lib
drwxrwxr-x.  8 postgres postgres 4096 9月  30 17:30 share
[postgres@pgstandby ~]$ du -sh /postgres/9.4.10/data/
37M     /postgres/9.4.10/data/
[postgres@pgstandby ~]$

可以看到,备库已经生成/postgres/9.4.10/data/。

5 备库,配置recovery.conf文件

从安装文件里cp过来。然后编辑,更改为下面3个参数,其它保持不变。

[postgres@pgstandby ~]$ cp /home/postgres/postgresql-9.4.10/src/backend/access/transam/recovery.conf.sample /postgres/9.4.10/data/recovery.conf
[postgres@pgstandby ~]$ vi /postgres/9.4.10/data/recovery.conf 
...
standby_mode = on
primary_conninfo = 'host=172.16.11.148 port=5432 user=replica password=replica'         # e.g. 'host=localhost port=5432'
recovery_target_timeline = 'latest'
...

6 备库,修改配置文件postgresql.conf

修改下列4个参数即可:

[postgres@pgstandby ~]$ vi /postgres/9.4.10/data/postgresql.conf 
...
hot_standby = on  # 说明这台机器不仅仅是用于数据归档,也用于数据查询
max_standby_streaming_delay = 30s # 数据流备份的最大延迟时间
wal_receiver_status_interval = 1s  # 多久向主报告一次从的状态,当然从每次数据复制都会向主报告状态,这里只是设置最长的间隔时间
hot_standby_feedback = on # 如果有错误的数据复制,是否向主进行反馈
...

7 备库,启动

[postgres@pgstandby ~]$ pg_ctl status
pg_ctl: no server running
[postgres@pgstandby ~]$ pg_ctl start
server starting
[postgres@pgstandby ~]$ LOG:  database system was interrupted; last known up at 2018-10-10 16:38:47 CST
LOG:  entering standby mode
LOG:  redo starts at 0/2000028
LOG:  consistent recovery state reached at 0/2000128
LOG:  database system is ready to accept read only connections
LOG:  started streaming WAL from primary at 0/3000000 on timeline 1

[postgres@pgstandby ~]$ pg_ctl status
pg_ctl: server is running (PID: 28027)
/postgres/9.4.10/bin/postgres
[postgres@pgstandby ~]$ ps -ef|grep postgres
root     27400 25585  0 16:36 pts/0    00:00:00 su - postgres
postgres 27401 27400  0 16:36 pts/0    00:00:00 -bash
postgres 28027     1  0 16:48 pts/0    00:00:00 /postgres/9.4.10/bin/postgres
postgres 28028 28027  0 16:48 ?        00:00:00 postgres: startup process   recovering 000000010000000000000003
postgres 28029 28027  0 16:48 ?        00:00:00 postgres: checkpointer process  
postgres 28030 28027  0 16:48 ?        00:00:00 postgres: writer process   
postgres 28031 28027  0 16:48 ?        00:00:00 postgres: stats collector process  
postgres 28032 28027  0 16:48 ?        00:00:00 postgres: wal receiver process   streaming 0/3000210
postgres 28073 27401  0 16:49 pts/0    00:00:00 ps -ef
postgres 28074 27401  0 16:49 pts/0    00:00:00 grep --color=auto postgres
[postgres@pgstandby ~]$

从上,可以看到,备库的receiver进程,在接收主库的日志信息。

8 验证主备库同步

主库执行:

[postgres@pgprimary ~]$ ps -ef|grep postgres
root     20280 17302  0 16:30 pts/0    00:00:00 su - postgres
postgres 20281 20280  0 16:30 pts/0    00:00:00 -bash
postgres 20677     1  0 16:35 pts/0    00:00:00 /postgres/9.4.10/bin/postgres
postgres 20679 20677  0 16:35 ?        00:00:00 postgres: checkpointer process  
postgres 20680 20677  0 16:35 ?        00:00:00 postgres: writer process   
postgres 20681 20677  0 16:35 ?        00:00:00 postgres: wal writer process  
postgres 20682 20677  0 16:35 ?        00:00:00 postgres: autovacuum launcher process  
postgres 20683 20677  0 16:35 ?        00:00:00 postgres: stats collector process  
postgres 21656 20677  0 16:48 ?        00:00:00 postgres: wal sender process replica 172.16.11.149(51868) streaming 0/3000210
postgres 21939 20281  0 16:52 pts/0    00:00:00 ps -ef
postgres 21940 20281  0 16:52 pts/0    00:00:00 grep --color=auto postgres
[postgres@pgprimary ~]$ psql
psql (9.4.10)
Type "help" for help.

postgres=# \x
Expanded display is on.
postgres=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 21656
usesysid         | 16385
usename          | replica
application_name | walreceiver
client_addr      | 172.16.11.149
client_hostname  | 
client_port      | 51868
backend_start    | 2018-10-10 16:48:44.010195+08
backend_xmin     | 1893
state            | streaming
sent_location    | 0/3000210
write_location   | 0/3000210
flush_location   | 0/3000210
replay_location  | 0/3000210
sync_priority    | 0
sync_state       | async

postgres=#

从上,看到主库有sender进程,发送同步信息给备库。同时,登录主库也可以查看到同步信息。

 

六 主备库切换

这里,模拟主库出现故障,先做一次switchover测试,即先切换主备库角色,主备互换。

然后,重建原主库或者直接将原主库当作新主库的备库。

最后,再执行一次switchover,主备互换,即恢复到之前的最初状态,主库还是原主库,备库还是原备库。

1 执行第一次主备切换

当原主库库服务器或者数据库故障时,我们可以先停止主库,然后启动备库。

a 停止主库

[postgres@pgprimary data]$ pg_ctl stop -m fast -l /home/postgres/pg.log 
waiting for server to shut down.... done
server stopped
[postgres@pgprimary data]$

b 启动备库

 [postgres@pgstandby data]$ pwd
/postgres/9.4.10/data
[postgres@pgstandby data]$ ll 
总用量 72
-rw-------. 1 postgres postgres   206 10月 10 16:38 backup_label.old
drwx------. 5 postgres postgres    41 10月 10 16:38 base
drwx------. 2 postgres postgres  4096 10月 26 15:57 global
drwx------. 2 postgres postgres    18 10月 10 16:38 pg_clog
drwx------. 2 postgres postgres     6 10月 10 16:38 pg_dynshmem
-rw-------. 1 postgres postgres  4605 10月 16 16:35 pg_hba.conf
-rw-------. 1 postgres postgres  1636 10月 10 16:38 pg_ident.conf
drwx------. 4 postgres postgres    39 10月 10 16:38 pg_logical
drwx------. 4 postgres postgres    36 10月 10 16:38 pg_multixact
drwx------. 2 postgres postgres    18 10月 26 15:57 pg_notify
drwx------. 2 postgres postgres     6 10月 10 16:38 pg_replslot
drwx------. 2 postgres postgres     6 10月 10 16:38 pg_serial
drwx------. 2 postgres postgres     6 10月 10 16:38 pg_snapshots
drwx------. 2 postgres postgres     6 10月 26 15:57 pg_stat
drwx------. 2 postgres postgres     6 10月 26 09:29 pg_stat_tmp
drwx------. 2 postgres postgres    18 10月 10 16:38 pg_subtrans
drwx------. 2 postgres postgres     6 10月 10 16:38 pg_tblspc
drwx------. 2 postgres postgres     6 10月 10 16:38 pg_twophase
-rw-------. 1 postgres postgres     4 10月 10 16:38 PG_VERSION
drwx------. 3 postgres postgres  4096 10月 26 15:57 pg_xlog
-rw-------. 1 postgres postgres    88 10月 10 16:38 postgresql.auto.conf
-rw-------. 1 postgres postgres 20810 10月 10 16:45 postgresql.conf
-rw-------. 1 postgres postgres    30 10月 26 15:57 postmaster.opts
-rw-------. 1 postgres postgres    71 10月 26 15:57 postmaster.pid
-rw-r--r--. 1 postgres postgres  5641 10月 16 16:54 recovery.conf
[postgres@pgstandby data]$ pg_ctl promote -l /home/postgres/pg.log 
server promoting
[postgres@pgstandby data]$ ll
总用量 72
-rw-------. 1 postgres postgres   206 10月 10 16:38 backup_label.old
drwx------. 5 postgres postgres    41 10月 10 16:38 base
drwx------. 2 postgres postgres  4096 10月 26 15:57 global
drwx------. 2 postgres postgres    18 10月 10 16:38 pg_clog
drwx------. 2 postgres postgres     6 10月 10 16:38 pg_dynshmem
-rw-------. 1 postgres postgres  4605 10月 16 16:35 pg_hba.conf
-rw-------. 1 postgres postgres  1636 10月 10 16:38 pg_ident.conf
drwx------. 4 postgres postgres    39 10月 10 16:38 pg_logical
drwx------. 4 postgres postgres    36 10月 10 16:38 pg_multixact
drwx------. 2 postgres postgres    18 10月 26 15:57 pg_notify
drwx------. 2 postgres postgres     6 10月 10 16:38 pg_replslot
drwx------. 2 postgres postgres     6 10月 10 16:38 pg_serial
drwx------. 2 postgres postgres     6 10月 10 16:38 pg_snapshots
drwx------. 2 postgres postgres     6 10月 26 15:57 pg_stat
drwx------. 2 postgres postgres    42 10月 26 16:10 pg_stat_tmp
drwx------. 2 postgres postgres    18 10月 10 16:38 pg_subtrans
drwx------. 2 postgres postgres     6 10月 10 16:38 pg_tblspc
drwx------. 2 postgres postgres     6 10月 10 16:38 pg_twophase
-rw-------. 1 postgres postgres     4 10月 10 16:38 PG_VERSION
drwx------. 3 postgres postgres  4096 10月 26 16:10 pg_xlog
-rw-------. 1 postgres postgres    88 10月 10 16:38 postgresql.auto.conf
-rw-------. 1 postgres postgres 20810 10月 10 16:45 postgresql.conf
-rw-------. 1 postgres postgres    30 10月 26 15:57 postmaster.opts
-rw-------. 1 postgres postgres    71 10月 26 15:57 postmaster.pid
-rw-r--r--. 1 postgres postgres  5641 10月 16 16:54 recovery.done
[postgres@pgstandby data]$

这里,需要注意的是,启动备库的命令是pg_ctl promote,之前备库是出在接受主库同步的状态,需要通过这个命令来将备库提升到primary状态。同时,备库切换为主库之后,/postgres/9.4.10/data/recovery.conf文件自动变为/postgres/9.4.10/data/recovery.done。我个人猜想,主要是为了下次重启该数据库时,直接进入“主库模式”,如果还是recovery.conf的话,意味着该库需要执行recovery操作。

c 备库启动浮动IP

先在主库上停止浮动IP

 [root@pgprimary ~]# ifconfig 
ens160: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.11.148  netmask 255.255.255.0  broadcast 172.16.11.255
        inet6 fe80::58b2:50cd:9d7c:8b23  prefixlen 64  scopeid 0x20
        ether 00:50:56:80:64:27  txqueuelen 1000  (Ethernet)
        RX packets 37310308  bytes 8976173370 (8.3 GiB)
        RX errors 0  dropped 4868601  overruns 0  frame 0
        TX packets 1426780  bytes 127364686 (121.4 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens160:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.11.121  netmask 255.255.255.0  broadcast 172.16.11.255
        ether 00:50:56:80:64:27  txqueuelen 1000  (Ethernet)

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 134634  bytes 25999628 (24.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 134634  bytes 25999628 (24.7 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

您在 /var/spool/mail/root 中有邮件
[root@pgprimary ~]# ifdown ens160:1
[root@pgprimary ~]# ifconfig 
ens160: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.11.148  netmask 255.255.255.0  broadcast 172.16.11.255
        inet6 fe80::58b2:50cd:9d7c:8b23  prefixlen 64  scopeid 0x20
        ether 00:50:56:80:64:27  txqueuelen 1000  (Ethernet)
        RX packets 37310444  bytes 8976218410 (8.3 GiB)
        RX errors 0  dropped 4868628  overruns 0  frame 0
        TX packets 1426794  bytes 127368186 (121.4 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 134634  bytes 25999628 (24.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 134634  bytes 25999628 (24.7 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@pgprimary ~]#

然后,备库启动浮动IP

 [root@pgstandby ~]# ifup ens160:1
您在 /var/spool/mail/root 中有邮件
[root@pgstandby ~]# ifconfig 
ens160: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.11.149  netmask 255.255.255.0  broadcast 172.16.11.255
        inet6 fe80::486a:f6d4:7976:4c9a  prefixlen 64  scopeid 0x20
        ether 00:50:56:80:5f:53  txqueuelen 1000  (Ethernet)
        RX packets 37445369  bytes 8975382297 (8.3 GiB)
        RX errors 0  dropped 4884823  overruns 0  frame 0
        TX packets 1509890  bytes 162194380 (154.6 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens160:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.11.121  netmask 255.255.255.0  broadcast 172.16.11.255
        ether 00:50:56:80:5f:53  txqueuelen 1000  (Ethernet)

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 195774  bytes 35184330 (33.5 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 195774  bytes 35184330 (33.5 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@pgstandby ~]#

d 测试新主库是否可用

MacBook-Air-3:~ postgres$ psql -h 172.16.11.121 -p 5432 -d postgres -U postgres
Password for user postgres: 
psql (9.5.2, server 9.4.10)
Type "help" for help.

postgres=# select * from t;
 id 
----
  1
(1 row)

postgres=# delete from t;
DELETE 1
postgres=# select * from t;
 id 
----
(0 rows)

postgres=# 

客户端通过浮动IP访问数据库,并删除t表的数据。说明,新主库可用正常使用。同时,记录下,这里是从新主库里删除的数据,接下来,我们重建新备库之后,看看这条被删除的数据,是否也从备库里查询不到?

2 重建主库

这里,为了快速验证和测试主备互换的流程,只是假设主库出现故障,所以,在这里并不需要重建原主库。真实情况下,如果主库因出现故障而执行切换的话,就需要重建。重建的流程和步骤可以参考初始搭建备库的流程来完成。

当然,这里需要将原主库配置为新备库,同样需要参照搭建备库流程五 配置主备库同步中的5,6,7三个步骤。

a 新备库创建配置recovery.conf文件

[postgres@pgprimary ~]$ cp /home/postgres/postgresql-9.4.10/src/backend/access/transam/recovery.conf.sample /postgres/9.4.10/data/recovery.conf
[postgres@pgprimary ~]$ vi /postgres/9.4.10/data/recovery.conf 
...
standby_mode = on
primary_conninfo = 'host=172.16.11.149 port=5432 user=replica password=replica'         # e.g. 'host=localhost port=5432'
recovery_target_timeline = 'latest'
...

需要注意的是,这里主库的信息,需要指向新主库的IP,即原备库的IP 172.16.11.149。

b 新备库修改配置文件postgresql.conf

修改下列4个参数即可:

[postgres@pgprimary ~]$ vi /postgres/9.4.10/data/postgresql.conf 
...
hot_standby = on  # 说明这台机器不仅仅是用于数据归档,也用于数据查询
max_standby_streaming_delay = 30s # 数据流备份的最大延迟时间
wal_receiver_status_interval = 1s  # 多久向主报告一次从的状态,当然从每次数据复制都会向主报告状态,这里只是设置最长的间隔时间
hot_standby_feedback = on # 如果有错误的数据复制,是否向主进行反馈
...

c 启动新备库

[postgres@pgprimary data]$ pg_ctl start -l /home/postgres/pg.log 
server starting
[postgres@pgprimary data]$ ps -ef|grep postgres
root     13481 13458  0 09:22 pts/1    00:00:00 su - postgres
postgres 13482 13481  0 09:22 pts/1    00:00:00 -bash
postgres 14425     1  1 16:55 pts/1    00:00:00 /postgres/9.4.10/bin/postgres
postgres 14426 14425  0 16:55 ?        00:00:00 postgres: startup process   recovering 000000060000000000000005
postgres 14427 14425  0 16:55 ?        00:00:00 postgres: checkpointer process  
postgres 14428 14425  0 16:55 ?        00:00:00 postgres: writer process   
postgres 14429 14425  0 16:55 ?        00:00:00 postgres: stats collector process  
postgres 14430 14425  1 16:55 ?        00:00:00 postgres: wal receiver process   streaming 0/508DDD0
postgres 14431 13482  0 16:55 pts/1    00:00:00 ps -ef
postgres 14432 13482  0 16:55 pts/1    00:00:00 grep --color=auto postgres
[postgres@pgprimary data]$

启动之后,发现新备库上开始有wal receiver process 。

同时,登录新备库,查看t表的数据有没有同步:

[postgres@pgprimary data]$ psql
psql (9.4.10)
Type "help" for help.

postgres=# select * from t;
 id 
----
(0 rows)

postgres=#

显然,之前从新主库里删除的那条数据,在这个新备库里也看不到了。说明,新备库是OK的了。

3 再次执行主备切换

a 停掉新主库(原备库)

 [postgres@pgstandby data]$ ps -ef|grep postgres
postgres 16230     1  0 15:57 pts/1    00:00:00 /postgres/9.4.10/bin/postgres
postgres 16232 16230  0 15:57 ?        00:00:00 postgres: checkpointer process  
postgres 16233 16230  0 15:57 ?        00:00:00 postgres: writer process   
postgres 16234 16230  0 15:57 ?        00:00:00 postgres: stats collector process  
postgres 16998 16230  0 16:10 ?        00:00:00 postgres: wal writer process  
postgres 16999 16230  0 16:10 ?        00:00:00 postgres: autovacuum launcher process  
postgres 17546 16230  0 16:20 ?        00:00:00 postgres: postgres postgres 172.16.135.72(54471) idle
postgres 19262 16230  0 16:55 ?        00:00:00 postgres: wal sender process replica 172.16.11.148(53298) streaming 0/508DEA8
postgres 19520 29607  0 17:00 pts/1    00:00:00 ps -ef
postgres 19521 29607  0 17:00 pts/1    00:00:00 grep --color=auto postgres
root     29606 29605  0 09:23 pts/1    00:00:00 su - postgres
postgres 29607 29606  0 09:23 pts/1    00:00:00 -bash
[postgres@pgstandby data]$ pg_ctl stop -m fast -l /home/postgres/pg.log 
waiting for server to shut down.... done
server stopped
[postgres@pgstandby data]$ ll
总用量 68
-rw-------. 1 postgres postgres   206 10月 10 16:38 backup_label.old
drwx------. 5 postgres postgres    41 10月 10 16:38 base
drwx------. 2 postgres postgres  4096 10月 26 15:57 global
drwx------. 2 postgres postgres    18 10月 10 16:38 pg_clog
drwx------. 2 postgres postgres     6 10月 10 16:38 pg_dynshmem
-rw-------. 1 postgres postgres  4605 10月 16 16:35 pg_hba.conf
-rw-------. 1 postgres postgres  1636 10月 10 16:38 pg_ident.conf
drwx------. 4 postgres postgres    39 10月 10 16:38 pg_logical
drwx------. 4 postgres postgres    36 10月 10 16:38 pg_multixact
drwx------. 2 postgres postgres    18 10月 26 15:57 pg_notify
drwx------. 2 postgres postgres     6 10月 10 16:38 pg_replslot
drwx------. 2 postgres postgres     6 10月 10 16:38 pg_serial
drwx------. 2 postgres postgres     6 10月 10 16:38 pg_snapshots
drwx------. 2 postgres postgres    63 10月 26 17:00 pg_stat
drwx------. 2 postgres postgres     6 10月 26 17:00 pg_stat_tmp
drwx------. 2 postgres postgres    18 10月 10 16:38 pg_subtrans
drwx------. 2 postgres postgres     6 10月 10 16:38 pg_tblspc
drwx------. 2 postgres postgres     6 10月 10 16:38 pg_twophase
-rw-------. 1 postgres postgres     4 10月 10 16:38 PG_VERSION
drwx------. 3 postgres postgres  4096 10月 26 16:10 pg_xlog
-rw-------. 1 postgres postgres    88 10月 10 16:38 postgresql.auto.conf
-rw-------. 1 postgres postgres 20810 10月 10 16:45 postgresql.conf
-rw-------. 1 postgres postgres    30 10月 26 15:57 postmaster.opts
-rw-r--r--. 1 postgres postgres  5641 10月 16 16:54 recovery.done
[postgres@pgstandby data]$

需要注意的是,这里的新主库停止之后,系统同样自动的将/postgres/9.4.10/data/recovery.conf文件变为/postgres/9.4.10/data/recovery.done。在我们重新启动这个备库的时候,需要把recovery.done改名为recovery.conf,使其在重新启动时,自己担任的是“备库角色”,自动同主库进行同步。

b 启动新备库(原主库)

[postgres@pgprimary data]$ ps -ef|grep postgres
root     13481 13458  0 09:22 pts/1    00:00:00 su - postgres
postgres 13482 13481  0 09:22 pts/1    00:00:00 -bash
postgres 14425     1  0 16:55 pts/1    00:00:00 /postgres/9.4.10/bin/postgres
postgres 14426 14425  0 16:55 ?        00:00:00 postgres: startup process   recovering 000000060000000000000005
postgres 14427 14425  0 16:55 ?        00:00:00 postgres: checkpointer process  
postgres 14428 14425  0 16:55 ?        00:00:00 postgres: writer process   
postgres 14429 14425  0 16:55 ?        00:00:00 postgres: stats collector process  
postgres 14894 13482  0 17:01 pts/1    00:00:00 ps -ef
postgres 14895 13482  0 17:01 pts/1    00:00:00 grep --color=auto postgres
[postgres@pgprimary data]$ pg_ctl promote -l /home/postgres/
server promoting
[postgres@pgprimary data]$ ps -ef|grep postgres
root     13481 13458  0 09:22 pts/1    00:00:00 su - postgres
postgres 13482 13481  0 09:22 pts/1    00:00:00 -bash
postgres 14425     1  0 16:55 pts/1    00:00:00 /postgres/9.4.10/bin/postgres
postgres 14427 14425  0 16:55 ?        00:00:00 postgres: checkpointer process  
postgres 14428 14425  0 16:55 ?        00:00:00 postgres: writer process   
postgres 14429 14425  0 16:55 ?        00:00:00 postgres: stats collector process  
postgres 14916 14425  0 17:01 ?        00:00:00 postgres: wal writer process  
postgres 14917 14425  0 17:01 ?        00:00:00 postgres: autovacuum launcher process  
postgres 14918 13482  0 17:01 pts/1    00:00:00 ps -ef
postgres 14919 13482  0 17:01 pts/1    00:00:00 grep --color=auto postgres
[postgres@pgprimary data]$

此时,这个库又恢复到最初的状态,成为主库。

c 启动原备库

在启动之前,修改recovery.done改名为recovery.conf。

 [postgres@pgstandby data]$ pwd
/postgres/9.4.10/data
[postgres@pgstandby data]$ mv recovery.done recovery.conf
[postgres@pgstandby data]$

启动备库:

 [postgres@pgstandby data]$ pg_ctl start -l /home/postgres/pg.log 
server starting
[postgres@pgstandby data]$ ps -ef|grep postgres
postgres 19871     1  0 17:07 pts/1    00:00:00 /postgres/9.4.10/bin/postgres
postgres 19872 19871  0 17:07 ?        00:00:00 postgres: startup process   recovering 000000070000000000000005
postgres 19873 19871  0 17:07 ?        00:00:00 postgres: checkpointer process  
postgres 19874 19871  0 17:07 ?        00:00:00 postgres: writer process   
postgres 19875 19871  0 17:07 ?        00:00:00 postgres: stats collector process  
postgres 19876 19871  0 17:07 ?        00:00:00 postgres: wal receiver process   streaming 0/508E108
postgres 19881 29607  0 17:07 pts/1    00:00:00 ps -ef
postgres 19882 29607  0 17:07 pts/1    00:00:00 grep --color=auto postgres
root     29606 29605  0 09:23 pts/1    00:00:00 su - postgres
postgres 29607 29606  0 09:23 pts/1    00:00:00 -bash
[postgres@pgstandby data]$

启动之后,看到有wal receiver process。说明,此时备库依然是备库的角色了,并开始同主库进行同步了。

至此,完成了对于主备库的相互切换,并且验证了数据同步,以及注意事项。

七 小结

本文档对于在CentOS7+PostgreSQL9.4.10的数据库环境,利用浮动IP来构建一套数据库的主备同步环境,给出详细搭建和配置的流程。通过自动监控启停浮动IP来实现网络的畅通。正常情况下,主库提供读写操作,备库提供只读操作。客户端和应用程序应该通过浮动IP来访问数据库,而不应该使用真实IP访问数据库。

 

如何在Linux上源码安装PostgreSQL数据库?

零 说明

本文记录在CentOS/RHEL Linux平台上通过源码安装PostgreSQL数据库、配置数据库的过程和步骤,及注意事项。

一 前提条件

要求OS上已经安装下述软件。如果没有,可以直接通过yum来安装。

1 gmake或者make要求至少3.80版本以上

[root@localhost ~]# make -v
GNU Make 3.82
Built for x86_64-redhat-linux-gnu
Copyright (C) 2010  Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
[root@localhost ~]# gmake -v
GNU Make 3.82
Built for x86_64-redhat-linux-gnu
Copyright (C) 2010  Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
[root@localhost ~]# which make
/usr/bin/make
[root@localhost ~]# which gmake
/usr/bin/gmake
[root@localhost ~]# ll /usr/bin/make
-rwxr-xr-x. 1 root root 182776 11月  6 2016 /usr/bin/make
[root@localhost ~]# ll /usr/bin/gmake
lrwxrwxrwx. 1 root root 4 1月   8 2018 /usr/bin/gmake -> make
[root@localhost ~]#

这里,看到gmake版本为3.82。gmake是一个链接文件,其指向了make。其实,在CentOS/RHEL平台上,默认情况下,gmake等同于make。

2 C编译器

[root@localhost ~]# which gcc
/usr/bin/gcc
[root@localhost ~]# gcc -v
使用内建 specs。
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
目标:x86_64-redhat-linux
配置为:../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
线程模型:posix
gcc 版本 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) 
[root@localhost ~]#

如果没有的话,可以直接安装一个最新的gcc即可。

3 tar

[root@localhost ~]# which tar
/usr/bin/tar
[root@localhost ~]# tar --version
tar (GNU tar) 1.26
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
这是自由软件:您可以自由地更改并重新发布它。
在法律所允许的范围内不含任何担保。

由 John Gilmore 和 Jay Fenlason 所写。
[root@localhost ~]#

用于解压缩源码格式的安装压缩包文件。

 如果下载的源文件格式类似于postgresql-9.4.10.tar.gz,则用tar -zxvf postgresql-9.4.10.tar.gz解压;
如果下载的源文件格式类似于postgresql-9.4.10.tar.bz2,则用tar jxvf postgresql-9.4.10.tar.bz2解压;
注意,第二种格式文件的解压,命令行选项中不带减号-。​

4 GNU readline library

该库文件默认启用。用于在psql命令行下,可以通过键盘的上下箭头调出历史命令以及编辑之前的命令。如果不需要此功能的话,可以在configure的时候,带上--without-readline选项。

zlib compression library

该库文件默认启用。启用该选项则意味着,使用pg_dump/pg_restore对数据库进行备份/恢复的时候,不支持压缩???如果不需要此功能的话,可以在configure的时候,带上–without-zlib选项。

经过我初步验证测试,暂时还没搞清楚该库文件的具体功能和作用。暂时记下这个问题,留待我后续进一步验证一下???

https://dan.langille.org/2013/06/10/using-compression-with-postgresqls-pg_dump/

http://peter.eisentraut.org/blog/2015/12/07/check-your-pg-dump-compression-levels/

二 源码安装

1 创建postgres用户

[root@dev-middleware ~]# groupadd postgres
[root@dev-middleware ~]# useradd -g postgres postgres
[root@dev-middleware ~]# passwd postgres
更改用户 postgres 的密码 。
新的 密码:
无效的密码: 它基于字典单词
重新输入新的 密码:
passwd:所有的身份验证令牌已经成功更新。
[root@dev-middleware ~]#

2 创建数据库安装路径

假定将来要把数据库软件安装在/data/postgres/10.4/路径下,数据库的数据存放在/data/postgres/10.4/data路径下。
这里,先把/data/postgres/10.4/路径创建出来即可,/data/postgres/10.4/data路径不需提前创建。

[root@dev-middleware ~]# mkdir -p /data/postgres/10.4/
[root@dev-middleware ~]# chown -R postgres:postgres /data/postgres/10.4/
[root@dev-middleware ~]#

3 获取源码

到PostgreSQL官方网站,获取源码格式的数据库安装文件。

[root@dev-middleware ~]# su - postgres
[postgres@dev-middleware ~]$ pwd
/home/postgres
[postgres@dev-middleware ~]$ wget https://ftp.postgresql.org/pub/source/v10.4/postgresql-10.4.tar.gz
--2018-08-06 11:25:51--  https://ftp.postgresql.org/pub/source/v10.4/postgresql-10.4.tar.gz
正在解析主机 ftp.postgresql.org (ftp.postgresql.org)... 87.238.57.227, 217.196.149.55, 204.145.124.244, ...
正在连接 ftp.postgresql.org (ftp.postgresql.org)|87.238.57.227|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:26652442 (25M) [application/x-gzip]
正在保存至: “postgresql-10.4.tar.gz”

100%[======================================================================================================================================================================================================>] 26,652,442  5.83MB/s 用时 6.4s   

2018-08-06 11:26:03 (3.98 MB/s) - 已保存 “postgresql-10.4.tar.gz” [26652442/26652442])

[postgres@dev-middleware ~]$ ll
总用量 26028
-rw-rw-r-- 1 postgres postgres 26652442 5月   8 05:06 postgresql-10.4.tar.gz
[postgres@dev-middleware ~]$

4 解压源码

如果下载的源码文件格式为类似于postgresql-9.4.10.tar.bz2,则用tar jxvf postgresql-9.4.10.tar.bz2解压。

解压后,会生成一个新的postgresql-10.4路径,该路径下存放的就是源码格式的安装文件。

[postgres@dev-middleware ~]$ pwd
/home/postgres
[postgres@dev-middleware ~]$ ll
总用量 26028
-rw-rw-r-- 1 postgres postgres 26652442 5月   8 05:06 postgresql-10.4.tar.gz
[postgres@dev-middleware ~]$ 
[postgres@dev-middleware ~]$ tar -zxvf postgresql-10.4.tar.gz 
...
...
postgresql-10.4/GNUmakefile.in
postgresql-10.4/.gitattributes
postgresql-10.4/aclocal.m4
postgresql-10.4/configure.in
postgresql-10.4/INSTALL
[postgres@dev-middleware ~]$

5 执行configure

执行configure的命令行选项,–prefix参数,表示把PostgreSQL安装在哪个路径下。默认情况下,不带该参数时,则会安装在/usr/local/pgsql路径下。

[postgres@dev-middleware ~]$ pwd
/home/postgres
[postgres@dev-middleware ~]$ ll
总用量 26032
drwxrwxr-x 6 postgres postgres     4096 5月   8 05:06 postgresql-10.4
-rw-rw-r-- 1 postgres postgres 26652442 5月   8 05:06 postgresql-10.4.tar.gz
[postgres@dev-middleware ~]$ cd postgresql-10.4
[postgres@dev-middleware postgresql-10.4]$ ll
总用量 676
-rw-r--r--  1 postgres postgres    457 5月   8 04:51 aclocal.m4
drwxrwxr-x  2 postgres postgres   4096 5月   8 05:03 config
-rwxr-xr-x  1 postgres postgres 498962 5月   8 04:51 configure
-rw-r--r--  1 postgres postgres  76909 5月   8 04:51 configure.in
drwxrwxr-x 55 postgres postgres   4096 5月   8 05:03 contrib
-rw-r--r--  1 postgres postgres   1192 5月   8 04:51 COPYRIGHT
drwxrwxr-x  3 postgres postgres   4096 5月   8 05:03 doc
-rw-r--r--  1 postgres postgres   3638 5月   8 04:51 GNUmakefile.in
-rw-r--r--  1 postgres postgres    284 5月   8 04:51 HISTORY
-rw-r--r--  1 postgres postgres  71584 5月   8 05:06 INSTALL
-rw-r--r--  1 postgres postgres   1682 5月   8 04:51 Makefile
-rw-r--r--  1 postgres postgres   1212 5月   8 04:51 README
drwxrwxr-x 16 postgres postgres   4096 5月   8 05:06 src
[postgres@dev-middleware postgresql-10.4]$ ./configure --prefix=/data/postgres/10.4/
...
...
config.status: linking src/include/port/linux.h to src/include/pg_config_os.h
config.status: linking src/makefiles/Makefile.linux to src/Makefile.port
[postgres@dev-middleware postgresql-10.4]$

6 执行make world

这里,也可以简单执行make就可以build安装文件的,但是,make world的意思是指把PostgreSQL相关的文档,HTML,以及其它的一些模块都会一起编译进去。比如,在有的环境下遇到无法使用uuid或者使用不了gin index的原因,就是在编译的时候,没有包含这些对应的模块。处理这种的问题的方法也不复杂,只需要进到当时安装时的那个源文件路径下,重新执行make world,然后make install-world。

建议,在初始安装的时候,就直接用make world。

[postgres@dev-middleware postgresql-10.4]$ make world
..
..
make[2]: 离开目录“/home/postgres/postgresql-10.4/contrib/vacuumlo”
make[1]: 离开目录“/home/postgres/postgresql-10.4/contrib”
PostgreSQL, contrib, and documentation successfully made. Ready to install.
[postgres@dev-middleware postgresql-10.4]$

7 执行make install-world

[postgres@dev-middleware postgresql-10.4]$ make install-world 
...
...
/bin/mkdir -p '/data/postgres/10.4/bin'
/bin/install -c  vacuumlo '/data/postgres/10.4/bin'
make[2]: 离开目录“/home/postgres/postgresql-10.4/contrib/vacuumlo”
make[1]: 离开目录“/home/postgres/postgresql-10.4/contrib”
PostgreSQL, contrib, and documentation installation complete.
[postgres@dev-middleware postgresql-10.4]$

8 初始化数据库

[postgres@dev-middleware postgresql-10.4]$ /data/postgres/10.4/bin/initdb -D /data/postgres/10.4/data
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "zh_CN.UTF-8".
The default database encoding has accordingly been set to "UTF8".
initdb: could not find suitable text search configuration for locale "zh_CN.UTF-8"
The default text search configuration will be set to "simple".

Data page checksums are disabled.

creating directory /data/postgres/10.4/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    /data/postgres/10.4/bin/pg_ctl -D /data/postgres/10.4/data -l logfile start

[postgres@dev-middleware postgresql-10.4]$

9 启动数据库

[postgres@dev-middleware postgresql-10.4]$ cd
[postgres@dev-middleware ~]$ ps -ef|grep postgres
root     19521 10727  0 11:25 pts/0    00:00:00 su - postgres
postgres 19522 19521  0 11:25 pts/0    00:00:00 -bash
postgres 19882 19522  0 17:39 pts/0    00:00:00 ps -ef
postgres 19883 19522  0 17:39 pts/0    00:00:00 grep --color=auto postgres
[postgres@dev-middleware ~]$  /data/postgres/10.4/bin/pg_ctl -D /data/postgres/10.4/data -l /home/postgres/pg.log start
waiting for server to start.... done
server started
[postgres@dev-middleware ~]$ ps -ef|grep postgres
root     19521 10727  0 11:25 pts/0    00:00:00 su - postgres
postgres 19522 19521  0 11:25 pts/0    00:00:00 -bash
postgres 21158     1  0 17:39 pts/0    00:00:00 /data/postgres/10.4/bin/postgres -D /data/postgres/10.4/data
postgres 21160 21158  0 17:39 ?        00:00:00 postgres: checkpointer process   
postgres 21161 21158  0 17:39 ?        00:00:00 postgres: writer process   
postgres 21162 21158  0 17:39 ?        00:00:00 postgres: wal writer process   
postgres 21163 21158  0 17:39 ?        00:00:00 postgres: autovacuum launcher process   
postgres 21164 21158  0 17:39 ?        00:00:00 postgres: stats collector process   
postgres 21165 21158  0 17:39 ?        00:00:00 postgres: bgworker: logical replication launcher   
postgres 21168 19522  0 17:39 pts/0    00:00:00 ps -ef
postgres 21169 19522  0 17:39 pts/0    00:00:00 grep --color=auto postgres
[postgres@dev-middleware ~]$

10 修改环境变量

修改postgres用户的~/.bash_profile,PATH添加/data/postgres/10.4/bin,同时,添加PGDATA=/data/postgres/10.4/data环境变量。

PATH=$PATH:$HOME/.local/bin:$HOME/bin:/data/postgres/10.4/bin

export PATH

PGDATA=/data/postgres/10.4/data
export PGDATA

使其环境变量生效:

[postgres@dev-middleware ~]$ source ~/.bash_profile 
[postgres@dev-middleware ~]$

11 修改数据库参数

a 修改pg_hba.conf文件

在/data/postgres/10.4/data/pg_hba.conf文件中,添加下面的一行:

# IPv4 local connections:
host    all             all             0.0.0.0/0               md5

其,表示当前数据库服务器,对使用IPV4地址的客户端机器,允许其通过任意IP地址,通过任意数据库用户发起对该数据库服务器上的任意数据库的访问,但是必须得提供数据库用户密码。其认证方式,是通过MD5加密认证的。

一句话简介:对于使用IPV4地址的客户端,数据库服务器不做限制,可以通过任意用户访问所有数据库。

b 修改postgresql.conf 文件

在/data/postgres/10.4/data/postgresql.conf 文件中,修改#listen_addresses = ‘localhost’为#listen_addresses = ‘*’。

然后,reload使其生效。

原因是,如果不修改的话,则客户端只能通过数据库服务器的环路地址127.0.0.1来访问数据库,不能使用服务器的公网IP地址来访问。

如下,修改之前:

[postgres@dev-middleware ~]$ psql
psql (10.4)
Type "help" for help.

postgres=# show listen_addresses ;
 listen_addresses 
------------------
 localhost
(1 row)

postgres=# \q
[postgres@dev-middleware ~]$ psql -h 127.0.0.1 -p 5432 -d postgres
psql (10.4)
Type "help" for help.

postgres=# show listen_addresses ;
 listen_addresses 
------------------
 localhost
(1 row)

postgres=# \q
[postgres@dev-middleware ~]$ psql -h 172.16.11.44 -p 5432 -d postgres
psql: could not connect to server: 拒绝连接
        Is the server running on host "172.16.11.44" and accepting
        TCP/IP connections on port 5432?
[postgres@dev-middleware ~]$

修改listen_addresses = ‘*’,且重启数据库使其生效之后:

[postgres@dev-middleware ~]$ vi /data/postgres/10.4/data/postgresql.conf 
...
...

listen_addresses = '*'         # what IP address(es) to listen on;
                                        # comma-separated list of addresses;
                                        # defaults to 'localhost'; use '*' for all
                                        # (change requires restart)
#port = 5432                            # (change requires restart)
max_connections = 100                   # (change requires restart)
#superuser_reserved_connections = 3     # (change requires restart)
#unix_socket_directories = '/tmp'       # comma-separated list of directories
                                        # (change requires restart)
#unix_socket_group = ''                 # (change requires restart)
#unix_socket_permissions = 0777         # begin with 0 to use octal notation
                                        # (change requires restart)
"/data/postgres/10.4/data/postgresql.conf" 658L, 22755C written
[postgres@dev-middleware ~]$ pg_ctl restart -m fast
waiting for server to shut down....2018-08-06 18:31:40.294 CST [12290] LOG:  received fast shutdown request
2018-08-06 18:31:40.306 CST [12290] LOG:  aborting any active transactions
2018-08-06 18:31:40.307 CST [12290] LOG:  worker process: logical replication launcher (PID 12297) exited with exit code 1
2018-08-06 18:31:40.314 CST [12292] LOG:  shutting down
2018-08-06 18:31:40.347 CST [12290] LOG:  database system is shut down
 done
server stopped
waiting for server to start....2018-08-06 18:31:40.427 CST [14493] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2018-08-06 18:31:40.427 CST [14493] LOG:  listening on IPv6 address "::", port 5432
2018-08-06 18:31:40.433 CST [14493] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2018-08-06 18:31:40.487 CST [14494] LOG:  database system was shut down at 2018-08-06 18:31:40 CST
2018-08-06 18:31:40.498 CST [14493] LOG:  database system is ready to accept connections
 done
server started
[postgres@dev-middleware ~]$

可以通过数据库服务器的实际IP地址,访问数据库,

[postgres@dev-middleware ~]$ psql 
psql (10.4)
Type "help" for help.

postgres=# show listen_addresses ;
 listen_addresses 
------------------
 *
(1 row)

postgres=# \q
[postgres@dev-middleware ~]$ psql -h 127.0.0.1 -p 5432 -d postgres
psql (10.4)
Type "help" for help.

postgres=# show listen_addresses ;
 listen_addresses 
------------------
 *
(1 row)

postgres=# \q
[postgres@dev-middleware ~]$ psql -h 172.16.11.44 -p 5432 -d postgres -U postgres
Password for user postgres: 
psql (10.4)
Type "help" for help.

postgres=# show listen_addresses ;
 listen_addresses 
------------------
 *
(1 row)

postgres=# \q
[postgres@dev-middleware ~]$

12 删除PostgreSQL数据库软件

postgres用户可以进入到之前解压的源码所在的路径,通过执行make uninstall来卸载已经安装的数据库软件。如下:

[postgres@dev-middleware ~]$ pwd
/home/postgres
[postgres@dev-middleware ~]$ ll
总用量 26036
-rw------- 1 postgres postgres      939 8月   6 18:29 pg.log
drwxrwxr-x 6 postgres postgres     4096 8月   6 11:29 postgresql-10.4
-rw-rw-r-- 1 postgres postgres 26652442 5月   8 05:06 postgresql-10.4.tar.gz
[postgres@dev-middleware ~]$ make uninstall
make: *** 没有规则可以创建目标“uninstall”。 停止。
[postgres@dev-middleware ~]$ cd postgresql-10.4
[postgres@dev-middleware postgresql-10.4]$ make uninstall 
...
...
make[1]: 进入目录“/home/postgres/postgresql-10.4/config”
rm -f '/data/postgres/10.4/lib/pgxs/config/install-sh'
rm -f '/data/postgres/10.4/lib/pgxs/config/missing'
make[1]: 离开目录“/home/postgres/postgresql-10.4/config”
[postgres@dev-middleware postgresql-10.4]$

13 重新编译安装PostgreSQL数据库软件

可以通过前面第4步的,重新解压源码,用新解压出来的源码文件,进行configure,make world,make install-world。

还有一种方式,就是把之前第5步骤执行configure之后的文件,恢复到configure之前的状态,然后通过执行configure,make world,make install-world。这个命令是make distclean。

如下是第二种方式的记录:

[postgres@dev-middleware postgresql-10.4]$ pwd
/home/postgres/postgresql-10.4
[postgres@dev-middleware postgresql-10.4]$ ll
总用量 1088
-rw-r--r--  1 postgres postgres    457 5月   8 04:51 aclocal.m4
drwxrwxr-x  2 postgres postgres   4096 5月   8 05:03 config
-rw-rw-r--  1 postgres postgres 373348 8月   6 11:29 config.log
-rwxrwxr-x  1 postgres postgres  39415 8月   6 11:29 config.status
-rwxr-xr-x  1 postgres postgres 498962 5月   8 04:51 configure
-rw-r--r--  1 postgres postgres  76909 5月   8 04:51 configure.in
drwxrwxr-x 55 postgres postgres   4096 5月   8 05:03 contrib
-rw-r--r--  1 postgres postgres   1192 5月   8 04:51 COPYRIGHT
drwxrwxr-x  3 postgres postgres   4096 5月   8 05:03 doc
-rw-rw-r--  1 postgres postgres   3638 8月   6 11:29 GNUmakefile
-rw-r--r--  1 postgres postgres   3638 5月   8 04:51 GNUmakefile.in
-rw-r--r--  1 postgres postgres    284 5月   8 04:51 HISTORY
-rw-r--r--  1 postgres postgres  71584 5月   8 05:06 INSTALL
-rw-r--r--  1 postgres postgres   1682 5月   8 04:51 Makefile
-rw-r--r--  1 postgres postgres   1212 5月   8 04:51 README
drwxrwxr-x 16 postgres postgres   4096 8月   6 11:29 src
[postgres@dev-middleware postgresql-10.4]$ make distclean
...
...
make[1]: 离开目录“/home/postgres/postgresql-10.4/src”
rm -rf tmp_install/
rm -f config.cache config.log config.status GNUmakefile
[postgres@dev-middleware postgresql-10.4]$ ll
总用量 676
-rw-r--r--  1 postgres postgres    457 5月   8 04:51 aclocal.m4
drwxrwxr-x  2 postgres postgres   4096 5月   8 05:03 config
-rwxr-xr-x  1 postgres postgres 498962 5月   8 04:51 configure
-rw-r--r--  1 postgres postgres  76909 5月   8 04:51 configure.in
drwxrwxr-x 55 postgres postgres   4096 5月   8 05:03 contrib
-rw-r--r--  1 postgres postgres   1192 5月   8 04:51 COPYRIGHT
drwxrwxr-x  3 postgres postgres   4096 5月   8 05:03 doc
-rw-r--r--  1 postgres postgres   3638 5月   8 04:51 GNUmakefile.in
-rw-r--r--  1 postgres postgres    284 5月   8 04:51 HISTORY
-rw-r--r--  1 postgres postgres  71584 5月   8 05:06 INSTALL
-rw-r--r--  1 postgres postgres   1682 5月   8 04:51 Makefile
-rw-r--r--  1 postgres postgres   1212 5月   8 04:51 README
drwxrwxr-x 16 postgres postgres   4096 8月   6 19:10 src
[postgres@dev-middleware postgresql-10.4]$

三 小结

本文用于记录如何在Linux平台上,通过编译源码的方式,来安装配置PostgreSQL数据库,作一个简单记录。更为详细的信息可以参考官方文档

mongodb connection refused的可能原因

一 OS环境:

 
[root@localhost ~]# cat /etc/redhat-release 
CentOS Linux release 7.5.1804 (Core) 
[root@localhost ~]# uname -rm
3.10.0-862.3.2.el7.x86_64 x86_64
[root@localhost ~]#

二 mongo信息:

MongoDB server version: 4.0.0

且防火墙,SELINUX全关闭的情况下,访问mongodb报错。

三 错误信息如下:

[root@localhost ~]# mongo 172.16.143.250:27017/admin
MongoDB shell version v4.0.0
connecting to: mongodb://172.16.143.250:27017/admin
2018-07-20T09:30:52.691+0800 E QUERY    [js] Error: couldn't connect to server 172.16.143.250:27017, connection attempt failed: SocketException: Error connecting to 172.16.143.250:27017 :: caused by :: Connection refused :
connect@src/mongo/shell/mongo.js:251:13
@(connect):1:6
exception: connect failed
[root@localhost ~]#

且在服务器本地及其它机器上telnet该机器的mongodb默认端口27017都不通:

$ telnet 172.16.143.250 27017
Trying 172.16.143.250...
telnet: Unable to connect to remote host: Connection refused

四 原因及解决办法:

如果服务器上有多个IP地址的话,需要将mongodb的服务绑定到多个IP地址,在配置文件里加上:

bind_ip_all=on

当该参数配置为bind_ip_all=all时,启动mongodb会报出下述错误:

 Error parsing INI config file: the argument ('all') for option 'bind_ip_all' is invalid. Valid choices are 'on|off', 'yes|no', '1|0' and 'true|false'​

重启mongodb,然后访问正常。

该机器172.16.143.250的IP地址如下:

[root@localhost mongodata]# ifconfig 
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.10.10.1  netmask 255.255.255.0  broadcast 0.0.0.0
        inet6 fe80::42:2cff:fef4:41cb  prefixlen 64  scopeid 0x20
        ether 02:42:2c:f4:41:cb  txqueuelen 0  (Ethernet)
        RX packets 15725  bytes 857602 (837.5 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 20945  bytes 2675605 (2.5 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp0s31f6: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.143.250  netmask 255.255.255.0  broadcast 172.16.143.255
        inet6 fe80::3bf1:41bf:bb41:1a74  prefixlen 64  scopeid 0x20
        ether 2c:fd:a1:be:65:d3  txqueuelen 1000  (Ethernet)
        RX packets 83706  bytes 17618724 (16.8 MiB)
        RX errors 0  dropped 102  overruns 0  frame 0
        TX packets 30918  bytes 6621448 (6.3 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 16  memory 0xdf200000-df220000  

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 332117  bytes 34574785 (32.9 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 332117  bytes 34574785 (32.9 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth6d13f43: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::38d0:60ff:fe0d:1867  prefixlen 64  scopeid 0x20
        ether 3a:d0:60:0d:18:67  txqueuelen 0  (Ethernet)
        RX packets 4059  bytes 273076 (266.6 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 5426  bytes 580733 (567.1 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth8741a2f: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::14d2:d7ff:fe8e:75f7  prefixlen 64  scopeid 0x20
        ether 16:d2:d7:8e:75:f7  txqueuelen 0  (Ethernet)
        RX packets 3401  bytes 245833 (240.0 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 4769  bytes 871463 (851.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vetha20a789: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::e829:79ff:fea8:558d  prefixlen 64  scopeid 0x20
        ether ea:29:79:a8:55:8d  txqueuelen 0  (Ethernet)
        RX packets 4058  bytes 273390 (266.9 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 5423  bytes 637072 (622.1 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vetha26bafa: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::58cc:3aff:fec9:ad8e  prefixlen 64  scopeid 0x20
        ether 5a:cc:3a:c9:ad:8e  txqueuelen 0  (Ethernet)
        RX packets 4063  bytes 273661 (267.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 5430  bytes 594535 (580.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@localhost mongodata]#

配置scp,ssh,rsync到另外一台主机无需密码的另外一种方式

配置scp,ssh,rsync到另外一台主机无需密码的另外一种方式

一 场景需求:

一台AIX主机,一台Linux主机。
需用从AIX主机上通过oracle用户SCP传输文件到Linux主机,且要求SCP传输文件的过程中,要脚本自动化处理,无需人工干预,即免去手动键入密码的环节。

可是AIX上没有ssh-copy-id的可用命令



# ssh-copy-id   
ksh: ssh-copy-id:  not found
# uname -M
IBM,8202-E4B
# uname -n
usp720
# uname -a
AIX usp720 1 6 00F67A854C00
# oslevel 
6.1.0.0
# 

二 解决方法:

1 AIX上,Oracle用户执行,ssh-keygen -t rsa:

ssh-keygen -t rsa
执行过程中,一路Enter回车即可。

2 AIX上,Oracle用户执行,cat ~/.ssh/id_rsa.pub文件内容:

3 Linux上,Oracle用户执行,把步骤2中的文件内容,写入~/.ssh/authorized_keys:

同时,修改~/.ssh/authorized_keys的文件权限为700。

 
chmod 700~/.ssh/authorized_keys

4 AIX上,执行:

 
bash-4.3$ hostname
usp720
bash-4.3$ ssh oracle@172.18.1.12
Last login: Wed Mar 14 16:41:42 2018 from 172.18.1.1
localhost-> hostname
localhost.localdomain
localhost-> 

 

三 小记:

早在2008年的时候,在学习Oracle 10gR2 RAC在CentOS 4.8上安装部署时,配置RAC双节点,Oracle用户的互信,就采用的这种方法。10年前学的知识点被唤醒了。

在CentOS 7上安装redis

1 下载:从Redis官方网站下载安装最新的稳定版安装包;

[root@localhost ~]# cat /etc/redhat-release 
CentOS Linux release 7.2.1511 (Core) 
[root@localhost ~]# uname -rm
3.10.0-327.el7.x86_64 x86_64
[root@localhost ~]# wget http://download.redis.io/releases/redis-4.0.2.tar.gz
--2017-11-22 14:32:03--  http://download.redis.io/releases/redis-4.0.2.tar.gz
正在解析主机 download.redis.io (download.redis.io)... 109.74.203.151
正在连接 download.redis.io (download.redis.io)|109.74.203.151|:80... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:1713990 (1.6M) [application/x-gzip]
正在保存至: “redis-4.0.2.tar.gz”

100%[===================================================================================================================================>] 1,713,990    890KB/s 用时 1.9s   

2017-11-22 14:32:09 (890 KB/s) - 已保存 “redis-4.0.2.tar.gz” [1713990/1713990])

[root@localhost ~]#

2 解压:解压安装包;

[root@localhost ~]# tar -zxvf redis-4.0.2.tar.gz 
...
...
redis-4.0.2/utils/releasetools/02_upload_tarball.sh
redis-4.0.2/utils/releasetools/03_test_release.sh
redis-4.0.2/utils/releasetools/04_release_hash.sh
redis-4.0.2/utils/releasetools/changelog.tcl
redis-4.0.2/utils/speed-regression.tcl
redis-4.0.2/utils/whatisdoing.sh
[root@localhost ~]#

3 编译:进入安装包,执行make;

 [root@localhost ~]# cd redis-4.0.2
[root@localhost redis-4.0.2]# pwd
/root/redis-4.0.2
[root@localhost redis-4.0.2]# ll
总用量 280
-rw-rw-r--  1 root root 131381 9月  21 22:12 00-RELEASENOTES
-rw-rw-r--  1 root root     53 9月  21 22:12 BUGS
-rw-rw-r--  1 root root   1815 9月  21 22:12 CONTRIBUTING
-rw-rw-r--  1 root root   1487 9月  21 22:12 COPYING
drwxrwxr-x  6 root root   4096 9月  21 22:12 deps
-rw-rw-r--  1 root root     11 9月  21 22:12 INSTALL
-rw-rw-r--  1 root root    151 9月  21 22:12 Makefile
-rw-rw-r--  1 root root   4223 9月  21 22:12 MANIFESTO
-rw-rw-r--  1 root root  20530 9月  21 22:12 README.md
-rw-rw-r--  1 root root  57764 9月  21 22:12 redis.conf
-rwxrwxr-x  1 root root    271 9月  21 22:12 runtest
-rwxrwxr-x  1 root root    280 9月  21 22:12 runtest-cluster
-rwxrwxr-x  1 root root    281 9月  21 22:12 runtest-sentinel
-rw-rw-r--  1 root root   7606 9月  21 22:12 sentinel.conf
drwxrwxr-x  3 root root   4096 9月  21 22:12 src
drwxrwxr-x 10 root root   4096 9月  21 22:12 tests
drwxrwxr-x  8 root root   4096 9月  21 22:12 utils
[root@localhost redis-4.0.2]# make 
cd src && make all
make[1]: 进入目录“/root/redis-4.0.2/src”
    CC Makefile.dep
make[1]: 离开目录“/root/redis-4.0.2/src”
make[1]: 进入目录“/root/redis-4.0.2/src”
...
...
    INSTALL redis-check-rdb
    INSTALL redis-check-aof

Hint: It's a good idea to run 'make test' ;)

make[1]: 离开目录“/root/redis-4.0.2/src”
[root@localhost redis-4.0.2]#

4 启动Redis;

 [root@localhost redis-4.0.2]# /root/redis-4.0.2/src/redis-server 
1745:C 22 Nov 14:39:22.049 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1745:C 22 Nov 14:39:22.049 # Redis version=4.0.2, bits=64, commit=00000000, modified=0, pid=1745, just started
1745:C 22 Nov 14:39:22.049 # Warning: no config file specified, using the default config. In order to specify a config file use /root/redis-4.0.2/src/redis-server /path/to/redis.conf
1745:M 22 Nov 14:39:22.051 * Increased maximum number of open files to 10032 (it was originally set to 1024).
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 4.0.2 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 1745
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

1745:M 22 Nov 14:39:22.052 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1745:M 22 Nov 14:39:22.052 # Server initialized
1745:M 22 Nov 14:39:22.052 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1745:M 22 Nov 14:39:22.052 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1745:M 22 Nov 14:39:22.052 * Ready to accept connections

 

 

至此,Redis已经安装完毕。

5 通过redis-cli,访问和使用Redis:

 [root@localhost ~]# /root/redis-4.0.2/src/redis-cli 
127.0.0.1:6379> info
# Server
redis_version:4.0.2
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:1e096713e5234ce6
redis_mode:standalone
os:Linux 3.10.0-327.el7.x86_64 x86_64
...
....
127.0.0.1:6379> set a testredis
OK
127.0.0.1:6379> get a
"testredis"
127.0.0.1:6379>

 

接着看文档,继续动手实践。

zookeeper启动之后又异常死掉的处理

通过in-place的方式把一台CentOS 6.5的Linux机器升级到Centos 7.2之后,发现该机器上之前运行的zookeeper启动异常,记录如下:

 
[root@zkserver1 bin]# sh zkServer.sh start
JMX enabled by default
Using config: /usr/local/zookeeper-3.4.5/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@zkserver1 bin]# ps -ef|grep zook
root       891   843  0 14:10 pts/0    00:00:00 grep --color=auto zook
[root@zkserver1 bin]# cat zookeeper.out 
2017-06-22 14:10:04,154 [myid:] - INFO  [main:QuorumPeerConfig@101] - Reading configuration from: /usr/local/zookeeper-3.4.5/bin/../conf/zoo.cfg
2017-06-22 14:10:04,166 [myid:] - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2017-06-22 14:10:04,167 [myid:] - INFO  [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
2017-06-22 14:10:04,168 [myid:] - INFO  [main:DatadirCleanupManager@101] - Purge task is not scheduled.
2017-06-22 14:10:04,168 [myid:] - WARN  [main:QuorumPeerMain@113] - Either no config or no quorum defined in config, running  in standalone mode
2017-06-22 14:10:04,190 [myid:] - INFO  [main:QuorumPeerConfig@101] - Reading configuration from: /usr/local/zookeeper-3.4.5/bin/../conf/zoo.cfg
2017-06-22 14:10:04,191 [myid:] - INFO  [main:ZooKeeperServerMain@95] - Starting server
2017-06-22 14:10:04,206 [myid:] - INFO  [main:Environment@100] - Server environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
2017-06-22 14:10:04,207 [myid:] - INFO  [main:Environment@100] - Server environment:host.name=zkserver1
2017-06-22 14:10:04,207 [myid:] - INFO  [main:Environment@100] - Server environment:java.version=1.7.0_21
2017-06-22 14:10:04,208 [myid:] - INFO  [main:Environment@100] - Server environment:java.vendor=Oracle Corporation
2017-06-22 14:10:04,209 [myid:] - INFO  [main:Environment@100] - Server environment:java.home=/usr/java/jdk1.7.0_21/jre
2017-06-22 14:10:04,209 [myid:] - INFO  [main:Environment@100] - Server environment:java.class.path=/usr/local/zookeeper-3.4.5/bin/../build/classes:/usr/local/zookeeper-3.4.5/bin/../build/lib/*.jar:/usr/local/zookeeper-3.4.5/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/local/zookeeper-3.4.5/bin/../lib/slf4j-api-1.6.1.jar:/usr/local/zookeeper-3.4.5/bin/../lib/netty-3.2.2.Final.jar:/usr/local/zookeeper-3.4.5/bin/../lib/log4j-1.2.15.jar:/usr/local/zookeeper-3.4.5/bin/../lib/jline-0.9.94.jar:/usr/local/zookeeper-3.4.5/bin/../zookeeper-3.4.5.jar:/usr/local/zookeeper-3.4.5/bin/../src/java/lib/*.jar:/usr/local/zookeeper-3.4.5/bin/../conf:
2017-06-22 14:10:04,210 [myid:] - INFO  [main:Environment@100] - Server environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2017-06-22 14:10:04,211 [myid:] - INFO  [main:Environment@100] - Server environment:java.io.tmpdir=/tmp
2017-06-22 14:10:04,212 [myid:] - INFO  [main:Environment@100] - Server environment:java.compiler=
2017-06-22 14:10:04,212 [myid:] - INFO  [main:Environment@100] - Server environment:os.name=Linux
2017-06-22 14:10:04,213 [myid:] - INFO  [main:Environment@100] - Server environment:os.arch=amd64
2017-06-22 14:10:04,214 [myid:] - INFO  [main:Environment@100] - Server environment:os.version=3.10.0-327.el7.x86_64
2017-06-22 14:10:04,215 [myid:] - INFO  [main:Environment@100] - Server environment:user.name=root
2017-06-22 14:10:04,215 [myid:] - INFO  [main:Environment@100] - Server environment:user.home=/root
2017-06-22 14:10:04,216 [myid:] - INFO  [main:Environment@100] - Server environment:user.dir=/usr/local/zookeeper-3.4.5/bin
2017-06-22 14:10:04,227 [myid:] - INFO  [main:ZooKeeperServer@726] - tickTime set to 2000
2017-06-22 14:10:04,228 [myid:] - INFO  [main:ZooKeeperServer@735] - minSessionTimeout set to -1
2017-06-22 14:10:04,229 [myid:] - INFO  [main:ZooKeeperServer@744] - maxSessionTimeout set to -1
2017-06-22 14:10:04,274 [myid:] - INFO  [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:2181
2017-06-22 14:10:04,343 [myid:] - ERROR [main:FileTxnSnapLog@210] - Parent /dubbo/com.onlyou.es.bck.service.IUserStoreService/consumers missing for /dubbo/com.onlyou.es.bck.service.IUserStoreService/consumers/consumer%3A%2F%2F172.16.11.73%2Fcom.onlyou.es.bck.service.IUserStoreService%3Fapplication%3Des-client-web%26category%3Dconsumers%26check%3Dfalse%26default.check%3Dfalse%26default.reference.filter%3DdubboConsumerFilter%26default.retries%3D0%26default.timeout%3D1000000%26dubbo%3D2.5.3%26interface%3Dcom.onlyou.es.bck.service.IUserStoreService%26methods%3DgetUserStoreByCorpIdAndStoreId%2CdeleteStoreUserById%2ClistStoreCode%2CgetUserStoreByUserId%2CsearchStoreIdsByUserId%2CsearchUserStaffList%2CsaveStoreUser%2CgetUserStoreByCorpId%2CsearchStoreInfosByUserId%26pid%3D488%26revision%3Ddev-SNAPSHOT%26side%3Dconsumer%26timestamp%3D1496999541671
2017-06-22 14:10:04,345 [myid:] - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
java.io.IOException: Failed to process transaction type: 1 error: KeeperErrorCode = NoNode for /dubbo/com.onlyou.es.bck.service.IUserStoreService/consumers
        at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:153)
        at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
        at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250)
        at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
        at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
        at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
        at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
        at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
        at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /dubbo/com.onlyou.es.bck.service.IUserStoreService/consumers
        at org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:211)
        at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
        ... 9 more
[root@zkserver1 bin]#

原因:可能是服务器升级的过程中,导致zookeeper的数据文件被”破坏”了,导致重启zookeeper时异常。

解决:删除zookeeper的数据文件/tmp/zookeeper,然后重启zookeeper。

一则PostgreSQL数据库的性能问题定位处理

一、先看一台运行PostgreSQL数据库服务器的top和iostat相关信息图:

top:
[root@db2 ~ 11:14:42]$ top

top - 11:16:10 up 2 days, 13:01,  5 users,  load average: 51.62, 41.75, 29.06
Tasks: 948 total,   5 running, 943 sleeping,   0 stopped,   0 zombie
Cpu(s):  9.0%us, 59.2%sy,  0.0%ni, 14.1%id, 17.2%wa,  0.0%hi,  0.5%si,  0.0%st
Mem:  16284812k total, 16159260k used,   125552k free,     5700k buffers
Swap:  8191992k total,   107980k used,  8084012k free,  8010540k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                
31450 postgres  20   0 4534m 2.3g 2.2g D 46.6 15.0   7:02.50 postgres                                                
30166 postgres  20   0 4528m 2.3g 2.2g D 46.0 14.9   4:05.40 postgres                                                
30185 postgres  20   0 4494m 2.2g 2.1g D 43.5 14.1   4:05.08 postgres                                                
31453 postgres  20   0 4529m 1.9g 1.8g D 42.9 12.3   3:39.13 postgres                                                
30232 postgres  20   0 4526m 2.3g 2.1g D 40.4 14.6   6:07.51 postgres                                                
 2811 postgres  20   0 4521m 874m 742m D 39.8  5.5   1:36.34 postgres                                                
31457 postgres  20   0 4534m 1.8g 1.7g R 39.5 11.6   3:55.23 postgres                                                
30956 postgres  20   0 4521m 1.7g 1.5g D 38.8 10.8   2:14.67 postgres                                                
 2714 postgres  20   0 4519m 1.4g 1.3g D 37.9  9.1   1:19.96 postgres                                                
30182 postgres  20   0 4525m 1.9g 1.8g D 37.9 12.5   3:40.31 postgres                                                
31444 postgres  20   0 4525m 2.2g 2.0g D 37.6 13.9   3:29.11 postgres                                                
31654 postgres  20   0 4526m 2.4g 2.2g D 36.7 15.3   5:04.19 postgres                                                
 2717 postgres  20   0 4519m 847m 718m D 36.1  5.3   1:37.20 postgres                                                
  753 postgres  20   0 4533m 3.7g 3.6g D 35.4 23.9  27:52.65 postgres                                                
31451 postgres  20   0 4433m 1.8g 1.7g D 35.4 11.5   2:36.85 postgres                                                
30701 postgres  20   0 4520m 1.7g 1.6g D 35.1 11.1   2:09.85 postgres                                                
31448 postgres  20   0 4435m 2.2g 2.1g D 33.3 13.9   4:16.70 postgres                                                
29192 postgres  20   0 4526m 2.3g 2.1g D 32.6 14.6   4:19.91 postgres                                                
30693 postgres  20   0 4525m 1.9g 1.8g D 32.0 12.4   2:29.72 postgres                                                
 3448 postgres  20   0 4451m 383m 305m D 31.4  2.4   0:49.98 postgres                                                
 3568 postgres  20   0 4388m 131m 113m D 29.5  0.8   0:10.03 postgres                                                
 3435 postgres  20   0 4376m 364m 355m D 29.2  2.3   0:12.23 postgres                                                
[root@db2 ~ 11:16:22]$ 
iostat:
 
[root@db2 ~ 11:16:25]$ iostat 1 10 /dev/emcpowerf 
Linux 2.6.32-279.el6.x86_64 (db2) 	2016年06月02日 	_x86_64_	(24 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.51    0.00    0.84    2.04    0.00   92.61

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf       210.87     10256.34     11724.86 2257542690 2580782824

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          10.07    0.00   80.00    7.12    0.00    2.82

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf     34443.00   3095056.00   1747336.00    3095056    1747336

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.38    0.00   90.25    1.25    0.00    0.13

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf        83.79      7430.09     14483.79      89384     174240

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          18.00    0.00   70.16   10.69    0.00    1.14

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      2887.25    323498.04     71545.10     329968      72976

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.76    0.00   86.36    3.17    0.00    0.71

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      1996.00    222072.00     34056.00     222072      34056

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           7.75    0.00   89.37    2.50    0.00    0.38

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      2344.00    229568.00    149152.00     229568     149152

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.68    0.00   82.81    5.88    0.00    1.63

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      2585.00    242224.00    154320.00     242224     154320

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          14.57    0.00   24.36   34.20    0.00   26.87

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      2692.08    283271.29    137940.59     286104     139320

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          11.80    0.00   85.10    2.02    0.00    1.07

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      3367.33    268491.09    124879.21     271176     126128

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.77    0.00   85.36    4.32    0.00    0.54

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
emcpowerf      2792.00    274976.00    143192.00     274976     143192

[root@db2 ~ 11:23:07]$
以及,文件系统结构图:
[root@db2 ~ 11:03:54]$ df -Th
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/sda1     ext4     58G   14G   42G  25% /
tmpfs        tmpfs    7.8G     0  7.8G   0% /dev/shm
/dev/sda3     ext4    210G   54G  146G  27% /data/log
/dev/emcpowerd1
              ext4     99G   58G   37G  62% /data/pgsql/backup
/dev/emcpowerc1
              ext4     99G   89G  5.5G  95% /data/pgsql/data_old
/dev/emcpowerf
              ext4    197G  146G   42G  78% /data/pgsql/data
/dev/emcpowerg
              ext4    493G   66G  403G  14% /data/pgsql/backup/archive
[root@db2 ~ 11:04:14]$

二 结论:数据库服务器已经出现性能瓶颈,具体问题出在系统IO上。通常情况是,应用系统的SQL写的不够好,或者是糟糕的SQL执行计划导致消耗过高的IO资源。

三 分析思路

1 从top上,看到系统负载已经比较高:load average: 51.62, 41.75, 29.06
2 CPU负载上看,Cpu(s):  9.0%us, 59.2%sy,  0.0%ni, 14.1%id, 17.2%wa,  0.0%hi,  0.5%si,  0.0%st 。

其中, 9.0%us用于user space process处理较为正常。但是在系统负载较高的情况下,且有59.2%sy的CPU资源用于system kernel running。The amount of time spent in the kernel should be as low as possible.This number can peak much higher, especially when there is a lot of I/O happening.这里,意味着CPU消耗在用于系统调度上的时间偏高了,尤其是当系统出现大量IO操作时,该值可能非常高。这里的59%略高,系统有IO等待操作。

另,14.1%id, 系统CPU闲置时间比较低,通常情况下我们希望它不要太低,太低了说明CPU太忙,太高也不好,太高了意味着CPU一直处于空闲状态。17.2%wa,CPU用于等待IO操作完成的时间占比。这里,CPU在等待IO完成。

3 进程状态列,有大量的进程处于D状态。进程状态通常有:

R:TASK_RUNNING,可执行状态。处于此状态的进程正在CPU上运行;同一时间,少量进程处于该状态;

S:TASK_INTERRUPTIBLE,可中断睡眠状态。处于该状态的进程因为正在等待某些事件发生(如等待socket连接,等待信号量),而事件还未发生,所以被挂起;同一时间,大量进程处于该状态

D:TASK_UNINTERRUPTIBLE,不可中断睡眠状态。该状态的进程同样处于休眠状态,但是该进程不能被其它事件或信号所打断。常见于,在进程对硬件设备进行操作时(如进程调用read系统来对某个设备文件进行读操作,read系统需要调用到最终执行对应设备的驱动代码,并且与对应设备进行交互时),可能需要使用TASK_UNINTERRUPTIBLE状态来保护进程,以避免进程在与设备进行交互的过程中被打断,最终造成设备陷入不可控的状态。

结合本例,可以看到有大量进程处于D状态,即不可中断的睡眠状态。说明有大量的数据库进程在与操作系统的IO系统进行交互,且这些进程不能被中断。说白了,就是进程在进行的IO操作迟迟未完成,且该进程不能被打断。就是系统出现了大量的IO等待。

4 从iostat上,也可以看到用于存放数据库的设备上IO操作频繁,该设备的IO请求很高。

这是去年今日(2016年6月2日)处理的一则PostgreSQL数据库性能问题定位的case,不是特意要放在今天(2017年6月2日)来说。而是,近期在看几篇Linux文档,加上之前翻看《PostgreSQL 9 Administration Cookbook》这本书,P249里面提到:

Not enough CPU power or disk I/O capacity for the current load

These are usually caused by suboptimal query plans, but sometimes you just have not a powerful enough computer.

Here, top is your friend for quick checks, and from the command line, run the following:

user@host:~$ top

First, watch the CPU idle reading in the top. If this is in low single-digits for most of the time, you probably have problems with CPU power.

If you have a high load average with still lots of CPU idle left, you are probably out of disk bandwidth. In this case, you should also have lots of postgres processes in status D.

 

索性,做一个整理总结,便于理清自己处理性能问题时,能有一个个清晰的思路和线索。

四 参考:

1 Understanding Linux CPU stats
2  top进程状态详解R,S,D  

Linux环境设置zookeeper、memcached、activemq、dubbo自动启动

一 zookeeper

1 修改zkServer.sh(/usr/local/zookeeper-3.4.5/bin/zkServer.sh)文件的第一行:

#!/usr/bin/env bash

为:

#!/bin/bash

# description: Zookeeper Start Stop Restart

# processname: zookeeper

# chkconfig: 244 30 80

 

2 修改:

# use POSTIX interface, symlink is followed automatically

ZOOBIN=”${BASH_SOURCE-$0}”

ZOOBIN=`dirname ${ZOOBIN}`

ZOOBINDIR=`cd ${ZOOBIN}; pwd`

 

if [ -e “$ZOOBIN/../libexec/zkEnv.sh” ]; then

# use POSTIX interface, symlink is followed automatically

ZOOSH=`readlink $0`

ZOOBIN=`dirname $ZOOSH`

ZOOBINDIR=`cd $ZOOBIN; pwd`

ZOO_LOG_DIR=`echo $ZOOBIN`

 

if [ -e “$ZOOBIN/../libexec/zkEnv.sh” ]; then

 

3 创建软链接:

[root@dev-middleware ~]# ln -s /usr/local/zookeeper-3.4.5/bin/zkServer.sh  /etc/init.d/zookeeper

 

4 添加自动启动

[root@dev-middleware ~]# chkconfig –list zookeeper

zookeeper 服务支持 chkconfig,但它在任何级别中都没有被引用(运行“chkconfig –add zookeeper”)

[root@dev-middleware ~]# chkconfig –add zookeeper

[root@dev-middleware ~]# chkconfig –level 2345 zookeeper on

[root@dev-middleware ~]# chkconfig –list zookeeper

zookeeper       0:关闭  1:关闭  2:启用  3:启用  4:启用  5:启用  6:关闭

[root@dev-middleware ~]#

 

5 参考

http://positivealex.github.io/blog/posts/how-to-install-zookeeper-as-service-on-centos/

二 memcached

[root@dev-middleware ~]# chkconfig –list memcached

memcached 服务支持 chkconfig,但它在任何级别中都没有被引用(运行“chkconfig –add memcached”)

[root@dev-middleware ~]# chkconfig –add memcached

[root@dev-middleware ~]# chkconfig –level 2345 memcached on

[root@dev-middleware ~]# chkconfig –list memcached

memcached       0:关闭  1:关闭  2:启用  3:启用  4:启用  5:启用  6:关闭

[root@dev-middleware ~]#

三 activemq

1 修改文件activemq (/usr/local/apache-activemq-5.9.0/bin/activemq),第1行:

#!/bin/sh

为:

#!/bin/sh

### BEGIN INIT INFO

# Provides:             activemq

# Required-Start:       $remote_fs $syslog

# Required-Stop:        $remote_fs $syslog

# Default-Start:        2 3 4 5

# Default-Stop:         0 6

# Short-Description:    ActiveMQ server

### END INIT INFO

 

2 创建软链接:

[root@dev-middleware bin]# ln -s /usr/local/apache-activemq-5.9.0/bin/activemq /etc/init.d/activemq

[root@dev-middleware bin]#

 

3 添加自动启动

[root@dev-middleware bin]# chkconfig –list activemq

activemq 服务支持 chkconfig,但它在任何级别中都没有被引用(运行“chkconfig –add activemq”)

[root@dev-middleware bin]# chkconfig –add activemq

[root@dev-middleware bin]# chkconfig –level 2345 activemq on

[root@dev-middleware bin]# chkconfig –list activemq

activemq        0:关闭  1:关闭  2:启用  3:启用  4:启用  5:启用  6:关闭

[root@dev-middleware bin]#

 

4 参考

http://www.liaoxuefeng.com/article/0013738918072162b1c2a36eb0f40e690d3902acf60c8fb000

四 dubbo

 

[root@dev-middleware ~]# ll /etc/init.d/tomcat-dubbo-admin

-rwxr-xr-x 1 root root 1969 9月   8 13:49 /etc/init.d/tomcat-dubbo-admin

[root@dev-middleware ~]# /etc/init.d/tomcat-dubbo-admin

start/stop

[root@dev-middleware ~]# chkconfig –list tomcat-dubbo-admin

tomcat-dubbo-admin 服务支持 chkconfig,但它在任何级别中都没有被引用(运行“chkconfig –add tomcat-dubbo-admin”)

[root@dev-middleware ~]# chkconfig –add tomcat-dubbo-admin

[root@dev-middleware ~]# chkconfig –level 2345 tomcat-dubbo-admin on

[root@dev-middleware ~]# chkconfig –list tomcat-dubbo-admin

tomcat-dubbo-admin      0:关闭  1:关闭  2:启用  3:启用  4:启用  5:启用  6:关闭

[root@dev-middleware ~]#

Linux环境配置SFTP小结

说明:简明扼要的记录在Linux环境上配置SFTP的注意事项,主要是自己在该问题的小细节上犯过几次同样的错误,我必须要动手把它记录、写下来,自己还要时不时的翻看一下,避免在同样的问题上连续跌倒(*^__^*)

一 配置步骤

1.创建SFTP用户组

groupadd sftp

2.添加用户并设置为sftp

 useradd -g sftp -s /sbin/nologin -M  sftpUser

其中:(/sbin/nologin为禁止登录shell的用户,即该用户sftpUser无法获取shell,也就限制其登录主机)

3.设置用户密码

passwd sftpUser

4.创建用户目录,并设置权限

cd /home

mkdir sftpUserDir

chown root:sftp sftpUserDir

chmod 755 sftpUserDir

这里:设置sftp用户sftpUser将来通过SFTP来上传、下载的目录为/home/sftpUserDir。同时,设置该目录目录/home/sftpUserDir的所属用户和组为root:sftp,且其权限务必为755,否则SFTP会报错!

5.修改SSH配置

修改/etc/ssh/sshd_config配置文件

修改Subsystem为:

Subsystem sftp internal-sftp

6.sshd_config添加用户配置

Match User sftpUser  #配置允许SFTP操作的用户名为sftpUser

Match Group sftp   #配置允许SFTP操作的用户组为sftp

X11Forwarding no

AllowTcpForwarding no

ForceCommand internal-sftp

ChrootDirectory /home/sftpUserDir  #配置用户的根目录

说明:Match User sftpUser和Match Group sftp的配置,可以有选择性的配置,也可以组合配置。中间3行为允许SFTP转发的配置。

7.最后重启SSH

/etc/init.d/ssh restart

二 验证测试

在其它的机器上执行:

sftp sftpUser@SFTP_server_IP

三 小结

1  ChrootDirectory 的所有父目录的权限最高只能是755 ,否则会抛出下述错误!

Read from remote host 172.16.xx.xxx: Connection reset by peer

Couldn’t read packet: Connection reset by peer

2 SFTP Permission denied处理

com.jcraft.jsch.SftpException: Permission denied

问题的原因是,SFTP服务器的SELINUX没关闭。

解决办法:

查看SELINUX的配置:getenforce

关闭SELINUX:setenforce 0

 勿以善小而不为,不要以为不起眼的小问题就不够引起你对它的重视。

How to solve No termcap nor curses library found

Linux平台下,想在Oracle数据库的SQL*PLUS的命令行提示符下,利用键盘上的上下箭头,调出历史执行的SQL命令,可以考虑采用rlwrap这一开源工具

现,简单记录一下安装过程(Red Hat Enterprise Linux as 5.4环境)。以前在4版本的操作系统上安装时很顺利,就不再记录。

①解压并安装rlwrap工具:

[root@server1 ~]# tar -zxvf rlwrap-0.30.tar.gz

.......

.......

[root@server1 ~]# cd rlwrap-0.30 && ll

total 464

-rw-rw-r-- 1 hekan hekan  32266 Jan  8  2008 aclocal.m4

-rw-r--r-- 1 hekan hekan    638 Aug 15  2006 AUTHORS

-rw-r--r-- 1 hekan hekan   4740 Nov 10  2007 BUGS

-rw-r--r-- 1 hekan hekan    105 Aug 10  2003 ChangeLog

drwxrwxrwx 2 hekan hekan   4096 Jan  8  2008 completions

-rw-rw-r-- 1 hekan hekan   5489 Jan  8  2008 config.h.in

-rwxr-xr-x 1 hekan hekan 217282 Jan  8  2008 configure

-rw-r--r-- 1 hekan hekan   9860 Jan  8  2008 configure.ac

-rw-r--r-- 1 hekan hekan  17992 Aug  8  2006 COPYING

drwxrwxrwx 2 hekan hekan   4096 Jan  8  2008 distribution

drwxrwxrwx 2 hekan hekan   4096 Jan  8  2008 doc

-rw-r--r-- 1 hekan hekan   1927 Nov 10  2007 INSTALL

-rw-r--r-- 1 hekan hekan    436 Nov 10  2007 Makefile.am

-rw-rw-r-- 1 hekan hekan  22987 Jan  8  2008 Makefile.in

-rw-r--r-- 1 hekan hekan  11187 Jan  3  2008 NEWS

-rw-r--r-- 1 hekan hekan   2745 Nov 10  2007 README

drwxrwxrwx 2 hekan hekan   4096 Jan  8  2008 src

drwxrwxrwx 2 hekan hekan   4096 Jan  8  2008 test

-rw-r--r-- 1 hekan hekan    480 Nov 10  2007 TODO

drwxrwxrwx 2 hekan hekan   4096 Jan  8  2008 tools

[root@server1 rlwrap-0.30]# ./configure

......

checking for tgetent... no

checking for tgetent in -lcurses... no

checking for tgetent in -lncurses... no

checking for tgetent in -ltermcap... no

configure: WARNING: No termcap nor curses library found

checking for readline in -lreadline... no

configure: error:

You need the GNU readline library(ftp://ftp.gnu.org/gnu/readline/ ) to build

this program!

[root@server1 rlwrap-0.30]#

报错!提示需要安装readline-devel-5.1-1.1.i386.rpm包。
② 安装readline-devel-5.1-1.1.i386.rpm 包:

[root@server1 ~]# rpm -ivh readline-devel-5.1-1.1.i386.rpm

warning: readline-devel-5.1-1.1.i386.rpm: Header V3 DSA signature: NOKEY, key ID 37017186

error: Failed dependencies:

libtermcap-devel is needed by readline-devel-5.1-1.1.i386

[root@server1 ~]#

又报错提示缺失libtermcap-devel包!

继续安装:

[root@server1 ~]# rpm -ivh libtermcap-devel-2.0.8-46.1.i386.rpm

warning: libtermcap-devel-2.0.8-46.1.i386.rpm: Header V3 DSA signature: NOKEY, key ID 37017186

Preparing...                ########################################### [100%]

1:libtermcap-devel       ########################################### [100%]

[root@server1 ~]# rpm -ivh readline-devel-5.1-1.1.i386.rpm

warning: readline-devel-5.1-1.1.i386.rpm: Header V3 DSA signature: NOKEY, key ID 37017186

Preparing...                ########################################### [100%]

1:readline-devel         ########################################### [100%]

[root@server1 ~]# cd rlwrap-0.30

[root@server1 rlwrap-0.30]# ./configure

checking build system type... i686-pc-linux-gnu

checking host system type... i686-pc-linux-gnu

checking for a BSD-compatible install... /usr/bin/install -c

checking whether build environment is sane... yes

checking for a thread-safe mkdir -p... /bin/mkdir -p

checking for gawk... gawk

checking whether make sets $(MAKE)... yes

............

............

configure: creating ./config.status

config.status: creating Makefile

config.status: creating doc/Makefile

config.status: creating src/Makefile

config.status: creating doc/rlwrap.man

config.status: creating distribution/rlwrap.spec

config.status: creating config.h

config.status: executing depfiles commands

Now do:

make (or gmake)  to build rlwrap

make check       for instructions how to test it

make install     to install it

[root@server1 rlwrap-0.30]#

③ 根据执行执行,make,make install:

[root@server1 rlwrap-0.30]# make

make  all-recursive

......

[root@server1 rlwrap-0.30]# make install

Making install in doc

......

make[2]: Leaving directory `/root/rlwrap-0.30'

make[1]: Leaving directory `/root/rlwrap-0.30'

[root@server1 rlwrap-0.30]#

④ 修改oracle 用户的配置文件,

[oracle@server1 ~]$ pwd

/home/oracle

[oracle@server1 ~]$ vi .bash_profile

在其中,添加类似配置信息:

alias sqlplus='rlwrap sqlplus /nolog'

⑤至此,oracle在SHELL提示符下,只需要执行sqlplus就可以进入SQL*PLUS的命令行提示符,并且可以利用键盘的上下箭头调出之前执行过的历史命令。

这是在RHEL 5.4版本的操作系统上安装时的简单记录,而之前没遇到过类似的问题,故简单记录之。

即先安装libtermcap-devel-2.0.8-46.1.i386.rpm

然后安装 readline-devel-5.1-1.1.i386.rpm

最后安装rlwrap工具包。