3月 262014
 

heartbeat – Messaging and membership subsystem for High-Availability Linux
heartbeat-devel – Heartbeat development package
heartbeat-libs – Heartbeat libraries

修改主机名和hosts指向
[root@localhost ~]# vi /etc/sysconfig/network
NETWORKING=yes
#HOSTNAME=localhost.localdomain
HOSTNAME=ha01

[root@localhost ~]# vi /etc/hosts
192.168.2.217 ha01
192.168.2.218 ha02

[root@localhost ~]# init 6

[root@ha01 ~]# yum install httpd mysql-server
[root@ha01 ~]# yum install epel-release-6-8.noarch.rpm
[root@ha01 ~]# yum install heartbeat

查看说明文档并复制示例配置文件
[root@ha01 ~]# ls /usr/share/doc/heartbeat-3.0.4/
apphbd.cf AUTHORS COPYING ha.cf README
authkeys ChangeLog COPYING.LGPL haresources
[root@ha01 ~]#

[root@ha01 ~]# cd /usr/share/doc/heartbeat-3.0.4/
[root@ha01 heartbeat-3.0.4]# cp authkeys /etc/ha.d/
[root@ha01 heartbeat-3.0.4]# cp ha.cf /etc/ha.d/
[root@ha01 heartbeat-3.0.4]# cp haresources /etc/ha.d/
[root@ha02 ~]# vi /etc/ha.d/ha.cf
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport 694
bcast eth0
auto_failback on
watchdog /dev/watchdog
node ha01
node ha02
ping 192.168.1.254
respawn hacluster /usr/lib64/heartbeat/ipfail
apiauth ipfail gid=haclient uid=hacluster

[root@ha01 ~]# vi /etc/ha.d/haresources
ha01 192.168.2.100 mysqld httpd

[root@ha02 ~]# vi /etc/ha.d/authkeys
#auth 1
#1 crc
#2 sha1 HI!
#3 md5 Hello!
auth 1
1 crc
[root@ha02 ~]# chmod 600 /etc/ha.d/authkeys
错误分析

heartbeat: udpport setting must precede media statementsheartbeat[1495]: 2014/03/28_17:24:50 ERROR: Bad permissions on keyfile
[/etc/ha.d//authkeys], 600 recommended.

[root@ha01 ~]# echo “hello ha01 is here” >/var/www/html/index.html
[root@ha02 ~]# echo “hello ha02 is here” >/var/www/html/index.html

启动heartbeat服务
[root@ha01 log]# service heartbeat start
Starting High-Availability services: INFO: Resource is stopped
Done.

[root@ha01 log]#

确认虚拟IP的生成和服务的启动
C:\Users\Harvey Mei>ping 192.168.2.100 -t

正在 Ping 192.168.2.100 具有 32 字节的数据:
来自 192.168.3.10 的回复: 无法访问目标主机。
来自 192.168.3.10 的回复: 无法访问目标主机。
来自 192.168.3.10 的回复: 无法访问目标主机。
来自 192.168.3.10 的回复: 无法访问目标主机。
来自 192.168.3.10 的回复: 无法访问目标主机。

来自 192.168.2.100 的回复: 字节=32 时间=2478ms TTL=64
来自 192.168.2.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.2.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.2.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.2.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.2.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.2.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.2.100 的回复: 字节=32 时间=1ms TTL=64
来自 192.168.2.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.2.100 的回复: 字节=32 时间<1ms TTL=64

192.168.2.100 的 Ping 统计信息:
数据包: 已发送 = 46,已接收 = 46,丢失 = 0 (0% 丢失),
往返行程的估计时间(以毫秒为单位):
最短 = 0ms,最长 = 2478ms,平均 = 247ms
Control-C
^C
C:\Users\Harvey Mei>

通过日志确认heartbeat工作状态
[root@ha01 ~]# less /var/log/ha-debug

获取虚拟IP和启动后台服务
Mar 28 16:26:41 ha01 heartbeat: [2605]: debug: notify_world: setting SIGCHLD Han
dler to SIG_DFL
harc(default)[2605]: 2014/03/28_16:26:41 info: Running /etc/ha.d//rc.d/ip-req
uest-resp ip-request-resp
ip-request-resp(default)[2605]: 2014/03/28_16:26:41 received ip-request-resp 192.168.2.100 OK yes
ResourceManager(default)[2628]: 2014/03/28_16:26:41 info: Acquiring resource group: ha01 192.168.2.100 mysqld httpd
Mar 28 16:26:41 ha01 ipfail: [2449]: debug: Setting message filter mode
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.2.100)[2656]: 2014/03/28_16:26:41 INFO: Resource is stopped
ResourceManager(default)[2628]: 2014/03/28_16:26:41 info: Running /etc/ha.d/resource.d/IPaddr 192.168.2.100 start
IPaddr(IPaddr_192.168.2.100)[2754]: 2014/03/28_16:26:42 INFO: Adding inet address 192.168.2.100/22 with broadcast address
192.168.3.255 to device eth0
IPaddr(IPaddr_192.168.2.100)[2754]: 2014/03/28_16:26:42 INFO: Bringing device eth0 up
IPaddr(IPaddr_192.168.2.100)[2754]: 2014/03/28_16:26:42 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
/var/run/resource-agents/send_arp-192.168.2.100 eth0 192.168.2.100 auto not_used not_used
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.2.100)[2740]: 2014/03/28_16:26:42 INFO: Success
INFO: Success
ResourceManager(default)[2628]: 2014/03/28_16:26:42 info: Running /etc/init.d/mysqld start
Mar 28 16:26:42 ha01 ipfail: [2449]: debug: Starting node walk
Mar 28 16:26:42 ha01 ipfail: [2449]: debug: Cluster node: 192.168.1.254: status: ping
Starting mysqld: [ OK ]
ResourceManager(default)[2628]: 2014/03/28_16:26:43 info: Running /etc/init.d/httpd start
Starting httpd: httpd: Could not reliably determine the server’s fully qualified domain name, using 192.168.2.217 for
ServerName
[ OK ]

在启动ha01后检测h01与ha02状态
Mar 28 16:26:43 ha01 ipfail: [2449]: debug: Cluster node: ha02: status: dead
Mar 28 16:26:43 ha01 ipfail: [2449]: debug: [They are ha02]
Mar 28 16:26:44 ha01 ipfail: [2449]: debug: Cluster node: ha01: status: active

启动Ha02后通过日志查看对ha01的检测状态
Mar 28 17:25:17 ha02 ipfail: [1635]: debug: [We are ha02]
Mar 28 17:25:18 ha02 heartbeat: [1624]: info: Status update for node ha01: status active
Mar 28 17:25:18 ha02 heartbeat: [1624]: info: ha01 wants to go standby [foreign]
Mar 28 17:25:20 ha02 ipfail: [1635]: debug: [They are ha01]
Mar 28 17:25:20 ha02 ipfail: [1635]: debug: Setting message signal
Mar 28 17:25:21 ha02 ipfail: [1635]: debug: Waiting for messages…
Mar 28 17:25:22 ha02 ipfail: [1635]: debug: Other side is now stable.
Mar 28 17:25:22 ha02 ipfail: [1635]: info: Status update: Node ha01 now has status active
切换测试
持续ping 192.168.2.100并切断ha01网络连接

[root@ha01 ~]# ifdown eth0

通过日志确认ha02发现ha01不可达并接管服务
Mar 28 17:34:16 ha02 heartbeat: [1624]: WARN: node ha01: is dead
Mar 28 17:34:16 ha02 heartbeat: [1624]: WARN: No STONITH device configured.
Mar 28 17:34:16 ha02 heartbeat: [1624]: WARN: Shared disks are not protected.
Mar 28 17:34:16 ha02 heartbeat: [1624]: info: Resources being acquired from ha01.
Mar 28 17:34:16 ha02 heartbeat: [1624]: info: Link ha01:eth0 dead.
Mar 28 17:34:16 ha02 ipfail: [1635]: info: Status update: Node ha01 now has status dead
harc(default)[2130]: 2014/03/28_17:34:16 info: Running /etc/ha.d//rc.d/status status
mach_down(default)[2167]: 2014/03/28_17:34:16 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources
acquired
mach_down(default)[2167]: 2014/03/28_17:34:16 info: mach_down takeover complete for node ha01.
Mar 28 17:34:16 ha02 heartbeat: [1624]: info: mach_down takeover complete.
Mar 28 17:34:16 ha02 ipfail: [1635]: info: NS: We are still alive!
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.2.100)[2198]: 2014/03/28_17:34:16 INFO: Running OK
Mar 28 17:34:16 ha02 heartbeat: [2131]: info: Local Resource acquisition completed.
Mar 28 17:34:16 ha02 ipfail: [1635]: info: Link Status update: Link ha01/eth0 now has status dead
Mar 28 17:34:18 ha02 ipfail: [1635]: info: Asking other side for ping node count.
Mar 28 17:34:18 ha02 ipfail: [1635]: info: Checking remote count of ping nodes.

ha01的日志记录
Mar 28 17:34:23 ha01 heartbeat: [2446]: ERROR: glib: Error sending packet: Network is unreachable
Mar 28 17:34:23 ha01 heartbeat: [2446]: info: glib: euid=0 egid=0
Mar 28 17:34:23 ha01 heartbeat: [2446]: ERROR: write_child: write failure on ping 192.168.1.254.: Network is unreachable
Mar 28 17:34:25 ha01 heartbeat: [2446]: ERROR: glib: Error sending packet: Network is unreachable

 

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据