2014年5月27日 星期二

OpenStack High Availability I-MySQL, RabbitMQ HA

OpenStack HA的方式主要分為兩種,一是Master-Master另外一種是Master-Slave的架構。顧名思義,Master-Master就是同時有兩組Control/Network Node提供服務。在這裡要介紹的是Master-Slave的佈署方式。

對於MySQL的HA, 我們必須佈署DRBD, corosync和pacemaker來達成。DRBD就像是軟體的RAID I,同步兩台電腦間的partition。而corosync則是用來在cluster間傳遞訊息和heartbeat,最後pacemaker則是負責管理應用程式的切換(例如MySQL要開在哪一台,另外一台standby)。

DRBD安裝

安裝DRBD只需要安裝drbd8-utils

# apt-get install drbd8-utils

DRBD設定

假設我們要新增一個resource給mysql/rabbitmq使用,在master 分別新增檔案 /etc/drbd.d/mysql.res 和
/etc/drbd.d/rabbitmq.res
resource mysql {
on master {
 device /dev/drbd0;
 disk /dev/mapper/master-mysql; #master要拿來同步的partition
 address 10.109.36.58:7788;#master的IP
 meta-disk internal;
}
on slave {
 device /dev/drbd0;
 disk /dev/mapper/master-mysql; #slave要拿來同步的partition
 address 10.109.36.59:7788; #slave的IP
 meta-disk internal;
}
}

resource rabbitmq{
on master {
 device /dev/drbd1;
 disk /dev/mapper/master-rabbitmq; #master要拿來同步的partition
 address 10.109.36.58:7789;#master的IP
 meta-disk internal;
}
on slave {
 device /dev/drbd1;
 disk /dev/mapper/master-rabbitmq; #slave要拿來同步的partition
 address 10.109.36.59:7789; #slave的IP
 meta-disk internal;
}
}

#將設定檔從master copy到slave


# scp /etc/drbd.d/mysql.res slave:/etc/drbd.d/
# scp /etc/drbd.d/rabbitmq.res slave:/etc/drbd.d/

#在master及slave 啟動DRBD


# /etc/init.d/drbd start

#初始化metadata storage

master:

# drbdadm create-md mysql
# drbdadm create-md rabbitmq

#將master設定為primary

master:

# drbdadm -- --overwrite-data-of-peer primary mysql
# drbdadm -- --overwrite-data-of-peer primary rabbitmq

#確認安裝狀態

執行# service drbd status應該可以看到
master:
drbd driver loaded OK; device status:
version: 8.3.13 (api:88/proto:86-96)
srcversion: 697DE8B1973B1D8914F04DB
m:res   cs         ro                 ds                 p  mounted  fstype
0:mysql  Connected  Primary/Secondary  UpToDate/UpToDate  C
1:rabbitmq  Connected  Primary/Secondary  UpToDate/UpToDate  C
slave:

drbd driver loaded OK; device status:
version: 8.3.11 (api:88/proto:86-96)
srcversion: 2931F0123213F7DB1364EA7
m:res   cs         ro                 ds                 p  mounted  fstype
0:mysql  Connected  Secondary/Primary  UpToDate/UpToDate  C
1:rabbitmq  Connected  Primary/Secondary  UpToDate/UpToDate  C


DRBD問題排除

  1. 執行service drbd status出現 '0:mysql  StandAlone  Secondary/Unknown  UpToDate/DUnknown  r-----’
如果執行dmesg 有看到
kernel: block drbd0: Split-Brain detected, dropping connection!

代表遇到split-brain的問題,解決的方法如下:
    • Disconnect resource(on secondary node)
# drbdadm disconnect mysql
    • 將node轉為secondary(on secondary node)
# drbdadm secondary mysql
    • 強至取消所有修改(on secondary node)
# drbdadm -- --discard-my-data connect mysql
    • 重新連線(on primary node)
# drbdadm connect mysql


Reference


Corosync Installation

Corosync安裝

Corosync安裝只需要安裝corosync套件

both master & master-1
# apt-get install corosync

Corosync設定

#編輯/etc/corosync/corosync.conf

both master & master-1
# Please read the openais.conf.5 manual page

totem {
version: 2

# How long before declaring a token lost (ms)
token: 3000

# How many token retransmits before forming a new configuration
token_retransmits_before_loss_const: 10

# How long to wait for join messages in the membership protocol (ms)
join: 60

# How long to wait for consensus to be achieved before starting a new round of membership configuration (ms)
consensus: 3600

# Turn off the virtual synchrony filter
vsftype: none

# Number of messages that may be sent by one processor on receipt of the token
max_messages: 20

# Limit generated nodeids to 31-bits (positive signed integers)
clear_node_high_bit: yes

# Disable encryption
secauth: off

# How many threads to use for encryption/decryption
threads: 0

# Optionally assign a fixed node id (integer)
# nodeid: 1234

# This specifies the mode of redundant ring, which may be none, active, or passive.
rrp_mode: none

interface {
# The following values need to be set based on your environment
ringnumber: 0
bindnetaddr: 10.109.36.0
mcastaddr: 226.94.1.1
mcastport: 5405
}
}

amf {
mode: disabled
}

service {
# Load the Pacemaker Cluster Resource Manager
ver:       0
name:      pacemaker
}

aisexec {
       user:   root
       group:  root
}

logging {
       fileline: off
       to_stderr: yes
       to_logfile: no
       to_syslog: yes
syslog_facility: daemon
       debug: off
       timestamp: on
       logger_subsys {
               subsys: AMF
               debug: off
               tags: enter|leave|trace1|trace2|trace3|trace4|trace6
       }
}

#將設定檔複製到Slave

# scp -r /etc/corosync master-1:/etc/

  • 設定開機自動啟動

    1. 編輯 /etc/default/corosync 將內容的'no’ 改成’yes
  • 確認安裝狀態

在Master執行 # corosync-objctl runtime.totem.pg.mrp.srp.members 應該可以看到如下畫面

runtime.totem.pg.mrp.srp.1763994890.ip=r(0) ip(10.109.36.105)
runtime.totem.pg.mrp.srp.1763994890.join_count=1
runtime.totem.pg.mrp.srp.1763994890.status=joined
runtime.totem.pg.mrp.srp.1797549322.ip=r(0) ip(10.109.36.107)
runtime.totem.pg.mrp.srp.1797549322.join_count=1
runtime.totem.pg.mrp.srp.1797549322.status=joined

Reference

Pacemaker Installation

驗證系統

在重新啟動Master及Slave之後在兩台執行 # crm_mon應該可以看到

Master
============
Last updated: Wed Jun 19 16:52:11 2013
Last change: Wed Jun 19 13:14:15 2013 via crmd on master-1
Stack: openais
Current DC: master - partition with quorum
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
2 Nodes configured, 2 expected votes
0 Resources configured.
============

Online: [ master master-1 ]

Slave
============
Last updated: Wed Jun 19 16:52:11 2013
Last change: Wed Jun 19 13:14:15 2013 via crmd on master-1
Stack: openais
Current DC: master - partition with quorum
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
2 Nodes configured, 2 expected votes
0 Resources configured.
============

Online: [ master master-1 ]

基本設定

#在master執行 # crm configure 接著輸入

crm(live)configure# property no-quorum-policy="ignore" \  
                                          pe-warn-series-max="1000" \        
                                          pe-input-series-max="1000" \
                                          pe-error-series-max="1000" \
                                                 cluster-recheck-interval="5min" \

crm(live)configure# commit

HA for MySQL

#編輯pacemaker
執行 # crm configure

crm(live)configure# edit

接著輸入

primitive p_drbd_mysql ocf:linbit:drbd \
       params drbd_resource="mysql" \
       op start interval="0" timeout="90s" \
       op stop interval="0" timeout="180s" \
       op promote interval="0" timeout="180s" \
       op demote interval="0" timeout="180s" \
       op monitor interval="30s" role="Slave" \
       op monitor interval="29s" role="Master"
primitive p_drbd_rabbitmq ocf:linbit:drbd \
       params drbd_resource="rabbitmq" \
       op start interval="0" timeout="90s" \
       op stop interval="0" timeout="180s" \
       op promote interval="0" timeout="180s" \
       op demote interval="0" timeout="180s" \
       op monitor interval="30s" role="Slave" \
       op monitor interval="29s" role="Master"
primitive p_fs_mysql ocf:heartbeat:Filesystem \
       params device="/dev/drbd/by-res/mysql" directory="/var/lib/mysql" fstype="xfs" options="relatime" \
       op start interval="0" timeout="60s" \
       op stop interval="0" timeout="180s" \
       op monitor interval="60s" timeout="60s"
primitive p_fs_rabbitmq ocf:heartbeat:Filesystem \
       params device="/dev/drbd/by-res/rabbitmq" \  
directory="/var/lib/rabbitmq" fstype="xfs"
primitive p_ip_mysql ocf:heartbeat:IPaddr2 \
       params ip="10.109.36.198" cidr_netmask="24" \
       op monitor interval="30s"
primitive p_ip_rabbitmq ocf:heartbeat:IPaddr2 \
       params ip="10.109.36.198" cidr_netmask="24" \
       op monitor interval="10s"
primitive p_rabbitmq ocf:rabbitmq:rabbitmq-server \
       params nodename="rabbit@localhost" mnesia_base="/var/lib/rabbitmq" \
       op monitor interval="20s" timeout="10s"
primitive p_mysql ocf:heartbeat:mysql \
       params additional_parameters="--bind-address=0.0.0.0 config=/etc/mysql/my.cnf" pid="/var/run/mysqld/mysqld.pid" socket="/var/run/mysqld/mysqld.sock" log="/var/log/mysql/mysqld.log" \
       op monitor interval="20s" timeout="10s" \
       op start interval="0" timeout="120s" \
       op stop interval="0" timeout="120s"
roup g_rabbitmq p_ip_rabbitmq p_fs_rabbitmq p_rabbitmq \
       meta target-role="Started"
group g_mysql p_ip_mysql p_fs_mysql p_mysql
ms ms_drbd_mysql p_drbd_mysql \
       meta notify="true" clone-max="2"
ms ms_drbd_rabbitmq p_drbd_rabbitmq \
       meta notify="true" master-max="1" clone-max="2" target-role="Started"
colocation c_mysql_on_drbd inf: g_mysql ms_drbd_mysql:Master
colocation c_rabbitmq_on_drbd inf: g_rabbitmq ms_drbd_rabbitmq:Master
order o_drbd_before_rabbitmq inf: ms_drbd_rabbitmq:promote g_rabbitmq:start
order o_drbd_before_mysql inf: ms_drbd_mysql:promote g_mysql:start
order order1 inf: g_rabbitmq:start g_mysql
property $id="cib-bootstrap-options" \
       dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
       cluster-infrastructure="openais" \
       expected-quorum-votes="3" \
       no-quorum-policy="ignore" \
       pe-warn-series-max="1000" \
       pe-input-series-max="1000" \
       pe-error-series-max="1000" \
       cluster-recheck-interval="5min" \
       stonith-enabled="false"

crm(live)configure# commit

接著再執行 # crm_mon 應該就可以看到MySQL和RabbitMQ開起來了。

沒有留言: