2014年11月4日 星期二

Docker on Ubuntu 14.04

Install Docker
# echo deb https://get.docker.io/ubuntu docker main > /etc/apt/sources.list.d/docker.list

# apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 36A1D7869245C8950F966E92D8576A8BA88D21E9
# apt-get update
# apt-get install -y lxc-docker


Pull Ubuntu image
[with proxy]
# service docker stop
# HTTP_PROXY=[proxy] docker -d&
# docker pull ubuntu
[without proxy]
# docker pull ubuntu

Run a Docker Container
# docker run -i -t ubuntu /bin/bash

Frequently used commands
1.列出所有image
# docker images

2.列出所有運作中的container
# docker ps

3.列出所有container 包含暫停中或已結束的container
# docker ps -a

4.刪除container
# docker rm [container id]

5.刪除image
# docker rmi [image id]
note:如果image正在被某個container使用的話,就必須先刪除使用的container才能刪除image

6.build image
# docker build .
note:此指令必須跟Dockerfile同一層

6-1. build image時加入tag
# docker build -t [tag] .

6-2. build image不使用cache
# docker build --no-cache .
note:假設上一次build的時候發生錯誤(either Dockerfile寫錯 或是環境出錯等等),如果仍使用cache則會停留在出錯的狀態。另一個避免錯誤的方式是如6-3 不保留過程產生的container

6-3. build image但不保留過程中產生的container
# docker build --rm .

7. 啟動container
# docker run [image id or tag]

7-1. 啟動container with (I)nteractive and (T)ty
# docker run -t -i [image id or tag] /bin/bash
#note 啟動container並進入bash shell

7-2. 啟動container 並在背景執行
# docker run -i -t -d [image id or tag]
#note 要返回container 則執行 docker attach [container id] 回去

7-3. 啟動container 並掛載local 資料夾
# docker run -v /local/dir:/docker/dir [image id or tag]

7-4. 啟動container 並forward port
# docker run -p host_port:container_port -p host_port:container_port/udp [image id or tag]
#note -p 後面接一組 host_port:container_port,如果是udp則在後面加上/udp

7-5. 啟動container並與host時間同步
# docker run -v /etc/localtime:/etc/localtime:ro [image id or tag]

7-6. 啟動container並傳入環境變數
# docker run --env var_name=var_value [image id or tag]

7-7. 啟動container並給予tag
# docker run --name [tag] [image id or tag]

8. 跳出container
當進入container後,想要回到host的shell但不想結束container
ctrl+ p + q

9. 跳回container
# docker attach [container id]

10. 啟動container
# docker start [container id]
#note 當container是在exit狀態時,在attach必須先start container

11. Commit 變更
# docker commit -m "commit msg" -a "author" [container id] [repository]
#note -a和 repository非必要

12. 列出image/container 資訊
# docker inspect [image or container id]
#note 可以利用執行完docker inspect後檢查$?來判斷image or container是否已經存在

13. 匯出container
# docker export [container id] >image_name
#note 如果要順便壓縮的話docker export [container id] |gzip -c >image_name

14. 匯入container
# cat [image path] |docker import - [tag]
#note 匯入壓縮的image  # gzip -dc [image path] | docker import - tag

#15 限制container的memory 使用量
# docker run -m [memory size ex. -m 2g]
#note:如果有出現"WARNING: Your kernel does not support swap limit capabilities. Limitation discarded.
則必須修改/etc/default/grub

GRUB_CMDLINE_LINUX=" 
更改成 
GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1"
執行sudo update-grub 並重新開機

2014年10月30日 星期四

Illegal instruction 追蹤

某天執行程式的時候遇到Illegal instruction,這時候如果執行
# dmesg
應該會看到類似如下的訊息
XXXX[4444] trap invalid opcode rip:42a016 rsp:415104b8 error:0


這個問題通常是因為程式使用了CPU不支援的instruction set(也有例外),那我們要怎麼知道是使用到哪個instruction(opcode)呢?

1. 開啟core dump
  Ubuntu預設是不會產生core dump file的,要讓系統產生必須執行下面的指令
# ulimit -c unlimited

  在執行後,當遇到Illegal instruction的錯誤時,在執行的資料夾應該會出現core.XXX的檔案及為core dump file了

2. 使用gdb追蹤
  使用gdb 可以讓我們知道在哪邊發生了錯誤
  # gdb [程式]  [core dump file]

  以我遇到的例子如下:
  Program terminated with signal 4, Illegal instruction.
  #0  0x000000000042a016 in FUNCTION_NAME ()
  我們就可以發現錯誤是在 0x000000000042a016的地方

3. 檢視assembly code
   同樣在gdb裡面執行 
  (gdb) disassemble FUNCTION_NAME

0x000000000042a00d <blake2b_init_param_avx+45>: cmp    %rax,%rdx
0x000000000042a010 <blake2b_init_param_avx+48>: jbe    0x42a0c0 <blake2b_init_param_avx+224>
0x000000000042a016 <blake2b_init_param_avx+54>: vmovdqu (%rsi),%xmm1

就可以發現到底是什麼instruction造成錯誤了。


2014年5月27日 星期二

OpenStack High Availability I-MySQL, RabbitMQ HA

OpenStack HA的方式主要分為兩種,一是Master-Master另外一種是Master-Slave的架構。顧名思義,Master-Master就是同時有兩組Control/Network Node提供服務。在這裡要介紹的是Master-Slave的佈署方式。

對於MySQL的HA, 我們必須佈署DRBD, corosync和pacemaker來達成。DRBD就像是軟體的RAID I,同步兩台電腦間的partition。而corosync則是用來在cluster間傳遞訊息和heartbeat,最後pacemaker則是負責管理應用程式的切換(例如MySQL要開在哪一台,另外一台standby)。

DRBD安裝

安裝DRBD只需要安裝drbd8-utils

# apt-get install drbd8-utils

DRBD設定

假設我們要新增一個resource給mysql/rabbitmq使用,在master 分別新增檔案 /etc/drbd.d/mysql.res 和
/etc/drbd.d/rabbitmq.res
resource mysql {
on master {
 device /dev/drbd0;
 disk /dev/mapper/master-mysql; #master要拿來同步的partition
 address 10.109.36.58:7788;#master的IP
 meta-disk internal;
}
on slave {
 device /dev/drbd0;
 disk /dev/mapper/master-mysql; #slave要拿來同步的partition
 address 10.109.36.59:7788; #slave的IP
 meta-disk internal;
}
}

resource rabbitmq{
on master {
 device /dev/drbd1;
 disk /dev/mapper/master-rabbitmq; #master要拿來同步的partition
 address 10.109.36.58:7789;#master的IP
 meta-disk internal;
}
on slave {
 device /dev/drbd1;
 disk /dev/mapper/master-rabbitmq; #slave要拿來同步的partition
 address 10.109.36.59:7789; #slave的IP
 meta-disk internal;
}
}

#將設定檔從master copy到slave


# scp /etc/drbd.d/mysql.res slave:/etc/drbd.d/
# scp /etc/drbd.d/rabbitmq.res slave:/etc/drbd.d/

#在master及slave 啟動DRBD


# /etc/init.d/drbd start

#初始化metadata storage

master:

# drbdadm create-md mysql
# drbdadm create-md rabbitmq

#將master設定為primary

master:

# drbdadm -- --overwrite-data-of-peer primary mysql
# drbdadm -- --overwrite-data-of-peer primary rabbitmq

#確認安裝狀態

執行# service drbd status應該可以看到
master:
drbd driver loaded OK; device status:
version: 8.3.13 (api:88/proto:86-96)
srcversion: 697DE8B1973B1D8914F04DB
m:res   cs         ro                 ds                 p  mounted  fstype
0:mysql  Connected  Primary/Secondary  UpToDate/UpToDate  C
1:rabbitmq  Connected  Primary/Secondary  UpToDate/UpToDate  C
slave:

drbd driver loaded OK; device status:
version: 8.3.11 (api:88/proto:86-96)
srcversion: 2931F0123213F7DB1364EA7
m:res   cs         ro                 ds                 p  mounted  fstype
0:mysql  Connected  Secondary/Primary  UpToDate/UpToDate  C
1:rabbitmq  Connected  Primary/Secondary  UpToDate/UpToDate  C


DRBD問題排除

  1. 執行service drbd status出現 '0:mysql  StandAlone  Secondary/Unknown  UpToDate/DUnknown  r-----’
如果執行dmesg 有看到
kernel: block drbd0: Split-Brain detected, dropping connection!

代表遇到split-brain的問題,解決的方法如下:
    • Disconnect resource(on secondary node)
# drbdadm disconnect mysql
    • 將node轉為secondary(on secondary node)
# drbdadm secondary mysql
    • 強至取消所有修改(on secondary node)
# drbdadm -- --discard-my-data connect mysql
    • 重新連線(on primary node)
# drbdadm connect mysql


Reference


Corosync Installation

Corosync安裝

Corosync安裝只需要安裝corosync套件

both master & master-1
# apt-get install corosync

Corosync設定

#編輯/etc/corosync/corosync.conf

both master & master-1
# Please read the openais.conf.5 manual page

totem {
version: 2

# How long before declaring a token lost (ms)
token: 3000

# How many token retransmits before forming a new configuration
token_retransmits_before_loss_const: 10

# How long to wait for join messages in the membership protocol (ms)
join: 60

# How long to wait for consensus to be achieved before starting a new round of membership configuration (ms)
consensus: 3600

# Turn off the virtual synchrony filter
vsftype: none

# Number of messages that may be sent by one processor on receipt of the token
max_messages: 20

# Limit generated nodeids to 31-bits (positive signed integers)
clear_node_high_bit: yes

# Disable encryption
secauth: off

# How many threads to use for encryption/decryption
threads: 0

# Optionally assign a fixed node id (integer)
# nodeid: 1234

# This specifies the mode of redundant ring, which may be none, active, or passive.
rrp_mode: none

interface {
# The following values need to be set based on your environment
ringnumber: 0
bindnetaddr: 10.109.36.0
mcastaddr: 226.94.1.1
mcastport: 5405
}
}

amf {
mode: disabled
}

service {
# Load the Pacemaker Cluster Resource Manager
ver:       0
name:      pacemaker
}

aisexec {
       user:   root
       group:  root
}

logging {
       fileline: off
       to_stderr: yes
       to_logfile: no
       to_syslog: yes
syslog_facility: daemon
       debug: off
       timestamp: on
       logger_subsys {
               subsys: AMF
               debug: off
               tags: enter|leave|trace1|trace2|trace3|trace4|trace6
       }
}

#將設定檔複製到Slave

# scp -r /etc/corosync master-1:/etc/

  • 設定開機自動啟動

    1. 編輯 /etc/default/corosync 將內容的'no’ 改成’yes
  • 確認安裝狀態

在Master執行 # corosync-objctl runtime.totem.pg.mrp.srp.members 應該可以看到如下畫面

runtime.totem.pg.mrp.srp.1763994890.ip=r(0) ip(10.109.36.105)
runtime.totem.pg.mrp.srp.1763994890.join_count=1
runtime.totem.pg.mrp.srp.1763994890.status=joined
runtime.totem.pg.mrp.srp.1797549322.ip=r(0) ip(10.109.36.107)
runtime.totem.pg.mrp.srp.1797549322.join_count=1
runtime.totem.pg.mrp.srp.1797549322.status=joined

Reference

Pacemaker Installation

驗證系統

在重新啟動Master及Slave之後在兩台執行 # crm_mon應該可以看到

Master
============
Last updated: Wed Jun 19 16:52:11 2013
Last change: Wed Jun 19 13:14:15 2013 via crmd on master-1
Stack: openais
Current DC: master - partition with quorum
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
2 Nodes configured, 2 expected votes
0 Resources configured.
============

Online: [ master master-1 ]

Slave
============
Last updated: Wed Jun 19 16:52:11 2013
Last change: Wed Jun 19 13:14:15 2013 via crmd on master-1
Stack: openais
Current DC: master - partition with quorum
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
2 Nodes configured, 2 expected votes
0 Resources configured.
============

Online: [ master master-1 ]

基本設定

#在master執行 # crm configure 接著輸入

crm(live)configure# property no-quorum-policy="ignore" \  
                                          pe-warn-series-max="1000" \        
                                          pe-input-series-max="1000" \
                                          pe-error-series-max="1000" \
                                                 cluster-recheck-interval="5min" \

crm(live)configure# commit

HA for MySQL

#編輯pacemaker
執行 # crm configure

crm(live)configure# edit

接著輸入

primitive p_drbd_mysql ocf:linbit:drbd \
       params drbd_resource="mysql" \
       op start interval="0" timeout="90s" \
       op stop interval="0" timeout="180s" \
       op promote interval="0" timeout="180s" \
       op demote interval="0" timeout="180s" \
       op monitor interval="30s" role="Slave" \
       op monitor interval="29s" role="Master"
primitive p_drbd_rabbitmq ocf:linbit:drbd \
       params drbd_resource="rabbitmq" \
       op start interval="0" timeout="90s" \
       op stop interval="0" timeout="180s" \
       op promote interval="0" timeout="180s" \
       op demote interval="0" timeout="180s" \
       op monitor interval="30s" role="Slave" \
       op monitor interval="29s" role="Master"
primitive p_fs_mysql ocf:heartbeat:Filesystem \
       params device="/dev/drbd/by-res/mysql" directory="/var/lib/mysql" fstype="xfs" options="relatime" \
       op start interval="0" timeout="60s" \
       op stop interval="0" timeout="180s" \
       op monitor interval="60s" timeout="60s"
primitive p_fs_rabbitmq ocf:heartbeat:Filesystem \
       params device="/dev/drbd/by-res/rabbitmq" \  
directory="/var/lib/rabbitmq" fstype="xfs"
primitive p_ip_mysql ocf:heartbeat:IPaddr2 \
       params ip="10.109.36.198" cidr_netmask="24" \
       op monitor interval="30s"
primitive p_ip_rabbitmq ocf:heartbeat:IPaddr2 \
       params ip="10.109.36.198" cidr_netmask="24" \
       op monitor interval="10s"
primitive p_rabbitmq ocf:rabbitmq:rabbitmq-server \
       params nodename="rabbit@localhost" mnesia_base="/var/lib/rabbitmq" \
       op monitor interval="20s" timeout="10s"
primitive p_mysql ocf:heartbeat:mysql \
       params additional_parameters="--bind-address=0.0.0.0 config=/etc/mysql/my.cnf" pid="/var/run/mysqld/mysqld.pid" socket="/var/run/mysqld/mysqld.sock" log="/var/log/mysql/mysqld.log" \
       op monitor interval="20s" timeout="10s" \
       op start interval="0" timeout="120s" \
       op stop interval="0" timeout="120s"
roup g_rabbitmq p_ip_rabbitmq p_fs_rabbitmq p_rabbitmq \
       meta target-role="Started"
group g_mysql p_ip_mysql p_fs_mysql p_mysql
ms ms_drbd_mysql p_drbd_mysql \
       meta notify="true" clone-max="2"
ms ms_drbd_rabbitmq p_drbd_rabbitmq \
       meta notify="true" master-max="1" clone-max="2" target-role="Started"
colocation c_mysql_on_drbd inf: g_mysql ms_drbd_mysql:Master
colocation c_rabbitmq_on_drbd inf: g_rabbitmq ms_drbd_rabbitmq:Master
order o_drbd_before_rabbitmq inf: ms_drbd_rabbitmq:promote g_rabbitmq:start
order o_drbd_before_mysql inf: ms_drbd_mysql:promote g_mysql:start
order order1 inf: g_rabbitmq:start g_mysql
property $id="cib-bootstrap-options" \
       dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
       cluster-infrastructure="openais" \
       expected-quorum-votes="3" \
       no-quorum-policy="ignore" \
       pe-warn-series-max="1000" \
       pe-input-series-max="1000" \
       pe-error-series-max="1000" \
       cluster-recheck-interval="5min" \
       stonith-enabled="false"

crm(live)configure# commit

接著再執行 # crm_mon 應該就可以看到MySQL和RabbitMQ開起來了。