SETUP HIGH-AVAILIBILITY CLUSTER USING PACEMAKER WITH FLOATING IP
The example cluster will use:
- Ubuntu 20.04 Linux distribution OS
- PCS version 0.10.4
- Corosync to provide messaging and membership service.
- 3 nodes with the given IP and host name
10.0.10.11 (pcmk-1)
10.0.10.12 (pcmk-2)
10.0.10.13 (pcmk-3)
SIDE NOTE:
$ is used when logged in as USER.
# is used when logged in as ROOT.
Step 1) Configure OS
1) Apply updates on all existing nodes using the apt command below:
$ sudo apt-get update && sudo apt upgrade -y
2) Configure Hostname
Edit /etc/hosts on each nodes using the vi/nano command and add all IP and Hostname:
$ sudo vi /etc/hosts
Insert IP and hostname.
10.0.10.11 pcmk-1
10.0.10.12 pcmk-2
10.0.10.13 pcmk-3
Save and Exit
Step 2) Configure communication between nodes.
1) Confirm that you can communicate between nodes, either ping using its IP address or using its Hostname
$ ping -c 3 10.0.10.12
PING 10.0.10.12 (10.0.10.12) 56(84) bytes of data.
64 bytes from 10.99.9.21: icmp_seq=1 ttl=64 time=0.181 ms
64 bytes from 10.99.9.21: icmp_seq=2 ttl=64 time=0.135 ms
64 bytes from 10.99.9.21: icmp_seq=3 ttl=64 time=0.166 ms
— 10.0.10.12ping statistics —
3 packets transmitted, 3 received, 0% packet loss, time 2049ms.
rtt min/avg/max/mdev = 0.135/0.160/0.181/0.019 ms
$ ping -c 3 pcmk-2
PING pcmk-2 (10.0.10.12) 56(84) bytes of data.
64 bytes from pcmk-1 (10.0.10.11): icmp_seq=1 ttl=64 time=0.114 ms
64 bytes from pcmk-1 (10.0.10.11): icmp_seq=2 ttl=64 time=0.090 ms
64 bytes from pcmk-1 (10.0.10.11): icmp_seq=3 ttl=64 time=0.179 ms
— pcmk-2 ping statistics —
3 packets transmitted, 3 received, 0% packet loss, time 2039ms.
rtt min/avg/max/mdev = 0.090/0.127/0.179/0.037 ms
2) Configure SSH
Create an SSH key so that anyone with that key is allowed to login.
# ssh-keygen -t rsa -b 4096
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa
Your public key has been saved in /root/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:JMAN81ASJSAD09MKAYAH12UOLAP09ZMANASQ2 root@pcmk-1
The key’s randomart image is:
+—[RSA 4096]—-+
| . .+=.o.*+. |
|o o .o* *o+ |
|.+ = . B.o |
|+o= .. o o . |
|oX . .. S |
|X.. .o |
|.O….. |
|o o..E. |
| . .. |
+—-[SHA256]—–+
After creating an SSH-key. Copy the SSH key to every node using the command:
# ssh-copy-id <user>@<hostname>
Step 4) Install cluster software.
1) Install cluster software.
$ sudo apt-get install pacemaker pcs psmisc policycoreutils-python-utils -y
2) Adding rules to the Firewall.
Allow cluster related service using firewall.
# firewall-cmd –permanent –add-service=high-availability
success
# firewall-cmd –reload
success
If you are using other firewall application beside firewalld, simply open some of the following ports: 2224/TCP, 3121/TCP, 21064/TCP and 5405/UDP.
3) Enable pcs Daemon.
Before the cluster can be configured, the pcs daemon must be started and enabled each time at boot time. Start and Enabling pcs daemon by using the command below:
# systemctl start pcsd.service
# systemctl enable pcsd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
The installed packages will create a hacluster user with a disabled password, we need that user and password for later when we’re authenticating the cluster. For that reason, we’re going to set the password for the hacluster user.
# passwd hacluster
Changing password for user hacluster.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
SIDE NOTE:
AFTER THIS STEP, PLEASE DO THE NEXT STEP ONLY ON PCMK-1 (or the CONTROL NODE)
4) Configure Corosync
# pcs host auth pcmk-1 pcmk-2 pcmk-3
Username: hacluster
Password:
pcmk-2: Authorized
pcmk-1: Authorized
pcmk-3: Authorized
After the authentication is complete, we’re going to generate and synchronize the corosync configuration by using the command below:
# pcs cluster setup <cluster name> pcmk-1 pcmk-2 pcmk-3
No addresses specified for host ‘pcmk-1’, using ‘pcmk-1’
No addresses specified for host ‘pcmk-2’, using ‘pcmk-2’
Destroying cluster on hosts: ‘pcmk-1’, ‘pcmk-2’…
pcmk-1: Successfully destroyed cluster
pcmk-2: Successfully destroyed cluster
pcmk-3: Successfully destroyed cluster
Requesting remove ‘pcsd settings’ from ‘pcmk-1’, ‘pcmk-2’, ‘pcmk-3’
pcmk-1: successful removal of the file ‘pcsd settings’
pcmk-2: successful removal of the file ‘pcsd settings’
pcmk-3: successful removal of the file ‘pcsd settings’
Sending ‘corosync authkey’, ‘pacemaker authkey’ to ‘pcmk-1’, ‘pcmk-2’,’pcmk-3’
pcmk-2: successful distribution of the file ‘corosync authkey’
pcmk-2: successful distribution of the file ‘pacemaker authkey’
pcmk-1: successful distribution of the file ‘corosync authkey’
pcmk-1: successful distribution of the file ‘pacemaker authkey’
pcmk-3: successful distribution of the file ‘corosync authkey’
pcmk-3: successful distribution of the file ‘pacemaker authkey’
Synchronizing pcsd SSL certificates on nodes ‘pcmk-1’, ‘pcmk-2’,’pcmk-3’…
pcmk-1: Success
pcmk-2: Success
pcmk-3: Success
Sending ‘corosync.conf’ to ‘pcmk-1’, ‘pcmk-2’,’pcmk-3’
pcmk-2: successful distribution of the file ‘corosync.conf’
pcmk-1: successful distribution of the file ‘corosync.conf’
pcmk-3: successful distribution of the the ‘corosync.conf’
Cluster has been successfully set up.
Step 5) Start and Verify Cluster
1) Start Cluster
After the cluster is configured the next thing to do is start the cluster
# pcs cluster start –all
Pcmk-1: starting Cluster…
Pcmk-2: starting Cluster…
Pcmk-3: starting Cluster…
We’re not enabling the corosync and pacemaker service at boot time, so if one or some of the nodes are down or rebooted you need to start the cluster back up using the command above or pcs cluster start <hostname>
2) Verify Corosync Installation
Use the command below to verify corosync installation.
# corosync-cfgtool -s
Printing ring status.Local node ID 1RING ID 0
id = 10.0.10.11
status = ring 0 active with no faults
3) Verify Pacemaker Installation
Once we’ve verified that the corosync is functioning, the next service to verify is the pacemaker service by using the command below:
# ps axf
PID TTY STAT TIME COMMAND
2 ? S 0:00 [kthreadd]
…lots of processes…
11635 ? SLsl 0:03 corosync
11642 ? Ss 0:00 /usr/sbin/pacemakerd -f
11643 ? Ss 0:00 \_ /usr/libexec/pacemaker/cib
11644 ? Ss 0:00 \_ /usr/libexec/pacemaker/stonithd
11645 ? Ss 0:00 \_ /usr/libexec/pacemaker/lrmd
11646 ? Ss 0:00 \_ /usr/libexec/pacemaker/attrd
11647 ? Ss 0:00 \_ /usr/libexec/pacemaker/pengine
11648 ? Ss 0:00 \_ /usr/libexec/pacemaker/crmd
After that command, check the pcs status.
# pcs status
Cluster name: mycluster
WARNING: no stonith devices and stonith-enabled is not false
Stack: corosyncCurrent DC: pcmk-2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) – partition with quorum
Last updated: Tue Dec 19 16:15:34 2023
Last change: Tue Dec 19 16:15:34 2023 by hacluster via crmd on pcmk-1
3 nodes configured
0 resources configured
Online: [ pcmk-1 pcmk-2 pcmk-3]
No resources
Daemon Status: corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
Step 6) Create Active / Passive Cluster
1) Configure fencing
Fencing protects your data from getting corrupt, and your application from being unavailable, due to unintended concurrent access by rogue user.
Fencing also known as STONITH or “Shoot The Other Node In The Head”
It is possible to disable STONITH, but it is not recommended for production cluster. To disable STONITH use the command below:
# pcs property set stonith-enabled=false
# crm_verify -L
2) Adding Floating Resource to pcs
The first resource will be a unique IP address that the cluster can bring up on either node. regardless of where any cluster service(s) are running. For this example, we will use 10.0.10.14 as the floating ip and we’ll monitor the resource to check whether it is running every 30 seconds.
# pcs resource create floating_ip ocf:heartbeat:IPaddr2 ip=10.0.10.14 cidr_netmask=24 op monitor interval=30s
Once the floating is created, we can check the pcs resource and where it’s at by using the pcs status command:
# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: pcmk-2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) – partition with quorum
Last updated: Tue Dec 19 16:15:34 2023
Last change: Tue Dec 19 16:15:34 2023 by root via cibadmin on pcmk-1
3 nodes configured
1 resource configured
Online: [ pcmk-1 pcmk-2 pcmk-3]
Full list of resources: ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-1
Daemon Status: corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
3) Perform a Failover
Since our goal is high-availability, we’re going to test the cluster with the floating_ip resource. This idea can be achieved by stopping the cluster in which the floating_ip is started and see if the floating_ip has moved to the healthy node.
# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: pcmk-2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) – partition with quorum
Last updated: Tue Dec 19 16:15:34 2023
Last change: Tue Dec 19 16:15:34 2023 by root via cibadmin on pcmk-1
3 nodes configured
1 resource configured
Online: [ pcmk-1 pcmk-2 pcmk-3]
Full list of resources: ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-1
As expected with the pcs status command we can see that the control node (pcmk-1) is running the floating_ip resource, the next to do is to stop pcmk-1 by running the command below:
# pcs stop cluster pcmk-1
Pcmk-1: Stopping Cluster (pacemaker)…
Pcmk-1: Stopping Cluster (corosync)…
Now check the cluster status using pcs status command
# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: pcmk-2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) – partition with quorum
Last updated: Tue Dec 19 16:15:34 2023
Last change: Tue Dec 19 16:15:34 2023 by root via cibadmin on pcmk-1
3 nodes configured
1 resource configured
Online: [pcmk-2 pcmk-3]
Offline: [ pcmk-1]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-2