SETUP HIGH-AVAILIBILITY CLUSTER USING PACEMAKER WITH FLOATING IP

The example cluster will use:

  1. Ubuntu 20.04 Linux distribution OS
  2. PCS version 0.10.4
  3. Corosync to provide messaging and membership service.
  4. 3 nodes with the given IP and host name

10.0.10.11 (pcmk-1)

10.0.10.12 (pcmk-2)

10.0.10.13 (pcmk-3)

SIDE NOTE:

$ is used when logged in as USER.

# is used when logged in as ROOT.

Step 1) Configure OS

1)     Apply updates on all existing nodes using the apt command below:

$ sudo apt-get update && sudo apt upgrade -y

2)     Configure Hostname

Edit /etc/hosts on each nodes using the vi/nano command and add all IP and Hostname:

$ sudo vi /etc/hosts

Insert IP and hostname.

10.0.10.11 pcmk-1

10.0.10.12 pcmk-2

10.0.10.13 pcmk-3

Save and Exit

Step 2) Configure communication between nodes.

1)     Confirm that you can communicate between nodes, either ping using its IP address or using its Hostname

$ ping -c 3 10.0.10.12

PING 10.0.10.12 (10.0.10.12) 56(84) bytes of data.

64 bytes from 10.99.9.21: icmp_seq=1 ttl=64 time=0.181 ms

64 bytes from 10.99.9.21: icmp_seq=2 ttl=64 time=0.135 ms

64 bytes from 10.99.9.21: icmp_seq=3 ttl=64 time=0.166 ms

— 10.0.10.12ping statistics —

3 packets transmitted, 3 received, 0% packet loss, time 2049ms.

rtt min/avg/max/mdev = 0.135/0.160/0.181/0.019 ms

$ ping -c 3 pcmk-2

PING pcmk-2 (10.0.10.12) 56(84) bytes of data.

64 bytes from pcmk-1 (10.0.10.11): icmp_seq=1 ttl=64 time=0.114 ms

64 bytes from pcmk-1 (10.0.10.11): icmp_seq=2 ttl=64 time=0.090 ms

64 bytes from pcmk-1 (10.0.10.11): icmp_seq=3 ttl=64 time=0.179 ms

— pcmk-2 ping statistics —

3 packets transmitted, 3 received, 0% packet loss, time 2039ms.

rtt min/avg/max/mdev = 0.090/0.127/0.179/0.037 ms

2)     Configure SSH

Create an SSH key so that anyone with that key is allowed to login.

# ssh-keygen -t rsa -b 4096

Generating public/private rsa key pair.

Enter file in which to save the key (/root/.ssh/id_rsa):

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /root/.ssh/id_rsa

Your public key has been saved in /root/.ssh/id_rsa.pub

The key fingerprint is:

SHA256:JMAN81ASJSAD09MKAYAH12UOLAP09ZMANASQ2 root@pcmk-1

The key’s randomart image is:

+—[RSA 4096]—-+

| .    .+=.o.*+.   |

|o    o .o* *o+  |

|.+    = . B.o     |

|+o= .. o o .      |

|oX . .. S          |

|X.. .o              |

|.O…..              |

|o o..E.             |

|   .  ..               |

+—-[SHA256]—–+

               After creating an SSH-key. Copy the SSH key to every node using the command:

            # ssh-copy-id <user>@<hostname>

Step 4) Install cluster software.

1)    Install cluster software.

$ sudo apt-get install pacemaker pcs psmisc policycoreutils-python-utils -y

2)     Adding rules to the Firewall.

Allow cluster related service using firewall.

# firewall-cmd –permanent –add-service=high-availability

success

# firewall-cmd –reload

success

If you are using other firewall application beside firewalld, simply open some of the following ports: 2224/TCP, 3121/TCP, 21064/TCP and 5405/UDP.

3)     Enable pcs Daemon.

Before the cluster can be configured, the pcs daemon must be started and enabled each time at boot time. Start and Enabling pcs daemon by using the command below:

# systemctl start pcsd.service

# systemctl enable pcsd.service

Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.

The installed packages will create a hacluster user with a disabled password, we need that user and password for later when we’re authenticating the cluster. For that reason, we’re going to set the password for the hacluster user.

# passwd hacluster

Changing password for user hacluster.

New password:

Retype new password:

passwd: all authentication tokens updated successfully.

SIDE NOTE:

AFTER THIS STEP, PLEASE DO THE NEXT STEP ONLY ON PCMK-1 (or the CONTROL NODE)

4)     Configure Corosync

# pcs host auth pcmk-1 pcmk-2 pcmk-3

Username: hacluster

Password:

pcmk-2: Authorized

pcmk-1: Authorized

pcmk-3: Authorized

After the authentication is complete, we’re going to generate and synchronize the corosync configuration by using the command below:

# pcs cluster setup <cluster name> pcmk-1 pcmk-2 pcmk-3

No addresses specified for host ‘pcmk-1’, using ‘pcmk-1’

No addresses specified for host ‘pcmk-2’, using ‘pcmk-2’

Destroying cluster on hosts: ‘pcmk-1’, ‘pcmk-2’…

pcmk-1: Successfully destroyed cluster

pcmk-2: Successfully destroyed cluster

pcmk-3: Successfully destroyed cluster

Requesting remove ‘pcsd settings’ from ‘pcmk-1’, ‘pcmk-2’, ‘pcmk-3’

pcmk-1: successful removal of the file ‘pcsd settings’

pcmk-2: successful removal of the file ‘pcsd settings’

pcmk-3: successful removal of the file ‘pcsd settings’

Sending ‘corosync authkey’, ‘pacemaker authkey’ to ‘pcmk-1’, ‘pcmk-2’,’pcmk-3’

pcmk-2: successful distribution of the file ‘corosync authkey’

pcmk-2: successful distribution of the file ‘pacemaker authkey’

pcmk-1: successful distribution of the file ‘corosync authkey’

pcmk-1: successful distribution of the file ‘pacemaker authkey’

pcmk-3: successful distribution of the file ‘corosync authkey’

pcmk-3: successful distribution of the file ‘pacemaker authkey’

Synchronizing pcsd SSL certificates on nodes ‘pcmk-1’, ‘pcmk-2’,’pcmk-3’…

pcmk-1: Success

pcmk-2: Success

pcmk-3: Success

Sending ‘corosync.conf’ to ‘pcmk-1’, ‘pcmk-2’,’pcmk-3’

pcmk-2: successful distribution of the file ‘corosync.conf’

pcmk-1: successful distribution of the file ‘corosync.conf’

pcmk-3: successful distribution of the the ‘corosync.conf’

Cluster has been successfully set up.

Step 5) Start and Verify Cluster

1)     Start Cluster

After the cluster is configured the next thing to do is start the cluster

# pcs cluster start –all

Pcmk-1: starting Cluster…

Pcmk-2: starting Cluster…

Pcmk-3: starting Cluster…

We’re not enabling the corosync and pacemaker service at boot time, so if one or some of the nodes are down or rebooted you need to start the cluster back up using the command above or pcs cluster start <hostname>

2)     Verify Corosync Installation

Use the command below to verify corosync installation.

# corosync-cfgtool -s

Printing ring status.Local node ID 1RING ID 0       

id                        = 10.0.10.11       

status                  = ring 0 active with no faults

3)     Verify Pacemaker Installation

Once we’ve verified that the corosync is functioning, the next service to verify is the pacemaker service by using the command below:

# ps axf 

PID TTY      STAT   TIME COMMAND   

2 ?        S      0:00 [kthreadd]

…lots of processes…

11635 ?        SLsl   0:03 corosync

11642 ?        Ss     0:00 /usr/sbin/pacemakerd -f

11643 ?        Ss     0:00  \_ /usr/libexec/pacemaker/cib

11644 ?        Ss     0:00  \_ /usr/libexec/pacemaker/stonithd

11645 ?        Ss     0:00  \_ /usr/libexec/pacemaker/lrmd

11646 ?        Ss     0:00  \_ /usr/libexec/pacemaker/attrd

11647 ?        Ss     0:00  \_ /usr/libexec/pacemaker/pengine

11648 ?        Ss     0:00  \_ /usr/libexec/pacemaker/crmd

After that command, check the pcs status.

# pcs status

Cluster name: mycluster

WARNING: no stonith devices and stonith-enabled is not false

Stack: corosyncCurrent DC: pcmk-2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) – partition with quorum

Last updated: Tue Dec 19 16:15:34 2023

Last change: Tue Dec 19 16:15:34 2023 by hacluster via crmd on pcmk-1 

3 nodes configured

0 resources configured 

Online: [ pcmk-1 pcmk-2 pcmk-3] 

No resources  

Daemon Status:  corosync: active/disabled 

pacemaker: active/disabled 

pcsd: active/enabled

Step 6) Create Active / Passive Cluster

1)    Configure fencing

Fencing protects your data from getting corrupt, and your application from being unavailable, due to unintended concurrent access by rogue user.

Fencing also known as STONITH or “Shoot The Other Node In The Head”

It is possible to disable STONITH, but it is not recommended for production cluster. To disable STONITH use the command below:

# pcs property set stonith-enabled=false

# crm_verify -L

2)     Adding Floating Resource to pcs

The first resource will be a unique IP address that the cluster can bring up on either node. regardless of where any cluster service(s) are running. For this example, we will use 10.0.10.14 as the floating ip and we’ll monitor the resource to check whether it is running every 30 seconds.

 # pcs resource create floating_ip ocf:heartbeat:IPaddr2 ip=10.0.10.14 cidr_netmask=24 op monitor interval=30s

Once the floating is created, we can check the pcs resource and where it’s at by using the pcs status command:

# pcs status

Cluster name: mycluster

Stack: corosync

Current DC: pcmk-2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) – partition with quorum

Last updated: Tue Dec 19 16:15:34 2023

Last change: Tue Dec 19 16:15:34 2023 by root via cibadmin on pcmk-1 

3 nodes configured

1 resource configured 

Online: [ pcmk-1 pcmk-2 pcmk-3] 

Full list of resources:  ClusterIP      (ocf::heartbeat:IPaddr2):       Started pcmk-1 

Daemon Status:  corosync: active/disabled 

pacemaker: active/disabled 

pcsd: active/enabled

3)     Perform a Failover

Since our goal is high-availability, we’re going to test the cluster with the floating_ip resource. This idea can be achieved by stopping the cluster in which the floating_ip is started and see if the floating_ip has moved to the healthy node.

# pcs status

Cluster name: mycluster

Stack: corosync

Current DC: pcmk-2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) – partition with quorum

Last updated: Tue Dec 19 16:15:34 2023

Last change: Tue Dec 19 16:15:34 2023 by root via cibadmin on pcmk-1

3 nodes configured

1 resource configured 

Online: [ pcmk-1 pcmk-2 pcmk-3] 

Full list of resources:  ClusterIP      (ocf::heartbeat:IPaddr2):       Started pcmk-1

As expected with the pcs status command we can see that the control node (pcmk-1) is running the floating_ip resource, the next to do is to stop pcmk-1 by running the command below:

# pcs stop cluster pcmk-1

Pcmk-1: Stopping Cluster (pacemaker)…

Pcmk-1: Stopping Cluster (corosync)…

            Now check the cluster status using pcs status command

# pcs status

Cluster name: mycluster

Stack: corosync

Current DC: pcmk-2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) – partition with quorum

Last updated: Tue Dec 19 16:15:34 2023

Last change: Tue Dec 19 16:15:34 2023 by root via cibadmin on pcmk-1 

3 nodes configured

1 resource configured 

Online: [pcmk-2 pcmk-3]

Offline: [ pcmk-1]

 Full list of resources: 

ClusterIP      (ocf::heartbeat:IPaddr2):       Started pcmk-2

Leave a Reply

Your email address will not be published. Required fields are marked *