Create a Highly Available NFS Service with Gluster and Oracle Linux

Introduction

In this lab, we will create an NFS service hosted by three instances: ol-node01, ol-node02, and ol-node03. These instances will replicate a Gluster volume for data redundancy and use clustering tools for service redundancy.

A fourth instance named ol-client will mount this NFS service for demonstration and testing.

This tutorial is targeted at Oracle Linux 8 users, but the commands are also available on other Oracle Linux releases.

Components

Corosync provides clustering infrastructure to manage which nodes are involved, their communication, and quorum.
Pacemaker manages cluster resources and rules of their behavior.
Gluster is a scalable and distributed filesystem.
Ganesha is an NFS server that can use many different backing filesystem types, including Gluster.

Objectives

In this lab, you'll learn to:

Create a Gluster volume
Configure Ganesha
Create a Cluster
Create Cluster services

Prerequisites

Four Oracle Linux 8 instances installed with the following configuration:
- a non-root user with sudo permissions
- ssh keypair for the non-root user
- ability to ssh from one host (ol-node01) to the others (ol-node02,ol-node03) using passwordless ssh login
- additional block volume for use with gluster

Setup Lab Environment

Note: When using the free lab environment, see Oracle Linux Lab Basics for connection and other usage instructions.

This lab involves multiple instances, and we will need to perform different steps on each. So, we recommend starting by opening three terminal windows and connecting to ol-node01, ol-node02, ol-node03. This step avoids continually logging in and out.

If not already connected, open a terminal and connect via ssh to each instance mentioned above.
```
ssh oracle@<ip_address_of_instance>
```
Note: When a step says "(On all nodes)" in the lab, perform those actions on ol-node01, ol-node02, and ol-node03. We word the instruction in this way to remove redundancy as the action and result are identical.

Install Software

Enable the required Oracle Linux repositories before installing the Corosync, Ganesha, Gluster, and Pacemaker software.

(On all nodes) Install the Gluster yum repository configuration.
```
sudo dnf install -y oracle-gluster-release-el8
```

(On all nodes) Enable the repositories.

sudo dnf config-manager --enable ol8_addons ol8_UEKR6 ol8_appstream

(On all nodes) Install the software.

sudo dnf install -y corosync glusterfs-server nfs-ganesha-gluster pacemaker pcs pcp-zeroconf fence-agents-all

Create the Gluster volume

Prepare each attached block volume to create and activate a replicated Gluster volume.

(On all nodes) Create an XFS filesystem on /dev/sdb with a label of gluster-000.
```
sudo mkfs.xfs -f -i size=512 -L gluster-000 /dev/sdb
```
- -f: Forces overwriting the device when detecting an existing filesystem.
- -i size: Sets the filesystem's inode size, which defaults to a value of 256 bytes.
- -L: Sets the filesystem label, which cannot exceed 12 characters in length.

(On all nodes) Create a mountpoint, add a fstab(5) entry for a disk with the label gluster-000, and mount the filesystem.

sudo mkdir -p /data/glusterfs/sharedvol/mybrick
echo 'LABEL=gluster-000 /data/glusterfs/sharedvol/mybrick xfs defaults  0 0' | sudo tee -a /etc/fstab > /dev/null
sudo mount /data/glusterfs/sharedvol/mybrick

(On all nodes) Enable and start the Gluster service.
```
sudo systemctl enable --now glusterd
```

Configure the firewall to allow traffic on the ports that are specifically used by Gluster.

sudo firewall-cmd --permanent --zone=trusted --add-source=10.0.0.0/24
sudo firewall-cmd --permanent --zone=trusted --add-service=glusterfs
sudo firewall-cmd --reload

(Optional) Ensure that each node has a resolvable name across all the nodes in the pool.
Configure using DNS resolution for each hostname or using the /etc/hosts file instead. When using the hosts file, edit the file on each node and add entries for all Gluster nodes.
The free lab environment already has name resolution configured.

(On ol-node01) Create the Gluster environment by adding peers.

sudo gluster peer probe ol-node02
sudo gluster peer probe ol-node03

(On all nodes) Show that the peers have joined the environment.

sudo gluster peer status

Example Output:

Number of Peers: 2

Hostname: ol-node02
Uuid: 2607976e-7004-47e8-821c-7c6985961cda
State: Peer in Cluster (Connected)

Hostname: ol-node03
Uuid: c51cb4aa-fccd-47f7-9fb2-edb5766991d2
State: Peer in Cluster (Connected)

(On ol-node01) Create a Gluster volume named sharedvol, which replicates across the three hosts: ol-node01, ol-node02, and ol-node03.
```
sudo gluster volume create sharedvol replica 3 ol-node0{1,2,3}:/data/glusterfs/sharedvol/mybrick/brick
```
For more details on volume types, see the Creating and Managing Volumes section of the Oracle Linux Gluster Storage documentation.
(On ol-node01) Enable the sharedvol Gluster volume.
```
sudo gluster volume start sharedvol
```

(On ol-node01) Verify that the replicated Gluster volume is now available from any node.

sudo gluster volume info

Example Output:

Volume Name: sharedvol
Type: Replicate
Volume ID: 1608bc61-cd4e-4b64-a5f3-f5800b717f76
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: ol-node01:/data/glusterfs/sharedvol/mybrick/brick
Brick2: ol-node02:/data/glusterfs/sharedvol/mybrick/brick
Brick3: ol-node03:/data/glusterfs/sharedvol/mybrick/brick
Options Reconfigured:
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

(On ol-node01) Get the status of the Gluster volume.

sudo gluster volume status

Example Output:

Status of volume: sharedvol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick ol-node01:/data/glusterfs/sharedvol/m
ybrick/brick                                49152     0          Y       78082
Brick ol-node02:/data/glusterfs/sharedvol/m
ybrick/brick                                49152     0          Y       77832
Brick ol-node03:/data/glusterfs/sharedvol/m
ybrick/brick                                49152     0          Y       77851
Self-heal Daemon on localhost               N/A       N/A        Y       78099
Self-heal Daemon on ol-node02               N/A       N/A        Y       77849
Self-heal Daemon on ol-node03               N/A       N/A        Y       77868
 
Task Status of Volume sharedvol
------------------------------------------------------------------------------
There are no active volume tasks

Configure Ganesha

Ganesha is the NFS server that shares out the Gluster volume. In this example, we allow any NFS client to connect to our NFS share with read/write permissions.

(On all nodes) Populate the file /etc/ganesha/ganesha.conf with the given configuration.

sudo tee /etc/ganesha/ganesha.conf > /dev/null <<'EOF'
EXPORT{
    Export_Id = 1 ;       # Unique identifier for each EXPORT (share)
    Path = "/sharedvol";  # Export path of our NFS share

    FSAL {
        name = GLUSTER;          # Backing type is Gluster
        hostname = "localhost";  # Hostname of Gluster server
        volume = "sharedvol";    # The name of our Gluster volume
    }

    Access_type = RW;          # Export access permissions
    Squash = No_root_squash;   # Control NFS root squashing
    Disable_ACL = FALSE;       # Enable NFSv4 ACLs
    Pseudo = "/sharedvol";     # NFSv4 pseudo path for our NFS share
    Protocols = "3","4" ;      # NFS protocols supported
    Transports = "UDP","TCP" ; # Transport protocols supported
    SecType = "sys";           # NFS Security flavors supported
}
EOF

For more options to control permissions, see the EXPORT {CLIENT{}} section of config_samples-export in the Additional Information section.

Create a Cluster

Create and start a Pacemaker/Corosync cluster using the three ol-nodes.

(On all nodes) Set a shared password for the user hacluster.
```
echo "hacluster:oracle" | sudo chpasswd
```
NOTE: The use of oracle as the password is just an example for this lab. In production, use the command sudo passwd hacluster to set the password and adhere to the password complexity requirements; otherwise, errors will appear.

(On all nodes) Enable the Corosync and Pacemaker services.

sudo systemctl enable corosync
sudo systemctl enable pacemaker

(On all nodes) Enable and start the configuration system service.
```
sudo systemctl enable --now pcsd
```
(On all nodes) Configure the firewall to allow traffic on the ports that are specifically used by High Availability.
```
sudo firewall-cmd --permanent --zone=trusted --add-service=high-availability
sudo firewall-cmd --reload
```
(On ol-node01) Authenticate with all cluster nodes using the hacluster user and password defined above.
```
sudo pcs host auth ol-node01 ol-node02 ol-node03 -u hacluster -p oracle
```

(On ol-node01) Create a cluster named HA-NFS.

sudo pcs cluster setup HA-NFS ol-node01 ol-node02 ol-node03

(On ol-node01) Start the cluster on all nodes
```
sudo pcs cluster start --all
```
(On ol-node01) Enable the cluster to run on all nodes at boot time.
```
sudo pcs cluster enable --all
```
(On ol-node01) Disable STONITH
STONITH is a feature of Linux for maintaining the integrity of nodes in a high-availability (HA) cluster. STONITH automatically powers down, or fences, a node that is not working correctly. An administrator may utilize STONITH if one of the nodes in a cluster is unreachable by the other node(s) in the cluster.
STONITH is disabled for simplicity in the lab, but setting to disabled is not recommended for production.
```
sudo pcs property set stonith-enabled=false
```

(On any node) Check the cluster status.

The cluster is now running.

sudo pcs cluster status

Example Output:

Cluster Status:
 Cluster Summary:
   * Stack: corosync
   * Current DC: ol-node03 (version 2.1.0-8.0.1.el8-7c3f660707) - partition with quorum
   * Last updated: Wed May  4 16:47:55 2022
   * Last change:  Wed May  4 16:47:47 2022 by hacluster via crmd on ol-node03
   * 3 nodes configured
   * 0 resource instances configured
 Node List:
   * Online: [ ol-node01 ol-node02 ol-node03 ]

PCSD Status:
  ol-node01: Online
  ol-node03: Online
  ol-node02: Online

(On any node) Check the cluster's details, including resources, pacemaker status, and node details.

sudo pcs status

Example Output:

Cluster name: HA-NFS
Cluster Summary:
  * Stack: corosync
  * Current DC: ol-node03 (version 2.1.0-8.0.1.el8-7c3f660707) - partition with quorum
  * Last updated: Wed May  4 16:50:21 2022
  * Last change:  Wed May  4 16:47:47 2022 by hacluster via crmd on ol-node03
  * 3 nodes configured
  * 0 resource instances configured

Node List:
  * Online: [ ol-node01 ol-node02 ol-node03 ]

Full List of Resources:
  * No resources

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Create Cluster Services

Create a Pacemaker resource group containing the resources necessary to host NFS services from the hostname nfs (10.0.0.100) defined as a floating secondary IP address on ol-node01.

(On all nodes) Configure the firewall to allow traffic on the ports that are specifically used by NFS.
```
sudo firewall-cmd --permanent --zone=trusted --add-service=nfs
sudo firewall-cmd --reload
```
(On ol-node01) Create a systemd based cluster resource to ensure nfs-ganesha is running.
```
sudo pcs resource create nfs_server systemd:nfs-ganesha op monitor interval=10s
```

(On ol-node01) Create an IP cluster resource used to present the NFS server.

sudo pcs resource create nfs_ip ocf:heartbeat:IPaddr2 ip=10.0.0.100 cidr_netmask=24 op monitor interval=10s

(On ol-node01) Join the Ganesha service and IP resource in a group to ensure they remain together on the same host.
```
sudo pcs resource group add nfs_group nfs_server nfs_ip
```

(On ol-node01) Verify service is now running.

sudo pcs status

Example Output:

Cluster name: HA-NFS
Cluster Summary:
  * Stack: corosync
  * Current DC: ol-node03 (version 2.1.0-8.0.1.el8-7c3f660707) - partition with quorum
  * Last updated: Wed May  4 16:52:56 2022
  * Last change:  Wed May  4 16:52:39 2022 by root via cibadmin on ol-node01
  * 3 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ ol-node01 ol-node02 ol-node03 ]

Full List of Resources:
  * Resource Group: nfs_group:
    * nfs_server	(systemd:nfs-ganesha):	 Started ol-node01
    * nfs_ip	(ocf::heartbeat:IPaddr2):	 Started ol-node01

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Note: The DC (Designated Controller) node is where all the decisions get made, and if the current DC fails, corosync elects a new one from the remaining cluster nodes. The choice of DC is of no significance to an administrator beyond the fact that its logs will generally be more interesting.

Update the IPaddr2 library configuration

When a node in the cluster does not respond for some reason, Pacemaker and Corosync will make a call to the IPaddr2 library.

We will customize this library to include details of our deployment (such as the VNIC OCIDs, and IP Addresses), and it will utilize those details when it calls the Oracle Command Line Interface (CLI). The CLI will do the heavy lifting by asking the OCI Console to migrate the Secondary IP Address from one node to the other.

(On all nodes) Install the Oracle Linux Developer repository.
```
sudo dnf install -y oraclelinux-developer-release-el8
```
The repository is already installed and available in the free lab environment.
(On all nodes) Install the OCI CLI.
```
sudo dnf install -y python36-oci-cli
```
(On ol-node01) Verify the OCI CLI install.
The free lab environment uses Instance Principal for the authorization of the OCI CLI. For self deployments, configure the same or set up the OCI CLI configuration file .
```
export LC_ALL=C.UTF-8
oci os ns get --auth instance_principal
```

(On all nodes) Make a back up of the IPaddr2 file.

sudo cp /usr/lib/ocf/resource.d/heartbeat/IPaddr2 /usr/lib/ocf/resource.d/heartbeat/IPaddr2.bak

(On all nodes) Run script to update IPaddr2 file.
The script makes the changes within the "add_interface()" function. The reason for making the changes there is once a node fails, Corosync/Pacemaker will run IPaddr2 and move the resource(s) to another node(s) in the cluster. IPaddr2 calls this function during this process.
```
sudo ./update-ipaddr2.sh
```
Here
```
https://luna.oracle.com/api/v1/labs/2bf5d9a2-7afc-4286-97ef-386427e3ebea/gitlab/tutorial/files/update-ipaddr2.sh
```
is a sample version of the script for reference.

Test NFS availability using a client

If not already open and connected, we recommend opening two terminal windows for these steps as we test failover with ol-node01 and ol-client.

If not already connected, open a terminal and connect via ssh to ol-node01 and ol-client system.
```
ssh oracle@<ip_address_of_instance>
```

(On ol-client) Mount the NFS service provided by our cluster and create a file.

sudo dnf install -y nfs-utils
sudo mkdir /sharedvol
sudo mount -t nfs nfs:/sharedvol /sharedvol
df -h /sharedvol/
echo "Hello from Oracle CloudWorld" | sudo tee /sharedvol/hello > /dev/null

(On ol-node01) Identify the host running the nfs_group resources and put it in standby mode to stop running services.

sudo pcs status

Example Output:

Cluster name: HA-NFS
Cluster Summary:
  * Stack: corosync
  * Current DC: ol-node03 (version 2.1.0-8.0.1.el8-7c3f660707) - partition with quorum
  * Last updated: Thu May  5 00:48:07 2022
  * Last change:  Thu May  5 00:47:50 2022 by root via crm_resource on ol-node01
  * 3 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ ol-node01 ol-node02 ol-node03 ]

Full List of Resources:
  * Resource Group: nfs_group:
    * nfs_server	(systemd:nfs-ganesha):	    Started ol-node01
    * nfs_ip	      (ocf::heartbeat:IPaddr2):	 Started ol-node01

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

sudo pcs node standby ol-node01

(On ol-node01) Verify that the nfs_group resources have moved to another node.

sudo pcs status

Example Output:

Cluster name: HA-NFS
Cluster Summary:
  * Stack: corosync
  * Current DC: ol-node03 (version 2.1.0-8.0.1.el8-7c3f660707) - partition with quorum
  * Last updated: Thu May  5 00:53:19 2022
  * Last change:  Thu May  5 00:53:08 2022 by root via cibadmin on ol-node01
  * 3 nodes configured
  * 2 resource instances configured

Node List:
  * Node ol-node01: standby
  * Online: [ ol-node02 ol-node03 ]

Full List of Resources:
  * Resource Group: nfs_group:
    * nfs_server	(systemd:nfs-ganesha):	    Started ol-node02
    * nfs_ip	      (ocf::heartbeat:IPaddr2):	 Started ol-node02

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

(On ol-node02) Verify the floating IP address moved from ol-node01 to ol-node02.

ip a

Example Output:

...
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 02:00:17:06:6a:dd brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.151/24 brd 10.0.0.255 scope global dynamic ens3
       valid_lft 83957sec preferred_lft 83957sec
    inet 10.0.0.100/24 brd 10.0.0.255 scope global secondary ens3
       valid_lft forever preferred_lft forever
    inet6 fe80::17ff:fe06:6add/64 scope link 
       valid_lft forever preferred_lft forever

(On ol-client) Verify the file is still accessible
This action has a short delay as the service moves from one node to another.
```
sudo ls -la /sharedvol/
sudo cat /sharedvol/hello
```
(On ol-node01) Bring the standby node back into the cluster.
```
sudo pcs node unstandby ol-node01
```
(On ol-node01) Verify that the node is back in the cluster.
```
sudo pcs status
```
(On ol-node01) Move resources back to ol-node01.
```
sudo pcs resource move nfs_ip ol-node01
```
(On ol-node01) Verify that the resources moved back to ol-node01.
```
sudo pcs status
```
(On ol-node01) Verify the floating IP address moved from ol-node02 to ol-node01.
```
ip a
```

We now understand how to use Pacemaker/Corosync to create highly available services backed by Gluster.

(Optional) Enable Gluster encryption

Create a self-signed certificate for each node and have its peers trust it.

For more options, see Setting up Transport Layer Security in the Gluster Storage for Oracle Linux User's Guide

(On all nodes) Create a private key and create a certificate for this host signed with this key.

sudo openssl genrsa -out /etc/ssl/glusterfs.key 2048
sudo openssl req -new -x509 -days 365 -key /etc/ssl/glusterfs.key \
                                      -out /etc/ssl/glusterfs.pem \
                                      -subj "/CN=${HOSTNAME}/"

(On ol-node01) Combine the certificate from each node into one file all nodes can trust.

cat /etc/ssl/glusterfs.pem > ~/combined.ca.pem

ssh ol-node02 cat /etc/ssl/glusterfs.pem >> ~/combined.ca.pem

ssh ol-node03 cat /etc/ssl/glusterfs.pem >> ~/combined.ca.pem

(On ol-node01) Copy the combined list of trusted certificates to the local system of each node for Gluster use.

sudo cp ~/combined.ca.pem /etc/ssl/glusterfs.ca

scp ~/combined.ca.pem ol-node02:~

scp ~/combined.ca.pem ol-node03:~

ssh -t ol-node02 sudo cp ~/combined.ca.pem /etc/ssl/glusterfs.ca  > /dev/null 2>&1

ssh -t ol-node03 sudo cp ~/combined.ca.pem /etc/ssl/glusterfs.ca > /dev/null 2>&1

The -t option allows running remote ssh commands with sudo.

(On all nodes) Enable encryption for Gluster management traffic.
```
sudo touch /var/lib/glusterd/secure-access
```

(On ol-node01) Enable encryption on the Gluster volume sharedvol.

sudo gluster volume set sharedvol client.ssl on
sudo gluster volume set sharedvol server.ssl on

(On all nodes) Restart the Gluster service.
```
sudo systemctl restart glusterd
```

Verify the Gluster volume has transport encryption enabled.

sudo gluster volume info

Example Output:

Volume Name: sharedvol
Type: Replicate
Volume ID: 674b73a8-8c09-457e-8996-4417db16651e
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: ol-node01:/data/glusterfs/sharedvol/mybrick/brick
Brick2: ol-node02:/data/glusterfs/sharedvol/mybrick/brick
Brick3: ol-node03:/data/glusterfs/sharedvol/mybrick/brick
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
client.ssl: on
server.ssl: on

For More Information

See other related resources: