Backup Control Plane Nodes on Oracle Cloud Native Environment
Introduction
Oracle Cloud Native Environment ships with a module that allows an administrator to back up and restore the control plane configuration files. This tutorial covers performing a backup, inspecting the backup file, and then restoring from the backup file.
Objectives
In this lab, you'll learn how to:
- Back up a control plane configuration
- Inspect the backup file
- Restore the backup file
Prerequisites
Minimum of a 3-node Oracle Cloud Native Environment cluster:
- Operator node
- Kubernetes control plane node
- Kubernetes worker node
Each system should have Oracle Linux installed and configured with:
- An Oracle user account (used during the installation) with sudo access
- Key-based SSH, also known as password-less SSH, between the hosts
- Installation of Oracle Cloud Native Environment
Deploy Oracle Cloud Native Environment
Note: If running in your own tenancy, read the linux-virt-labs
GitHub project README.md and complete the prerequisites before deploying the lab environment.
Open a terminal on the Luna Desktop.
Clone the
linux-virt-labs
GitHub project.git clone https://github.com/oracle-devrel/linux-virt-labs.git
Change into the working directory.
cd linux-virt-labs/ocne
Install the required collections.
ansible-galaxy collection install -r requirements.yml
Deploy the lab environment.
ansible-playbook create_instance.yml -e localhost_python_interpreter="/usr/bin/python3.6"
The free lab environment requires the extra variable
local_python_interpreter
, which setsansible_python_interpreter
for plays running on localhost. This variable is needed because the environment installs the RPM package for the Oracle Cloud Infrastructure SDK for Python, located under the python3.6 modules.Important: Wait for the playbook to run successfully and reach the pause task. At this stage of the playbook, the installation of Oracle Cloud Native Environment is complete, and the instances are ready. Take note of the previous play, which prints the public and private IP addresses of the nodes it deploys and any other deployment information needed while running the lab.
Back Up the Control Plane Node
A proper backup strategy for the control plane node, especially the etcd
database, is essential for cluster administration. Although high-availability clusters have resilience provided by the replication and failover functionality, that does not replace the need for backups.
Open a terminal and connect via SSH to the ocne-operator node.
ssh oracle@<ip_address_of_node>
Confirm installation of the cluster.
olcnectl module instances --environment-name myenvironment
Confirm the cluster is running.
ssh ocne-control-01 kubectl get nodes
You can keep the cluster up and running while performing a backup as part of your disaster recovery plan.
Create a backup file.
olcnectl module backup --environment-name myenvironment --name mycluster
Important: The backup only contains the key containers required for the Kubernetes control plane node. It does not back up any application containers.
Check for the new backup files.
The backup module writes the backup files to
/var/olcne/backups/<environment-name>/<module-name>/<cluster-name>
in a timestamped directory.Example:
total 2368 drwxr-x---. 2 olcne olcne 75 Apr 22 18:53 . drwxr-x---. 3 olcne olcne 28 Apr 22 18:53 .. -rwxr-x---. 1 olcne olcne 2304000 Apr 22 18:53 etcd.tar -rw-r--r--. 1 olcne olcne 1100 Apr 22 18:53 module-config.json -rwxr-x---. 1 olcne olcne 112640 Apr 22 18:53 ocne-control-01.tar
Verify the Backup File
One of the most critical parts of the backup is etcd
. The etcd
in Kubernetes is similar to the etc
directory on Linux but works on distributed systems such as Oracle Cloud Native Environment. Oracle Cloud Native Environment uses etcd
as its primary data store for the Kubernetes cluster to hold its configuration data and cluster settings.
Within the backup file, the etcd.tar
file contains the etcd
database, while the ocne-control-01.tar
file contains other configurations such as certificates, endpoints, and deployment tracking.
Change into the directory containing the backup files.
List the files within the
etcd.tar
file.tar -tvf etcd.tar
The output shows the backup of the
etcd
database and its member list.Example:
-rw------- root/root 2220064 2024-04-24 12:51 var/olcne/scratch/etcd.backup -rw-r--r-- root/root 39 2024-04-24 12:51 var/olcne/scratch/etcd.member
Show the contents of the
etcd.member
file.tar xfO etcd.tar var/olcne/scratch/etcd.member
Example:
ocne-control-01=https://10.0.0.54:2380
Perform a
diff
between the backup and active files.ssh ocne-control-01 sudo cat /etc/kubernetes/manifests/etcd.yaml | diff - <(tar xfO ocne-control-01.tar etc/kubernetes/manifests/etcd.yaml)
This one-liner grabs the contents of the active file over SSH to the control plane node and then passes the contents to the
diff
command. On thediff
side, the-
is the placeholder for the incoming contents, and the<
redirects the output fromtar
, which grabs the contents of the file from within the backup file.If nothing returns from the command, no differences exist between the files.
Restore from Back Up
To show that the backup is working, we need to change the configuration of the Kubernetes cluster. We'll do this by creating a new pod running Nginx.
Check for existing pods in the default namespace.
The SSH command runs this check on the control plane node.
ssh ocne-control-01 kubectl get pod
The result shows
No resources found in default namespace.
Deploy a new pod running Nginx.
ssh ocne-control-01 kubectl run newpod --image=nginx
Verify the new pod is running.
ssh ocne-control-01 kubectl get pod
Example:
NAME READY STATUS RESTARTS AGE newpod 1/1 Running 0 16s
If the
STATUS
does not report asRunning
, run the command a few times until the deployment is complete.- Restore the backup.
olcnectl module restore --environment-name myenvironment --name mycluster --log-level info
Reply
y
to the prompt to continue restoring the backup.Note: Kubernetes cordons and drains the nodes during the restore, which can take 15-20 minutes to complete. Adding the
--log-level
option to the restore command displays more details. In contrast, without this option the command runs silently without showing progress.Recheck for existing pods in the default namespace.
ssh ocne-control-01 kubectl get pod
The result shows
No resources found in default namespace
, which shows the restore as successful because it removes configuration changes that occur within the cluster after the initial backup.
Summary
A successful backup restore demonstrates how to back up and restore the control plane on Oracle Cloud Native Environment. It also illustrates why administrators need regular incremental backups after configuration changes and deployments to ensure restoring to those specific points in time without losing the changes is possible.