Scale a Kubernetes Cluster on Oracle Cloud Native Environment
Introduction
This tutorial demonstrates how to scale an existing Kubernetes cluster in Oracle Cloud Native Environment.
To scale up a Kubernetes cluster means adding nodes, likewise scale down occurs by removing nodes. Nodes can be either control plane or worker nodes. Oracle recommends against scaling the cluster up and down at the same time. Instead perform a scale up, then scale down, in two separate commands.
To avoid split-brain scenarios and maintain the quorum, it is recommended to scale the Kubernetes cluster control plane, or worker nodes in odd numbers. For example, 3, 5, or 7 control plane or worker nodes ensures the reliability of the cluster.
This tutorial uses an existing Highly Available Kubernetes cluster running on Oracle Cloud Native Environment, and has three modules deployed:
- Kubernetes (
kubernetes
) - Helm (
helm
) - Oracle Cloud Infrastructure Cloud Controller Manager Module (
oci-ccm
)
The starting deployment consists of the following:
- 1 Operator Node
- 3 Control Plane Nodes
- 5 Worker Nodes
It builds upon the labs:
- Deploy Oracle Cloud Native Environment
- Deploy an External Load Balancer with Oracle Cloud Native Environment
- Use OCI Cloud Controller Manager on Oracle Cloud Native Environment
Objectives
This tutorial/lab steps through configuring and adding two new control plane nodes and two new worker nodes to the cluster. The tutorial/lab then demonstrates how to scale down the cluster by removing the same nodes from the cluster.
In this scenario, X.509 Private CA Certificates are used to secure communication between the nodes. There are other methods to manage and deploy the certificates, such as by using HashiCorp Vault secrets manager, or by using your own certificates, signed by a trusted Certificate Authority (CA). These other methods are not included in this tutorial.
Prerequisites
Note: If using the free lab environment these prerequisites are provided as the starting point.
In addition to the requirement of a Highly Available Kubernetes cluster running on Oracle Cloud Native Environment, the following is needed:
4 additional Oracle Linux systems to use as:
- 2 Kubernetes control plane nodes
- 2 Kubernetes worker nodes
Access to a Load Balancer (the free lab environment uses the OCI Load Balancer)
Systems should have:
- a minimum of latest Oracle Linux 8 (x86_64) installed and running the Unbreakable Enterprise Kernel Release 6 (UEK R6).
- completed the prerequisite steps to install Oracle Cloud Native Environment
Set Up Lab Environment
Note: When using the free lab environment, see Oracle Linux Lab Basics for connection and other usage instructions.
Information: The free lab environment deploys Oracle Cloud Native Environment on the provided nodes, ready for creating environments. This deployment takes approximately 20-25 minutes to finish after launch. Therefore, you might want to step away while this runs and then return to complete the lab.
Unless otherwise stated, all steps within the free lab environment can be executed from the ocne-operator node and it is recommended to start by opening a terminal window and connect to the node. In a multi-node installation of Oracle Cloud Native Environment, the kubectl
commands are run on either the operator, a control plane node or another system configured for kubectl
.
Open a terminal and connect via ssh to the ocne-operator system.
ssh oracle@<ip_address_of_ol_node>
Install the Kubernetes Module
Important All operations, unless stated otherwise, are executed from the ocne-operator node.
The free lab environment creates a Highly Available Oracle Cloud Native Environment install during deployment, including preparing the environment and module configuration.
View the
myenvironment.yaml
file.cat ~/myenvironment.yaml
The free lab environment deployment uses three control plane nodes and five worker nodes when creating the cluster.
Example Output:
[oracle@ocne-operator ~]$ cat ~/myenvironment.yaml environments: - environment-name: myenvironment globals: api-server: 127.0.0.1:8091 secret-manager-type: file olcne-ca-path: /etc/olcne/configs/certificates/production/ca.cert olcne-node-cert-path: /etc/olcne/configs/certificates/production/node.cert olcne-node-key-path: /etc/olcne/configs/certificates/production/node.key modules: - module: kubernetes name: mycluster args: container-registry: container-registry.oracle.com/olcne load-balancer: 10.0.0.168:6443 master-nodes: - ocne-control01.lv.vcnf998d566.oraclevcn.com:8090 - ocne-control02.lv.vcnf998d566.oraclevcn.com:8090 - ocne-control03.lv.vcnf998d566.oraclevcn.com:8090 worker-nodes: - ocne-worker01.lv.vcnf998d566.oraclevcn.com:8090 - ocne-worker02.lv.vcnf998d566.oraclevcn.com:8090 - ocne-worker03.lv.vcnf998d566.oraclevcn.com:8090 - ocne-worker04.lv.vcnf998d566.oraclevcn.com:8090 - ocne-worker05.lv.vcnf998d566.oraclevcn.com:8090 selinux: enforcing restrict-service-externalip: true restrict-service-externalip-ca-cert: /etc/olcne/configs/certificates/restrict_external_ip/production/ca.cert restrict-service-externalip-tls-cert: /etc/olcne/configs/certificates/restrict_external_ip/production/node.cert restrict-service-externalip-tls-key: /etc/olcne/configs/certificates/restrict_external_ip/production/node.key - module: helm name: myhelm args: helm-kubernetes-module: mycluster - module: oci-ccm name: myoci oci-ccm-helm-module: myhelm oci-use-instance-principals: true oci-compartment: ocid1.compartment.oc1..aaaaaaaau2g2k23u6mp3t43ky3i4ky7jpyeiqcdcobpbcb7z6vjjlrdnuufq oci-vcn: ocid1.vcn.oc1.eu-frankfurt-1.amaaaaaaw6qx2pia2xkfmnnknpk3jll6emb76gtcza3ttbqqofxmwjb45rka oci-lb-subnet1: ocid1.subnet.oc1.eu-frankfurt-1.aaaaaaaawfjs5zrb6wdmg43522a4l5aak5zr6vvkaaa6xogttha2ufsip7fq
The domain part of the FQDN for the nodes will be unique on each deployment of the free lab environment.
Install the Kubernetes module.
olcnectl module install --config-file myenvironment.yaml
Note: The deployment of Kubernetes to the nodes will take 20-25 minutes to complete.
Example Output:
[oracle@ocne-operator ~]$ olcnectl module install --config-file myenvironment.yaml Modules installed successfully. Modules installed successfully. Modules installed successfully.
Why are there three Modules installed successfully responses? Well, this is because the
myenvironment.yaml
file used in this example defines three separate modules:- module: kubernetes
- module: helm
- module: oci-ccm
It is important to understand this because later in these steps some responses will also be returned three times - once for each module defined in the
myenvironment.yaml
file.Verify the deployment of the Kubernetes module.
olcnectl module instances --config-file myenvironment.yaml
Example Output:
[oracle@ocne-operator ~]$ olcnectl module instances --config-file myenvironment.yaml INSTANCE MODULE STATE mycluster kubernetes installed myhelm helm installed myoci oci-ccm installed ocne-control01.lv.vcnf998d566.oraclevcn.com:8090 node installed ocne-control02.lv.vcnf998d566.oraclevcn.com:8090 node installed ocne-control03.lv.vcnf998d566.oraclevcn.com:8090 node installed ocne-worker01.lv.vcnf998d566.oraclevcn.com:8090 node installed ocne-worker02.lv.vcnf998d566.oraclevcn.com:8090 node installed ocne-worker03.lv.vcnf998d566.oraclevcn.com:8090 node installed ocne-worker04.lv.vcnf998d566.oraclevcn.com:8090 node installed ocne-worker05.lv.vcnf998d566.oraclevcn.com:8090 node installed
Set up kubectl
Set up the
kubectl
command.Copy the configuration file from one of the control plane nodes.
mkdir -p $HOME/.kube ssh -o StrictHostKeyChecking=no 10.0.0.150 "sudo cat /etc/kubernetes/admin.conf" > $HOME/.kube/config
Example Output:
[oracle@ocne-operator ~]$ mkdir -p $HOME/.kube [oracle@ocne-operator ~]$ ssh -o StrictHostKeyChecking=no 10.0.0.150 "sudo cat /etc/kubernetes/admin.conf" > $HOME/.kube/config Warning: Permanently added '10.0.0.150' (ECDSA) to the list of known hosts.
Export the configuration for use by the
kubectl
command.sudo chown $(id -u):$(id -g) $HOME/.kube/config export KUBECONFIG=$HOME/.kube/config echo 'export KUBECONFIG=$HOME/.kube/config' >> $HOME/.bashrc
Verify
kubectl
works.kubectl get nodes
Example Output:
[oracle@ocne-operator ~]$ kubectl get nodes NAME STATUS ROLES AGE VERSION ocne-control01 Ready control-plane,master 17m v1.23.7+1.el8 ocne-control02 Ready control-plane,master 16m v1.23.7+1.el8 ocne-control03 Ready control-plane,master 15m v1.23.7+1.el8 ocne-worker01 Ready <none> 16m v1.23.7+1.el8 ocne-worker02 Ready <none> 15m v1.23.7+1.el8 ocne-worker03 Ready <none> 14m v1.23.7+1.el8 ocne-worker04 Ready <none> 15m v1.23.7+1.el8 ocne-worker05 Ready <none> 15m v1.23.7+1.el8 [oracle@ocne-operator ~]$
Confirm the Oracle Cloud Infrastructure Cloud Controller Manager Module is Ready
Before proceeding it is important to wait for the Oracle Cloud Infrastructure Cloud Controller Manager module to establish communication with the OCI API. The Oracle Cloud Infrastructure Cloud Controller Manager module runs a pod on each node that handles functionality such as attaching the block storage. After being installed, this controller prevents any pods from being scheduled until this dedicated pod confirms it is initialized, running and communicating with the OCI API. Until this communication has been successfully established, any attempt to proceed is likely to prevent successful use of cloud storage or load balancers by Kubernetes.
Retrieve the status of the component
oci-cloud-controller-manager
pods.kubectl -n kube-system get pods -l component=oci-cloud-controller-manager
Example Output:
[[oracle@ocne-operator ~]$ kubectl -n kube-system get pods -l component=oci-cloud-controller-manager NAME READY STATUS RESTARTS AGE oci-cloud-controller-manager-9d9gh 1/1 Running 1 (48m ago) 50m oci-cloud-controller-manager-jqzs6 1/1 Running 0 50m oci-cloud-controller-manager-xfm9w 1/1 Running 0 50m
Retrieve the status of the role
csi-oci
pods.kubectl -n kube-system get pods -l role=csi-oci
Example Output:
[[oracle@ocne-operator ~]$ kubectl -n kube-system get pods -l role=csi-oci NAME READY STATUS RESTARTS AGE csi-oci-controller-7fcbddd746-2hb5c 4/4 Running 2 (50m ago) 51m csi-oci-node-7jd6t 3/3 Running 0 51m csi-oci-node-fc5x5 3/3 Running 0 51m csi-oci-node-jq8sm 3/3 Running 0 51m csi-oci-node-jqkvl 3/3 Running 0 51m csi-oci-node-jwq8g 3/3 Running 0 51m csi-oci-node-jzxqt 3/3 Running 0 51m csi-oci-node-rmmmb 3/3 Running 0 51m csi-oci-node-zc287 1/3 Running 0 51m
Note: Wait for both of these commands to show the
STATUS
asRunning
before proceeding further.
If the values under theREADY
column do not show all of the containers as started , and those under theSTATUS
column do not show asRunning
after 15 minutes, please restart the lab.
(Optional) Set up the New Kubernetes Nodes
Note: The steps in this section are not required in the free lab environment because they have already been completed during the initial deployment of the lab. Please skip forward to the next section and continue from there.
When scaling up (adding nodes), any new nodes require all of the prerequisites listed in the Prerequisites
section of this tutorial to be met.
In this tutorial/lab, the nodes ocne-control04
and ocne-control05
are the new control plane nodes, while the nodes ocne-worker06
and ocne-worker07
are the new worker nodes. Besides the prerequisites, these new nodes require installing and enabling the Oracle Cloud Native Environment Platform Agent.
Install and enable the Platform Agent.
sudo dnf install olcne-agent olcne-utils sudo systemctl enable olcne-agent.service
If using a proxy server, configure it with CRI-O. On each Kubernetes node, create a CRI-O systemd configuration directory. Create a file named
proxy.conf
in the directory and add the proxy server information.sudo mkdir /etc/systemd/system/crio.service.d sudo vi /etc/systemd/system/crio.service.d/proxy.conf
Substitute the appropriate proxy values for those in the environment using the example
proxy.conf
file:[Service] Environment="HTTP_PROXY=proxy.example.com:80" Environment="HTTPS_PROXY=proxy.example.com:80" Environment="NO_PROXY=.example.com,192.0.2.*"
If the
docker
orcontainerd
service is running, stop and disable them.sudo systemctl disable --now docker.service sudo systemctl disable --now containerd.service
Set up X.509 Private CA Certificates
Set up X.509 Private CA Certificates for the new control plane nodes and the worker nodes.
Create a list of new nodes.
VAR1=$(hostname -d) for NODE in 'ocne-control04' 'ocne-control05' 'ocne-worker06' 'ocne-worker07'; do VAR2+="${NODE}.$VAR1,"; done VAR2=${VAR2%,}
The provided bash script grabs the domain name of the operator node and creates a comma separated list of the nodes to add to the cluster during the scale up procedure.
Generate a private CA and set of certificates for the new nodes.
Use the
--byo-ca-cert
option to specify the location of the existing CA Certificate, and the--byo-ca-key
option to specify the location of the existing CA Key. Use the--nodes
option and provide the FQDN of the new control plane and worker nodes.cd /etc/olcne sudo ./gen-certs-helper.sh \ --cert-dir /etc/olcne/configs/certificates/ \ --byo-ca-cert /etc/olcne/configs/certificates/production/ca.cert \ --byo-ca-key /etc/olcne/configs/certificates/production/ca.key \ --nodes $VAR2
Example Output:
[oracle@ocne-operator ~]$ cd /etc/olcne [oracle@ocne-operator olcne]$ sudo ./gen-certs-helper.sh \ > --cert-dir /etc/olcne/configs/certificates/ \ > --byo-ca-cert /etc/olcne/configs/certificates/production/ca.cert \ > --byo-ca-key /etc/olcne/configs/certificates/production/ca.key \ > --nodes $VAR2 [INFO] Generating certs for ocne-control04.lv.vcnf998d566.oraclevcn.com Generating RSA private key, 2048 bit long modulus (2 primes) .............................+++++ ....................+++++ e is 65537 (0x010001) Signature ok subject=C = US, ST = North Carolina, L = Whynot, O = your-company, OU = private cloud, CN = example.com Getting CA Private Key [INFO] Generating certs for ocne-control05.lv.vcnf998d566.oraclevcn.com Generating RSA private key, 2048 bit long modulus (2 primes) ...+++++ ...........................................................+++++ e is 65537 (0x010001) Signature ok subject=C = US, ST = North Carolina, L = Whynot, O = your-company, OU = private cloud, CN = example.com Getting CA Private Key [INFO] Generating certs for ocne-worker06.lv.vcnf998d566.oraclevcn.com Generating RSA private key, 2048 bit long modulus (2 primes) ......+++++ .......................+++++ e is 65537 (0x010001) Signature ok subject=C = US, ST = North Carolina, L = Whynot, O = your-company, OU = private cloud, CN = example.com Getting CA Private Key [INFO] Generating certs for ocne-worker07.lv.vcnf998d566.oraclevcn.com Generating RSA private key, 2048 bit long modulus (2 primes) ....................................................................................+++++ .................................+++++ e is 65537 (0x010001) Signature ok subject=C = US, ST = North Carolina, L = Whynot, O = your-company, OU = private cloud, CN = example.com Getting CA Private Key ----------------------------------------------------------- Script To Transfer Certs: /etc/olcne/configs/certificates/olcne-tranfer-certs.sh ----------------------------------------------------------- [SUCCESS] Generated certs and file transfer script! [INFO] CA Cert: /etc/olcne/configs/certificates/production/ca.key [INFO] CA Key: /etc/olcne/configs/certificates/production/ca.cert [WARNING] The CA Key is the only way to generate more certificates, ensure it is stored in long term storage [USER STEP #1] Please ensure you have ssh access from this machine to: ocne-control04.lv.vcnf998d566.oraclevcn.com,ocne-control05.lv.vcnf998d566.oraclevcn.com,ocne-worker06.lv.vcnf998d566.oraclevcn.com,ocne-worker07.lv.vcnf998d566.oraclevcn.com
Transfer Certificates
Transfer the newly created certificates from the operator node to all of the new nodes.
Update the user details in the provided transfer script.
sudo sed -i 's/USER=opc/USER=oracle/g' configs/certificates/olcne-tranfer-certs.sh
Note: The tutorial requires this step because the script's default user is
opc
. Since both this tutorial and free lab environment install the product using the useroracle
, update theUSER
variable within the script accordingly.Update the permissions for each node.key generated by the certificate creation script.
sudo chmod 644 /etc/olcne/configs/certificates/tmp-olcne/ocne-control*/node.key sudo chmod 644 /etc/olcne/configs/certificates/tmp-olcne/ocne-operator*/node.key sudo chmod 644 /etc/olcne/configs/certificates/tmp-olcne/ocne-worker*/node.key
Transfer the certificates to each of the new nodes.
Note This step requires passwordless SSH configured between the nodes. Configuration of this is outside the scope of this tutorial but is pre-configured in the free lab environment.
bash -ex /etc/olcne/configs/certificates/olcne-tranfer-certs.sh
Configure the Platform Agent to Use the Certificates
Configure the Platform Agent on each new node to use the certificates copied over in the previous step. We accomplish this task from the operator node by running the command over ssh
.
Configure the ocne-control04 node.
ssh -o StrictHostKeyChecking=no ocne-control04 'sudo /etc/olcne/bootstrap-olcne.sh \ --secret-manager-type file \ --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \ --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \ --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \ --olcne-component agent'
Example Output:
[oracle@ocne-operator olcne]$ ssh -o StrictHostKeyChecking=no ocne-control04 'sudo /etc/olcne/bootstrap-olcne.sh \ > --secret-manager-type file \ > --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \ > --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \ > --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \ > --olcne-component agent' Warning: Permanently added 'ocne-control04,10.0.0.153' (ECDSA) to the list of known hosts. ��� olcne-agent.service - Agent for Oracle Linux Cloud Native Environments Loaded: loaded (/usr/lib/systemd/system/olcne-agent.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/olcne-agent.service.d ������10-auth.conf Active: active (running) since Tue 2022-08-30 15:29:37 GMT; 2s ago Main PID: 152809 (olcne-agent) Tasks: 8 (limit: 202294) Memory: 11.1M CGroup: /system.slice/olcne-agent.service ������152809 /usr/libexec/olcne-agent --secret-manager-type file --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key Aug 30 15:29:37 ocne-control04 systemd[1]: Started Agent for Oracle Linux Cloud Native Environments. Aug 30 15:29:37 ocne-control04 olcne-agent[152809]: time=30/08/22 15:29:37 level=info msg=Started server on[::]:8090
Configure the ocne-control05 node.
ssh -o StrictHostKeyChecking=no ocne-control05 'sudo /etc/olcne/bootstrap-olcne.sh \ --secret-manager-type file \ --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \ --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \ --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \ --olcne-component agent'
Example Output:
[oracle@ocne-operator olcne]$ ssh -o StrictHostKeyChecking=no ocne-control05 'sudo /etc/olcne/bootstrap-olcne.sh \ > --secret-manager-type file \ > --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \ > --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \ > --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \ > --olcne-component agent' Warning: Permanently added 'ocne-control05,10.0.0.154' (ECDSA) to the list of known hosts. ��� olcne-agent.service - Agent for Oracle Linux Cloud Native Environments Loaded: loaded (/usr/lib/systemd/system/olcne-agent.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/olcne-agent.service.d ������10-auth.conf Active: active (running) since Tue 2022-08-30 15:34:13 GMT; 2s ago Main PID: 153413 (olcne-agent) Tasks: 7 (limit: 202294) Memory: 9.1M CGroup: /system.slice/olcne-agent.service ������153413 /usr/libexec/olcne-agent --secret-manager-type file --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key Aug 30 15:34:13 ocne-control05 systemd[1]: olcne-agent.service: Succeeded. Aug 30 15:34:13 ocne-control05 systemd[1]: Stopped Agent for Oracle Linux Cloud Native Environments. Aug 30 15:34:13 ocne-control05 systemd[1]: Started Agent for Oracle Linux Cloud Native Environments. Aug 30 15:34:13 ocne-control05 olcne-agent[153413]: time=30/08/22 15:34:13 level=info msg=Started server on[::]:8090
Configure the ocne-worker06 node.
ssh -o StrictHostKeyChecking=no ocne-worker06 'sudo /etc/olcne/bootstrap-olcne.sh \ --secret-manager-type file \ --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \ --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \ --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \ --olcne-component agent'
Example Output:
[oracle@ocne-operator olcne]$ ssh -o StrictHostKeyChecking=no ocne-worker06 'sudo /etc/olcne/bootstrap-olcne.sh \ > --secret-manager-type file \ > --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \ > --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \ > --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \ > --olcne-component agent' Warning: Permanently added 'ocne-worker06,10.0.0.165' (ECDSA) to the list of known hosts. ��� olcne-agent.service - Agent for Oracle Linux Cloud Native Environments Loaded: loaded (/usr/lib/systemd/system/olcne-agent.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/olcne-agent.service.d ������10-auth.conf Active: active (running) since Tue 2022-08-30 15:41:08 GMT; 2s ago Main PID: 153988 (olcne-agent) Tasks: 8 (limit: 202294) Memory: 5.2M CGroup: /system.slice/olcne-agent.service ������153988 /usr/libexec/olcne-agent --secret-manager-type file --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key Aug 30 15:41:08 ocne-worker06 systemd[1]: Started Agent for Oracle Linux Cloud Native Environments. Aug 30 15:41:08 ocne-worker06 olcne-agent[153988]: time=30/08/22 15:41:08 level=info msg=Started server on[::]:8090
Configure the ocne-worker07 node.
ssh -o StrictHostKeyChecking=no ocne-worker07 'sudo /etc/olcne/bootstrap-olcne.sh \ --secret-manager-type file \ --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \ --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \ --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \ --olcne-component agent'
Example Output:
[oracle@ocne-operator olcne]$ ssh -o StrictHostKeyChecking=no ocne-worker07 'sudo /etc/olcne/bootstrap-olcne.sh \ > --secret-manager-type file \ > --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \ > --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \ > --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \ > --olcne-component agent' Warning: Permanently added 'ocne-worker07,10.0.0.166' (ECDSA) to the list of known hosts. ��� olcne-agent.service - Agent for Oracle Linux Cloud Native Environments Loaded: loaded (/usr/lib/systemd/system/olcne-agent.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/olcne-agent.service.d ������10-auth.conf Active: active (running) since Tue 2022-08-30 15:43:23 GMT; 2s ago Main PID: 154734 (olcne-agent) Tasks: 8 (limit: 202294) Memory: 9.1M CGroup: /system.slice/olcne-agent.service ������154734 /usr/libexec/olcne-agent --secret-manager-type file --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key Aug 30 15:43:23 ocne-worker07 systemd[1]: olcne-agent.service: Succeeded. Aug 30 15:43:23 ocne-worker07 systemd[1]: Stopped Agent for Oracle Linux Cloud Native Environments. Aug 30 15:43:23 ocne-worker07 systemd[1]: Started Agent for Oracle Linux Cloud Native Environments. Aug 30 15:43:23 ocne-worker07 olcne-agent[154734]: time=30/08/22 15:43:23 level=info msg=Started server on[::]:8090
Access the OCI Load Balancer and View the Backends
Because having more than one node defined for the Kubernetes control plane requires a Load Balancer, it is interesting to view the configuration that was automatically setup when the free lab environment was deployed. This will show the three nodes deployed and configured when the lab is created as having a Healthy
status and the two nodes that will be added in the upcoming steps as being in Critical
status.
Switch from the Terminal to the Luna desktop
Open the Luna Lab details page using the Luna Lab icon.
Click on the OCI Console link.
The Oracle Cloud Console login page displays.
Enter the
User Name
andPassword
(found on the Luna Lab tab in the Credentials section).Click on the hamburger menu (top-left), then Networking and Load Balancers.
The Load Balancers page displays.
Locate the Compartment being used from the drop-down list.
Click on the Load Balancer listed in the table (ocne-load-balancer).
Scroll down the page and click on the link to the Backend Sets (on the left-hand side in the Resources section).
The Backend Sets table is displayed. Click on the link called ocne-lb-backend-set in the Name column.
Click on the link to the Backends (on the left-hand side in the Resources section).
The Backends representing the control plane nodes are displayed.
Note Two of the backend nodes are in the Critical - connection failed state because these nodes are not yet part of the Kubernetes control plane cluster. Keep this browser tab open, as we'll recheck the status of the backend nodes after completing the scale-up steps.
View the Kubernetes Nodes
Check the currently available Kubernetes nodes in the cluster. Note that there are three control plane nodes and five worker nodes.
Confirm that the nodes are all in READY status.
kubectl get nodes
Example Output:
[oracle@ocne-operator olcne]$ kubectl get nodes NAME STATUS ROLES AGE VERSION ocne-control01 Ready control-plane,master 5h15m v1.23.7+1.el8 ocne-control02 Ready control-plane,master 5h14m v1.23.7+1.el8 ocne-control03 Ready control-plane,master 5h13m v1.23.7+1.el8 ocne-worker01 Ready <none> 5h14m v1.23.7+1.el8 ocne-worker02 Ready <none> 5h13m v1.23.7+1.el8 ocne-worker03 Ready <none> 5h12m v1.23.7+1.el8 ocne-worker04 Ready <none> 5h13m v1.23.7+1.el8 ocne-worker05 Ready <none> 5h14m v1.23.7+1.el8
Add Control Plane and Worker Nodes to the Deployment Configuration File
Add the Fully Qualified Domain Name (FQDN) and Platform Agent access port (8090) to all control plane and worker nodes to be added into the cluster.
Edit the YAML deployment configuration file to include the new cluster nodes. Add the control plane nodes under the master-nodes
section while adding the worker nodes to the worker-node
section.
The filename for the configuration file in this tutorial is myenvironment.yaml
and currently includes three control plane and five worker nodes.
Confirm the current environment uses three control planes nodes and five worker nodes.
cat ~/myenvironment.yaml
Example Output:
... master-nodes: - ocne-control01.lv.vcneea798df.oraclevcn.com:8090 - ocne-control02.lv.vcneea798df.oraclevcn.com:8090 - ocne-control03.lv.vcneea798df.oraclevcn.com:8090 worker-nodes: - ocne-worker01.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker02.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker03.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker04.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker05.lv.vcneea798df.oraclevcn.com:8090 ...
Add the new control plane and worker nodes to the
myenvironment.yaml
file.cd ~ sed -i '19 i \ - ocne-control04.'"$(hostname -d)"':8090' ~/myenvironment.yaml sed -i '20 i \ - ocne-control05.'"$(hostname -d)"':8090' ~/myenvironment.yaml sed -i '27 i \ - ocne-worker06.'"$(hostname -d)"':8090' ~/myenvironment.yaml sed -i '28 i \ - ocne-worker07.'"$(hostname -d)"':8090' ~/myenvironment.yaml
Confirm the control plane and worker nodes have been added to the
myenvironment.yaml
file.cat ~/myenvironment.yaml
Example Excerpt:
... master-nodes: - ocne-control01.lv.vcneea798df.oraclevcn.com:8090 - ocne-control02.lv.vcneea798df.oraclevcn.com:8090 - ocne-control03.lv.vcneea798df.oraclevcn.com:8090 - ocne-control04.lv.vcneea798df.oraclevcn.com:8090 - ocne-control05.lv.vcneea798df.oraclevcn.com:8090 worker-nodes: - ocne-worker01.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker02.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker03.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker04.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker05.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker06.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker07.lv.vcneea798df.oraclevcn.com:8090 ...
The configuration file now includes the new control plane nodes (ocne-control04
and ocne-control05
) and the new worker nodes (ocne-worker06
and ocne-worker07
). This represents all of the control plane and worker nodes that should be in the cluster after the scale-up completes.
Scale Up the Control Plane and Worker Nodes
Run the module update command.
Use
olcnectl module update
command with the--config-file
option to specify the location of the configuration file. The Platform API Server validates the configuration file with the state of the cluster and recognises there are more nodes that should be added to the cluster. Answery
when prompted.Note: There will be a delay between the prompts in the Terminal window while each of the modules are updated. In the free lab environment this delay may be up to 10-15 minutes.
olcnectl module update --config-file myenvironment.yaml
Example Output:
[oracle@ocne-operator ~]$ olcnectl module update --config-file myenvironment.yaml ? [WARNING] Update will shift your workload and some pods will lose data if they rely on local storage. Do you want to continue? Yes Taking backup of modules before update Backup of modules succeeded. Updating modules Update successful ? [WARNING] Update will shift your workload and some pods will lose data if they rely on local storage. Do you want to continue? Yes Taking backup of modules before update Backup of modules succeeded. Updating modules Update successful ? [WARNING] Update will shift your workload and some pods will lose data if they rely on local storage. Do you want to continue? Yes Taking backup of modules before update Backup of modules succeeded. Updating modules Update successful
(In the Cloud Console) Confirm that the Load Balancer's Backend Set shows five healthy Backend nodes.
Confirm that the new control plane and worker nodes have been added to the cluster.
kubectl get nodes
Example Output:
[oracle@ocne-operator ~]$ kubectl get nodes NAME STATUS ROLES AGE VERSION ocne-control01 Ready control-plane,master 99m v1.23.7+1.el8 ocne-control02 Ready control-plane,master 97m v1.23.7+1.el8 ocne-control03 Ready control-plane,master 96m v1.23.7+1.el8 ocne-control04 Ready control-plane,master 13m v1.23.7+1.el8 ocne-control05 Ready control-plane,master 12m v1.23.7+1.el8 ocne-worker01 Ready <none> 99m v1.23.7+1.el8 ocne-worker02 Ready <none> 98m v1.23.7+1.el8 ocne-worker03 Ready <none> 98m v1.23.7+1.el8 ocne-worker04 Ready <none> 98m v1.23.7+1.el8 ocne-worker05 Ready <none> 98m v1.23.7+1.el8 ocne-worker06 Ready <none> 13m v1.23.7+1.el8 ocne-worker07 Ready <none> 13m v1.23.7+1.el8
Notice that new control planes nodes (
ocne-control04
andocne-control05
) and the new worker nodes (ocne-work06
andocne-worker07
) are now included in the cluster. Thereby confirming that the scale up operation worked.
Scale Down the Control Plane Nodes
To demonstrate that the control plane and worker nodes can scale independently, we'll just scale down (remove) the control plane nodes in this step.
Confirm the current environment uses five control planes nodes and seven worker nodes.
cat ~/myenvironment.yaml
Example Output:
... master-nodes: - ocne-control01.lv.vcneea798df.oraclevcn.com:8090 - ocne-control02.lv.vcneea798df.oraclevcn.com:8090 - ocne-control03.lv.vcneea798df.oraclevcn.com:8090 - ocne-control04.lv.vcneea798df.oraclevcn.com:8090 - ocne-control05.lv.vcneea798df.oraclevcn.com:8090 worker-nodes: - ocne-worker01.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker02.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker03.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker04.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker05.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker06.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker07.lv.vcneea798df.oraclevcn.com:8090 ...
To scale the cluster back down to the original three control plane, remove the
ocne-control04
andocne-control05
control plane nodes from the configuration file.sed -i '19d;20d' ~/myenvironment.yaml
Confirm the configuration file now contains only three control planes nodes and the seven worker nodes.
cat ~/myenvironment.yaml
Example Excerpt:
... master-nodes: - ocne-control01.lv.vcneea798df.oraclevcn.com:8090 - ocne-control02.lv.vcneea798df.oraclevcn.com:8090 - ocne-control03.lv.vcneea798df.oraclevcn.com:8090 worker-nodes: - ocne-worker01.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker02.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker03.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker04.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker05.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker06.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker07.lv.vcneea798df.oraclevcn.com:8090 ...
Suppress the module update warning message.
It is possible to avoid and suppress the confirmation prompt during module update by adding the
force: true
directive to the configuration file. Thisdirective
needs to be placed immediately under thename: <xxxx>
directive for each module defined.cd ~ sed -i '12 i \ force: true' ~/myenvironment.yaml sed -i '35 i \ force: true' ~/myenvironment.yaml sed -i '40 i \ force: true' ~/myenvironment.yaml
Confirm the configuration file now contains the
force: true
directive.cat ~/myenvironment.yaml
Example Excerpt:
[oracle@ocne-operator ~]$ cat ~/myenvironment.yaml environments: - environment-name: myenvironment globals: api-server: 127.0.0.1:8091 secret-manager-type: file olcne-ca-path: /etc/olcne/configs/certificates/production/ca.cert olcne-node-cert-path: /etc/olcne/configs/certificates/production/node.cert olcne-node-key-path: /etc/olcne/configs/certificates/production/node.key modules: - module: kubernetes name: mycluster force: true args: container-registry: container-registry.oracle.com/olcne load-balancer: 10.0.0.18:6443 master-nodes: - ocne-control01.lv.vcn1174e41d.oraclevcn.com:8090 - ocne-control02.lv.vcn1174e41d.oraclevcn.com:8090 - ocne-control03.lv.vcn1174e41d.oraclevcn.com:8090 worker-nodes: - ocne-worker01.lv.vcn1174e41d.oraclevcn.com:8090 - ocne-worker02.lv.vcn1174e41d.oraclevcn.com:8090 - ocne-worker03.lv.vcn1174e41d.oraclevcn.com:8090 - ocne-worker04.lv.vcn1174e41d.oraclevcn.com:8090 - ocne-worker05.lv.vcn1174e41d.oraclevcn.com:8090 - ocne-worker06.lv.vcneea798df.oraclevcn.com:8090 - ocne-worker07.lv.vcneea798df.oraclevcn.com:8090 selinux: enforcing restrict-service-externalip: true restrict-service-externalip-ca-cert: /etc/olcne/configs/certificates/restrict_external_ip/production/ca.cert restrict-service-externalip-tls-cert: /etc/olcne/configs/certificates/restrict_external_ip/production/node.cert restrict-service-externalip-tls-key: /etc/olcne/configs/certificates/restrict_external_ip/production/node.key - module: helm name: myhelm force: true args: helm-kubernetes-module: mycluster - module: oci-ccm name: myoci force: true oci-ccm-helm-module: myhelm oci-use-instance-principals: true oci-compartment: ocid1.compartment.oc1..aaaaaaaanr6cysadeswwxc7sczdsrlamzhfh6scdyvuh4s4fmvecob6e2cha oci-vcn: ocid1.vcn.oc1.eu-frankfurt-1.amaaaaaag7acy3iat3duvrym376oax7nxdyqd56mqxtjaws47t4g7vqthgja oci-lb-subnet1: ocid1.subnet.oc1.eu-frankfurt-1.aaaaaaaa6rt6chugbkfhyjyl4exznpxrlvnus2bgkzcgm7fljfkqbxkva6ya
Run the command to update the cluster and remove the nodes.
Note: This may take a few minutes to complete.
olcnectl module update --config-file myenvironment.yaml
Example Output:
[oracle@ocne-operator ~]$ olcnectl module update --config-file myenvironment.yaml Taking backup of modules before update Backup of modules succeeded. Updating modules Update successful Taking backup of modules before update Backup of modules succeeded. Updating modules Update successful Taking backup of modules before update Backup of modules succeeded. Updating modules Update successful
(In the Cloud Console) Confirm that the Load Balancer's Backend Set shows three healthy (
Health = 'OK'
) and two unhealthy (Health = 'Critical - Connection failed'
) nodes. The reason two nodes show as having a critical status is because they have been removed from the Kubernetes cluster.Show the control plane nodes are removed from the cluster by the Platform API Server. Confirm that the control plane (
ocne-control04
andocne-control05
) nodes have been removed.kubectl get nodes
Example Output:
[oracle@ocne-operator ~]$ kubectl get nodes NAME STATUS ROLES AGE VERSION ocne-control01 Ready control-plane,master 164m v1.23.7+1.el8 ocne-control02 Ready control-plane,master 163m v1.23.7+1.el8 ocne-control03 Ready control-plane,master 162m v1.23.7+1.el8 ocne-worker01 Ready <none> 164m v1.23.7+1.el8 ocne-worker02 Ready <none> 163m v1.23.7+1.el8 ocne-worker03 Ready <none> 164m v1.23.7+1.el8 ocne-worker04 Ready <none> 164m v1.23.7+1.el8 ocne-worker05 Ready <none> 164m v1.23.7+1.el8 ocne-worker06 Ready <none> 13m v1.23.7+1.el8 ocne-worker07 Ready <none> 13m v1.23.7+1.el8
Summary
This completes the demonstration detailing how to add, and then remove, Kubernetes nodes from the cluster. Whilst this exercise demonstrated updating both the control plane and worker nodes simultaneously this is not the recommended approach to scaling up, or scaling down an Oracle Cloud Native Environment Kubernetes cluster, and in a production environment would most likely be undertaken separately.