Scale a Kubernetes Cluster on Oracle Cloud Native Environment

0
0
Send lab feedback

Scale a Kubernetes Cluster on Oracle Cloud Native Environment

Introduction

This tutorial demonstrates how to scale an existing Kubernetes cluster in Oracle Cloud Native Environment.

To scale up a Kubernetes cluster means adding nodes, likewise scale down occurs by removing nodes. Nodes can be either control plane or worker nodes. Oracle recommends against scaling the cluster up and down at the same time. Instead perform a scale up, then scale down, in two separate commands.

To avoid split-brain scenarios and maintain the quorum, it is recommended to scale the Kubernetes cluster control plane, or worker nodes in odd numbers. For example, 3, 5, or 7 control plane or worker nodes ensures the reliability of the cluster.

This tutorial uses an existing Highly Available Kubernetes cluster running on Oracle Cloud Native Environment, and has three modules deployed:

  • Kubernetes (kubernetes)
  • Helm (helm)
  • Oracle Cloud Infrastructure Cloud Controller Manager Module (oci-ccm)

The starting deployment consists of the following:

  • 1 Operator Node
  • 3 Control Plane Nodes
  • 5 Worker Nodes

It builds upon the labs:

Objectives

This tutorial/lab steps through configuring and adding two new control plane nodes and two new worker nodes to the cluster. The tutorial/lab then demonstrates how to scale down the cluster by removing the same nodes from the cluster.

In this scenario, X.509 Private CA Certificates are used to secure communication between the nodes. There are other methods to manage and deploy the certificates, such as by using HashiCorp Vault secrets manager, or by using your own certificates, signed by a trusted Certificate Authority (CA). These other methods are not included in this tutorial.

Prerequisites

Note: If using the free lab environment these prerequisites are provided as the starting point.

In addition to the requirement of a Highly Available Kubernetes cluster running on Oracle Cloud Native Environment, the following is needed:

  • 4 additional Oracle Linux systems to use as:

    • 2 Kubernetes control plane nodes
    • 2 Kubernetes worker nodes
  • Access to a Load Balancer (the free lab environment uses the OCI Load Balancer)

  • Systems should have:

Set Up Lab Environment

Note: When using the free lab environment, see Oracle Linux Lab Basics for connection and other usage instructions.

Information: The free lab environment deploys Oracle Cloud Native Environment on the provided nodes, ready for creating environments. This deployment takes approximately 20-25 minutes to finish after launch. Therefore, you might want to step away while this runs and then return to complete the lab.

Unless otherwise stated, all steps within the free lab environment can be executed from the ocne-operator node and it is recommended to start by opening a terminal window and connect to the node. In a multi-node installation of Oracle Cloud Native Environment, the kubectl commands are run on either the operator, a control plane node or another system configured for kubectl.

  1. Open a terminal and connect via ssh to the ocne-operator system.

    ssh oracle@<ip_address_of_ol_node>

Install the Kubernetes Module

Important All operations, unless stated otherwise, are executed from the ocne-operator node.

The free lab environment creates a Highly Available Oracle Cloud Native Environment install during deployment, including preparing the environment and module configuration.

  1. View the myenvironment.yaml file.

    cat ~/myenvironment.yaml

    The free lab environment deployment uses three control plane nodes and five worker nodes when creating the cluster.

    Example Output:

    [oracle@ocne-operator ~]$ cat ~/myenvironment.yaml
    environments:
      - environment-name: myenvironment
        globals:
          api-server: 127.0.0.1:8091
          secret-manager-type: file
          olcne-ca-path: /etc/olcne/configs/certificates/production/ca.cert
          olcne-node-cert-path: /etc/olcne/configs/certificates/production/node.cert
          olcne-node-key-path:  /etc/olcne/configs/certificates/production/node.key
        modules:
          - module: kubernetes
            name: mycluster
            args:
              container-registry: container-registry.oracle.com/olcne
              load-balancer: 10.0.0.168:6443
              master-nodes:
                - ocne-control01.lv.vcnf998d566.oraclevcn.com:8090
                - ocne-control02.lv.vcnf998d566.oraclevcn.com:8090
                - ocne-control03.lv.vcnf998d566.oraclevcn.com:8090
              worker-nodes:
                - ocne-worker01.lv.vcnf998d566.oraclevcn.com:8090
                - ocne-worker02.lv.vcnf998d566.oraclevcn.com:8090
                - ocne-worker03.lv.vcnf998d566.oraclevcn.com:8090
                - ocne-worker04.lv.vcnf998d566.oraclevcn.com:8090
                - ocne-worker05.lv.vcnf998d566.oraclevcn.com:8090
              selinux: enforcing
              restrict-service-externalip: true
              restrict-service-externalip-ca-cert: /etc/olcne/configs/certificates/restrict_external_ip/production/ca.cert
              restrict-service-externalip-tls-cert: /etc/olcne/configs/certificates/restrict_external_ip/production/node.cert
              restrict-service-externalip-tls-key: /etc/olcne/configs/certificates/restrict_external_ip/production/node.key
          - module: helm
            name: myhelm
            args:
              helm-kubernetes-module: mycluster      
          - module: oci-ccm
            name: myoci
            oci-ccm-helm-module: myhelm
            oci-use-instance-principals: true
            oci-compartment: ocid1.compartment.oc1..aaaaaaaau2g2k23u6mp3t43ky3i4ky7jpyeiqcdcobpbcb7z6vjjlrdnuufq
            oci-vcn: ocid1.vcn.oc1.eu-frankfurt-1.amaaaaaaw6qx2pia2xkfmnnknpk3jll6emb76gtcza3ttbqqofxmwjb45rka
            oci-lb-subnet1: ocid1.subnet.oc1.eu-frankfurt-1.aaaaaaaawfjs5zrb6wdmg43522a4l5aak5zr6vvkaaa6xogttha2ufsip7fq         

    The domain part of the FQDN for the nodes will be unique on each deployment of the free lab environment.

  2. Install the Kubernetes module.

    olcnectl module install --config-file myenvironment.yaml

    Note: The deployment of Kubernetes to the nodes will take 20-25 minutes to complete.

    Example Output:

    [oracle@ocne-operator ~]$ olcnectl module install --config-file myenvironment.yaml
    Modules installed successfully.
    Modules installed successfully.
    Modules installed successfully.

    Why are there three Modules installed successfully responses? Well, this is because the myenvironment.yaml file used in this example defines three separate modules:

    • - module: kubernetes
    • - module: helm
    • - module: oci-ccm

    It is important to understand this because later in these steps some responses will also be returned three times - once for each module defined in the myenvironment.yaml file.

  3. Verify the deployment of the Kubernetes module.

    olcnectl module instances --config-file myenvironment.yaml

    Example Output:

    [oracle@ocne-operator ~]$ olcnectl module instances --config-file myenvironment.yaml
    INSTANCE                                        	MODULE    	STATE    
    mycluster                                       	kubernetes	installed
    myhelm                                          	helm      	installed
    myoci                                           	oci-ccm   	installed
    ocne-control01.lv.vcnf998d566.oraclevcn.com:8090	node      	installed
    ocne-control02.lv.vcnf998d566.oraclevcn.com:8090	node      	installed
    ocne-control03.lv.vcnf998d566.oraclevcn.com:8090	node      	installed
    ocne-worker01.lv.vcnf998d566.oraclevcn.com:8090 	node      	installed
    ocne-worker02.lv.vcnf998d566.oraclevcn.com:8090 	node      	installed
    ocne-worker03.lv.vcnf998d566.oraclevcn.com:8090 	node      	installed
    ocne-worker04.lv.vcnf998d566.oraclevcn.com:8090 	node      	installed
    ocne-worker05.lv.vcnf998d566.oraclevcn.com:8090 	node      	installed

Set up kubectl

  1. Set up the kubectl command.

    1. Copy the configuration file from one of the control plane nodes.

      mkdir -p $HOME/.kube
      ssh -o StrictHostKeyChecking=no 10.0.0.150 "sudo cat /etc/kubernetes/admin.conf" > $HOME/.kube/config

      Example Output:

      [oracle@ocne-operator ~]$ mkdir -p $HOME/.kube
      [oracle@ocne-operator ~]$ ssh -o StrictHostKeyChecking=no 10.0.0.150 "sudo cat /etc/kubernetes/admin.conf" > $HOME/.kube/config
      Warning: Permanently added '10.0.0.150' (ECDSA) to the list of known hosts.
    2. Export the configuration for use by the kubectl command.

      sudo chown $(id -u):$(id -g) $HOME/.kube/config
      export KUBECONFIG=$HOME/.kube/config
      echo 'export KUBECONFIG=$HOME/.kube/config' >> $HOME/.bashrc
  2. Verify kubectl works.

    kubectl get nodes

    Example Output:

    [oracle@ocne-operator ~]$ kubectl get nodes
    NAME             STATUS   ROLES                  AGE   VERSION
    ocne-control01   Ready    control-plane,master   17m   v1.23.7+1.el8
    ocne-control02   Ready    control-plane,master   16m   v1.23.7+1.el8
    ocne-control03   Ready    control-plane,master   15m   v1.23.7+1.el8
    ocne-worker01    Ready    <none>                 16m   v1.23.7+1.el8
    ocne-worker02    Ready    <none>                 15m   v1.23.7+1.el8
    ocne-worker03    Ready    <none>                 14m   v1.23.7+1.el8
    ocne-worker04    Ready    <none>                 15m   v1.23.7+1.el8
    ocne-worker05    Ready    <none>                 15m   v1.23.7+1.el8
    [oracle@ocne-operator ~]$

Confirm the Oracle Cloud Infrastructure Cloud Controller Manager Module is Ready

Before proceeding it is important to wait for the Oracle Cloud Infrastructure Cloud Controller Manager module to establish communication with the OCI API. The Oracle Cloud Infrastructure Cloud Controller Manager module runs a pod on each node that handles functionality such as attaching the block storage. After being installed, this controller prevents any pods from being scheduled until this dedicated pod confirms it is initialized, running and communicating with the OCI API. Until this communication has been successfully established, any attempt to proceed is likely to prevent successful use of cloud storage or load balancers by Kubernetes.

  1. Retrieve the status of the component oci-cloud-controller-manager pods.

    kubectl -n kube-system get pods -l component=oci-cloud-controller-manager

    Example Output:

    [[oracle@ocne-operator ~]$ kubectl -n kube-system get pods -l component=oci-cloud-controller-manager
    NAME                                 READY   STATUS    RESTARTS      AGE
    oci-cloud-controller-manager-9d9gh   1/1     Running   1 (48m ago)   50m
    oci-cloud-controller-manager-jqzs6   1/1     Running   0             50m
    oci-cloud-controller-manager-xfm9w   1/1     Running   0             50m
  2. Retrieve the status of the role csi-oci pods.

    kubectl -n kube-system get pods -l role=csi-oci

    Example Output:

    [[oracle@ocne-operator ~]$ kubectl -n kube-system get pods -l role=csi-oci
    NAME                                  READY   STATUS             RESTARTS      AGE
    csi-oci-controller-7fcbddd746-2hb5c   4/4     Running            2 (50m ago)   51m
    csi-oci-node-7jd6t                    3/3     Running            0             51m
    csi-oci-node-fc5x5                    3/3     Running            0             51m
    csi-oci-node-jq8sm                    3/3     Running            0             51m
    csi-oci-node-jqkvl                    3/3     Running            0             51m
    csi-oci-node-jwq8g                    3/3     Running            0             51m
    csi-oci-node-jzxqt                    3/3     Running            0             51m
    csi-oci-node-rmmmb                    3/3     Running            0             51m
    csi-oci-node-zc287                    1/3     Running            0             51m

Note: Wait for both of these commands to show the STATUS as Running before proceeding further.

If the values under the READY column do not show all of the containers as started , and those under the STATUS column do not show as Running after 15 minutes, please restart the lab.

(Optional) Set up the New Kubernetes Nodes

Note: The steps in this section are not required in the free lab environment because they have already been completed during the initial deployment of the lab. Please skip forward to the next section and continue from there.

When scaling up (adding nodes), any new nodes require all of the prerequisites listed in the Prerequisites section of this tutorial to be met.

In this tutorial/lab, the nodes ocne-control04 and ocne-control05 are the new control plane nodes, while the nodes ocne-worker06 and ocne-worker07 are the new worker nodes. Besides the prerequisites, these new nodes require installing and enabling the Oracle Cloud Native Environment Platform Agent.

  1. Install and enable the Platform Agent.

    sudo dnf install olcne-agent olcne-utils
    sudo systemctl enable olcne-agent.service
  2. If using a proxy server, configure it with CRI-O. On each Kubernetes node, create a CRI-O systemd configuration directory. Create a file named proxy.conf in the directory and add the proxy server information.

    sudo mkdir /etc/systemd/system/crio.service.d
    sudo vi /etc/systemd/system/crio.service.d/proxy.conf
  3. Substitute the appropriate proxy values for those in the environment using the example proxy.conf file:

    [Service]
    Environment="HTTP_PROXY=proxy.example.com:80"
    Environment="HTTPS_PROXY=proxy.example.com:80"
    Environment="NO_PROXY=.example.com,192.0.2.*"
  4. If the docker or containerd service is running, stop and disable them.

    sudo systemctl disable --now docker.service
    sudo systemctl disable --now containerd.service

Set up X.509 Private CA Certificates

Set up X.509 Private CA Certificates for the new control plane nodes and the worker nodes.

  1. Create a list of new nodes.

    VAR1=$(hostname -d)
    for NODE in 'ocne-control04' 'ocne-control05' 'ocne-worker06' 'ocne-worker07'; do VAR2+="${NODE}.$VAR1,"; done
    VAR2=${VAR2%,}

    The provided bash script grabs the domain name of the operator node and creates a comma separated list of the nodes to add to the cluster during the scale up procedure.

  2. Generate a private CA and set of certificates for the new nodes.

    Use the --byo-ca-cert option to specify the location of the existing CA Certificate, and the --byo-ca-key option to specify the location of the existing CA Key. Use the --nodes option and provide the FQDN of the new control plane and worker nodes.

    cd /etc/olcne
    sudo ./gen-certs-helper.sh \
    --cert-dir /etc/olcne/configs/certificates/ \
    --byo-ca-cert /etc/olcne/configs/certificates/production/ca.cert \
    --byo-ca-key /etc/olcne/configs/certificates/production/ca.key \
    --nodes $VAR2

    Example Output:

    [oracle@ocne-operator ~]$ cd /etc/olcne
    [oracle@ocne-operator olcne]$ sudo ./gen-certs-helper.sh \
    > --cert-dir /etc/olcne/configs/certificates/ \
    > --byo-ca-cert /etc/olcne/configs/certificates/production/ca.cert \
    > --byo-ca-key /etc/olcne/configs/certificates/production/ca.key \
    > --nodes $VAR2
    [INFO] Generating certs for ocne-control04.lv.vcnf998d566.oraclevcn.com
    Generating RSA private key, 2048 bit long modulus (2 primes)
    .............................+++++
    ....................+++++
    e is 65537 (0x010001)
    Signature ok
    subject=C = US, ST = North Carolina, L = Whynot, O = your-company, OU = private cloud, CN = example.com
    Getting CA Private Key
    [INFO] Generating certs for ocne-control05.lv.vcnf998d566.oraclevcn.com
    Generating RSA private key, 2048 bit long modulus (2 primes)
    ...+++++
    ...........................................................+++++
    e is 65537 (0x010001)
    Signature ok
    subject=C = US, ST = North Carolina, L = Whynot, O = your-company, OU = private cloud, CN = example.com
    Getting CA Private Key
    [INFO] Generating certs for ocne-worker06.lv.vcnf998d566.oraclevcn.com
    Generating RSA private key, 2048 bit long modulus (2 primes)
    ......+++++
    .......................+++++
    e is 65537 (0x010001)
    Signature ok
    subject=C = US, ST = North Carolina, L = Whynot, O = your-company, OU = private cloud, CN = example.com
    Getting CA Private Key
    [INFO] Generating certs for ocne-worker07.lv.vcnf998d566.oraclevcn.com
    Generating RSA private key, 2048 bit long modulus (2 primes)
    ....................................................................................+++++
    .................................+++++
    e is 65537 (0x010001)
    Signature ok
    subject=C = US, ST = North Carolina, L = Whynot, O = your-company, OU = private cloud, CN = example.com
    Getting CA Private Key
    -----------------------------------------------------------
    Script To Transfer Certs: /etc/olcne/configs/certificates/olcne-tranfer-certs.sh
    -----------------------------------------------------------
    [SUCCESS] Generated certs and file transfer script!
    [INFO]    CA Cert: /etc/olcne/configs/certificates/production/ca.key
    [INFO]    CA Key:  /etc/olcne/configs/certificates/production/ca.cert
    [WARNING] The CA Key is the only way to generate more certificates, ensure it is stored in long term storage
    [USER STEP #1]    Please ensure you have ssh access from this machine to: ocne-control04.lv.vcnf998d566.oraclevcn.com,ocne-control05.lv.vcnf998d566.oraclevcn.com,ocne-worker06.lv.vcnf998d566.oraclevcn.com,ocne-worker07.lv.vcnf998d566.oraclevcn.com

Transfer Certificates

Transfer the newly created certificates from the operator node to all of the new nodes.

  1. Update the user details in the provided transfer script.

    sudo sed -i 's/USER=opc/USER=oracle/g' configs/certificates/olcne-tranfer-certs.sh

    Note: The tutorial requires this step because the script's default user is opc. Since both this tutorial and free lab environment install the product using the user oracle, update the USER variable within the script accordingly.

  2. Update the permissions for each node.key generated by the certificate creation script.

    sudo chmod 644 /etc/olcne/configs/certificates/tmp-olcne/ocne-control*/node.key
    sudo chmod 644 /etc/olcne/configs/certificates/tmp-olcne/ocne-operator*/node.key
    sudo chmod 644 /etc/olcne/configs/certificates/tmp-olcne/ocne-worker*/node.key
  3. Transfer the certificates to each of the new nodes.

    Note This step requires passwordless SSH configured between the nodes. Configuration of this is outside the scope of this tutorial but is pre-configured in the free lab environment.

    bash -ex /etc/olcne/configs/certificates/olcne-tranfer-certs.sh

Configure the Platform Agent to Use the Certificates

Configure the Platform Agent on each new node to use the certificates copied over in the previous step. We accomplish this task from the operator node by running the command over ssh.

  1. Configure the ocne-control04 node.

    ssh -o StrictHostKeyChecking=no ocne-control04 'sudo /etc/olcne/bootstrap-olcne.sh \
    --secret-manager-type file \
    --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \
    --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \
    --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \
    --olcne-component agent'

    Example Output:

    [oracle@ocne-operator olcne]$ ssh -o StrictHostKeyChecking=no ocne-control04 'sudo /etc/olcne/bootstrap-olcne.sh \
    > --secret-manager-type file \
    > --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \
    > --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \
    > --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \
    > --olcne-component agent'
    Warning: Permanently added 'ocne-control04,10.0.0.153' (ECDSA) to the list of known hosts.
    ��� olcne-agent.service - Agent for Oracle Linux Cloud Native Environments
       Loaded: loaded (/usr/lib/systemd/system/olcne-agent.service; enabled; vendor preset: disabled)
      Drop-In: /etc/systemd/system/olcne-agent.service.d
               ������10-auth.conf
       Active: active (running) since Tue 2022-08-30 15:29:37 GMT; 2s ago
     Main PID: 152809 (olcne-agent)
        Tasks: 8 (limit: 202294)
       Memory: 11.1M
       CGroup: /system.slice/olcne-agent.service
               ������152809 /usr/libexec/olcne-agent --secret-manager-type file --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key
    
    Aug 30 15:29:37 ocne-control04 systemd[1]: Started Agent for Oracle Linux Cloud Native Environments.
    Aug 30 15:29:37 ocne-control04 olcne-agent[152809]: time=30/08/22 15:29:37 level=info msg=Started server on[::]:8090
  2. Configure the ocne-control05 node.

    ssh -o StrictHostKeyChecking=no ocne-control05 'sudo /etc/olcne/bootstrap-olcne.sh \
    --secret-manager-type file \
    --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \
    --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \
    --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \
    --olcne-component agent'

    Example Output:

    [oracle@ocne-operator olcne]$ ssh -o StrictHostKeyChecking=no ocne-control05 'sudo /etc/olcne/bootstrap-olcne.sh \
    > --secret-manager-type file \
    > --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \
    > --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \
    > --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \
    > --olcne-component agent'
    Warning: Permanently added 'ocne-control05,10.0.0.154' (ECDSA) to the list of known hosts.
    ��� olcne-agent.service - Agent for Oracle Linux Cloud Native Environments
      Loaded: loaded (/usr/lib/systemd/system/olcne-agent.service; enabled; vendor preset: disabled)
      Drop-In: /etc/systemd/system/olcne-agent.service.d
               ������10-auth.conf
       Active: active (running) since Tue 2022-08-30 15:34:13 GMT; 2s ago
     Main PID: 153413 (olcne-agent)
        Tasks: 7 (limit: 202294)
       Memory: 9.1M
       CGroup: /system.slice/olcne-agent.service
               ������153413 /usr/libexec/olcne-agent --secret-manager-type file --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key
    
    Aug 30 15:34:13 ocne-control05 systemd[1]: olcne-agent.service: Succeeded.
    Aug 30 15:34:13 ocne-control05 systemd[1]: Stopped Agent for Oracle Linux Cloud Native Environments.
    Aug 30 15:34:13 ocne-control05 systemd[1]: Started Agent for Oracle Linux Cloud Native Environments.
    Aug 30 15:34:13 ocne-control05 olcne-agent[153413]: time=30/08/22 15:34:13 level=info msg=Started server on[::]:8090
  3. Configure the ocne-worker06 node.

    ssh -o StrictHostKeyChecking=no ocne-worker06 'sudo /etc/olcne/bootstrap-olcne.sh \
    --secret-manager-type file \
    --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \
    --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \
    --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \
    --olcne-component agent'

    Example Output:

    [oracle@ocne-operator olcne]$ ssh -o StrictHostKeyChecking=no ocne-worker06 'sudo /etc/olcne/bootstrap-olcne.sh \
    > --secret-manager-type file \
    > --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \
    > --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \
    > --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \
    > --olcne-component agent'
    Warning: Permanently added 'ocne-worker06,10.0.0.165' (ECDSA) to the list of known hosts.
    ��� olcne-agent.service - Agent for Oracle Linux Cloud Native Environments
       Loaded: loaded (/usr/lib/systemd/system/olcne-agent.service; enabled; vendor preset: disabled)
      Drop-In: /etc/systemd/system/olcne-agent.service.d
               ������10-auth.conf
       Active: active (running) since Tue 2022-08-30 15:41:08 GMT; 2s ago
     Main PID: 153988 (olcne-agent)
        Tasks: 8 (limit: 202294)
       Memory: 5.2M
       CGroup: /system.slice/olcne-agent.service
               ������153988 /usr/libexec/olcne-agent --secret-manager-type file --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key
    
    Aug 30 15:41:08 ocne-worker06 systemd[1]: Started Agent for Oracle Linux Cloud Native Environments.
    Aug 30 15:41:08 ocne-worker06 olcne-agent[153988]: time=30/08/22 15:41:08 level=info msg=Started server on[::]:8090
  4. Configure the ocne-worker07 node.

    ssh -o StrictHostKeyChecking=no ocne-worker07 'sudo /etc/olcne/bootstrap-olcne.sh \
    --secret-manager-type file \
    --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \
    --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \
    --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \
    --olcne-component agent'

    Example Output:

    [oracle@ocne-operator olcne]$ ssh -o StrictHostKeyChecking=no ocne-worker07 'sudo /etc/olcne/bootstrap-olcne.sh \
    > --secret-manager-type file \
    > --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert \
    > --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert \
    > --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key \
    > --olcne-component agent'
    Warning: Permanently added 'ocne-worker07,10.0.0.166' (ECDSA) to the list of known hosts.
    ��� olcne-agent.service - Agent for Oracle Linux Cloud Native Environments
       Loaded: loaded (/usr/lib/systemd/system/olcne-agent.service; enabled; vendor preset: disabled)
      Drop-In: /etc/systemd/system/olcne-agent.service.d
               ������10-auth.conf
       Active: active (running) since Tue 2022-08-30 15:43:23 GMT; 2s ago
     Main PID: 154734 (olcne-agent)
        Tasks: 8 (limit: 202294)
       Memory: 9.1M
       CGroup: /system.slice/olcne-agent.service
               ������154734 /usr/libexec/olcne-agent --secret-manager-type file --olcne-ca-path /etc/olcne/configs/certificates/production/ca.cert --olcne-node-cert-path /etc/olcne/configs/certificates/production/node.cert --olcne-node-key-path /etc/olcne/configs/certificates/production/node.key
    
    Aug 30 15:43:23 ocne-worker07 systemd[1]: olcne-agent.service: Succeeded.
    Aug 30 15:43:23 ocne-worker07 systemd[1]: Stopped Agent for Oracle Linux Cloud Native Environments.
    Aug 30 15:43:23 ocne-worker07 systemd[1]: Started Agent for Oracle Linux Cloud Native Environments.
    Aug 30 15:43:23 ocne-worker07 olcne-agent[154734]: time=30/08/22 15:43:23 level=info msg=Started server on[::]:8090

Access the OCI Load Balancer and View the Backends

Because having more than one node defined for the Kubernetes control plane requires a Load Balancer, it is interesting to view the configuration that was automatically setup when the free lab environment was deployed. This will show the three nodes deployed and configured when the lab is created as having a Healthy status and the two nodes that will be added in the upcoming steps as being in Critical status.

  1. Switch from the Terminal to the Luna desktop

  2. Open the Luna Lab details page using the Luna Lab icon.

  3. Click on the OCI Console link.

    oci-link

  4. The Oracle Cloud Console login page displays.

    console-login

  5. Enter the User Name and Password (found on the Luna Lab tab in the Credentials section).

    un-pw

  6. Click on the hamburger menu (top-left), then Networking and Load Balancers.

    select-networking

  7. The Load Balancers page displays.

    load-balancer

  8. Locate the Compartment being used from the drop-down list.

    oci-compartment

  9. Click on the Load Balancer listed in the table (ocne-load-balancer).

    load-balancer

  10. Scroll down the page and click on the link to the Backend Sets (on the left-hand side in the Resources section).

    backend-set

  11. The Backend Sets table is displayed. Click on the link called ocne-lb-backend-set in the Name column.

    load-balancer

  12. Click on the link to the Backends (on the left-hand side in the Resources section).

    backend-link

  13. The Backends representing the control plane nodes are displayed.

    Note Two of the backend nodes are in the Critical - connection failed state because these nodes are not yet part of the Kubernetes control plane cluster. Keep this browser tab open, as we'll recheck the status of the backend nodes after completing the scale-up steps.

    backend-table

View the Kubernetes Nodes

Check the currently available Kubernetes nodes in the cluster. Note that there are three control plane nodes and five worker nodes.

  1. Confirm that the nodes are all in READY status.

    kubectl get nodes

    Example Output:

    [oracle@ocne-operator olcne]$ kubectl get nodes
    NAME             STATUS   ROLES                  AGE     VERSION
    ocne-control01   Ready    control-plane,master   5h15m   v1.23.7+1.el8
    ocne-control02   Ready    control-plane,master   5h14m   v1.23.7+1.el8
    ocne-control03   Ready    control-plane,master   5h13m   v1.23.7+1.el8
    ocne-worker01    Ready    <none>                 5h14m   v1.23.7+1.el8
    ocne-worker02    Ready    <none>                 5h13m   v1.23.7+1.el8
    ocne-worker03    Ready    <none>                 5h12m   v1.23.7+1.el8
    ocne-worker04    Ready    <none>                 5h13m   v1.23.7+1.el8
    ocne-worker05    Ready    <none>                 5h14m   v1.23.7+1.el8

Add Control Plane and Worker Nodes to the Deployment Configuration File

Add the Fully Qualified Domain Name (FQDN) and Platform Agent access port (8090) to all control plane and worker nodes to be added into the cluster.

Edit the YAML deployment configuration file to include the new cluster nodes. Add the control plane nodes under the master-nodes section while adding the worker nodes to the worker-node section.

The filename for the configuration file in this tutorial is myenvironment.yaml and currently includes three control plane and five worker nodes.

  1. Confirm the current environment uses three control planes nodes and five worker nodes.

    cat ~/myenvironment.yaml

    Example Output:

    ...
              master-nodes:
                - ocne-control01.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control02.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control03.lv.vcneea798df.oraclevcn.com:8090
              worker-nodes:
                - ocne-worker01.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker02.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker03.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker04.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker05.lv.vcneea798df.oraclevcn.com:8090
    ...
  2. Add the new control plane and worker nodes to the myenvironment.yaml file.

    cd ~
    sed -i '19 i \            - ocne-control04.'"$(hostname -d)"':8090' ~/myenvironment.yaml
    sed -i '20 i \            - ocne-control05.'"$(hostname -d)"':8090' ~/myenvironment.yaml
    sed -i '27 i \            - ocne-worker06.'"$(hostname -d)"':8090' ~/myenvironment.yaml
    sed -i '28 i \            - ocne-worker07.'"$(hostname -d)"':8090' ~/myenvironment.yaml
  3. Confirm the control plane and worker nodes have been added to the myenvironment.yaml file.

    cat ~/myenvironment.yaml

    Example Excerpt:

    ...
              master-nodes:
                - ocne-control01.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control02.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control03.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control04.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control05.lv.vcneea798df.oraclevcn.com:8090
              worker-nodes:
                - ocne-worker01.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker02.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker03.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker04.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker05.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker06.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker07.lv.vcneea798df.oraclevcn.com:8090   
    ...

The configuration file now includes the new control plane nodes (ocne-control04 and ocne-control05) and the new worker nodes (ocne-worker06 and ocne-worker07). This represents all of the control plane and worker nodes that should be in the cluster after the scale-up completes.

Scale Up the Control Plane and Worker Nodes

  1. Run the module update command.

    Use olcnectl module update command with the --config-file option to specify the location of the configuration file. The Platform API Server validates the configuration file with the state of the cluster and recognises there are more nodes that should be added to the cluster. Answer y when prompted.

    Note: There will be a delay between the prompts in the Terminal window while each of the modules are updated. In the free lab environment this delay may be up to 10-15 minutes.

    olcnectl module update --config-file myenvironment.yaml

    Example Output:

    [oracle@ocne-operator ~]$ olcnectl module update --config-file myenvironment.yaml
    ? [WARNING] Update will shift your workload and some pods will lose data if they rely on local storage. Do you want to continue? Yes
    Taking backup of modules before update
    Backup of modules succeeded.
    Updating modules
    Update successful
    ? [WARNING] Update will shift your workload and some pods will lose data if they rely on local storage. Do you want to continue? Yes
    Taking backup of modules before update
    Backup of modules succeeded.
    Updating modules
    Update successful
    ? [WARNING] Update will shift your workload and some pods will lose data if they rely on local storage. Do you want to continue? Yes
    Taking backup of modules before update
    Backup of modules succeeded.
    Updating modules
    Update successful
  2. (In the Cloud Console) Confirm that the Load Balancer's Backend Set shows five healthy Backend nodes.

    lb-healthy

  3. Confirm that the new control plane and worker nodes have been added to the cluster.

    kubectl get nodes

    Example Output:

    [oracle@ocne-operator ~]$ kubectl get nodes
    NAME             STATUS   ROLES                  AGE   VERSION
    ocne-control01   Ready    control-plane,master   99m   v1.23.7+1.el8
    ocne-control02   Ready    control-plane,master   97m   v1.23.7+1.el8
    ocne-control03   Ready    control-plane,master   96m   v1.23.7+1.el8
    ocne-control04   Ready    control-plane,master   13m   v1.23.7+1.el8
    ocne-control05   Ready    control-plane,master   12m   v1.23.7+1.el8
    ocne-worker01    Ready    <none>                 99m   v1.23.7+1.el8
    ocne-worker02    Ready    <none>                 98m   v1.23.7+1.el8
    ocne-worker03    Ready    <none>                 98m   v1.23.7+1.el8
    ocne-worker04    Ready    <none>                 98m   v1.23.7+1.el8
    ocne-worker05    Ready    <none>                 98m   v1.23.7+1.el8
    ocne-worker06    Ready    <none>                 13m   v1.23.7+1.el8
    ocne-worker07    Ready    <none>                 13m   v1.23.7+1.el8

    Notice that new control planes nodes (ocne-control04 and ocne-control05) and the new worker nodes (ocne-work06 and ocne-worker07) are now included in the cluster. Thereby confirming that the scale up operation worked.

Scale Down the Control Plane Nodes

To demonstrate that the control plane and worker nodes can scale independently, we'll just scale down (remove) the control plane nodes in this step.

  1. Confirm the current environment uses five control planes nodes and seven worker nodes.

    cat ~/myenvironment.yaml

    Example Output:

    ...
              master-nodes:
                - ocne-control01.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control02.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control03.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control04.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control05.lv.vcneea798df.oraclevcn.com:8090
              worker-nodes:
                - ocne-worker01.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker02.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker03.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker04.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker05.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker06.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker07.lv.vcneea798df.oraclevcn.com:8090
    ...
  2. To scale the cluster back down to the original three control plane, remove the ocne-control04 and ocne-control05 control plane nodes from the configuration file.

    sed -i '19d;20d' ~/myenvironment.yaml
  3. Confirm the configuration file now contains only three control planes nodes and the seven worker nodes.

    cat ~/myenvironment.yaml

    Example Excerpt:

    ...
              master-nodes:
                - ocne-control01.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control02.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control03.lv.vcneea798df.oraclevcn.com:8090
              worker-nodes:
                - ocne-worker01.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker02.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker03.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker04.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker05.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker06.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker07.lv.vcneea798df.oraclevcn.com:8090
    ...
  4. Suppress the module update warning message.

    It is possible to avoid and suppress the confirmation prompt during module update by adding the force: true directive to the configuration file. This directive needs to be placed immediately under the name: <xxxx> directive for each module defined.

    cd ~
    sed -i '12 i \        force: true' ~/myenvironment.yaml
    sed -i '35 i \        force: true' ~/myenvironment.yaml
    sed -i '40 i \        force: true' ~/myenvironment.yaml
  5. Confirm the configuration file now contains the force: true directive.

    cat ~/myenvironment.yaml

    Example Excerpt:

    [oracle@ocne-operator ~]$ cat ~/myenvironment.yaml
    environments:
      - environment-name: myenvironment
        globals:
          api-server: 127.0.0.1:8091
          secret-manager-type: file
          olcne-ca-path: /etc/olcne/configs/certificates/production/ca.cert
          olcne-node-cert-path: /etc/olcne/configs/certificates/production/node.cert
          olcne-node-key-path:  /etc/olcne/configs/certificates/production/node.key
        modules:
          - module: kubernetes
            name: mycluster
            force: true
            args:
              container-registry: container-registry.oracle.com/olcne
              load-balancer: 10.0.0.18:6443
              master-nodes:
                - ocne-control01.lv.vcn1174e41d.oraclevcn.com:8090
                - ocne-control02.lv.vcn1174e41d.oraclevcn.com:8090
                - ocne-control03.lv.vcn1174e41d.oraclevcn.com:8090
              worker-nodes:
                - ocne-worker01.lv.vcn1174e41d.oraclevcn.com:8090
                - ocne-worker02.lv.vcn1174e41d.oraclevcn.com:8090
                - ocne-worker03.lv.vcn1174e41d.oraclevcn.com:8090
                - ocne-worker04.lv.vcn1174e41d.oraclevcn.com:8090
                - ocne-worker05.lv.vcn1174e41d.oraclevcn.com:8090
                - ocne-worker06.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker07.lv.vcneea798df.oraclevcn.com:8090
              selinux: enforcing
              restrict-service-externalip: true
              restrict-service-externalip-ca-cert: /etc/olcne/configs/certificates/restrict_external_ip/production/ca.cert
              restrict-service-externalip-tls-cert: /etc/olcne/configs/certificates/restrict_external_ip/production/node.cert
              restrict-service-externalip-tls-key: /etc/olcne/configs/certificates/restrict_external_ip/production/node.key
          - module: helm
            name: myhelm
            force: true
            args:
              helm-kubernetes-module: mycluster      
          - module: oci-ccm
            name: myoci
            force: true
            oci-ccm-helm-module: myhelm
            oci-use-instance-principals: true
            oci-compartment: ocid1.compartment.oc1..aaaaaaaanr6cysadeswwxc7sczdsrlamzhfh6scdyvuh4s4fmvecob6e2cha
            oci-vcn: ocid1.vcn.oc1.eu-frankfurt-1.amaaaaaag7acy3iat3duvrym376oax7nxdyqd56mqxtjaws47t4g7vqthgja
            oci-lb-subnet1: ocid1.subnet.oc1.eu-frankfurt-1.aaaaaaaa6rt6chugbkfhyjyl4exznpxrlvnus2bgkzcgm7fljfkqbxkva6ya         
  6. Run the command to update the cluster and remove the nodes.

    Note: This may take a few minutes to complete.

    olcnectl module update --config-file myenvironment.yaml

    Example Output:

    [oracle@ocne-operator ~]$ olcnectl module update --config-file myenvironment.yaml
    Taking backup of modules before update
    Backup of modules succeeded.
    Updating modules
    Update successful
    Taking backup of modules before update
    Backup of modules succeeded.
    Updating modules
    Update successful
    Taking backup of modules before update
    Backup of modules succeeded.
    Updating modules
    Update successful
  7. (In the Cloud Console) Confirm that the Load Balancer's Backend Set shows three healthy (Health = 'OK') and two unhealthy (Health = 'Critical - Connection failed') nodes. The reason two nodes show as having a critical status is because they have been removed from the Kubernetes cluster.

    lb-healthy

  8. Show the control plane nodes are removed from the cluster by the Platform API Server. Confirm that the control plane (ocne-control04 and ocne-control05) nodes have been removed.

    kubectl get nodes

    Example Output:

    [oracle@ocne-operator ~]$ kubectl get nodes
    NAME             STATUS   ROLES                  AGE    VERSION
    ocne-control01   Ready    control-plane,master   164m   v1.23.7+1.el8
    ocne-control02   Ready    control-plane,master   163m   v1.23.7+1.el8
    ocne-control03   Ready    control-plane,master   162m   v1.23.7+1.el8
    ocne-worker01    Ready    <none>                 164m   v1.23.7+1.el8
    ocne-worker02    Ready    <none>                 163m   v1.23.7+1.el8
    ocne-worker03    Ready    <none>                 164m   v1.23.7+1.el8
    ocne-worker04    Ready    <none>                 164m   v1.23.7+1.el8
    ocne-worker05    Ready    <none>                 164m   v1.23.7+1.el8
    ocne-worker06    Ready    <none>                 13m   v1.23.7+1.el8
    ocne-worker07    Ready    <none>                 13m   v1.23.7+1.el8

Summary

This completes the demonstration detailing how to add, and then remove, Kubernetes nodes from the cluster. Whilst this exercise demonstrated updating both the control plane and worker nodes simultaneously this is not the recommended approach to scaling up, or scaling down an Oracle Cloud Native Environment Kubernetes cluster, and in a production environment would most likely be undertaken separately.

For More Information

SSR