Scale a Kubernetes Cluster on Oracle Cloud Native Environment

0
0
Send lab feedback

Scale a Kubernetes Cluster on Oracle Cloud Native Environment

Introduction

This tutorial demonstrates how to scale an existing Kubernetes cluster in Oracle Cloud Native Environment.

Scaling up a Kubernetes cluster means adding nodes; likewise, scaling down occurs by removing nodes. Nodes can be either control plane or worker nodes. Oracle recommends against scaling the cluster up and down at the same time. Instead, perform a scale up and then down in two separate commands.

We recommend scaling the Kubernetes cluster control plane or worker nodes in odd numbers to avoid split-brain scenarios and maintain the quorum. For example, 3, 5, or 7 control plane or worker nodes ensure the reliability of the cluster.

We start with an existing Highly Available Kubernetes cluster running on Oracle Cloud Native Environment that consists of the following:

  • 1 Operator Node
  • 3 Control Plane Nodes
  • 5 Worker Nodes

The deployment for this tutorial builds upon these labs:

Objectives

At the end of this tutorial, you should be able to do the following:

  • Add two new control plane nodes and two new worker nodes to a cluster
  • Scale down the cluster by removing the same nodes from the cluster

Prerequisites

Note: These prerequisites are provided as the starting point and automatically deploy when using the free lab environment.

  • A Highly Available Kubernetes cluster running on Oracle Cloud Native Environment

  • 4 additional Oracle Linux instances to use as:

    • 2 Kubernetes control plane nodes
    • 2 Kubernetes worker nodes
  • Access to a Load Balancer such as OCI Load Balancer

  • The additional Oracle Linux instances need the following:

    • The same OS and patch level as the original cluster
    • The completion of the prerequisite steps to install Oracle Cloud Native Environment
    • Set up the Kubernetes control plane and worker nodes

Set Up Lab Environment

Note: When using the free lab environment, see Oracle Linux Lab Basics for connection and other usage instructions.

Information: The free lab environment deploys a fully functional Oracle Cloud Native Environment on the provided nodes. This deployment takes approximately 60-65 minutes to finish after launch. Therefore, you might want to step away while this runs and then return to complete the lab.

  1. Open a terminal and connect via ssh to the ocne-operator system.

    ssh oracle@<ip_address_of_operator_node>
  2. Verify the deployment of the Kubernetes and OCI-CCM modules.

    olcnectl module instances --config-file myenvironment.yaml

    Example Output:

    [oracle@ocne-operator ~]$ olcnectl module instances --config-file myenvironment.yaml
    INSTANCE                                            MODULE      STATE    
    mycluster                                           kubernetes  installed
    myoci                                               oci-ccm     installed
    ocne-control-01.lv.vcn03957132.oraclevcn.com:8090   node        installed
    ocne-control-02.lv.vcn03957132.oraclevcn.com:8090   node        installed
    ocne-control-03.lv.vcn03957132.oraclevcn.com:8090   node        installed
    ocne-worker-01.lv.vcn03957132.oraclevcn.com:8090    node        installed
    ocne-worker-02.lv.vcn03957132.oraclevcn.com:8090    node        installed
    ocne-worker-03.lv.vcn03957132.oraclevcn.com:8090    node        installed
    ocne-worker-04.lv.vcn03957132.oraclevcn.com:8090    node        installed
    ocne-worker-05.lv.vcn03957132.oraclevcn.com:8090    node        installed
  3. Verify that kubectl works.

    ssh ocne-control-01 "kubectl get nodes"

    Example Output:

    [oracle@ocne-operator ~]$ ssh ocne-control-01 "kubectl get nodes"
    NAME              STATUS   ROLES           AGE   VERSION
    ocne-control-01   Ready    control-plane   35m   v1.28.3+3.el8
    ocne-control-02   Ready    control-plane   34m   v1.28.3+3.el8
    ocne-control-03   Ready    control-plane   32m   v1.28.3+3.el8
    ocne-worker-01    Ready    <none>          33m   v1.28.3+3.el8
    ocne-worker-02    Ready    <none>          33m   v1.28.3+3.el8
    ocne-worker-03    Ready    <none>          31m   v1.28.3+3.el8
    ocne-worker-04    Ready    <none>          35m   v1.28.3+3.el8
    ocne-worker-05    Ready    <none>          33m   v1.28.3+3.el8

Set up the New Kubernetes Nodes

Note: The free lab environment completes the prerequisite steps during the initial deployment.

When scaling up, any new nodes require all of the prerequisites listed in this tutorial's Prerequisites section.

In the free lab environment, we use the nodes ocne-control-04 and ocne-control-05 as the new control plane nodes, while the nodes ocne-worker-06 and ocne-worker-07 are the new worker nodes. Given the free lab environment handles the prerequisites and enables the Oracle Cloud Native Environment Platform Agent service, we can proceed to generate certificates.

Create X.509 Private CA Certificates

The free lab environments use X.509 Private CA Certificates to secure node communication. Other methods exist to manage and deploy the certificates, such as using the HashiCorp Vault secrets manager or certificates signed by a trusted Certificate Authority (CA). Covering the usage of these other methods is outside the scope of this tutorial.

  1. Create a list of new nodes.

    VAR1=$(hostname -d)
    for NODE in 'ocne-control-04' 'ocne-control-05' 'ocne-worker-06' 'ocne-worker-07'; do VAR2+="${NODE}.$VAR1,"; done
    VAR2=${VAR2%,}

    The provided bash script grabs the domain name of the operator node and creates a comma separated list of the nodes to add to the cluster during the scale up procedure.

  2. Generate and distribute a set of certificates for the new nodes using the existing private CA.

    Use the --byo-ca-cert option to specify the location of the existing CA Certificate and the --byo-ca-key option to specify the location of the existing CA Key. Use the --nodes option and provide the FQDN of the new control plane and worker nodes.

    olcnectl certificates distribute \
    --cert-dir $HOME/certificates \
    --byo-ca-cert $HOME/certificates/ca/ca.cert \
    --byo-ca-key $HOME/certificates/ca/ca.key \
    --nodes $VAR2

Configure the Platform Agent to Use the Certificates

Configure the Platform Agent on each new node to use the certificates copied over in the previous step. We accomplish this task from the operator node by running the command over ssh.

  1. Configure for each additional control plane and worker node.

    for host in ocne-control-04 ocne-control-05 ocne-worker-06 ocne-worker-07
    do
    ssh $host /bin/bash <<EOF
      sudo /etc/olcne/bootstrap-olcne.sh --secret-manager-type file --olcne-component agent
    EOF
    done
    

Access the OCI Load Balancer and View the Backends

Because having more than one node defined for the Kubernetes control plane requires a Load Balancer, it is interesting to view the configuration we automatically set up when deploying the free lab environment. These steps show the three nodes deployed and configured when creating the lab as having a Healthy status and the two nodes we add in the upcoming steps as being in Critical status.

  1. Switch from the Terminal to the Luna desktop

  2. Open the Luna Lab details page using the Luna Lab icon.

  3. Click on the OCI Console link.

    oci-link

  4. The Oracle Cloud Console login page displays.

    console-login

  5. Enter the User Name and Password (found on the Luna Lab tab in the Credentials section).

    un-pw

  6. Click on the navigation menu in the page's top-left corner, then Networking and Load Balancers.

    select-networking

  7. The Load Balancers page displays.

    load-balancer

  8. Locate the Compartment being used from the drop-down list.

    oci-compartment

  9. Click on the Load Balancer listed in the table (ocne-load-balancer).

    load-balancer

  10. Under the Resources section in the navigation panel on the left-hand side of the browser window, scroll down the page and click on the link to the Backend Sets.

    backend-set

    The Backend Sets table displays.

  11. Click on the ocne-lb-backend-set link in the Name column.

    load-balancer

  12. Under the Resources section in the navigation panel on the left-hand side of the browser window, scroll down the page and click the Backends link.

    backend-link

  13. The page displays the Backends representing the control plane nodes.

    Note Two of the backend nodes are in the Critical - connection failed state because these nodes are not yet part of the Kubernetes control plane cluster. Keep this browser tab open, as we'll recheck the status of the backend nodes after completing the scale-up steps.

    backend-table

View the Kubernetes Nodes

Check the currently available Kubernetes nodes in the cluster. Note that there are three control plane nodes and five worker nodes.

  1. Confirm that the nodes are all in READY status.

    ssh ocne-control-01 "kubectl get nodes"

    Example Output:

    [oracle@ocne-operator olcne]$ ssh ocne-control-01 "kubectl get nodes"
    NAME             STATUS   ROLES           AGE     VERSION
    ocne-control-01   Ready    control-plane   5h15m   v1.28.3+3.el8
    ocne-control-02   Ready    control-plane   5h14m   v1.28.3+3.el8
    ocne-control-03   Ready    control-plane   5h13m   v1.28.3+3.el8
    ocne-worker-01    Ready    <none>          5h14m   v1.28.3+3.el8
    ocne-worker-02    Ready    <none>          5h13m   v1.28.3+3.el8
    ocne-worker-03    Ready    <none>          5h12m   v1.28.3+3.el8
    ocne-worker-04    Ready    <none>          5h13m   v1.28.3+3.el8
    ocne-worker-05    Ready    <none>          5h14m   v1.28.3+3.el8

Add Control Plane and Worker Nodes to the Deployment Configuration File

Before scaling the Kubernetes cluster, you must add the Fully Qualified Domain Name (FQDN) and Platform Agent access port of 8090 for each new node to the appropriate section of the deployment configuration file.

  1. Confirm the current environment uses three control plane nodes and five worker nodes.

    cat ~/myenvironment.yaml

    Example Output:

    ...
              control-plane-nodes:
                - ocne-control-01.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control-02.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control-03.lv.vcneea798df.oraclevcn.com:8090
              worker-nodes:
                - ocne-worker-01.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker-02.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker-03.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker-04.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker-05.lv.vcneea798df.oraclevcn.com:8090
    ...
  2. Add the new control plane and worker nodes to the deployment configuration file.

    The free lab environment uses a file named myenvironment.yaml.

    cd ~
    sed -i '20 i \            - ocne-control-04.'"$(hostname -d)"':8090' ~/myenvironment.yaml
    sed -i '21 i \            - ocne-control-05.'"$(hostname -d)"':8090' ~/myenvironment.yaml
    sed -i '28 i \            - ocne-worker-06.'"$(hostname -d)"':8090' ~/myenvironment.yaml
    sed -i '29 i \            - ocne-worker-07.'"$(hostname -d)"':8090' ~/myenvironment.yaml
  3. Confirm the addition of the control plane and worker nodes in the deployment configuration file.

    cat ~/myenvironment.yaml

    Example Excerpt:

    ...
              master-nodes:
                - ocne-control-01.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control-02.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control-03.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control-04.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control-05.lv.vcneea798df.oraclevcn.com:8090
              worker-nodes:
                - ocne-worker-01.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker-02.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker-03.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker-04.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker-05.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker-06.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker-07.lv.vcneea798df.oraclevcn.com:8090   
    ...

The deployment configuration file now includes the new control plane nodes (ocne-control-04 and ocne-control-05) and the new worker nodes (ocne-worker-06 and ocne-worker-07). This change represents all of the control plane and worker nodes that should be in the cluster after the scale-up completes.

Scale Up the Control Plane and Worker Nodes

  1. (Optional) Avoid using the --api-server flag in future olcnectl commands.

    Get a list of the module instances and add the --update-config flag.

    olcnectl module instances \
    --config-file myenvironment.yaml \
    --update-config

    Note: The myenvironment.yaml file includes this option already and sets the value to true

  2. Run the module update command.

    Use the olcnectl module update command with the --config-file option to specify the configuration file's location. The Platform API Server validates the configuration file and compares it with the state of the cluster. After the comparison, it recognizes there are more nodes to add to the cluster.

    olcnectl module update --config-file myenvironment.yaml --log-level debug

    Note: The --log-level debug shows the command's output to the console in debug mode, allowing the user to follow along with the progress.

    Respond with y to the following prompts during the upgrade process.

    [WARNING] Update will shift your workload and some pods will lose data if they rely on local storage. Do you want to continue? (y/N) y
  3. Switch to the browser and the Cloud Console window.

  4. Confirm that the Load Balancer's Backend Set shows five healthy Backend nodes.

    lb-healthy

  5. Confirm the addition of the new control plane and worker nodes to the cluster.

    ssh ocne-control-01 "kubectl get nodes"

    Example Output:

    [oracle@ocne-operator ~]$ ssh ocne-control-01 "kubectl get nodes"
    NAME             STATUS   ROLES           AGE   VERSION
    ocne-control-01   Ready    control-plane   99m   v1.28.3+3.el8
    ocne-control-02   Ready    control-plane   97m   v1.28.3+3.el8
    ocne-control-03   Ready    control-plane   96m   v1.28.3+3.el8
    ocne-control-04   Ready    control-plane   13m   v1.28.3+3.el8
    ocne-control-05   Ready    control-plane   12m   v1.28.3+3.el8
    ocne-worker-01    Ready    <none>          99m   v1.28.3+3.el8
    ocne-worker-02    Ready    <none>          98m   v1.28.3+3.el8
    ocne-worker-03    Ready    <none>          98m   v1.28.3+3.el8
    ocne-worker-04    Ready    <none>          98m   v1.28.3+3.el8
    ocne-worker-05    Ready    <none>          98m   v1.28.3+3.el8
    ocne-worker-06    Ready    <none>          13m   v1.28.3+3.el8
    ocne-worker-07    Ready    <none>          13m   v1.28.3+3.el8

    Notice that new control plane nodes (ocne-control-04 and ocne-control-05) and the new worker nodes (ocne-work-06 and ocne-worker-07) are now part of the cluster. This output confirms a successful scale up operation.

Scale Down the Control Plane Nodes

Next, we'll only scale down the control plane nodes to demonstrate that the control plane and worker nodes can scale independently.

  1. Confirm the current environment uses five control plane nodes and seven worker nodes.

    cat ~/myenvironment.yaml

    Example Output:

    ...
              master-nodes:
                - ocne-control-01.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control-02.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control-03.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control-04.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control-05.lv.vcneea798df.oraclevcn.com:8090
              worker-nodes:
                - ocne-worker-01.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker-02.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker-03.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker-04.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker-05.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker-06.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker-07.lv.vcneea798df.oraclevcn.com:8090
    ...
  2. To scale the cluster down to the original three control plane, remove the ocne-control-04 and ocne-control-05 control plane nodes from the configuration file.

    sed -i '19d;20d' ~/myenvironment.yaml
  3. Confirm the configuration file now contains only three control plane nodes and the seven worker nodes.

    cat ~/myenvironment.yaml

    Example Excerpt:

    ...
              master-nodes:
                - ocne-control-01.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control-02.lv.vcneea798df.oraclevcn.com:8090
                - ocne-control-03.lv.vcneea798df.oraclevcn.com:8090
              worker-nodes:
                - ocne-worker01.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker02.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker03.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker04.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker05.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker-06.lv.vcneea798df.oraclevcn.com:8090
                - ocne-worker-07.lv.vcneea798df.oraclevcn.com:8090
    ...
  4. Suppress the module update warning message.

    It is possible to avoid and suppress the confirmation prompt during the module update by adding the force: true directive to the configuration file. Place this directive immediately under the name: <xxxx> directive for each module defined.

    cd ~
    sed -i '13 i \        force: true' ~/myenvironment.yaml
    sed -i '37 i \        force: true' ~/myenvironment.yaml
  5. Confirm the configuration file contains the force: true directive.

    cat ~/myenvironment.yaml

    Example Excerpt:

    [oracle@ocne-operator ~]$ cat ~/myenvironment.yaml
    ...
          - module: kubernetes
            name: mycluster
            force: true
            args:
              container-registry: container-registry.oracle.com/olcne
    ...
          - module: oci-ccm
            name: myoci
            force: true
            oci-ccm-kubernetes-module: mycluster
    ...
  6. Update the cluster and remove the nodes.

    Note: This may take a few minutes to complete.

    olcnectl module update --config-file myenvironment.yaml

    Example Output:

    [oracle@ocne-operator ~]$ olcnectl module update --config-file myenvironment.yaml
    Taking backup of modules before update
    Backup of modules succeeded.
    Updating modules
    Update successful
    Taking backup of modules before update
    Backup of modules succeeded.
    Updating modules
    Update successful
  7. Switch to the browser and the Cloud Console window.

  8. Confirm the Load Balancer's Backend Set status.

    The page shows three healthy (Health = 'OK') and two unhealthy (Health = 'Critical - Connection failed') nodes. After removing nodes from the Kubernetes cluster, they appear critical to the load balancer since they are no longer available.

    lb-healthy

  9. Confirm the removal of the control plane nodes.

    ssh ocne-control-01 "kubectl get nodes"

    Example Output:

    [oracle@ocne-operator ~]$ ssh ocne-control-01 "kubectl get nodes"
    NAME             STATUS   ROLES           AGE    VERSION
    ocne-control-01   Ready    control-plane   164m   v1.28.3+3.el8
    ocne-control-02   Ready    control-plane   163m   v1.28.3+3.el8
    ocne-control-03   Ready    control-plane   162m   v1.28.3+3.el8
    ocne-worker-01    Ready    <none>          164m   v1.28.3+3.el8
    ocne-worker-02    Ready    <none>          163m   v1.28.3+3.el8
    ocne-worker-03    Ready    <none>          164m   v1.28.3+3.el8
    ocne-worker-04    Ready    <none>          164m   v1.28.3+3.el8
    ocne-worker-05    Ready    <none>          164m   v1.28.3+3.el8
    ocne-worker-06    Ready    <none>           13m   v1.28.3+3.el8
    ocne-worker-07    Ready    <none>           13m   v1.28.3+3.el8

Summary

That completes the demonstration of adding and removing Kubernetes nodes from a cluster. While this exercise demonstrated updating the control plane and worker nodes simultaneously, this is not the recommended approach to scaling up or scaling down an Oracle Cloud Native Environment Kubernetes cluster. In production environments, administrators should undertake these tasks separately.

For More Information

SSR