Deploy Internal Load Balancer with Oracle Cloud Native Environment

0
0
Send lab feedback

Deploy an Internal Load Balancer with Oracle Cloud Native Environment

Introduction

Oracle Cloud Native Environment is a fully integrated suite for developing and managing cloud-native applications. The Kubernetes module is the core module. It deploys and manages containers and automatically installs and configures CRI-O and RunC. CRI-O manages the container runtime for a Kubernetes cluster, which defaults to RunC.

Objectives

In this lab, you will learn:

  • Configure the Kubernetes cluster with an internal load balancer to enable high availability
  • Configure Oracle Cloud Native Environment on a 5-node cluster
  • Verify keepalived failover between the control plane nodes completes successfully

Support Note: Using the internal load balancer is NOT recommended for production deployments. Instead, please use a correctly configured (external) load balancer.

Prerequisites

  • Minimum of 6 Oracle Linux instances for the Oracle Cloud Native Environment cluster:

    • Operator node
    • 3 Kubernetes control plane nodes
    • 2 Kubernetes worker nodes
  • Each system should have Oracle Linux installed and configured with:

    • An Oracle user account (used during the installation) with sudo access
    • Key-based SSH, also known as password-less SSH, between the hosts
    • Prerequisites for Oracle Cloud Native Environment
  • Additional requirements include:

    • A virtual IP address for the primary control plane node.
      • Do not use this IP address on any of the nodes.

      • The load balancer dynamically sets the IP address to the control plane node assigned as the primary controller.

        Note: If you are deploying to Oracle Cloud Infrastructure, your tenancy requires enabling a new feature introduced in OCI: Layer 2 Networking for VLANs within your virtual cloud networks (VCNs). The OCI Layer 2 Networking feature is not generally available, although the free lab environment's tenancy enables this feature.

        If you have a use case, please work with your technical team to get your tenancy listed to use this feature.

Deploy Oracle Cloud Native Environment

Note: If running in your own tenancy, read the linux-virt-labs GitHub project README.md and complete the prerequisites before deploying the lab environment.

  1. Open a terminal on the Luna Desktop.

  2. Clone the linux-virt-labs GitHub project.

    git clone https://github.com/oracle-devrel/linux-virt-labs.git
  3. Change into the working directory.

    cd linux-virt-labs/ocne
  4. Install the required collections.

    ansible-galaxy collection install -r requirements.yml
  5. Update the Oracle Cloud Native Environment configuration.

    cat << EOF | tee instances.yml > /dev/null
    compute_instances:
      1:
        instance_name: "ocne-operator"
        type: "operator"
      2:
        instance_name: "ocne-control-01"
        type: "controlplane"
      3:
        instance_name: "ocne-worker-01"
        type: "worker"
      4:
        instance_name: "ocne-worker-02"
        type: "worker"
      5:
        instance_name: "ocne-control-02"
        type: "controlplane"
      6:
        instance_name: "ocne-control-03"
        type: "controlplane"
    EOF
  6. Deploy the lab environment.

    ansible-playbook create_instance.yml -e localhost_python_interpreter="/usr/bin/python3.6" -e ocne_type=vlan -e use_vlan=true -e "@instances.yml"

    The free lab environment requires the extra variable local_python_interpreter, which sets ansible_python_interpreter for plays running on localhost. This variable is needed because the environment installs the RPM package for the Oracle Cloud Infrastructure SDK for Python, located under the python3.6 modules.

    Important: Wait for the playbook to run successfully and reach the pause task. At this stage of the playbook, the installation of Oracle Cloud Native Environment is complete, and the instances are ready. Take note of the previous play, which prints the public and private IP addresses of the nodes it deploys and any other deployment information needed while running the lab.

Set Firewall Rules on Control Plane Nodes

  1. Open a terminal and connect via SSH to the ocne-operator node.

    ssh oracle@<ip_address_of_node>
  2. Set the firewall rules and enable the Virtual Router Redundancy Protocol (VRRP) protocol on each control plane node.

    for host in ocne-control-01 ocne-control-02 ocne-control-03
    do
      ssh $host "sudo firewall-cmd --zone=public --add-port=6444/tcp --permanent; sudo firewall-cmd --zone=public --add-protocol=vrrp --permanent; sudo firewall-cmd --reload"
    done

    Note: You must complete this step before proceeding to ensure the load balancer process can communicate between the control plane nodes.

Create a Platform CLI Configuration File

Administrators can use a configuration file to simplify creating and managing environments and modules. The configuration file, written in valid YAML syntax, includes all information about the environments and modules to deploy. Using a configuration file saves repeated entries of Platform CLI command options.

Note: When entering more than one control plane node in the myenvironment.yaml file while configuring Oracle Cloud Native Environment, the olcnectl command requires setting the virtual IP address option in the configuration file. In the Kubernetes module section, enter the argument virtual-ip: <vrrp-ip-address>.

More information on creating a configuration file is in the documentation at Using a Configuration File .

  1. Create a configuration file.

    cat << EOF | tee ~/myenvironment.yaml > /dev/null
    environments:
      - environment-name: myenvironment
        globals:
          api-server: 127.0.0.1:8091
          secret-manager-type: file
          olcne-ca-path: /etc/olcne/certificates/ca.cert
          olcne-node-cert-path: /etc/olcne/certificates/node.cert
          olcne-node-key-path:  /etc/olcne/certificates/node.key
        modules:
          - module: kubernetes
            name: mycluster
            args:
              container-registry: container-registry.oracle.com/olcne
              virtual-ip: 10.0.12.111
              control-plane-nodes:
                - 10.0.12.10:8090
                - 10.0.12.11:8090
                - 10.0.12.12:8090
              worker-nodes:
                - 10.0.12.20:8090
                - 10.0.12.21:8090
              selinux: enforcing
              restrict-service-externalip: true
              restrict-service-externalip-ca-cert: /home/oracle/certificates/ca/ca.cert
              restrict-service-externalip-tls-cert: /home/oracle/certificates/restrict_external_ip/node.cert
              restrict-service-externalip-tls-key: /home/oracle/certificates/restrict_external_ip/node.key
    EOF

Create the Environment and Kubernetes Module

  1. Create the environment.

    cd ~
    olcnectl environment create --config-file myenvironment.yaml
  2. Create the Kubernetes module.

    olcnectl module create --config-file myenvironment.yaml
  3. Validate the Kubernetes module.

    olcnectl module validate --config-file myenvironment.yaml

    There are no validation errors in the free lab environment. The command's output provides the steps required to fix the nodes if there are any errors.

  4. Install the Kubernetes module.

    olcnectl module install --config-file myenvironment.yaml

    The deployment of Kubernetes to the nodes may take several minutes to complete.

  5. Validate the deployment of the Kubernetes module.

    olcnectl module instances --config-file myenvironment.yaml

    Example Output:

    [oracle@ocne-operator ~]$ olcnectl module instances --config-file myenvironment.yaml
    INSTANCE        MODULE      STATE    
    10.0.12.10:8090 node        installed
    10.0.12.11:8090 node        installed
    10.0.12.12:8090 node        installed
    10.0.12.20:8090 node        installed
    10.0.12.21:8090 node        installed
    mycluster       kubernetes  installed
    [oracle@ocne-operator ~]$

Set up kubectl

  1. Set up the kubectl command on the control plane nodes.

    for host in ocne-control-01 ocne-control-02 ocne-control-03
    do
    ssh $host /bin/bash <<EOF
      mkdir -p $HOME/.kube
      sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
      sudo chown $(id -u):$(id -g) $HOME/.kube/config
      export KUBECONFIG=$HOME/.kube/config
      echo 'export KUBECONFIG=$HOME/.kube/config' >> $HOME/.bashrc
    EOF
    done

    Repeating this step across each control plane node is essential due to the potential for a node to go offline. In practice, install the kubectl utility in a separate node outside of the cluster.

  2. Verify that kubectl works and that the install completed successfully, with all nodes listed as being in the Ready status.

    ssh ocne-control-01 kubectl get nodes

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl get nodes
    NAME              STATUS   ROLES           AGE     VERSION
    ocne-control-01   Ready    control-plane   12m     v1.28.3+3.el8
    ocne-control-02   Ready    control-plane   10m     v1.28.3+3.el8
    ocne-control-03   Ready    control-plane   9m28s   v1.28.3+3.el8
    ocne-worker-01    Ready    <none>          8m36s   v1.28.3+3.el8
    ocne-worker-02    Ready    <none>          8m51s   v1.28.3+3.el8

Confirm Failover Between Control Plane Nodes

The Oracle Cloud Native Environment installation with three control plane nodes behind an internal load balancer is complete.

The following steps confirm that the internal load balancer (using keepalived) will detect when the primary control plane node fails and passes control to one of the surviving control plane nodes. Likewise, when the 'missing' node recovers, it can automatically rejoin the cluster.

Locate the Primary Control Plane Node

Determine which control plane node currently holds the virtual IP address.

  1. List each control plane node's network devices and IP addresses.

    for host in ocne-control-01 ocne-control-02 ocne-control-03
    do
      ssh $host "echo -e '\n---- $host ----\n'; ip -br a"
    done

    The -br option summarizes the IP address information assigned to a device.

  2. Look at the results and find which host's output contains the virtual IP associated with the ens5 NIC.

    In the free lab environment, the virtual IP address used by the keepalived daemon is set to 10.0.12.111.

    ens5             UP             10.0.12.11/24 10.0.12.111/32 fe80::41f6:2a0d:9a89:13d0/64 

    Example Output:

    ---- ocne-control-01 ----
    
    lo               UNKNOWN        127.0.0.1/8 ::1/128 
    ens3             UP             10.0.0.150/24 fe80::17ff:fe06:56dd/64 
    ens5             UP             10.0.12.10/24 fe80::b5ee:51ab:8efd:92de/64 
    flannel.1        UNKNOWN        10.244.0.0/32 fe80::e80c:9eff:fe27:e0b3/64 
    
    ---- ocne-control-02 ----
    
    lo               UNKNOWN        127.0.0.1/8 ::1/128 
    ens3             UP             10.0.0.151/24 fe80::17ff:fe02:f7d0/64 
    ens5             UP             10.0.12.11/24 fe80::d7b2:5f5b:704b:369/64 
    flannel.1        UNKNOWN        10.244.1.0/32 fe80::48ae:67ff:fef1:862c/64 
    
    ---- ocne-control-03 ----
    
    lo               UNKNOWN        127.0.0.1/8 ::1/128 
    ens3             UP             10.0.0.152/24 fe80::17ff:fe0b:7feb/64 
    ens5             UP             10.0.12.12/24 10.0.12.111/32 fe80::ba57:2df0:d652:f8b0/64 
    flannel.1        UNKNOWN        10.244.2.0/32 fe80::10db:c4ff:fee9:2121/64 
    cni0             UP             10.244.2.1/24 fe80::bc0b:26ff:fe2d:383e/64 
    veth1814a0d4@if2 UP             fe80::ec22:ddff:fe41:1695/64 
    veth321409fc@if2 UP             fe80::485e:c4ff:feec:6914/64 

    The ocne-control-03 node contains the virtual IP address in the example output.

    Important: Note of which control plane node currently holds the virtual IP address in your running environment.

  3. (Optional) Confirm which control plane node holds the virtual IP address by querying the keepalived.service logs on each control plane node.

    for host in ocne-control-01 ocne-control-02 ocne-control-03
    do
       printf "======= $host =======\n\n"
       ssh $host "sudo journalctl -u keepalived | grep vrrp | grep -i ENTER"
    done

    Look at the last entry in the specific host's output list.

    Example Output:

    ...
    Aug 10 23:47:26 ocne-control01 Keepalived_vrrp[55605]: (VI_1) Entering MASTER STATE
    ...

    This control plane node has the virtual IP address assigned.

    Example Output:

    ...
    Aug 10 23:54:59 ocne-control02 Keepalived_vrrp[59961]: (VI_1) Entering BACKUP STATE (init)
    ...

    This control plane node does not.

Force the keepalived Daemon to Move to a Different Control Plane Node

Double-click the Luna Lab icon on the desktop in the free lab environment and navigate to the Luna Lab tab. Then click on the OCI Console hyperlink. Sign on using the provided User Name and Password values. After logging on, proceed with the following steps:

  1. Start from this screen.

    oci-main-page

  2. Click on the navigation menu at the top left corner of the page.

    Then click on the Compute menu item and the Instances pinned link in the main panel.

    oci-navigation-menu-compute

  3. Click the checkbox next to the instance name matching the one that contains the virtual IP address.

    oci-stop-instance

  4. Select Stop from the from the Actions drop-down list of values.

  5. Click the Stop button in the Stop instances panel.

  6. Once the instance stops, click the Close button in the Stop instances panel.

  7. Switch from the browser back to the terminal session.

  8. Confirm that the control plane node you just shut down is reporting as NotReady.

    Note: You may need to repeat this step several times or notice a timeout until the status changes.

    for host in ocne-control-01 ocne-control-02 ocne-control-03
    do
      ssh -o ConnectTimeout=10 $host "echo -e '\n---- $host ----\n'; kubectl get nodes"
    done
    • -o ConnectTimeout=<time-in-seconds> overrides the TCP timeout and will stop trying to connect to the system if it's reported down or unreachable.

    Example Output:

    NAME              STATUS     ROLES           AGE   VERSION
    ocne-control-01   NotReady   control-plane   51m   v1.28.3+3.el8
    ocne-control-02   Ready      control-plane   50m   v1.28.3+3.el8
    ocne-control-03   Ready      control-plane   49m   v1.28.3+3.el8
    ocne-worker-01    Ready      <none>          48m   v1.28.3+3.el8
    ocne-worker-02    Ready      <none>          48m   v1.28.3+3.el8
  9. Determine which node has the running keepalived daemon.

    for host in ocne-control-01 ocne-control-02 ocne-control-03
    do
      ssh -o ConnectTimeout=10 $host "echo -e '\n---- $host ----\n'; ip -br a"
    done

    Example Output:

    ssh: connect to host ocne-control-01 port 22: Connection timed out
    
    ---- ocne-control-02 ----
    
    lo               UNKNOWN        127.0.0.1/8 ::1/128 
    ens3             UP             10.0.0.56/28 fe80::17ff:fe0e:5b87/64 
    ens5             UP             10.0.12.11/24 fe80::f3aa:21fd:c0d4:7636/64 
    flannel.1        UNKNOWN        10.244.2.0/32 fe80::ac89:deff:fee8:b005/64 
    
    ---- ocne-control-03 ----
    
    lo               UNKNOWN        127.0.0.1/8 ::1/128 
    ens3             UP             10.0.0.50/28 fe80::17ff:fe20:2db5/64 
    ens5             UP             10.0.12.12/24 10.0.12.111/32 fe80::729d:a432:4a0f:f922/64 
    flannel.1        UNKNOWN        10.244.3.0/32 fe80::4460:27ff:fe27:86bf/64 

    The virtual IP is now associated with ocne-control03 per the sample output.

  10. Switch back to the browser.

  11. In the Cloud Console, start the Instance that was previously shut down by clicking the checkbox next to it and then selecting Start from the Actions drop-down list of values.

    oci-start-instance

  12. Click the Start button in the Start instances panel and then the Close button once the instances starts.

  13. Switch back to the terminal session.

  14. Confirm that kubectl shows all control plane nodes as being Ready.

    for host in ocne-control-01 ocne-control-02 ocne-control-03
    do
      ssh -o ConnectTimeout=10 $host "echo -e '\n---- $host ----\n'; kubectl get nodes"
    done

    Note: It may be necessary to repeat this step several times until the status changes.

    Example Output:

    NAME              STATUS     ROLES           AGE   VERSION
    ocne-control-01   Ready      control-plane   62m   v1.28.3+3.el8
    ocne-control-02   Ready      control-plane   60m   v1.28.3+3.el8
    ocne-control-03   Ready      control-plane   59m   v1.28.3+3.el8
    ocne-worker-01    Ready      <none>          58m   v1.28.3+3.el8
    ocne-worker-02    Ready      <none>          58m   v1.28.3+3.el8
  15. Confirm the location of the active keepalived daemon.

    for host in ocne-control-01 ocne-control-02 ocne-control-03
    do
      ssh -o ConnectTimeout=10 $host "echo -e '\n---- $host ----\n'; ip -br a"
    done

    The keepalived daemon keeps the virtual IP with the currently active host, despite the original host restarting. This behavior happens because Oracle Cloud Native Environment sets each node's weight equally in the keepalived configuration file.

Summary

These steps confirm that the Internal Load Balancer based on keepalived has been configured correctly and accepts requests successfully for Oracle Cloud Native Environment.

For More Information

SSR