Use Kubectl to Manage Kubernetes Clusters and Nodes

2
3
Send lab feedback

Use Kubectl to Manage Kubernetes Clusters and Nodes on Oracle Cloud Native Environment

Introduction

Although graphical tools can manage Kubernetes, many administrators prefer to use command-line tools. The command line tool provided within the Kubernetes ecosystem is called kubectl . Kubectl is a versatile tool used to deploy and inspect the configurations and logs of the cluster resources and applications. Kubectl achieves this by using the Kubernetes API to authenticate with the control Node of the Kubernetes cluster to complete any management actions requested by the administrator.

Most of the operations/commands available for kubectl provide administrators with the ability to deploy and manage applications deployed onto the Kubernetes cluster and inspect and manage the Kubernetes cluster resources.

Note: Many kubectl commands have the --all-namespaces option appended. For this reason, a shorthand for this option is the -A flag. This tutorial uses kubectl -A instead of kubectl --all-namespaces in preference.

Objectives

This tutorial builds on the basic commands introduced in Introducing Kubectl with Oracle Cloud Native Environment . If this is your first encounter using kubectl, you may find it beneficial to start there. This tutorial introduces how kubectl can manage individual Kubernetes Nodes and any application(s) deployed onto them. The specific areas of Node management introduced in this tutorial are:

  • Querying Cluster Information
  • Querying Node information
  • Deploying an example application (Nginx)
  • Introducing new concepts such as:
    • Cordoning/Uncordoning and Draining Nodes
    • Taints and Tolerations

This tutorial only uses kubectl to view the current configuration information.

Prerequisites

  • Minimum of a 5-node Oracle Cloud Native Environment cluster:

    • Operator node
    • Kubernetes control plane node
    • 3 Kubernetes worker nodes
  • Each system should have Oracle Linux installed and configured with:

    • An Oracle user account (used during the installation) with sudo access
    • Key-based SSH, also known as password-less SSH, between the hosts
    • Installation of Oracle Cloud Native Environment

Deploy Oracle Cloud Native Environment

Note: If running in your own tenancy, read the linux-virt-labs GitHub project README.md and complete the prerequisites before deploying the lab environment.

  1. Open a terminal on the Luna Desktop.

  2. Clone the linux-virt-labs GitHub project.

    git clone https://github.com/oracle-devrel/linux-virt-labs.git
  3. Change into the working directory.

    cd linux-virt-labs/ocne
  4. Install the required collections.

    ansible-galaxy collection install -r requirements.yaml
  5. Update the Oracle Cloud Native Environment configuration.

    cat << EOF | tee instances.yaml > /dev/null
    compute_instances:
      1:
        instance_name: "ocne-operator"
        type: "operator"
      2:
        instance_name: "ocne-control-01"
        type: "controlplane"
      3:
        instance_name: "ocne-worker-01"
        type: "worker"
      4:
        instance_name: "ocne-worker-02"
        type: "worker"
      5:
        instance_name: "ocne-worker-03"
        type: "worker"
    
    EOF
  6. Deploy the lab environment.

    ansible-playbook create_instance.yaml -e ansible_python_interpreter="/usr/bin/python3.6" -e "@instances.yaml"

    The free lab environment requires the extra variable ansible_python_interpreter because it installs the RPM package for the Oracle Cloud Infrastructure SDK for Python. The location for this package's installation is under the python3.6 modules.

    Important: Wait for the playbook to run successfully and reach the pause task. The Oracle Cloud Native Environment installation is complete at this stage of the playbook, and the instances are ready. Take note of the previous play, which prints the public and private IP addresses of the nodes it deploys.

Review Existing Cluster and Node Information

An essential precursor to administering any Kubernetes cluster is discovering what nodes are present, the pods executing on those nodes, and so on. This action allows you to plan and anticipate temporarily disabling pod scheduling on nodes while they undergo any required maintenance or troubleshooting.

  1. Open a terminal and connect via ssh to the ocne-control node.

    ssh oracle@<ip_address_of_ol_node>
  2. Query a complete list of existing nodes.

    kubectl get nodes

    Note that the output returns a list of all the deployed nodes with status details and the Kubernetes version.

  3. Request more details about one of the nodes.

    kubectl describe node <your-preferred-node-name>

    This command returns a wealth of information related to the Kubernetes node, starting with the following:

    • Name: confirms the Kubernetes node name
    • Labels: key/value pairs used to identify object attributes relevant to end-users
    • Annotations: key/value pairs used to store extra information about a Kubernetes node
    • Unschedulable: false indicates the node accepts any deployed pod

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl describe node ocne-worker-01
    Name:               ocne-worker-01
    Roles:              <none>
    Labels:             beta.kubernetes.io/arch=amd64
                        beta.kubernetes.io/os=linux
                        failure-domain.beta.kubernetes.io/zone=EU-FRANKFURT-1-AD-1
                        kubernetes.io/arch=amd64
                        kubernetes.io/hostname=ocne-worker-01
                        kubernetes.io/os=linux
                        oci.oraclecloud.com/fault-domain=FAULT-DOMAIN-2
                        topology.kubernetes.io/zone=EU-FRANKFURT-1-AD-1
    Annotations:        alpha.kubernetes.io/provided-node-ip: 10.0.0.160
                        csi.volume.kubernetes.io/nodeid: {"blockvolume.csi.oraclecloud.com":"ocne-worker-01","fss.csi.oraclecloud.com":"ocne-worker-01"}
                        flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"3a:c1:fe:56:38:93"}
                        flannel.alpha.coreos.com/backend-type: vxlan
                        flannel.alpha.coreos.com/kube-subnet-manager: true
                        flannel.alpha.coreos.com/public-ip: 10.0.0.160
                        kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/crio/crio.sock
                        node.alpha.kubernetes.io/ttl: 0
                        oci.oraclecloud.com/compartment-id: ocid1.compartment.oc1..aaaaaaaau2g2k23u6mp3t43ky3i4ky7jpyeiqcdcobpbcb7z6vjjlrdnuufq
                        volumes.kubernetes.io/controller-managed-attach-detach: true
    CreationTimestamp:  Mon, 14 Aug 2023 11:05:34 +0000
    Taints:             <none>
    Unschedulable:      false
    ...
    ...
    Events:
      Type    Reason                   Age                From             Message
      ----    ------                   ----               ----             -------
      Normal  Starting                 12m                kube-proxy       
      Normal  NodeHasSufficientMemory  12m (x8 over 12m)  kubelet          Node ocne-worker-01 status is now: NodeHasSufficientMemory
      Normal  NodeHasNoDiskPressure    12m (x8 over 12m)  kubelet          Node ocne-worker-01 status is now: NodeHasNoDiskPressure
      Normal  RegisteredNode           12m                node-controller  Node ocne-worker-01 event: Registered Node ocne-worker-01 in Controller
      Normal  Starting                 5m18s              kubelet          Starting kubelet.
      Normal  NodeHasSufficientMemory  5m18s              kubelet          Node ocne-worker-01 status is now: NodeHasSufficientMemory
      Normal  NodeHasNoDiskPressure    5m18s              kubelet          Node ocne-worker-01 status is now: NodeHasNoDiskPressure
      Normal  NodeHasSufficientPID     5m18s              kubelet          Node ocne-worker-01 status is now: NodeHasSufficientPID
      Normal  NodeNotReady             5m18s              kubelet          Node ocne-worker-01 status is now: NodeNotReady
      Normal  NodeAllocatableEnforced  5m18s              kubelet          Updated Node Allocatable limit across pods
      Normal  NodeReady                5m18s              kubelet          Node ocne-worker-01 status is now: NodeReady
      Normal  RegisteredNode           5m18s              node-controller  Node ocne-worker-01 event: Registered Node ocne-worker-01 in Controller

    NOTE: Don't clear the output from your terminal because the next steps highlight some areas of interest in this output.

  4. This output excerpt shows the internal and external IPs, internal DNS name, and hostname assigned to the node.

    Example Output (excerpt):

    Addresses:
      InternalIP:  10.0.0.160
      ExternalIP:  130.61.232.251
      Hostname:    ocne-worker-01
    Capacity:
      cpu:                2
     ephemeral-storage:  37177616Ki
      hugepages-1Gi:      0
      hugepages-2Mi:      0
      memory:             32568508Ki
      pods:               110
  5. A little further down the output, the Non-terminated pods: section shows what pods are running on the node with details of CPU and memory requests and any limits per pod.

    Example Output (excerpt):

    Non-terminated Pods:          (3 in total)
      Namespace                   Name                     CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
      ---------                   ----                     ------------  ----------  ---------------  -------------  ---
      kube-system                 csi-oci-node-87qf5       0 (0%)        0 (0%)      0 (0%)           0 (0%)         6m48s
      kube-system                 kube-flannel-ds-cpgnk    100m (5%)     100m (5%)   50Mi (0%)        50Mi (0%)      12m
      kube-system                 kube-proxy-jfw2w         0 (0%)        0 (0%)      0 (0%)           0 (0%)         12m
  6. Finally, the Allocated resources: section outlines any resources assigned to the node.

    Example Output (excerpt):

    Allocated resources:
      (Total limits may be over 100 percent, i.e., overcommitted.)
      Resource           Requests    Limits
      --------           --------    ------
      cpu                200m (10%)  200m (10%)
      memory             70Mi (0%)   80Mi (0%)
      ephemeral-storage  0 (0%)      0 (0%)
      hugepages-1Gi      0 (0%)      0 (0%)
      hugepages-2Mi      0 (0%)      0 (0%)

    In summary, the kubectl describe node command is very useful in providing the administrator with a wealth of information about a Kubernetes node that can assist with planning or troubleshooting deployments.

Deploy Nginx

Currently, no applications are deployed to the three worker nodes, which makes demonstrating the effects of any Node management commands more difficult. The following steps will deploy an Nginx pod onto each worker node.

  1. Generate a deployment file to create Nginx Deployments onto each of the three worker nodes.

    cat << EOF | tee ~/deployment.yaml > /dev/null
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-deployment
    spec:
      selector:
        matchLabels:
          app: nginx
      replicas: 3 # tells deployment to run 3 pods matching the template
      template:
        metadata:
          labels:
            app: nginx
        spec:
          containers:
          - name: nginx
            image: nginx:1.24.0
            ports:
            - containerPort: 80   
    EOF

    Where:

    • name: nginx-deployment represents the deployment's name
    • replicas: 3 represents the number of pods to deploy
    • image: nginx:1.24.0 represents the Nginx version the Kubernetes pod will deploy

    Note: There are numerous possibilities for describing a deployment in its associated YAML file. For more detail, refer to the upstream documentation .

  2. Deploy Nginx onto the three OCNE worker Nodes.

    kubectl create -f ~/deployment.yaml

    The -f switch indicates which YAML file to use and its location.

  3. Confirm Nginx deployed across all three worker Nodes.

    kubectl get pods --namespace default -o wide

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl get pods --namespace default -o wide
    NAME                                READY   STATUS    RESTARTS   AGE   IP           NODE            NOMINATED NODE   READINESS GATES
    nginx-deployment-5c46dbdf89-5qnsw   1/1     Running   0          86m   10.244.3.2   ocne-worker-01   <none>           <none>
    nginx-deployment-5c46dbdf89-j6dcl   1/1     Running   0          86m   10.244.3.3   ocne-worker-02   <none>           <none>
    nginx-deployment-5c46dbdf89-lcjv6   1/1     Running   0          86m   10.244.2.4   ocne-worker-03   <none>           <none>
  4. (Optional) It is also possible to review what Pods deployed onto a single Node, for ocne-worker-01.

    kubectl get pods --field-selector spec.nodeName=ocne-worker-01 -o wide

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl get pods --field-selector spec.nodeName=ocne-worker-01 -o wide
    NAME                                READY   STATUS    RESTARTS   AGE   IP           NODE            NOMINATED NODE   READINESS GATES
    nginx-deployment-5c46dbdf89-5qnsw   1/1     Running   0          89m   10.244.3.2   ocne-worker-01   <none>           <none>

    Using these options is helpful in busy Kubernetes environments where there are potentially many nodes, each hosting many deployments, to assist in planning maintenance operations.

Cordoning a Node

Like anything, Kubernetes nodes occasionally require maintenance. This maintenance may involve replacing physical hardware and updating the Node's Operating System or Kernel. Cordons and drains are two mechanisms that facilitate the safe preparation of the target node to ensure that any applications deployed on the node do not affect the end user's experience when using those applications.

Note: Manually using the kubectl cordon or kubectl uncordon commands to upgrade the Kubernetes environment with Oracle Cloud Native Environment is not supported. The olcnectl module update command, when used with a new Kubernetes version, traverses the deployed nodes invoking the kubectl cordon and kubectl drain commands followed by the required steps to process the in-place upgrade before finally issuing the kubectl uncordon command, all in sequence to smoothly upgrade the Kubernetes cluster without incurring any application outages.

  1. Identify existing nodes.

    kubectl get nodes
  2. Cordon one of the worker nodes.

    kubectl cordon ocne-worker-01

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl cordon ocne-worker-01
    node/ocne-worker-03 cordoned
  3. Confirm the node is 'cordoned'.

    kubectl get nodes

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl get nodes
    NAME             STATUS                     ROLES           AGE   VERSION
    ocne-control-01   Ready                      control-plane   21m   v1.28.3+3.el8
    ocne-worker-01    Ready,SchedulingDisabled   <none>          20m   v1.28.3+3.el8
    ocne-worker-02    Ready                      <none>          21m   v1.28.3+3.el8
    ocne-worker-03    Ready                      <none>          20m   v1.28.3+3.el8

    Notice that the cordoned Node now lists with the SchedulingDisabled status. Subsequently, no new applications will deploy to that node, and any existing applications will continue to service current sessions.

Draining a Node

Before undertaking any maintenance on the newly cordoned node, any pods deployed onto that node need to be removed/evicted. The drain command gracefully terminates the pod's containers. Once the drain command is complete, it is safe to complete whatever actions have been planned for that node, for example, scheduled maintenance such as upgrading the Operating System or the Kernel.

  1. Drain the ocne-worker-01 node.

    kubectl drain ocne-worker-01 --delete-emptydir-data --ignore-daemonsets --force

    Note: Why is the --ignore-daemonset --force command needed? Adding these command options overrides the Daemonset controller's default pod management behavior of replacing any closed pods with a new instance of the equivalent pod while also evicting all pods on the node.

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl drain ocne-worker-01 --delete-emptydir-data --ignore-daemonsets --force
    node/ocne-worker-01 already cordoned
    Warning: ignoring DaemonSet-managed Pods: kube-system/csi-oci-node-87qf5, kube-system/kube-flannel-ds-cpgnk, kube-system/kube-proxy-jfw2w
    evicting pod default/nginx-deployment-5c46dbdf89-pdsp8
    pod/nginx-deployment-5c46dbdf89-pdsp8 evicted
    node/ocne-worker-01 drained
  2. Confirm Nginx is no longer running on ocne-worker-01.

    kubectl get pods --namespace default -o wide

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl get pods --namespace default -o wide
    NAME                                READY   STATUS    RESTARTS   AGE     IP           NODE            NOMINATED NODE   READINESS GATES
    nginx-deployment-5c46dbdf89-75kk2   1/1     Running   0          111s    10.244.3.4   ocne-worker-03   <none>           <none>
    nginx-deployment-5c46dbdf89-g58dr   1/1     Running   0          4m34s   10.244.3.3   ocne-worker-03   <none>           <none>
    nginx-deployment-5c46dbdf89-nwhct   1/1     Running   0          4m34s   10.244.1.6   ocne-worker-02   <none>           <none>

    Note: Because the deployment.yaml file instructed Kubernetes to deploy three pods, notice there are still three pods listed in the output. This example shows the third pod has redeployed to ocne-worker-03. However, this may differ in your environment because the Kubernetes scheduler may decide differently.

Confirm the kubectl cordon Command Works

  1. Delete the existing Nginx deployment.

    kubectl delete deployment nginx-deployment

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl delete deployment nginx-deployment
    deployment.apps "nginx-deployment" deleted
  2. Confirm Nginx is no longer present.

    kubectl get pods --namespace default -o wide

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl get pods --namespace default -o wide
    No resources found in default namespace.
  3. Deploy Nginx again.

    kubectl create -f ~/deployment.yaml
  4. Confirm no pods deploy onto a Cordoned worker node.

    kubectl get pods --namespace default -o wide

    Notice that pods do not deploy onto the 'cordoned' node, which is ocne-worker-01 in this example.

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl get pods --namespace default -o wide
    NAME                                READY   STATUS    RESTARTS   AGE   IP           NODE            NOMINATED NODE   READINESS GATES
    nginx-deployment-5c46dbdf89-bsf54   1/1     Running   0          7s    10.244.2.9   ocne-worker-03   <none>           <none>
    nginx-deployment-5c46dbdf89-cm25f   1/1     Running   0          7s    10.244.3.6   ocne-worker-02   <none>           <none>
    nginx-deployment-5c46dbdf89-qwtg7   1/1     Running   0          7s    10.244.2.8   ocne-worker-03   <none>           <none>

    Note: Which node displays with two pods may differ in your environment.

    This output confirms that the kubectl cordon command issued to the ocne-worker-01 node has successfully prevented any pod from deploying onto it.

Uncordon the Node

Once the maintenance is complete, the previously cordoned node is returned to the pool using the kubectl uncordon command.

  1. Uncordon the ocne-worker-01 node.

    kubectl uncordon ocne-worker-01

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl uncordon ocne-worker-01
    node/ocne-worker-01 uncordoned
  2. Confirm the ocne-worker-01 node is available for scheduling again.

    kubectl get nodes

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl get nodes
    NAME             STATUS   ROLES           AGE   VERSION
    ocne-control-01   Ready    control-plane   11m   v1.28.3+3.el8
    ocne-worker-01    Ready    <none>          10m   v1.28.3+3.el8
    ocne-worker-02    Ready    <none>          10m   v1.28.3+3.el8
    ocne-worker-03    Ready    <none>          10m   v1.28.3+3.el8

    Note: The SchedulingDisabled flag under the STATUS column has been removed, confirming the ocne-worker-01 has been returned to the pool.

(Optional) Confirm the Uncordoned Node is Available Again

Note: Please be aware that the following steps are not required on a live system because they would cause an application outage if used. Under normal circumstances, the Kubernetes scheduler determines where individual Pods are deployed based on many things, such as the overall load across the cluster. Therefore, Kubernetes Scheduler is free to schedule pods wherever it wants, evict pods whenever it wants, etc. The steps provided here are for illustrative purposes to demonstrate that once a node is uncordoned it is available for Kubernetes to schedule pods whenever the Kubernetes Scheduler chooses.

  1. Delete the existing Nginx deployment.

    kubectl delete deployment nginx-deployment
  2. Deploy Nginx again.

    kubectl create -f ~/deployment.yaml
  3. Confirm Nginx has deployed across all three worker Nodes.

    kubectl get pods --namespace default -o wide

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl get pods --namespace default -o wide
    NAME                                READY   STATUS    RESTARTS   AGE   IP           NODE            NOMINATED NODE   READINESS GATES
    nginx-deployment-5c46dbdf89-6wrd6   1/1     Running   0          8s    10.244.3.7    ocne-worker-02   <none>           <none>
    nginx-deployment-5c46dbdf89-w55wt   1/1     Running   0          8s    10.244.2.10   ocne-worker-03   <none>           <none>
    nginx-deployment-5c46dbdf89-x8vwz   1/1     Running   0          8s    10.244.1.7    ocne-worker-01   <none>           <none>

    Notice that, as expected, all worker nodes now host an Nginx pod confirming, the node is available for deployments again.

Introducing Taints and Tolerations

Managing where application pods deploy within a Kubernetes cluster is an essential and skilled aspect of being a Kubernetes administrator. Effective scheduling management can help companies improve the efficient use of their resources, control costs, and manage applications at scale across a cluster. This section does not provide in-depth coverage of all aspects of this complicated aspect of Kubernetes administration. Instead, it introduces Taints and Tolerations and how they can aid an administrator in their role.

What are Taints and Tolerations?

Taints represent a Kubernetes property assigned to nodes to repel certain pods when deployed onto the cluster. Tolerations, on the other hand, represent a property linked to an application as part of the deployment.yaml file indicating this pod, or pods, can be scheduled against any node having a matching taint. Setting these properties is how a Kubernetes administrator can exercise some control over which nodes an application's pods deploy onto.

Are they guaranteed to work?

Taints and tolerations help to repel pods from deploying to a defined node. However, they cannot ensure that a specific pod deploys to a predetermined node. Instead, another advanced scheduling technique Kubernetes administrators use, which, when used together with taints and tolerations, can control where pods deploy to execute. This additional technique is known as node affinity but is outside the scope of this tutorial.

Why use Taints and Tolerations?

The simplest way to view the most common use cases behind why an administrator would choose to use taints and tolerations includes the following:

  • Configure dedicated nodes: Taints and tolerations, combined with a node affinity definition, can help ensure matching pods deploy to these nodes.

  • Indicate nodes with special hardware: When a pod requires specific hardware to be present to either run or run most efficiently, using taints and tolerations allows administrators to ensure the most relevant pods will be scheduled onto these nodes by the Kubernetes scheduler process.

  • Eviction of pods based on node conditions: If an administrator assigns a taint to a node that already has pods assigned to it, then existing pods that do not possess a 'toleration' will eventually be automatically evicted from that node by the Kubernetes Scheduler.

Review Existing Taints across all Nodes

Before making any changes to the existing nodes, the administrator must establish whether any taints definitions apply across all the existing nodes.

  1. Confirm what taints exist currently across all of the nodes.

    kubectl get nodes -o=custom-columns=NodeName:.metadata.name,TaintKey:.spec.taints[*].key,TaintValue:.spec.taints[*].value,TaintEffect:.spec.taints[*].effect  
    

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl get nodes -o=custom-columns=NodeName:.metadata.name,TaintKey:.spec.taints[*].key,TaintValue:.spec.taints[*].value,TaintEffect:.spec.taints[*].effect
    NodeName         TaintKey                                TaintValue   TaintEffect
    ocne-control-01   node-role.kubernetes.io/control-plane   <none>       NoSchedule
    ocne-worker-01    <none>                                  <none>       <none>
    ocne-worker-02    <none>                                  <none>       <none>
    ocne-worker-03    <none>                                  <none>       <none>

    Notice that only the ocne-control-01 node has an existing taint assigned to it, which is node-role.kubernetes.io/control-plane:NoSchedule, and prevents pods from deploying onto the ocne-control-01 node itself.

    Note: The node-role.kubernetes.io/control-plane:NoSchedule setting automatically applied by the kubeadm tool while it is performing the actions to bootstrap the Kubernetes control-plane node.

Apply a Taint to a Worker Node

Taints are very similar to labels applied to any number of nodes within the Kubernetes cluster. The Kubernetes scheduler will only schedule pods to execute on the node where the deployment.yaml file defines tolerations, thus allowing them to be on that node.

Taints are applied to a node using a command in the format: kubectl taint nodes nodename key1=value1:taint-effect. The command declares a taint as a key-value pair, where value1 is the key and taint-effect represents the value. There are three taint effects available which are:

  • The NoSchedule or strong effect taint instructs the Kubernetes scheduler to allow only newly deployed pods possessing tolerations to execute on this node. Any existing pods will continue executing unaffected.

  • The PreferNoSchedule or soft effect taint instructs the Kubernetes scheduler to try to avoid scheduling newly deployed pods on this node unless they have a toleration.

  • The NoExecute taint instructs the Kubernetes scheduler to evict any running pods from the node unless they have tolerations for the tainted node.

  1. Delete the existing Nginx deployment.

    kubectl delete deployment nginx-deployment

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl delete deployment nginx-deployment
    deployment.apps "nginx-deployment" deleted
  2. Apply a NoExecute taint to the ocne-worker-01 node.

    kubectl taint nodes ocne-worker-01 app=nginx:NoSchedule

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl taint nodes ocne-worker-01 app=nginx:NoSchedule
    node/ocne-worker-01 tainted

    Where:

    • Key matches the app: nginx line defined in the spec: section of the deployment.yaml file.
    • value matches the taint to be applied (NoSchedule)
  3. Confirm the taint has applied to the ocne-worker-01 node.

    kubectl get nodes -o=custom-columns=NodeName:.metadata.name,TaintKey:.spec.taints[*].key,TaintValue:.spec.taints[*].value,TaintEffect:.spec.taints[*].effect

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl get nodes -o=custom-columns=NodeName:.metadata.name,TaintKey:.spec.taints[*].key,TaintValue:.spec.taints[*].value,TaintEffect:.spec.taints[*].effect
    NodeName         TaintKey                                TaintValue   TaintEffect
    ocne-control-01   node-role.kubernetes.io/control-plane   <none>       NoSchedule
    ocne-worker-01    app                                     nginx        NoSchedule
    ocne-worker-02    <none>                                  <none>       <none>
    ocne-worker-03    <none>                                  <none>       <none>

    Where:

    • The Taintkey column shows the value app for ocne-worker-01
    • The TaintValue column shows the value nginx for ocne-worker-01
  4. Deploy Nginx again.

    kubectl create -f ~/deployment.yaml
  5. Confirm no pods have deployed onto the tainted ocne-worker-01 node.

    kubectl get pods --namespace default -o wide

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl get pods --namespace default -o wide
    NAME                                READY   STATUS    RESTARTS   AGE   IP           NODE            NOMINATED NODE   READINESS GATES
    nginx-deployment-5c46dbdf89-kt9g6   1/1     Running   0          8s    10.244.2.6   ocne-worker-02   <none>           <none>
    nginx-deployment-5c46dbdf89-twh86   1/1     Running   0          8s    10.244.2.5   ocne-worker-02   <none>           <none>
    nginx-deployment-5c46dbdf89-xd5jr   1/1     Running   0          8s    10.244.3.6   ocne-worker-03   <none>           <none>

    Notice that, as expected, no pods have been deployed onto the cone-worker-01 node, demonstrating the effect of the NoSchedule taint.

Define a Toleration in a Deployment File

The next step is to use a deployment file containing a toleration that allows deploying new pods to a node containing a taint.

  1. The first step is to delete the existing Nginx deployment.

    kubectl delete deployment nginx-deployment
  2. Create a new deployment file to deploy Nginx to the cluster.

    cat << EOF | tee ~/deployment-toleration.yaml > /dev/null
    apiVersion: apps/v1
    kind: Deployment
    metadata:
       name: nginx-deployment
    spec:
       selector:
          matchLabels:
             app: nginx
       replicas: 3 # tells deployment to run 3 pods matching the template
       template:
          metadata:
             labels:
                app: nginx
          spec:
             containers:
             - name: nginx
               image: nginx:1.24.0
               ports:
               - containerPort: 80
             tolerations:
             - key: "app"
               operator: "Equal"
               value: "nginx"
               effect: "NoSchedule"
    EOF

    Where the toleration is described in the last section of the deployment descriptor, as shown below:

          tolerations:
          - key: "app"
          operator: "Equal"
          value: "nginx"
          effect: "NoSchedule"

    Remember that the taint declared earlier was this app=nginx:NoSchedule where:

    • The key used was app
    • The value used was nginx

    That taint matches the values defined in the toleration section of the deployment file's YAML. The toleration uses the Equal operator to ensure the taint value matches the toleration. If this matches, the Kubernetes Scheduler can deploy the pod onto the node. If not, the Kubernetes scheduler will not deploy the pod onto the node.

  3. Deploy the Nginx using the deployment-toleration.yaml file.

    kubectl create -f ~/deployment-toleration.yaml

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl create -f ~/deployment-toleration.yaml
    deployment.apps/nginx-deployment created
  4. Confirm the deployment used all three worker nodes.

    kubectl get pods --namespace default -o wide

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl get pods --namespace default -o wide
    NAME                                READY   STATUS    RESTARTS   AGE   IP           NODE            NOMINATED NODE   READINESS GATES
    nginx-deployment-86d6d8c585-8jsxq   1/1     Running   0          8s    10.244.3.6   ocne-worker-03   <none>           <none>
    nginx-deployment-86d6d8c585-g65xw   1/1     Running   0          8s    10.244.1.3   ocne-worker-01   <none>           <none>
    nginx-deployment-86d6d8c585-j7czg   1/1     Running   0          8s    10.244.2.6   ocne-worker-02   <none>           <none>

    Notice the output confirms the nginx-deployment-86d6d8c585-xxxxx pod deployed on all three ocne-workerxx nodes. Confirming the toleration has enabled the Kubernetes scheduler to deploy the pods across all three available nodes.

    For now, this serves only as an introduction to the wide variety of options available to a Kubernetes administrator to fine-tune the deployment of applications and broader maintenance of the Kubernetes cluster for which they are responsible. This ability will eventually become one of several tools the administrator will use to manage and troubleshoot the Kubernetes cluster for which they're responsible.

Summary

This concludes this very brief introduction to how an experienced administrator can use the kubectl command-line tool to manage Pod Scheduling operations on their Kubernetes cluster. The examples introduced here provide a basic introduction to the general scheduling principles on a Kubernetes cluster. If you want to learn more, please refer to the official documentation for more details.

For More Information

2024-05-15T09:03:32.912Z