Use Kubectl to Manage Kubernetes Clusters and Nodes on Oracle Cloud Native Environment

Introduction

Although graphical tools can manage Kubernetes, many administrators prefer to use command-line tools. The command line tool provided within the Kubernetes ecosystem is called kubectl . Kubectl is a versatile tool used to deploy and inspect the configurations and logs of the cluster resources and applications. Kubectl achieves this by using the Kubernetes API to authenticate with the control Node of the Kubernetes cluster to complete any management actions requested by the administrator.

Most of the operations/commands available for kubectl provide administrators with the ability to deploy and manage applications deployed onto the Kubernetes cluster and inspect and manage the Kubernetes cluster resources.

Note: Many kubectl commands have the --all-namespaces option appended. For this reason, a shorthand for this option is the -A flag. This tutorial often uses kubectl -A instead of kubectl --all-namespaces.

Objectives

In this tutorial, you will learn:

Querying Cluster Information
Querying Node information
Deploying an example application (Nginx)
Introducing new concepts such as:
- Cordoning/Uncordoning and Draining Nodes
- Taints and Tolerations

Prerequisites

Installation of Oracle Cloud Native Environment
- a single control node and 3 worker nodes

Deploy Oracle Cloud Native Environment

Note: If running in your own tenancy, read the linux-virt-labs GitHub project README.md and complete the prerequisites before deploying the lab environment.

Open a terminal on the Luna Desktop.

Clone the linux-virt-labs GitHub project.

git clone https://github.com/oracle-devrel/linux-virt-labs.git

Change into the working directory.
```
cd linux-virt-labs/ocne2
```

Install the required collections.

ansible-galaxy collection install -r requirements.yml

Deploy the lab environment.
```
ansible-playbook create_instance.yml -e localhost_python_interpreter="/usr/bin/python3.6" -e install_ocne_rpm=true -e create_ocne_cluster=true -e "ocne_cluster_node_options='-n 1 -w 3'"
```
The free lab environment requires the extra variable local_python_interpreter, which sets ansible_python_interpreter for plays running on localhost. This variable is needed because the environment installs the RPM package for the Oracle Cloud Infrastructure SDK for Python, located under the python3.6 modules.
The default deployment shape uses the AMD CPU and Oracle Linux 8. To use an Intel CPU or Oracle Linux 9, add -e instance_shape="VM.Standard3.Flex" or -e os_version="9" to the deployment command.
Important: Wait for the playbook to run successfully and reach the pause task. At this stage of the playbook, the installation of Oracle Cloud Native Environment is complete, and the instances are ready. Take note of the previous play, which prints the public and private IP addresses of the nodes it deploys and any other deployment information needed while running the lab.

Review Existing Cluster and Node Information

An essential precursor to administering any Kubernetes cluster is discovering what nodes are present, the pods executing on those nodes, and so on. This action allows you to plan and anticipate temporarily disabling pod scheduling on nodes while they undergo any required maintenance or troubleshooting.

Open a terminal and connect via SSH to the ocne instance.
```
ssh oracle@<ip_address_of_node>
```
Query a complete list of existing nodes.
```
kubectl get nodes
```
Note that the output returns a list of all the deployed nodes with status details and the Kubernetes version.

Request more details about one of the nodes.

kubectl describe node $(kubectl get nodes | awk 'FNR==3 {print $1}')

This command returns a wealth of information related to the Kubernetes node, starting with the following:

Name: confirms the Kubernetes node name
Labels: key/value pairs used to identify object attributes relevant to end-users
Annotations: key/value pairs used to store extra information about a Kubernetes node
Unschedulable: false indicates the node accepts any deployed pod

Example Output:

Name:               ocne-worker-1
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ocne-worker-1
                    kubernetes.io/os=linux
Annotations:        flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"02:86:99:e2:8a:95"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 192.168.122.169
                    kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/crio/crio.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 26 Sep 2024 17:56:13 +0000
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  ocne-worker-1
  AcquireTime:     <unset>
  RenewTime:       Thu, 26 Sep 2024 17:56:54 +0000
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Thu, 26 Sep 2024 17:56:18 +0000   Thu, 26 Sep 2024 17:56:18 +0000   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   Thu, 26 Sep 2024 17:56:16 +0000   Thu, 26 Sep 2024 17:56:13 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Thu, 26 Sep 2024 17:56:16 +0000   Thu, 26 Sep 2024 17:56:13 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Thu, 26 Sep 2024 17:56:16 +0000   Thu, 26 Sep 2024 17:56:13 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Thu, 26 Sep 2024 17:56:16 +0000   Thu, 26 Sep 2024 17:56:16 +0000   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.122.169
  Hostname:    ocne-worker-1
Capacity:
  cpu:                2
  ephemeral-storage:  20153324Ki
  hugepages-2Mi:      0
  memory:             3695444Ki
  pods:               110
Allocatable:
  cpu:                2
  ephemeral-storage:  18573303368
  hugepages-2Mi:      0
  memory:             3593044Ki
  pods:               110
System Info:
  Machine ID:                 284851e5afa74b1088cde1ccaf570d8c
  System UUID:                284851e5-afa7-4b10-88cd-e1ccaf570d8c
  Boot ID:                    0979d758-9979-4df1-b428-c645562f871f
  Kernel Version:             5.15.0-209.161.7.2.el8uek.x86_64
  OS Image:                   Oracle Linux Server 8.10
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  cri-o://1.30.3
  Kubelet Version:            v1.30.3+1.el8
  Kube-Proxy Version:         v1.30.3+1.el8
PodCIDR:                      10.244.1.0/24
PodCIDRs:                     10.244.1.0/24
Non-terminated Pods:          (4 in total)
  Namespace                   Name                             CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                             ------------  ----------  ---------------  -------------  ---
  kube-flannel                kube-flannel-ds-xltn9            100m (5%)     0 (0%)      50Mi (1%)        0 (0%)         47s
  kube-system                 kube-proxy-gmxsx                 0 (0%)        0 (0%)      0 (0%)           0 (0%)         47s
  ocne-system                 ocne-catalog-578c959566-4qbcj    0 (0%)        0 (0%)      0 (0%)           0 (0%)         78s
  ocne-system                 ui-84dd57ff69-szphp              0 (0%)        0 (0%)      0 (0%)           0 (0%)         79s
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests   Limits
  --------           --------   ------
  cpu                100m (5%)  0 (0%)
  memory             50Mi (1%)  0 (0%)
  ephemeral-storage  0 (0%)     0 (0%)
  hugepages-2Mi      0 (0%)     0 (0%)
Events:
  Type    Reason                   Age                From             Message
  ----    ------                   ----               ----             -------

NOTE: Don't clear the output from your terminal because the next steps highlight some areas of interest in this output.

This output excerpt shows the internal and external IPs, internal DNS name, and hostname assigned to the node.

Example Output (excerpt):

Addresses:
  InternalIP:  192.168.122.169
  Hostname:    ocne-worker-1
Capacity:
  cpu:                2
  ephemeral-storage:  20153324Ki
  hugepages-2Mi:      0
  memory:             3695444Ki
  pods:               110

A little further down the output, the Non-terminated pods: section shows what pods are running on the node with details of CPU and memory requests and any limits per pod.

Example Output (excerpt):

Non-terminated Pods:          (4 in total)
  Namespace                   Name                             CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                             ------------  ----------  ---------------  -------------  ---
  kube-flannel                kube-flannel-ds-xltn9            100m (5%)     0 (0%)      50Mi (1%)        0 (0%)         47s
  kube-system                 kube-proxy-gmxsx                 0 (0%)        0 (0%)      0 (0%)           0 (0%)         47s
  ocne-system                 ocne-catalog-578c959566-4qbcj    0 (0%)        0 (0%)      0 (0%)           0 (0%)         78s
  ocne-system                 ui-84dd57ff69-szphp              0 (0%)        0 (0%)      0 (0%)           0 (0%)         79s

Finally, the Allocated resources: section outlines any resources assigned to the node.

Example Output (excerpt):

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests   Limits
  --------           --------   ------
  cpu                100m (5%)  0 (0%)
  memory             50Mi (1%)  0 (0%)
  ephemeral-storage  0 (0%)     0 (0%)
  hugepages-2Mi      0 (0%)     0 (0%)

In summary, the kubectl describe node command is very useful in providing the administrator with a wealth of information about a Kubernetes node that can assist with planning or troubleshooting deployments.

Deploy Nginx

Currently, we do not have any applications deployed to the three worker nodes, which makes demonstrating the effects of any Node management commands more difficult. The following steps will deploy an Nginx pod onto each worker node.

Generate a deployment file to create Nginx Deployments onto each of the three worker nodes.

cat << EOF | tee ~/deployment.yaml > /dev/null
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 3 # tells deployment to run 3 pods matching the template
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: ghcr.io/oracle/oraclelinux9-nginx:1.20
        ports:
        - containerPort: 80   
EOF

Where:

name: nginx-deployment represents the deployment's name
replicas: 3 represents the number of pods to deploy
image: nginx:1.24.0 represents the Nginx version the Kubernetes pod will deploy

Note: There are numerous possibilities for describing a deployment in its associated YAML file. For more detail, refer to the upstream documentation .

Deploy Nginx onto the three OCNE worker Nodes.
```
kubectl create -f ~/deployment.yaml
```
The -f switch indicates which YAML file to use and its location.

Confirm Nginx deployed across all three worker Nodes.

kubectl get pods --namespace default -o wide

Example Output:

NAME                                READY   STATUS    RESTARTS   AGE   IP           NODE            NOMINATED NODE   READINESS GATES
nginx-deployment-64845dcbf6-4mg54   1/1     Running   0          11s   10.244.1.5   ocne-worker-1   <none>           <none>
nginx-deployment-64845dcbf6-tkpcd   1/1     Running   0          19s   10.244.3.3   ocne-worker-2   <none>           <none>
nginx-deployment-64845dcbf6-xqdqg   1/1     Running   0          27s   10.244.2.3   ocne-worker-3   <none>           <none>

Review what Pods deployed onto a single node.

kubectl get pods --field-selector spec.nodeName=ocne-worker-1 -o wide

Example Output:

NAME                                READY   STATUS    RESTARTS   AGE    IP           NODE            NOMINATED NODE   READINESS GATES
nginx-deployment-64845dcbf6-4mg54   1/1     Running   0          115s   10.244.1.5   ocne-worker-1   <none>           <none>

Using these options is helpful in busy Kubernetes environments where there are potentially many nodes, each hosting many deployments, to assist in planning maintenance operations.

Cordoning a Node

Like anything, Kubernetes nodes occasionally require maintenance. This maintenance may involve replacing physical hardware and updating the Node's Operating System or Kernel. Cordons and drains are two mechanisms that facilitate the safe preparation of the target node to ensure that any applications deployed on the node do not affect the end user's experience when using those applications.

Identify existing nodes.
```
kubectl get nodes
```
Cordon one of the worker nodes.
```
kubectl cordon ocne-worker-1
```
Confirm the node is cordoned.
```
kubectl get nodes
```
Notice that the cordoned Node now lists with the SchedulingDisabled status. Subsequently, no new applications will deploy to that node, and any existing applications will continue to service current sessions.

Draining a Node

Before undertaking any maintenance on the newly cordoned node, any pods deployed onto that node must be removed/evicted. The drain command gracefully terminates the pod's containers. Once the drain command is complete, it is safe to complete whatever actions have been planned for that node, for example, scheduled maintenance such as upgrading the Operating System or the Kernel.

Drain ocne-worker-1 node.

kubectl drain ocne-worker-1 --delete-emptydir-data --ignore-daemonsets --force

Why is the --ignore-daemonset --force command needed? Adding these command options overrides the Daemonset controller's default pod management behavior of replacing any closed pods with a new instance of the equivalent pod while also evicting all pods on the node.

Example Output:

node/ocne-worker-1 already cordoned
Warning: ignoring DaemonSet-managed Pods: kube-flannel/kube-flannel-ds-xltn9, kube-system/kube-proxy-gmxsx
evicting pod ocne-system/ui-84dd57ff69-szphp
evicting pod default/nginx-deployment-64845dcbf6-4mg54
evicting pod ocne-system/ocne-catalog-578c959566-4qbcj
pod/ui-84dd57ff69-szphp evicted
pod/ocne-catalog-578c959566-4qbcj evicted
pod/nginx-deployment-64845dcbf6-4mg54 evicted
node/ocne-worker-1 drained

Confirm Nginx is no longer running on the drained node.
```
kubectl get pods --namespace default -o wide
```
The NODE column shows the nginx pod running only on the ocne-worker-2 and ocne-worker-3 nodes.
Because the deployment.yaml file instructed Kubernetes to deploy three pods, notice there are still three pods listed in the output. This example shows the third pod has redeployed to ocne-worker-3. However, this may differ in your environment because the Kubernetes scheduler may decide differently.

Confirm the Cordon Command Works

Delete the existing Nginx deployment.

kubectl delete deployment nginx-deployment

Confirm Nginx is no longer present.
```
kubectl get pods --namespace default -o wide
```
The output shows a No resources found in default namespace message.
Deploy Nginx again.
```
kubectl create -f ~/deployment.yaml
```
Confirm no pods deploy onto a Cordoned worker node.
```
kubectl get pods --namespace default -o wide
```
The output shows that pods do not deploy onto the 'cordoned' node. The cluster decides which nodes get multiple pods to meet the deployment setting of three replicas.

Uncordon the Node

Once the maintenance is complete, the previously cordoned node is returned to the pool using the kubectl uncordon command.

Uncordon the ocne-worker-1 node.
```
kubectl uncordon ocne-worker-1
```
Confirm the ocne-worker-1 node is available for scheduling again.
```
kubectl get nodes
```
The command removes the SchedulingDisabled flag under the STATUS column, confirming the cluster returns the cordoned node to the pool.

Confirm the Uncordoned Node is Available Again

Important: Be aware that the following steps are not required on a live system because they would cause an application outage if used. Under normal circumstances, the Kubernetes scheduler determines where individual Pods are deployed based on many things, such as the overall load across the cluster. Therefore, Kubernetes Scheduler is free to schedule pods wherever it wants, evict pods whenever it wants, etc. The steps provided here are for illustrative purposes to demonstrate that once a node is uncordoned it is available for Kubernetes to schedule pods whenever the Kubernetes Scheduler chooses.

Delete the existing Nginx deployment.

kubectl delete deployment nginx-deployment

Deploy Nginx again.
```
kubectl create -f ~/deployment.yaml
```
Confirm Nginx has deployed across all three worker Nodes.
```
kubectl get pods --namespace default -o wide
```
The results show all worker nodes now host an Nginx pod, confirming that the node is available again for deployments.

Introducing Taints and Tolerations

Managing where application pods deploy within a Kubernetes cluster is an essential and skilled aspect of being a Kubernetes administrator. Effective scheduling management can help companies improve the efficient use of their resources, control costs, and manage applications at scale across a cluster. This section does not provide in-depth coverage of all aspects of this complicated aspect of Kubernetes administration. Instead, it introduces Taints and Tolerations and how they can aid an administrator in their role.

What are Taints and Tolerations?

Taints represent a Kubernetes property assigned to nodes to repel certain pods when deployed onto the cluster. Tolerations, on the other hand, represent a property linked to an application as part of the deployment.yaml file, indicating you can schedule this pod or pods against any node having a matching taint. Setting these properties is how a Kubernetes administrator can exercise some control over which nodes an application's pods deploy onto.

Are they guaranteed to work?

Taints and tolerations help to repel pods from deploying to a defined node. However, they cannot ensure that a specific pod deploys to a predetermined node. Instead, another advanced scheduling technique Kubernetes administrators use, which, when used together with taints and tolerations, can control where pods deploy to execute. This additional technique is known as node affinity but is outside the scope of this tutorial.

Why use Taints and Tolerations?

The simplest way to view the most common use cases behind why an administrator would choose to use taints and tolerations includes the following:

Configure dedicated nodes: Taints and tolerations, combined with a node affinity definition, can help ensure matching pods deploy to these nodes.
Indicate nodes with special hardware: When a pod requires specific hardware to be present to either run or run most efficiently, using taints and tolerations allows administrators to ensure the most relevant pods will be scheduled onto these nodes by the Kubernetes scheduler process.
Eviction of pods based on node conditions: If an administrator assigns a taint to a node that already has pods assigned to it, then existing pods that do not possess a 'toleration' will eventually be automatically evicted from that node by the Kubernetes Scheduler.

Review Existing Taints across all Nodes

Before making any changes to the existing nodes, the administrator must establish whether any taints definitions apply across all the existing nodes.

Confirm what taints exist currently across all of the nodes.
```
kubectl get nodes -o=custom-columns=NodeName:.metadata.name,TaintKey:.spec.taints[*].key,TaintValue:.spec.taints[*].value,TaintEffect:.spec.taints[*].effect  
```
The output shows that only the control plane node has an existing taint assigned to it, which is node-role.kubernetes.io/control-plane:NoSchedule, and prevents pods from deploying onto the control plane node itself. The node-role.kubernetes.io/control-plane:NoSchedule setting automatically applied by the kubeadm tool while it is performing the actions to bootstrap the Kubernetes control plane node.

Apply a Taint to a Worker Node

Taints are very similar to labels applied to any number of nodes within the Kubernetes cluster. The Kubernetes scheduler will only schedule pods to execute on the node where the deployment.yaml file defines tolerations, thus allowing them to be on that node.

Taints are applied to a node using a command in the format: kubectl taint nodes nodename key1=value1:taint-effect. The command declares a taint as a key-value pair, where value1 is the key and taint-effect represents the value. There are three taint effects available which are:

The NoSchedule or strong effect taint instructs the Kubernetes scheduler to allow only newly deployed pods possessing tolerations to execute on this node. Any existing pods will continue executing unaffected.
The PreferNoSchedule or soft effect taint instructs the Kubernetes scheduler to try to avoid scheduling newly deployed pods on this node unless they have a toleration.
The NoExecute taint instructs the Kubernetes scheduler to evict any running pods from the node unless they have tolerations for the tainted node.

Delete the existing Nginx deployment.

kubectl delete deployment nginx-deployment

Apply a NoExecute taint to the ocne-worker-1 node.
```
kubectl taint nodes ocne-worker-1 app=nginx:NoSchedule
```
Where:
- Key matches the app: nginx line defined in the spec: section of the deployment.yaml file.
- value matches the taint of NoSchedule to be applied

Confirm the taint has applied to the ocne-worker-1 node.

kubectl get nodes -o=custom-columns=NodeName:.metadata.name,TaintKey:.spec.taints[*].key,TaintValue:.spec.taints[*].value,TaintEffect:.spec.taints[*].effect

Example Output:

NodeName               TaintKey                                TaintValue   TaintEffect
ocne-control-plane-1   node-role.kubernetes.io/control-plane   <none>       NoSchedule
ocne-worker-1          app                                     nginx        NoSchedule
ocne-worker-2          <none>                                  <none>       <none>
ocne-worker-3          <none>                                  <none>       <none>

Where:

The Taintkey column shows the value app for ocne-worker-1
The TaintValue column shows the value nginx for ocne-worker-1

Deploy Nginx again.
```
kubectl create -f ~/deployment.yaml
```
Confirm no pods have deployed onto the tainted ocne-worker-1 node.
```
kubectl get pods --namespace default -o wide
```
As expected, no pods have been deployed onto the cone-worker-01 node, demonstrating the effect of the NoSchedule taint.

Define a Toleration in a Deployment File

The next step is to use a deployment file containing a toleration that allows deploying new pods to a node containing a taint.

The first step is to delete the existing Nginx deployment.
```
kubectl delete deployment nginx-deployment
```

Create a new deployment file to deploy Nginx to the cluster.

cat << EOF | tee ~/deployment-toleration.yaml > /dev/null
apiVersion: apps/v1
kind: Deployment
metadata:
   name: nginx-deployment
spec:
   selector:
      matchLabels:
         app: nginx
   replicas: 3 # tells deployment to run 3 pods matching the template
   template:
      metadata:
         labels:
            app: nginx
      spec:
         containers:
         - name: nginx
           image: ghcr.io/oracle/oraclelinux9-nginx:1.20
           ports:
           - containerPort: 80
         tolerations:
         - key: "app"
           operator: "Equal"
           value: "nginx"
           effect: "NoSchedule"
EOF

Where the toleration is described in the last section of the deployment descriptor under spec: template: spec: tolerations:

Remember the taint app=nginx:NoSchedul declared earlier where:

The key used was app
The value used was nginx

That taint matches the values defined in the toleration section of the deployment file's YAML. The toleration uses the Equal operator to ensure the taint value matches the toleration. If this matches, the Kubernetes Scheduler can deploy the pod onto the node. If not, the Kubernetes scheduler will not deploy the pod onto the node.

Deploy the Nginx using the deployment-toleration.yaml file.
```
kubectl create -f ~/deployment-toleration.yaml
```
Confirm the deployment used all three worker nodes.
```
kubectl get pods --namespace default -o wide
```
The NODE column confirms the pod deploys across all nodes, thus proving that the toleration has enabled the Kubernetes scheduler to deploy the pods across all three available nodes.
For now, this serves only as an introduction to the wide variety of options available to a Kubernetes administrator to fine-tune the deployment of applications and broader maintenance of the Kubernetes cluster for which they are responsible. This ability will eventually become one of several tools the administrator will use to manage and troubleshoot the Kubernetes cluster for which they're responsible.

Next Steps

That concludes this very brief introduction to how an experienced administrator can use the kubectl command-line tool to manage Pod Scheduling operations on their Kubernetes cluster. The examples introduced here provide a basic introduction to the general scheduling principles on a Kubernetes cluster. If you want to learn more, please refer to the official documentation for more details.

Use Kubectl to Manage Kubernetes Clusters and Nodes on Oracle Cloud Native Environment

Introduction

Objectives

Prerequisites

Deploy Oracle Cloud Native Environment

Review Existing Cluster and Node Information

Deploy Nginx

Cordoning a Node

Draining a Node

Confirm the Cordon Command Works

Uncordon the Node

Confirm the Uncordoned Node is Available Again

Introducing Taints and Tolerations

What are Taints and Tolerations?

Are they guaranteed to work?

Why use Taints and Tolerations?

Review Existing Taints across all Nodes

Apply a Taint to a Worker Node

Define a Toleration in a Deployment File

Next Steps

Related Links