Use Kubectl to Manage Kubernetes Clusters and Nodes on Oracle Cloud Native Environment
Introduction
Although graphical tools can manage Kubernetes, many administrators prefer to use command-line tools. The command line tool provided within the Kubernetes ecosystem is called kubectl . Kubectl is a versatile tool used to deploy and inspect the configurations and logs of the cluster resources and applications. Kubectl achieves this by using the Kubernetes API to authenticate with the control Node of the Kubernetes cluster to complete any management actions requested by the administrator.
Most of the operations/commands available for kubectl provide administrators with the ability to deploy and manage applications deployed onto the Kubernetes cluster and inspect and manage the Kubernetes cluster resources.
Note: Many kubectl commands have the
--all-namespaces
option appended. For this reason, a shorthand for this option is the-A
flag. This tutorial often useskubectl -A
instead ofkubectl --all-namespaces
.
Objectives
In this tutorial, you will learn:
- Querying Cluster Information
- Querying Node information
- Deploying an example application (Nginx)
- Introducing new concepts such as:
- Cordoning/Uncordoning and Draining Nodes
- Taints and Tolerations
Prerequisites
- Installation of Oracle Cloud Native Environment
- a single control node and 3 worker nodes
Deploy Oracle Cloud Native Environment
Note: If running in your own tenancy, read the linux-virt-labs
GitHub project README.md and complete the prerequisites before deploying the lab environment.
Open a terminal on the Luna Desktop.
Clone the
linux-virt-labs
GitHub project.git clone https://github.com/oracle-devrel/linux-virt-labs.git
Change into the working directory.
cd linux-virt-labs/ocne2
Install the required collections.
ansible-galaxy collection install -r requirements.yml
Deploy the lab environment.
ansible-playbook create_instance.yml -e localhost_python_interpreter="/usr/bin/python3.6" -e install_ocne_rpm=true -e create_ocne_cluster=true -e "ocne_cluster_node_options='-n 1 -w 3'"
The free lab environment requires the extra variable
local_python_interpreter
, which setsansible_python_interpreter
for plays running on localhost. This variable is needed because the environment installs the RPM package for the Oracle Cloud Infrastructure SDK for Python, located under the python3.6 modules.The default deployment shape uses the AMD CPU and Oracle Linux 8. To use an Intel CPU or Oracle Linux 9, add
-e instance_shape="VM.Standard3.Flex"
or-e os_version="9"
to the deployment command.Important: Wait for the playbook to run successfully and reach the pause task. At this stage of the playbook, the installation of Oracle Cloud Native Environment is complete, and the instances are ready. Take note of the previous play, which prints the public and private IP addresses of the nodes it deploys and any other deployment information needed while running the lab.
Review Existing Cluster and Node Information
An essential precursor to administering any Kubernetes cluster is discovering what nodes are present, the pods executing on those nodes, and so on. This action allows you to plan and anticipate temporarily disabling pod scheduling on nodes while they undergo any required maintenance or troubleshooting.
Open a terminal and connect via SSH to the ocne instance.
ssh oracle@<ip_address_of_node>
Query a complete list of existing nodes.
kubectl get nodes
Note that the output returns a list of all the deployed nodes with status details and the Kubernetes version.
Request more details about one of the nodes.
kubectl describe node $(kubectl get nodes | awk 'FNR==3 {print $1}')
This command returns a wealth of information related to the Kubernetes node, starting with the following:
Name:
confirms the Kubernetes node nameLabels:
key/value pairs used to identify object attributes relevant to end-usersAnnotations:
key/value pairs used to store extra information about a Kubernetes nodeUnschedulable: false
indicates the node accepts any deployed pod
Example Output:
Name: ocne-worker-1 Roles: <none> Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=ocne-worker-1 kubernetes.io/os=linux Annotations: flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"02:86:99:e2:8a:95"} flannel.alpha.coreos.com/backend-type: vxlan flannel.alpha.coreos.com/kube-subnet-manager: true flannel.alpha.coreos.com/public-ip: 192.168.122.169 kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/crio/crio.sock node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Thu, 26 Sep 2024 17:56:13 +0000 Taints: <none> Unschedulable: false Lease: HolderIdentity: ocne-worker-1 AcquireTime: <unset> RenewTime: Thu, 26 Sep 2024 17:56:54 +0000 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- NetworkUnavailable False Thu, 26 Sep 2024 17:56:18 +0000 Thu, 26 Sep 2024 17:56:18 +0000 FlannelIsUp Flannel is running on this node MemoryPressure False Thu, 26 Sep 2024 17:56:16 +0000 Thu, 26 Sep 2024 17:56:13 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Thu, 26 Sep 2024 17:56:16 +0000 Thu, 26 Sep 2024 17:56:13 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Thu, 26 Sep 2024 17:56:16 +0000 Thu, 26 Sep 2024 17:56:13 +0000 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Thu, 26 Sep 2024 17:56:16 +0000 Thu, 26 Sep 2024 17:56:16 +0000 KubeletReady kubelet is posting ready status Addresses: InternalIP: 192.168.122.169 Hostname: ocne-worker-1 Capacity: cpu: 2 ephemeral-storage: 20153324Ki hugepages-2Mi: 0 memory: 3695444Ki pods: 110 Allocatable: cpu: 2 ephemeral-storage: 18573303368 hugepages-2Mi: 0 memory: 3593044Ki pods: 110 System Info: Machine ID: 284851e5afa74b1088cde1ccaf570d8c System UUID: 284851e5-afa7-4b10-88cd-e1ccaf570d8c Boot ID: 0979d758-9979-4df1-b428-c645562f871f Kernel Version: 5.15.0-209.161.7.2.el8uek.x86_64 OS Image: Oracle Linux Server 8.10 Operating System: linux Architecture: amd64 Container Runtime Version: cri-o://1.30.3 Kubelet Version: v1.30.3+1.el8 Kube-Proxy Version: v1.30.3+1.el8 PodCIDR: 10.244.1.0/24 PodCIDRs: 10.244.1.0/24 Non-terminated Pods: (4 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age --------- ---- ------------ ---------- --------------- ------------- --- kube-flannel kube-flannel-ds-xltn9 100m (5%) 0 (0%) 50Mi (1%) 0 (0%) 47s kube-system kube-proxy-gmxsx 0 (0%) 0 (0%) 0 (0%) 0 (0%) 47s ocne-system ocne-catalog-578c959566-4qbcj 0 (0%) 0 (0%) 0 (0%) 0 (0%) 78s ocne-system ui-84dd57ff69-szphp 0 (0%) 0 (0%) 0 (0%) 0 (0%) 79s Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 100m (5%) 0 (0%) memory 50Mi (1%) 0 (0%) ephemeral-storage 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) Events: Type Reason Age From Message ---- ------ ---- ---- -------
NOTE: Don't clear the output from your terminal because the next steps highlight some areas of interest in this output.
This output excerpt shows the internal and external IPs, internal DNS name, and hostname assigned to the node.
Example Output (excerpt):
Addresses: InternalIP: 192.168.122.169 Hostname: ocne-worker-1 Capacity: cpu: 2 ephemeral-storage: 20153324Ki hugepages-2Mi: 0 memory: 3695444Ki pods: 110
A little further down the output, the
Non-terminated pods:
section shows what pods are running on the node with details of CPU and memory requests and any limits per pod.Example Output (excerpt):
Non-terminated Pods: (4 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age --------- ---- ------------ ---------- --------------- ------------- --- kube-flannel kube-flannel-ds-xltn9 100m (5%) 0 (0%) 50Mi (1%) 0 (0%) 47s kube-system kube-proxy-gmxsx 0 (0%) 0 (0%) 0 (0%) 0 (0%) 47s ocne-system ocne-catalog-578c959566-4qbcj 0 (0%) 0 (0%) 0 (0%) 0 (0%) 78s ocne-system ui-84dd57ff69-szphp 0 (0%) 0 (0%) 0 (0%) 0 (0%) 79s
Finally, the
Allocated resources:
section outlines any resources assigned to the node.Example Output (excerpt):
Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 100m (5%) 0 (0%) memory 50Mi (1%) 0 (0%) ephemeral-storage 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%)
In summary, the
kubectl describe node
command is very useful in providing the administrator with a wealth of information about a Kubernetes node that can assist with planning or troubleshooting deployments.
Deploy Nginx
Currently, we do not have any applications deployed to the three worker nodes, which makes demonstrating the effects of any Node management commands more difficult. The following steps will deploy an Nginx pod onto each worker node.
Generate a deployment file to create Nginx Deployments onto each of the three worker nodes.
cat << EOF | tee ~/deployment.yaml > /dev/null apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment spec: selector: matchLabels: app: nginx replicas: 3 # tells deployment to run 3 pods matching the template template: metadata: labels: app: nginx spec: containers: - name: nginx image: ghcr.io/oracle/oraclelinux9-nginx:1.20 ports: - containerPort: 80 EOF
Where:
name: nginx-deployment
represents the deployment's namereplicas: 3
represents the number of pods to deployimage: nginx:1.24.0
represents the Nginx version the Kubernetes pod will deploy
Note: There are numerous possibilities for describing a deployment in its associated YAML file. For more detail, refer to the upstream documentation .
Deploy Nginx onto the three OCNE worker Nodes.
kubectl create -f ~/deployment.yaml
The
-f
switch indicates which YAML file to use and its location.Confirm Nginx deployed across all three worker Nodes.
kubectl get pods --namespace default -o wide
Example Output:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-deployment-64845dcbf6-4mg54 1/1 Running 0 11s 10.244.1.5 ocne-worker-1 <none> <none> nginx-deployment-64845dcbf6-tkpcd 1/1 Running 0 19s 10.244.3.3 ocne-worker-2 <none> <none> nginx-deployment-64845dcbf6-xqdqg 1/1 Running 0 27s 10.244.2.3 ocne-worker-3 <none> <none>
Review what Pods deployed onto a single node.
kubectl get pods --field-selector spec.nodeName=ocne-worker-1 -o wide
Example Output:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-deployment-64845dcbf6-4mg54 1/1 Running 0 115s 10.244.1.5 ocne-worker-1 <none> <none>
Using these options is helpful in busy Kubernetes environments where there are potentially many nodes, each hosting many deployments, to assist in planning maintenance operations.
Cordoning a Node
Like anything, Kubernetes nodes occasionally require maintenance. This maintenance may involve replacing physical hardware and updating the Node's Operating System or Kernel. Cordons and drains are two mechanisms that facilitate the safe preparation of the target node to ensure that any applications deployed on the node do not affect the end user's experience when using those applications.
Identify existing nodes.
kubectl get nodes
Cordon one of the worker nodes.
kubectl cordon ocne-worker-1
Confirm the node is cordoned.
kubectl get nodes
Notice that the cordoned Node now lists with the
SchedulingDisabled
status. Subsequently, no new applications will deploy to that node, and any existing applications will continue to service current sessions.
Draining a Node
Before undertaking any maintenance on the newly cordoned node, any pods deployed onto that node must be removed/evicted. The drain command gracefully terminates the pod's containers. Once the drain command is complete, it is safe to complete whatever actions have been planned for that node, for example, scheduled maintenance such as upgrading the Operating System or the Kernel.
Drain ocne-worker-1 node.
kubectl drain ocne-worker-1 --delete-emptydir-data --ignore-daemonsets --force
Why is the
--ignore-daemonset --force
command needed? Adding these command options overrides the Daemonset controller's default pod management behavior of replacing any closed pods with a new instance of the equivalent pod while also evicting all pods on the node.Example Output:
node/ocne-worker-1 already cordoned Warning: ignoring DaemonSet-managed Pods: kube-flannel/kube-flannel-ds-xltn9, kube-system/kube-proxy-gmxsx evicting pod ocne-system/ui-84dd57ff69-szphp evicting pod default/nginx-deployment-64845dcbf6-4mg54 evicting pod ocne-system/ocne-catalog-578c959566-4qbcj pod/ui-84dd57ff69-szphp evicted pod/ocne-catalog-578c959566-4qbcj evicted pod/nginx-deployment-64845dcbf6-4mg54 evicted node/ocne-worker-1 drained
Confirm Nginx is no longer running on the drained node.
kubectl get pods --namespace default -o wide
The NODE column shows the nginx pod running only on the ocne-worker-2 and ocne-worker-3 nodes.
Because the
deployment.yaml
file instructed Kubernetes to deploy three pods, notice there are still three pods listed in the output. This example shows the third pod has redeployed to ocne-worker-3. However, this may differ in your environment because the Kubernetes scheduler may decide differently.
Confirm the Cordon Command Works
Delete the existing Nginx deployment.
kubectl delete deployment nginx-deployment
Confirm Nginx is no longer present.
kubectl get pods --namespace default -o wide
The output shows a No resources found in default namespace message.
Deploy Nginx again.
kubectl create -f ~/deployment.yaml
Confirm no pods deploy onto a Cordoned worker node.
kubectl get pods --namespace default -o wide
The output shows that pods do not deploy onto the 'cordoned' node. The cluster decides which nodes get multiple pods to meet the deployment setting of three replicas.
Uncordon the Node
Once the maintenance is complete, the previously cordoned node is returned to the pool using the kubectl uncordon
command.
Uncordon the ocne-worker-1 node.
kubectl uncordon ocne-worker-1
Confirm the ocne-worker-1 node is available for scheduling again.
kubectl get nodes
The command removes the SchedulingDisabled flag under the STATUS column, confirming the cluster returns the cordoned node to the pool.
Confirm the Uncordoned Node is Available Again
Important: Be aware that the following steps are not required on a live system because they would cause an application outage if used. Under normal circumstances, the Kubernetes scheduler determines where individual Pods are deployed based on many things, such as the overall load across the cluster. Therefore, Kubernetes Scheduler is free to schedule pods wherever it wants, evict pods whenever it wants, etc. The steps provided here are for illustrative purposes to demonstrate that once a node is uncordoned it is available for Kubernetes to schedule pods whenever the Kubernetes Scheduler chooses.
Delete the existing Nginx deployment.
kubectl delete deployment nginx-deployment
Deploy Nginx again.
kubectl create -f ~/deployment.yaml
Confirm Nginx has deployed across all three worker Nodes.
kubectl get pods --namespace default -o wide
The results show all worker nodes now host an Nginx pod, confirming that the node is available again for deployments.
Introducing Taints and Tolerations
Managing where application pods deploy within a Kubernetes cluster is an essential and skilled aspect of being a Kubernetes administrator. Effective scheduling management can help companies improve the efficient use of their resources, control costs, and manage applications at scale across a cluster. This section does not provide in-depth coverage of all aspects of this complicated aspect of Kubernetes administration. Instead, it introduces Taints
and Tolerations
and how they can aid an administrator in their role.
What are Taints and Tolerations?
Taints represent a Kubernetes property assigned to nodes to repel certain pods when deployed onto the cluster. Tolerations, on the other hand, represent a property linked to an application as part of the deployment.yaml
file, indicating you can schedule this pod or pods against any node having a matching taint. Setting these properties is how a Kubernetes administrator can exercise some control over which nodes an application's pods deploy onto.
Are they guaranteed to work?
Taints and tolerations help to repel pods from deploying to a defined node. However, they cannot ensure that a specific pod deploys to a predetermined node. Instead, another advanced scheduling technique Kubernetes administrators use, which, when used together with taints and tolerations, can control where pods deploy to execute. This additional technique is known as node affinity
but is outside the scope of this tutorial.
Why use Taints and Tolerations?
The simplest way to view the most common use cases behind why an administrator would choose to use taints and tolerations includes the following:
Configure dedicated nodes: Taints and tolerations, combined with a node affinity definition, can help ensure matching pods deploy to these nodes.
Indicate nodes with special hardware: When a pod requires specific hardware to be present to either run or run most efficiently, using taints and tolerations allows administrators to ensure the most relevant pods will be scheduled onto these nodes by the Kubernetes scheduler process.
Eviction of pods based on node conditions: If an administrator assigns a taint to a node that already has pods assigned to it, then existing pods that do not possess a 'toleration' will eventually be automatically evicted from that node by the Kubernetes Scheduler.
Review Existing Taints across all Nodes
Before making any changes to the existing nodes, the administrator must establish whether any taints definitions apply across all the existing nodes.
Confirm what taints exist currently across all of the nodes.
kubectl get nodes -o=custom-columns=NodeName:.metadata.name,TaintKey:.spec.taints[*].key,TaintValue:.spec.taints[*].value,TaintEffect:.spec.taints[*].effect
The output shows that only the control plane node has an existing taint assigned to it, which is node-role.kubernetes.io/control-plane:NoSchedule, and prevents pods from deploying onto the control plane node itself. The
node-role.kubernetes.io/control-plane:NoSchedule
setting automatically applied by thekubeadm
tool while it is performing the actions to bootstrap the Kubernetes control plane node.
Apply a Taint to a Worker Node
Taints are very similar to labels applied to any number of nodes within the Kubernetes cluster. The Kubernetes scheduler will only schedule pods to execute on the node where the deployment.yaml
file defines tolerations, thus allowing them to be on that node.
Taints are applied to a node using a command in the format: kubectl taint nodes nodename key1=value1:taint-effect
. The command declares a taint as a key-value pair, where value1 is the key and taint-effect represents the value. There are three taint effects available which are:
The
NoSchedule
or strong effect taint instructs the Kubernetes scheduler to allow only newly deployed pods possessing tolerations to execute on this node. Any existing pods will continue executing unaffected.The
PreferNoSchedule
or soft effect taint instructs the Kubernetes scheduler to try to avoid scheduling newly deployed pods on this node unless they have a toleration.The
NoExecute
taint instructs the Kubernetes scheduler to evict any running pods from the node unless they have tolerations for the tainted node.
Delete the existing Nginx deployment.
kubectl delete deployment nginx-deployment
Apply a NoExecute taint to the ocne-worker-1 node.
kubectl taint nodes ocne-worker-1 app=nginx:NoSchedule
Where:
- Key matches the
app: nginx
line defined in thespec:
section of thedeployment.yaml
file. - value matches the taint of NoSchedule to be applied
- Key matches the
Confirm the taint has applied to the ocne-worker-1 node.
kubectl get nodes -o=custom-columns=NodeName:.metadata.name,TaintKey:.spec.taints[*].key,TaintValue:.spec.taints[*].value,TaintEffect:.spec.taints[*].effect
Example Output:
NodeName TaintKey TaintValue TaintEffect ocne-control-plane-1 node-role.kubernetes.io/control-plane <none> NoSchedule ocne-worker-1 app nginx NoSchedule ocne-worker-2 <none> <none> <none> ocne-worker-3 <none> <none> <none>
Where:
- The Taintkey column shows the value app for ocne-worker-1
- The TaintValue column shows the value nginx for ocne-worker-1
Deploy Nginx again.
kubectl create -f ~/deployment.yaml
Confirm no pods have deployed onto the tainted ocne-worker-1 node.
kubectl get pods --namespace default -o wide
As expected, no pods have been deployed onto the cone-worker-01 node, demonstrating the effect of the NoSchedule taint.
Define a Toleration in a Deployment File
The next step is to use a deployment file containing a toleration that allows deploying new pods to a node containing a taint.
The first step is to delete the existing Nginx deployment.
kubectl delete deployment nginx-deployment
Create a new deployment file to deploy Nginx to the cluster.
cat << EOF | tee ~/deployment-toleration.yaml > /dev/null apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment spec: selector: matchLabels: app: nginx replicas: 3 # tells deployment to run 3 pods matching the template template: metadata: labels: app: nginx spec: containers: - name: nginx image: ghcr.io/oracle/oraclelinux9-nginx:1.20 ports: - containerPort: 80 tolerations: - key: "app" operator: "Equal" value: "nginx" effect: "NoSchedule" EOF
Where the toleration is described in the last section of the deployment descriptor under spec: template: spec: tolerations:
Remember the taint app=nginx:NoSchedul declared earlier where:
- The
key
used was app - The
value
used was nginx
That taint matches the values defined in the toleration section of the deployment file's YAML. The toleration uses the
Equal
operator to ensure the taint value matches the toleration. If this matches, the Kubernetes Scheduler can deploy the pod onto the node. If not, the Kubernetes scheduler will not deploy the pod onto the node.- The
Deploy the Nginx using the deployment-toleration.yaml file.
kubectl create -f ~/deployment-toleration.yaml
Confirm the deployment used all three worker nodes.
kubectl get pods --namespace default -o wide
The NODE column confirms the pod deploys across all nodes, thus proving that the toleration has enabled the Kubernetes scheduler to deploy the pods across all three available nodes.
For now, this serves only as an introduction to the wide variety of options available to a Kubernetes administrator to fine-tune the deployment of applications and broader maintenance of the Kubernetes cluster for which they are responsible. This ability will eventually become one of several tools the administrator will use to manage and troubleshoot the Kubernetes cluster for which they're responsible.
Next Steps
That concludes this very brief introduction to how an experienced administrator can use the kubectl command-line tool to manage Pod Scheduling operations on their Kubernetes cluster. The examples introduced here provide a basic introduction to the general scheduling principles on a Kubernetes cluster. If you want to learn more, please refer to the official documentation for more details.