Use Affinity with Oracle Cloud Native Environment
Introduction
The ability to influence how Kubernetes schedules Pods to provide the best performance, reduce running costs, and simplify cluster management is an essential skill for an administrator to master.
But what happens if you have several applications deployed to your Kubernetes cluster that would run more efficiently on some nodes rather than others? Administrators use several ways to influence how the Kubernetes scheduler assigns application Pods to specific nodes within your cluster. Node Affinity, Pod affinity, and Pod anti-affinity help by providing flexible rules that govern how the Kubernetes scheduler deploys Pods to nodes in the cluster.
Objectives
In this tutorial, you will learn:
- How to use affinity
- How to use anti-affinity
Prerequisites
- Installation of Oracle Cloud Native Environment
- a single control node and four worker nodes
Deploy Oracle Cloud Native Environment
Note: If running in your own tenancy, read the linux-virt-labs
GitHub project README.md and complete the prerequisites before deploying the lab environment.
Open a terminal on the Luna Desktop.
Clone the
linux-virt-labs
GitHub project.git clone https://github.com/oracle-devrel/linux-virt-labs.git
Change into the working directory.
cd linux-virt-labs/ocne2
Install the required collections.
ansible-galaxy collection install -r requirements.yml
Deploy the lab environment.
ansible-playbook create_instance.yml -e localhost_python_interpreter="/usr/bin/python3.6" -e install_ocne_rpm=true -e create_ocne_cluster=true -e "ocne_cluster_node_options='-n 1 -w 4'"
The free lab environment requires the extra variable
local_python_interpreter
, which setsansible_python_interpreter
for plays running on localhost. This variable is needed because the environment installs the RPM package for the Oracle Cloud Infrastructure SDK for Python, located under the python3.6 modules.The default deployment shape uses the AMD CPU and Oracle Linux 8. To use an Intel CPU or Oracle Linux 9, add
-e instance_shape="VM.Standard3.Flex"
or-e os_version="9"
to the deployment command.Important: Wait for the playbook to run successfully and reach the pause task. At this stage of the playbook, the installation of Oracle Cloud Native Environment is complete, and the instances are ready. Take note of the previous play, which prints the public and private IP addresses of the nodes it deploys and any other deployment information needed while running the lab.
Confirm the Number of Nodes
It helps to know the number and names of nodes in your Kubernetes cluster.
Open a terminal and connect via SSH to the ocne instance.
ssh oracle@<ip_address_of_node>
List the nodes in the cluster.
kubectl get nodes
The output confirms both worker nodes are in a Ready state.
Apply New Labels to the Worker Nodes
Apply new labels to the worker nodes.
kubectl label node ocne-worker-1 region=west disktype=ssd kubectl label node ocne-worker-2 region=west kubectl label node ocne-worker-3 region=east disktype=ssd kubectl label node ocne-worker-4 region=east
Confirm the region labels applied to the nodes.
kubectl get nodes --show-labels | grep region
Confirm the disktype labels applied to ocne-worker-1 and ocne-worker-3 nodes.
kubectl get nodes --show-labels | grep disktype
An Overview of Affinity
Affinity is superficially similar to Node Affinity but allows you to define more complex scheduling criteria such as:
- Using nodeSelector to not schedule a Pod if you have not satisfied the defined criteria
- Using affinity and anti-affinity definitions to allow for more nuance in controlling how the Kubernetes scheduler deploys a Pod
When using affinity and anti-affinity, you can choose to define whether a rule is required or is preferred. These act as hard and soft rules within the scheduler. You can also influence scheduling by referencing node and Pod labels, thus controlling whether or not Pods co-locate on a node.
Affinity is available as Node Affinity and Inter-pod affinity/anti-affinity. Node Affinity is similar to node selector but with more flexibility. Meanwhile, the inter-pod affinity/anti-affinity controls how Pods deploy to nodes relative to other Pods already deployed.
A simple way to view affinity is that it determines whether Pods are attracted to run beside other Pods or on a specific node. Whereas, if your Pods have an anti-affinity to specific Pods, they will avoid those Pods and run on a different node.
Let's start by looking at node affinity.
Node Affinity
Node affinity provides administrator-defined rules to influence how the Kubernetes scheduler deploys Pods to the cluster's nodes. It is very similar to nodeSelector but with more flexibility. Node affinity provides a way to influence how Pods are scheduled based on using labels applied to nodes and label selectors defined in the deployment YAML files. Node affinity uses two types of node affinity rules:
requiredDuringSchedulingIgnoredDuringExecution
: The criteria must be satisfied before the Pod schedules to a node. The Pod will not deploy to any Node if none match the requirement.preferredDuringSchedulingIgnoredDuringExecution
: The scheduler tries to locate a node meeting the criteria. If there are none, the scheduler will deploy to any node, passing the required rule.
The ...IgnoredDuringExecution part of both types means that if the node labels change after the Kubernetes scheduler has scheduled the Pod to a node, the Pod will continue to run.
Deploy Using Node Affinity
This example shows how to use node affinity rules to ensure the Kubernetes scheduler applies two node affinity rules. The first affinity rule is that the application must only deploy to nodes in the west region. The second affinity rule is that the application provides the best performance when deployed to a node having an SSD.
Create the web-backend deployment YAML file.
cat << EOF | tee node-affinity.yaml > /dev/null --- apiVersion: apps/v1 kind: Deployment metadata: name: web-backend labels: app: web-backend spec: selector: matchLabels: app: web-backend replicas: 1 template: metadata: labels: app: web-backend spec: containers: - name: web-backend image: ghcr.io/oracle/oraclelinux9-nginx:1.20 affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: region operator: In values: - west preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: disktype operator: In values: - ssd EOF
The affinity: section of the deployment file determines the node affinity. We illustrate a required and a preferred node affinity rule. Let's examine these rules in more detail:
- The required... rule is interpreted by the Kubernetes scheduler as a must-be true value. So if the labels on the nodes do not match, the Pods will not schedule there. Think of it as a hard rule that Kubernetes must meet. The Kubernetes scheduler looks for a node with a label matching region=west.
- The preferred... rule, on the other hand, is a soft rule. If the required rule of region=west is matched, the Pod will schedule to a node even if the preferred rule of disktype=ssd does not match but selects one with the preferred rule as a precedence over one that does not have it.
- The weight option is typically used where you define multiple preferred rules that could potentially equally apply in certain circumstances. The weight rule allows you to influence the choice made by the Kubernetes scheduler. The range available is 1 through 100, with 100 being the highest priority.
- The ...IgnoredDuringExecution term used in both the required and the preferred rules, means that if, for any reason, the label matched during deployment is, say, altered or removed, then the Kubernetes scheduler will not move the Pod from the node. In other words, Kubernetes only applies affinity rules when creating and scheduling the Pod.
Deploy the application.
kubectl apply -f node-affinity.yaml
Confirm the Pods always deploy to the west region and prefers nodes with the ssd label.
kubectl get pods -o wide
Notice the output shows that the Kubernetes scheduler always places the Pod on the ocne-worker-1 node because it is in the west region and labeled as an ssd node.
Scale Up the Number of Pods
As long as the required node has sufficient resources, new Pods should continue being scheduled by Kubernetes to the ocne-worker-1 node.
Increase the number of Pods.
kubectl scale deploy web-backend --replicas 10
Check where the Kubernetes scheduler deployed the Pods.
kubectl get pods -o wide
Repeat this command until all the Pods STATUS shows as Running. As expected, the Kubernetes scheduler is honoring the required and the preferred affinity rules and placing all the newly created Pods onto the ocne-worker-1 node.
Nodes with Preferred Rules Unavailable
As you have seen, the Kubernetes scheduler will honor the preferred rule as long as the node(s) have sufficient amounts of the indicated resources (GPU, CPU, memory, etc.). But what happens if that resource is no longer available on the node(s)? Let's find out.
Apply a taint to the ocne-worker-1 node to simulate the SSD resource being unavailable.
kubectl taint nodes ocne-worker-1 disktype=ssd:NoSchedule
Scale up the number of Pods to 15.
kubectl scale deploy web-backend --replicas 15
Check where Kubernetes schedules the Pods.
kubectl get pods -o wide
The output confirms that the Kubernetes scheduler is honoring the required affinity rule and placing all the newly created Pods onto the ocne-worker-2 node. This behavior occurs because the required rule states that all Pods must deploy to a node in the west region, and the ocne-worker-2 is in the west region.
No Nodes Matching Affinity Rules
As you have seen, the Kubernetes scheduler will honor the preferred rule as long as the node(s) have a sufficient amount of the indicated resource. But what happens when the indicated resource is no longer available on the node(s) or if something happens to all the nodes in the west region? Let's find out.
Apply a taint to the ocne-worker-2 node to ensure no available nodes in the west region, thus simulating an outage.
kubectl taint nodes ocne-worker-2 disktype=ssd:NoSchedule
Scale up the number of Pods to 20.
kubectl scale deploy web-backend --replicas 20
Check the scheduling of the Pods.
kubectl get pods -o wide
Notice the output shows a status of Pending for the five newly requested Pods. They'll remain in a Pending status because the required rule states that the Pods must only deploy in the west region. So even though the cluster has resources available in the east region, the Kubernetes scheduler will not deploy them there.
Clean Up the Cluster
Before you move on and look at Pod affinity, remove the taints you applied and scale the deployment back to a single Pod.
Remove the taints.
kubectl taint nodes ocne-worker-1 disktype=ssd:NoSchedule- kubectl taint nodes ocne-worker-2 disktype=ssd:NoSchedule-
Scale the deployment back to a single Pod.
kubectl scale deploy web-backend --replicas 0 kubectl scale deploy web-backend --replicas 1
We scaled back to 0 and then up to 1 to ensure that the single remaining Pod fully complies with both node affinity rules in the deployment YAML file.
Verify the Pods status.
kubectl get pods -o wide
During the scaling back of the Pods, you may notice one or more of the Pods with a STATUS of Terminating. We expect this behavior, and you can ignore it. Repeat the check until you have a single Pod in a Running status.
Delete the existing deployments.
kubectl delete deployment web-backend
Pod Affinity
Pod affinity and anti-affinity refer to how Pods are scheduled depending on the labels the Pods already running have. While Pod affinity is a way to attract Pods to other Pods based on their labels, Pod anti-affinity is the opposite and makes Pods avoid other Pods based on their labels.
So think of Pod affinity as a way to influence how Kubernetes schedules Pods that work nicely together or use Pod anti-affinity to avoid scheduling Pods together that may impact each other.
Deploy Using Pod Affinity
Remember, Pod affinity is a way to influence how Pods are scheduled to the cluster's nodes using the labels on Pods already deployed to either attract or repel them from a node. We'll use Pod affinity to instruct the Kubernetes scheduler to place the Pod on a node where a specific Pod is already running. If a matching Pod isn't already running on a node, the scheduler will not deploy the Pod to that node. This behavior occurs because of the required rule in the deployment YAML file.
Create the web-gui deployment YAML file.
cat << EOF | tee pod-affinity.yaml > /dev/null --- apiVersion: apps/v1 kind: Deployment metadata: name: web-gui labels: app: web-gui spec: replicas: 1 selector: matchLabels: app: web-gui template: metadata: labels: app: web-gui spec: containers: - name: web-gui image: ghcr.io/oracle/oraclelinux9-nginx:1.20 affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - web-backend topologyKey: "kubernetes.io/hostname" EOF
Like the previous Node affinity definition, the affinity: section of the deployment file determines the affinity type, which we set as podAffinity for this deployment.
Deploy the application.
kubectl apply -f node-affinity.yaml -f pod-affinity.yaml
Confirm the location of the Pods.
kubectl get pods -o wide
As expected, the web-gui and web-backend Pods run on the ocne-worker-1 node.
Scale Up the Number of Pods
Let's try scaling up the number of both backend and gui Pods to see if the behavior remains consistent.
Scale up the number of Pods.
kubectl scale deploy web-backend --replicas 2 kubectl scale deploy web-gui --replicas 2
Verify the Pods location.
kubectl get pods -o wide
The Kubernetes scheduler continued to behave as expected by placing all the newly created Pods onto the ocne-worker-1 node.
This case provides a simple example illustrating how Pod affinity works. However, running multiple copies of the same Pod on the node may reduce performance. Can this effect be mitigated? Yes, using anti-affinity can help prevent numerous copies of the same Pod from running on the same node.
Delete the existing deployments to prepare for the next section.
kubectl delete deployment web-backend kubectl delete deployment web-gui
Pod Anti-affinity
Pod anti-affinity is a way of defining a rule that allows you to prevent Pods from deploying on nodes based on the labels of other Pods on that node.
Modify the web-backend deployment YAML file.
cat << EOF | tee node-affinity.yaml > /dev/null --- apiVersion: apps/v1 kind: Deployment metadata: name: web-backend labels: app: web-backend spec: selector: matchLabels: app: web-backend replicas: 1 template: metadata: labels: app: web-backend spec: containers: - name: web-backend image: ghcr.io/oracle/oraclelinux9-nginx:1.20 affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - web-backend topologyKey: "kubernetes.io/hostname" nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: region operator: In values: - west preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: disktype operator: In values: - ssd EOF
Deploy the application.
kubectl apply -f node-affinity.yaml -f pod-affinity.yaml
Confirm the Pods location.
kubectl get pods -o wide
The web-gui and web-backend Pods deploy on the ocne-worker-1 node.
Scale Up the Number of Pods
Scale up the number of Pods.
kubectl scale deploy web-backend --replicas 3 kubectl scale deploy web-gui --replicas 3
Check the Pods status.
kubectl get pods -o wide
The Kubernetes scheduler deployed one pair of Pods to the ssd node and another to a non-ssd node. The third pair splits the deployment where the web-backend is left in a Pending state, while the web-gui deploys successfully to ocne-worker-1. The cluster does this because the required and the preferred rules for the web-backend deployment only allow one set of the application Pods to run on each node. The only way to run three copies of the web-backend application in this example is to either increase the number of nodes in the west region or alter the required and the preferred rules to allow Kubernetes to schedule the new Pods to the east region.
Next Steps
This tutorial shows how using affinity provides a way to introduce flexibility and control over how your applications deploy on a Kubernetes cluster. Unlike labels, which are hard rules applied to both nodes and Pods, affinity uses its extra rule flexibility to give administrators a way to influence how Kubernetes reacts to changes in the cluster's environment. You have also seen how the affinity rules determine how the scheduler selects a preferred node and an alternative node if the preferred node is unavailable.
That completes our walkthrough, which introduces affinity and demonstrates how it can help you manage your application deployments.