Use Affinity with Oracle Cloud Native Environment

0
0
Send lab feedback

Use Affinity with Oracle Cloud Native Environment

Introduction

The ability to influence how Kubernetes schedules Pods to provide the best performance, reduce running costs and simplify cluster management is an important skill for an administrator to master.

But what happens if you have several applications deployed to your Kubernetes cluster that would run more efficiently on some nodes rather than others? Administrators use several ways to influence how the Kubernetes scheduler assigns application pods to specific nodes within your cluster. Node Affinity and Pod Affinity/Anti-affinity help by providing flexible rules that govern how the Kubernetes scheduler deploys pods to nodes in the cluster. This tutorial covers using Affinity and Anti-Affinity.

Objectives

In this lab, you will learn:

  • How to use Affinity
  • How to use Anti-Affinity

Prerequisites

  • Minimum of a 6-node Oracle Cloud Native Environment cluster:

    • Operator node
    • Kubernetes control plane node
    • 4 Kubernetes worker nodes
  • Each system should have Oracle Linux installed and configured with:

    • An Oracle user account (used during the installation) with sudo access
    • Key-based SSH, also known as password-less SSH, between the hosts
    • Installation of Oracle Cloud Native Environment

Additional Information

Before proceeding: This tutorial follows from these tutorials:

If you are unfamiliar with using Kubernetes Labels, nodeSelector or Taints & Tolerations, these tutorials will help with your knowledge.

Deploy Oracle Cloud Native Environment

Note: If running in your own tenancy, read the linux-virt-labs GitHub project README.md and complete the prerequisites before deploying the lab environment.

  1. Open a terminal on the Luna Desktop.

  2. Clone the linux-virt-labs GitHub project.

    git clone https://github.com/oracle-devrel/linux-virt-labs.git
  3. Change into the working directory.

    cd linux-virt-labs/ocne
  4. Install the required collections.

    ansible-galaxy collection install -r requirements.yaml
  5. Update the Oracle Cloud Native Environment configuration.

    cat << EOF | tee instances.yaml > /dev/null
    compute_instances:
      1:
        instance_name: "ocne-operator"
        type: "operator"
      2:
        instance_name: "ocne-control-01"
        type: "controlplane"
      3:
        instance_name: "ocne-worker-01"
        type: "worker"
      4:
        instance_name: "ocne-worker-02"
        type: "worker"
      5:
        instance_name: "ocne-worker-03"
        type: "worker"
      6:
        instance_name: "ocne-worker-04"
        type: "worker"
    EOF
  6. Deploy the lab environment.

    ansible-playbook create_instance.yaml -e ansible_python_interpreter="/usr/bin/python3.6" -e "@instances.yaml"

    The free lab environment requires the extra variable ansible_python_interpreter because it installs the RPM package for the Oracle Cloud Infrastructure SDK for Python. The location for this package's installation is under the python3.6 modules.

    Important: Wait for the playbook to run successfully and reach the pause task. The Oracle Cloud Native Environment installation is complete at this stage of the playbook, and the instances are ready. Take note of the previous play, which prints the public and private IP addresses of the nodes it deploys.

Confirm the Number of Nodes

It helps to know the number and names of nodes in your Kubernetes cluster.

  1. Open a terminal and connect via SSH to the ocne-control-01 node.

    ssh oracle@<ip_address_of_ocne-control-o1>

    Note: All steps are completed on the ocne-control-01 node.

  2. List the nodes in the cluster.

    kubectl get nodes

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl get nodes
    NAME              STATUS   ROLES           AGE     VERSION
    ocne-control-01   Ready    control-plane   9m47s   v1.28.3+3.el8
    ocne-worker-01    Ready    <none>          8m7s    v1.28.3+3.el8
    ocne-worker-02    Ready    <none>          6m34s   v1.28.3+3.el8
    ocne-worker-03    Ready    <none>          8m10s   v1.28.3+3.el8
    ocne-worker-04    Ready    <none>          5m1s    v1.28.3+3.el8

    Which confirms all the worker nodes are in a Ready state.

Apply New Labels to the Worker Nodes

  1. Apply new labels to the worker nodes.

    kubectl label node ocne-worker-01 region=west disktype=ssd
    kubectl label node ocne-worker-02 region=west
    kubectl label node ocne-worker-03 region=east disktype=ssd
    kubectl label node ocne-worker-04 region=east
    

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl label node ocne-worker-01 region=west disktype=ssd
    node/ocne-worker-01 labeled
    [oracle@ocne-control-01 ~]$ kubectl label node ocne-worker-02 region=west
    node/ocne-worker-02 labeled
    [oracle@ocne-control-01 ~]$ kubectl label node ocne-worker-03 region=east disktype=ssd
    node/ocne-worker-03 labeled
    [oracle@ocne-control-01 ~]$ kubectl label node ocne-worker-04 region=east
    node/ocne-worker-04 labeled
  2. Confirm the region labels applied to the nodes.

    kubectl get nodes --show-labels | grep region

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl get nodes --show-labels | grep region
    ocne-worker-01    Ready    <none>          16m   v1.28.3+3.el8   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=ocne-worker-01,kubernetes.io/os=linux,region=west
    ocne-worker-02    Ready    <none>          15m   v1.28.3+3.el8   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=ocne-worker-02,kubernetes.io/os=linux,region=west
    ocne-worker-03    Ready    <none>          16m   v1.28.3+3.el8   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=ocne-worker-03,kubernetes.io/os=linux,region=east
    ocne-worker-04    Ready    <none>          13m   v1.28.3+3.el8   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=ocne-worker-04,kubernetes.io/os=linux,region=east
  3. Confirm the disktype labels applied to ocne-worker-01 and ocne-worker-03 nodes.

    kubectl get nodes --show-labels | grep disktype

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl get nodes --show-labels | grep disktype
    ocne-worker-01    Ready    <none>          16m   v1.28.3+3.el8   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=ocne-worker-01,kubernetes.io/os=linux,region=west
    ocne-worker-03    Ready    <none>          16m   v1.28.3+3.el8   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=ocne-worker-03,kubernetes.io/os=linux,region=east

What is Affinity?

Affinity is superficially similar to nodeSelector (covered in this Tutorial /Luna Lab ), but allows you to define more complex scheduling criteria. For example:

  • Using nodeSelector means that a pod will not schedule if the defined criteria are not satisfied.
  • Affinity/anti-affinity definitions allow for more nuance in controlling how the Kubernetes scheduler deploys a Pod. For example:
    • You can choose to define whether a rule is 'required' or whether it is 'preferred' (Visualize these as hard and soft rules).
    • Pod deployment can be influenced by referencing using both Node and Pod labels. Which means you can control whether Pods co-locate on a node or not.

Two types of affinity are available:

  • Node Affinity - this is similar to nodeSelector, but with more flexibility.
  • Pod Affinity/anti-affinity - provides control over how Pods deploy to nodes relative to other Pods already deployed.

A simple way to view affinity is that it determines whether Pods are attracted to run beside other Pods, or on a specific node. Whereas, if your Pods have an anti-affinity to specific Pods, then they will avoid those Pods and run on a different node.

Let's start by looking at node affinity.

Node Affinity

Node affinity provides administrator-defined rules to influence how the Kubernetes scheduler deploys Pods to the cluster's nodes. It is very similar to nodeSelector but with more flexibility. Node affinity provides a way to influence how Pods are scheduled based on using labels applied to nodes and label selectors defined in the deployment YAML files. Node affinity uses two types of node affinity rules:

  • requiredDuringSchedulingIgnoredDuringExecution: The criteria must be satisfied before the Pod schedules to a node. The Pod will not deploy to any Node if none match the requirement.
  • preferredDuringSchedulingIgnoredDuringExecution: The scheduler tries to locate a node meeting the criteria. If there are none, the scheduler will deploy to any node passing the 'required' rule.

Note: The ...IgnoredDuringExecution part of both types means that if the node labels change after the Kubernetes scheduler has scheduled the Pod to a node, the Pod will continue to run.

Deploy Using Node Affinity

This example shows how to use node affinity rules to ensure the Kubernetes scheduler applies two node affinity rules. The first affinity rule is that the application must only deploy to nodes in the west region. The second affinity rule is that the application provides the best performance when deployed to a node having an SSD.

  1. Create the deployment YAML file.

    cat << EOF | tee -a node-affinity.yaml > /dev/null
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: web-backend
      labels:
        app: web-backend
    spec:
      selector:
        matchLabels:
          app: web-backend
      replicas: 1
      template:
        metadata:
          labels:
            app: web-backend
        spec:
          containers:
          - name: web-backend
            image: nginx:latest
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: region
                    operator: In
                    values:
                    - west
              preferredDuringSchedulingIgnoredDuringExecution:
              - weight: 1
                preference:
                  matchExpressions:
                  - key: disktype
                    operator: In
                    values:
                    - ssd
    EOF
    

    Where:

    The name and type of application being deployed in this example can be ignored. Instead, the section of the deployment file shown below determines the node affinity in this example:

    ...
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: region
              operator: In
              values:
              - west
        preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 1
          preference:
            matchExpressions:
            - key: disktype
              operator: In
              values:
              - ssd

    This example illustrates two node affinity rules - requiredDuringSchedulingIgnoredDuringExecution and preferredDuringSchedulingIgnoredDuringExecution. Let's examine these rules in more detail:

    • The required... rule is interpreted by the Kubernetes scheduler as a must-be true value. So if the labels on the nodes do not match, the Pods will not schedule there. Think of it as being a 'hard' rule. In this example, the Kubernetes scheduler looks for a node having a label matching this value - region=west. This has to be met.
    • The preferred... rule, on the other hand, is a 'soft' rule. This means that as long as the required rule is matched, the Pod will schedule to a node even if the preferred rule does not match. So in this example, as long as the required affinity is met, the Kubernetes scheduler will try to schedule a Pod to a node matching this rule - disktype=ssd. However, if there are only nodes available in the West region without an SSD, the scheduler will place the Pod on that node as long as it is in the West region.
    • The weight: option is normally used where multiple preferred... rules have been defined that could potentially equally apply in certain circumstances. The weight rule allows you to influence the choice made by the Kubernetes scheduler. The range available is 1 (low) -> 100 (high)
    • The ...IgnoredDuringExecution term used in both the required and the preferred rules, means that if for any reason the label matched during deployment is say, altered or removed, then the Kubernetes scheduler will not move the Pod from the node. In other words, affinity rules are only applied when the Kubernetes is creating and scheduling the Pod.
  2. Deploy the application.

    kubectl apply -f node-affinity.yaml

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl apply -f node-affinity.yaml 
    deployment.apps/web-backend created
  3. Confirm the Pods always deploy to the 'west' region and prefer nodes with the 'ssd' label.

    kubectl get pods -o wide

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl get pods -o wide
    NAME                          READY   STATUS    RESTARTS   AGE   IP           NODE             NOMINATED NODE   READINESS GATES
    web-backend-f78d7d444-7dtq2   1/1     Running   0          20s   10.244.1.2   ocne-worker-01   <none>           <none>

    Notice that the Kubernetes scheduler always places the Pod on the 'ocne-worker-01' node because it is in the west region and is labelled as an 'ssd' node.

Scale Up the Amount of Pods

As long as the required node has sufficient resources, new Pods should continue being scheduled by Kubernetes to the ocne-worker-01 node, so let's confirm it.

  1. The initial deployment file only deployed one Pod. What happens if you increase the number of Pods you want deployed?

    kubectl scale deploy web-backend --replicas 10

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl scale deploy web-backend --replicas 10
    deployment.apps/web-backend scaled
  2. Check where the Kubernetes scheduler has scheduled the Pods.

    kubectl get pods -o wide

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl get pods -o wide
    NAME                          READY   STATUS    RESTARTS   AGE     IP            NODE             NOMINATED NODE   READINESS GATES
    web-backend-f78d7d444-6qbqr   1/1     Running   0          44s     10.244.1.6    ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-7dtq2   1/1     Running   0          2m53s   10.244.1.2    ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-bf6zj   1/1     Running   0          44s     10.244.1.10   ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-g9b27   1/1     Running   0          44s     10.244.1.8    ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-gjp58   1/1     Running   0          44s     10.244.1.3    ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-kmpsd   1/1     Running   0          44s     10.244.1.5    ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-m78pp   1/1     Running   0          44s     10.244.1.9    ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-nssdm   1/1     Running   0          44s     10.244.1.7    ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-sddk6   1/1     Running   0          44s     10.244.1.11   ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-zhb8g   1/1     Running   0          44s     10.244.1.4    ocne-worker-01   <none>           <none>

    Note: You may need to repeat the last kubectl command a few times before all of the Pods STATUS changes from ContainerCreating to Running. This is expected behavior, while the Pods deploy.

    Notice that the Kubernetes scheduler is honoring the required and the preferred affinity rules and placing all the newly created Pods onto the 'ocne-worker-01' node.

What Happens if the preferred Rule can't be applied?

As you have seen, the Kubernetes scheduler will honor the preferred rule for as long as the node(s) have sufficient amounts of the indicated resources (GPU, CPU, memory, etc.). But what happens if that resource is no longer available on the node(s)? Let's find out.

  1. Apply a 'taint' to the ocne-worker-01 node to simulate the SSD resource being unavailable.

    kubectl taint nodes ocne-worker-01 disktype=ssd:NoSchedule

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl taint nodes ocne-worker-01 disktype=ssd:NoSchedule
    node/ocne-worker-01 tainted
  2. Scale up the number of Pods deployed to 15.

    kubectl scale deploy web-backend --replicas 15

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl scale deploy web-backend --replicas 15
    deployment.apps/web-backend scaled
  3. Check where the Kubernetes scheduler has scheduled the Pods.

    kubectl get pods -o wide

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl get pods -o wide
    NAME                          READY   STATUS    RESTARTS   AGE   IP            NODE             NOMINATED NODE   READINESS GATES
    web-backend-f78d7d444-54mvn   1/1     Running   0          63s   10.244.2.8    ocne-worker-02   <none>           <none>
    web-backend-f78d7d444-6qbqr   1/1     Running   0          35m   10.244.1.6    ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-7dtq2   1/1     Running   0          37m   10.244.1.2    ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-9dql6   1/1     Running   0          63s   10.244.2.7    ocne-worker-02   <none>           <none>
    web-backend-f78d7d444-bf6zj   1/1     Running   0          35m   10.244.1.10   ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-g9b27   1/1     Running   0          35m   10.244.1.8    ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-gjp58   1/1     Running   0          35m   10.244.1.3    ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-kmpsd   1/1     Running   0          35m   10.244.1.5    ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-m78pp   1/1     Running   0          35m   10.244.1.9    ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-nssdm   1/1     Running   0          35m   10.244.1.7    ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-p46bt   1/1     Running   0          63s   10.244.2.4    ocne-worker-02   <none>           <none>
    web-backend-f78d7d444-sddk6   1/1     Running   0          35m   10.244.1.11   ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-szjsg   1/1     Running   0          63s   10.244.2.6    ocne-worker-02   <none>           <none>
    web-backend-f78d7d444-v4vzn   1/1     Running   0          63s   10.244.2.5    ocne-worker-02   <none>           <none>
    web-backend-f78d7d444-zhb8g   1/1     Running   0          35m   10.244.1.4    ocne-worker-01   <none>           <none>

    Notice that the Kubernetes scheduler is honoring the required affinity rule and placing all the newly created Pods onto the 'ocne-worker-02' node. Why? Because the required rule states states that all Pods must deploy to a node in the west region (and the 'ocne-worker-02' is in the west region).

What Happens if Neither Rule can be applied?

As you have seen, the Kubernetes scheduler will honor the preferred rule for as long as the node(s) have a sufficient amount of the indicated resource available (GPU, CPU, memory, etc.). But what happens when, or if, the indicated resource is no longer available on the node(s), or if something happens to all the nodes in the west region? Let's find out.

  1. Apply a taint to the ocne-worker-02 node to make sure there are no available nodes in the west region (e.g., simulating an outage).

    kubectl taint nodes ocne-worker-02 disktype=ssd:NoSchedule
  2. Scale up the number of Pods deployed to 20.

    kubectl scale deploy web-backend --replicas 20
  3. Check where the Kubernetes scheduler has scheduled the Pods.

    kubectl get pods -o wide

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl get pods -o wide
    NAME                          READY   STATUS    RESTARTS   AGE   IP            NODE             NOMINATED NODE   READINESS GATES
    web-backend-f78d7d444-4fllb   0/1     Pending   0          7s    <none>        <none>           <none>           <none>
    web-backend-f78d7d444-54mvn   1/1     Running   0          17m   10.244.2.8    ocne-worker-02   <none>           <none>
    web-backend-f78d7d444-5ln65   0/1     Pending   0          7s    <none>        <none>           <none>           <none>
    web-backend-f78d7d444-6qbqr   1/1     Running   0          51m   10.244.1.6    ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-7dtq2   1/1     Running   0          53m   10.244.1.2    ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-9dql6   1/1     Running   0          17m   10.244.2.7    ocne-worker-02   <none>           <none>
    web-backend-f78d7d444-bf6zj   1/1     Running   0          51m   10.244.1.10   ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-g9b27   1/1     Running   0          51m   10.244.1.8    ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-gjp58   1/1     Running   0          51m   10.244.1.3    ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-kmpsd   1/1     Running   0          51m   10.244.1.5    ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-m78pp   1/1     Running   0          51m   10.244.1.9    ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-nqkjh   0/1     Pending   0          7s    <none>        <none>           <none>           <none>
    web-backend-f78d7d444-nssdm   1/1     Running   0          51m   10.244.1.7    ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-p46bt   1/1     Running   0          17m   10.244.2.4    ocne-worker-02   <none>           <none>
    web-backend-f78d7d444-pj5xg   0/1     Pending   0          7s    <none>        <none>           <none>           <none>
    web-backend-f78d7d444-sddk6   1/1     Running   0          51m   10.244.1.11   ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-szjsg   1/1     Running   0          17m   10.244.2.6    ocne-worker-02   <none>           <none>
    web-backend-f78d7d444-v4vzn   1/1     Running   0          17m   10.244.2.5    ocne-worker-02   <none>           <none>
    web-backend-f78d7d444-vrxcc   0/1     Pending   0          7s    <none>        <none>           <none>           <none>
    web-backend-f78d7d444-zhb8g   1/1     Running   0          51m   10.244.1.4    ocne-worker-01   <none>           <none>

    Notice that the Kubernetes scheduler cannot deploy the five newly requested Pods. So they remain in a Pending status because the required rule states that the Pods must only deploy in the west region. So even though resource is available in the east region the Kubernetes scheduler will not place them there.

So that explains what Node Affinity is and how it works.

Before you move on and look at Pod Affinity, remove the taints you applied and scale the deployment back to a single Pod.

  1. Remove the taints.

    kubectl taint nodes ocne-worker-01 disktype=ssd:NoSchedule-
    kubectl taint nodes ocne-worker-02 disktype=ssd:NoSchedule-
    

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl taint nodes ocne-worker-01 disktype=ssd:NoSchedule-
    node/ocne-worker-01 untainted
    [oracle@ocne-control-01 ~]$ kubectl taint nodes ocne-worker-02 disktype=ssd:NoSchedule-
    node/ocne-worker-02 untainted
  2. Scale the deployment back to a single Pod.

    kubectl scale deploy web-backend --replicas 0
    kubectl scale deploy web-backend --replicas 1
    

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl scale deploy web-backend --replicas 0
    deployment.apps/web-backend scaled
    [oracle@ocne-control-01 ~]$ kubectl scale deploy web-backend --replicas 1
    deployment.apps/web-backend scaled

    Note: The reason why you scaled back to 0 and then up to 1, is to ensure that the single remaining Pod fully complies with both the required and preferred node affinity rules in the deployment YAML file.

  3. Check where the Kubernetes scheduler has scheduled the Pods.

    kubectl get pods -o wide

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl get pods -o wide
    NAME                          READY   STATUS    RESTARTS   AGE   IP            NODE             NOMINATED NODE   READINESS GATES
    web-backend-f78d7d444-swrg7   1/1     Running   0          64s   10.244.1.19   ocne-worker-01   <none>           <none>

    Note: You may notice one, or more, of the deployed pods with a STATUS of Terminating. This is normal and can be ignored. Repeat the command every few seconds and the output will match the example shown above.

  4. Delete the existing deployments ready for the next section.

    kubectl delete deployment web-backend
    

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl delete deployment web-backend
    deployment.apps "web-backend" deleted

What is Pod Affinity?

Pod affinity refers to the way Pods are scheduled depending on the labels the Pods already running have. There are two types of Pod affinity:

  • Pod affinity: A way to attract Pods to other Pods based on their labels.
  • Pod anti-affinity: This is the opposite of pod affinity, and makes Pods avoid other Pods based on their labels.

So think of this as a way to influence how Kubernetes schedules Pods that work better together (pod affinity), or alternatively avoid scheduling Pods together that may affect each other (pod anti-affinity).

Deploy using Pod Affinity

Remember Pod affinity is a way to influence how Pods are scheduled to the cluster's nodes using the labels on Pods already deployed to either attract, or repel, them from a node. This example uses Pod affinity to instruct the Kubernetes scheduler to place the Pod on a node where a specific Pod is already running. This also means that if there isn't a matching Pod already running on a node then the scheduler will not place the Pod on that node. Why is that? Because of the required rule in the deployment YAML file.

  1. Create the deployment YAML file.

    cat << EOF | tee -a pod-affinity.yaml > /dev/null
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: web-gui
      labels:
        app: web-gui
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: web-gui
      template:
        metadata:
          labels:
            app: web-gui
        spec:
          containers:
          - name: web-gui
            image: nginx:latest
          affinity:
            podAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchExpressions:
                  - key: app
                    operator: In
                    values:
                    - web-backend
                topologyKey: "kubernetes.io/hostname"
    EOF
    

    Where:

    The format is very similar to the Node affinity definition seen earlier, except this time the affinity type is podAffinity instead of nodeAffinity. The name and type of application deployed here are solely for illustrative purposes. Instead, the section of this deployment descriptor used to define node affinity is:

    ...
    affinity:
      podAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - west
          topologyKey: "kubernetes.io/hostname"
  2. Deploy the application.

    kubectl apply -f node-affinity.yaml
    kubectl apply -f pod-affinity.yaml
    
  3. Confirm where the Pods have deployed.

    kubectl get pods -o wide

    Example Output:

    NAME                          READY   STATUS    RESTARTS   AGE   IP           NODE             NOMINATED NODE   READINESS GATES
    web-backend-f78d7d444-xjmj7   1/1     Running   0          153   10.244.1.4   ocne-worker-01   <none>           <none>
    web-gui-5b4bb448b9-b4skg      1/1     Running   0          14s   10.244.1.5   ocne-worker-01   <none>           <none>

    Notice that the web-gui and web-backend Pods are running on the same node ('ocne-worker-01') as expected.

Scale Up the Amount of Pods

Now let's try scaling up the number of both 'backend' and 'gui' Pods to see if the behavior is consistent.

  1. Scale up the number of Pods deployed.

    kubectl scale deploy web-backend --replicas 2
    kubectl scale deploy web-gui --replicas 2
    
  2. Check where the Kubernetes scheduler has scheduled the Pods.

    kubectl get pods -o wide

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl get pods -o wide
    NAME                          READY   STATUS    RESTARTS   AGE    IP           NODE             NOMINATED NODE   READINESS GATES
    web-backend-f78d7d444-8wbwf   1/1     Running   0          5s     10.244.1.6   ocne-worker-01   <none>           <none>
    web-backend-f78d7d444-xjmj7   1/1     Running   0          17m    10.244.1.4   ocne-worker-01   <none>           <none>
    web-gui-5b4bb448b9-b4skg      1/1     Running   0          115s   10.244.1.5   ocne-worker-01   <none>           <none>
    web-gui-5b4bb448b9-jpwbx      1/1     Running   0          5s     10.244.1.7   ocne-worker-01   <none>           <none>

    Notice that the Kubernetes scheduler continued to behave as expected by placing all the newly created Pods onto the 'ocne-worker-01' node.

This provided a simple example illustrating how Pod affinity works. However, running multiple copies of the same Pod on the same node may result in reduced performance. Can this effect be mitigated? Yes, anti-affinity can help by providing a way to prevent multiple copies of the same Pod from running on the same node.

  1. Delete the existing deployments ready for the next section.

    kubectl delete deployment web-backend
    kubectl delete deployment web-gui
    

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl delete deployment web-backend
    deployment.apps "web-backend" deleted
    [oracle@ocne-control-01 ~]$ kubectl delete deployment web-gui
    deployment.apps "web-gui" deleted
    

What is Pod Anti-affinity?

Pod anti-affinity is a way of defining a rule allowing you to prevent Pods deploying on nodes based on the labels of other Pods on tht node.

  1. Remove the old deployment YAML file and create a new one.

    rm node-affinity.yaml
    cat << EOF | tee -a node-affinity.yaml > /dev/null
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: web-backend
      labels:
        app: web-backend
    spec:
      selector:
        matchLabels:
          app: web-backend
      replicas: 1
      template:
        metadata:
          labels:
            app: web-backend
        spec:
          containers:
          - name: web-backend
            image: nginx:latest
          affinity:
            podAntiAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchExpressions:
                  - key: app
                    operator: In
                    values:
                    - web-backend
                topologyKey: "kubernetes.io/hostname"
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: region
                    operator: In
                    values:
                    - west
              preferredDuringSchedulingIgnoredDuringExecution:
              - weight: 1
                preference:
                  matchExpressions:
                  - key: disktype
                    operator: In
                    values:
                    - ssd
    EOF
    

    Where:

    The name and type of application deployed here are solely for illustrative purposes. Instead, the section of this deployment descriptor used to define Pod anti-affinity is:

    ...
    affinity:
      podAntiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          labelSelector:
          - matchExpressions:
            - key: app
              operator: In
              values:
              - web-backend
  2. Deploy the application.

    kubectl apply -f node-affinity.yaml
    kubectl apply -f pod-affinity.yaml
    
  3. Confirm where the Pods have deployed.

    kubectl get pods -o wide

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl get pods -o wide
    NAME                           READY   STATUS    RESTARTS   AGE   IP           NODE             NOMINATED NODE   READINESS GATES
    web-backend-6997486dc9-n62j8   1/1     Running   0          47s   10.244.1.8   ocne-worker-01   <none>           <none>
    web-gui-5b4bb448b9-85hqm       1/1     Running   0          47s   10.244.1.9   ocne-worker-01   <none>           <none>

    Notice the web-gui and web-backend Pods deploy on the same node ('ocne-worker-01').

Scale Up the Number of Pods

Now let's try scaling up the number of both 'backend' and 'gui' Pods to see if the behavior is consistent.

  1. Scale up the number of Pods deployed.

    kubectl scale deploy web-backend --replicas 2
    kubectl scale deploy web-gui --replicas 2
    
  2. Check where the Kubernetes scheduler has scheduled the Pods.

    kubectl get pods -o wide

    Example Output:

    [oracle@ocne-control-01 ~]$ kubectl get pods -o wide
    NAME                           READY   STATUS    RESTARTS   AGE    IP           NODE             NOMINATED NODE   READINESS GATES
    web-backend-6997486dc9-f9g9x   1/1     Running   0          4s     10.244.2.3   ocne-worker-02   <none>           <none>
    web-backend-6997486dc9-n62j8   1/1     Running   0          115s   10.244.1.8   ocne-worker-01   <none>           <none>
    web-gui-5b4bb448b9-2ztz7       1/1     Running   0          4s     10.244.2.4   ocne-worker-02   <none>           <none>
    web-gui-5b4bb448b9-85hqm       1/1     Running   0          115s   10.244.1.9   ocne-worker-01   <none>           <none>

    Notice that the Kubernetes scheduler deployed one pair of Pods to the ssd node and another pair to a non-ssd node, leaving one pair in a Pending state. This is because the required and the preferred rules only allow one set of the application Pods (web-backend and web-gui) to run on each node. The only way to run three copies of the application in this example is to do one of the following:

    • Increase the number of nodes in the 'west' region

    or

    • Alter the required and the preferred rules to allow the Kubernetes scheduler to schedule the new Pods to the east region.

Summary

This tutorial showed how using affinity provides a way to introduce flexibility and control over how your applications deploy on a Kubernetes cluster. Unlike labels which are 'hard' rules applied to both nodes and Pods, affinity uses its extra rule flexibility to provide administrators with a way to influence how the Kubernetes scheduler reacts to changes in the cluster's environment. You have also seen how the affinity rules determine how the scheduler selects a preferred node, and an alternative node if the preferred node is unavailable.

This concludes the walkthrough introducing Affinity and demonstrating how it can help to manage your application deployments.

For More Information

SSR