Use Horizontal Pod Autoscaling with Oracle Cloud Native Environment

Introduction

The HorizontalPodAutoscaler (HPA) is the Kubernetes object for managing horizontal pod autoscaling. We refer to it as horizontal scaling because it allows Kubernetes to automatically deploy more Pods to match any variations to the incoming workload, such as the number of users making requests. Once the demand abates, HPA can scale the number of deployed Pods back down again. Kubernetes administrators can apply HPA to workload resources such as a Deployment or a StatefulSet as they include support for the scale subresource . Not all Kubernetes objects support the scale subresource, one of which is a DaemonSet. A scale subresource allows the number of replicas to be altered depending on their current state.

Benefits of HPA

Kubernetes Horizontal Pod Autoscaling helps by providing the following:

Automatically increase the number of Pods deployed to meet sustained changes to incoming workload.
Contributes to cost savings by automatically reducing the number of Pods as external demand reduces.
Monitors the Kubernetes cluster's metrics to ensure deployed applications remain available.
Allows you to configure expected capacity needs for both normal and busy workloads.

Limitations of HPA

Some limitations of the Kubernetes Horizontal Pod Autoscaling include:

It does not work with a DaemonSet.
It only works if your cluster has sufficient resources, allowing it to add more Pods.
HPA cannot scale the Kubernetes cluster nodes if the current cluster runs out of capacity.
The CPU and Memory limits need to be tuned to prevent Pods from unexpectedly terminating if you set them too low or, conversely, wasting resources that could be used for other deployments if they are set too high.
It does not consider other potential bottlenecks, such as network I/O, disk I/O, or disk space.

Given these benefits and limitations, Kubernetes HPA is best used to provide automation for predictable variations in demand, but the overall cluster health still needs to be monitored.

While similar to HPA, Kubernetes also provides a mechanism called Vertical Pod Autoscaling (VPA), but administrators would use it to adjust the available CPU and Memory resources to existing Pods in a deployment rather than changing the number of Pods deployed. Upstream documentation recommends that administrators not use HPA and VPA on the same resource metric.

Objectives

In this tutorial, you will learn to:

Install Kubernetes Metric Server
Use Horizontal Pod Autoscaling to enable deployments to respond to a changing workload

Prerequisites

Installation of Oracle Cloud Native Environment
- a single control node and one worker node

Deploy Oracle Cloud Native Environment

Note: If running in your own tenancy, read the linux-virt-labs GitHub project README.md and complete the prerequisites before deploying the lab environment.

Open a terminal on the Luna Desktop.

Clone the linux-virt-labs GitHub project.

git clone https://github.com/oracle-devrel/linux-virt-labs.git

Change into the working directory.
```
cd linux-virt-labs/ocne2
```

Install the required collections.

ansible-galaxy collection install -r requirements.yml

Deploy the lab environment.
```
ansible-playbook create_instance.yml -e localhost_python_interpreter="/usr/bin/python3.6" -e install_ocne_rpm=true -e create_ocne_cluster=true -e "ocne_cluster_node_options='-n 1 -w 1'"
```
The free lab environment requires the extra variable local_python_interpreter, which sets ansible_python_interpreter for plays running on localhost. This variable is needed because the environment installs the RPM package for the Oracle Cloud Infrastructure SDK for Python, located under the python3.6 modules.
The default deployment shape uses the AMD CPU and Oracle Linux 8. To use an Intel CPU or Oracle Linux 9, add -e instance_shape="VM.Standard3.Flex" or -e os_version="9" to the deployment command.
Important: Wait for the playbook to run successfully and reach the pause task. At this stage of the playbook, the installation of Oracle Cloud Native Environment is complete, and the instances are ready. Take note of the previous play, which prints the public and private IP addresses of the nodes it deploys and any other deployment information needed while running the lab.

Access the Kubernetes Cluster

It helps to know the number and names of nodes in your Kubernetes cluster.

Open a terminal and connect via SSH to the ocne instance.
```
ssh oracle@<ip_address_of_node>
```
Show the two nodes and verify that they are running.
```
kubectl get nodes
```

Install the Metrics Server

HPA leverages the resource usage metrics from the metrics.k8s.io Metrics API, which the Metrics Server usually provides. HPA uses these metrics to decide whether to scale the number of Pods up or down based on the rules you defined in the HPA YAML definition file. The Metrics server records CPU and memory data for the Kubernetes cluster's nodes and Pods, which HPA then uses to determine whether to scale in or out.

HPA currently supports two API versions:

Version 1 (autoscaling.V1 API) - tracks the Metrics Server's CPU utilization data.
Version 2 (autoscaling/V2 API) - adds support for scaling based on memory usage, custom and external metrics.

Check whether the Metrics API is available.
```
kubectl get apiservices | grep metrics.k8s.io
```
The lack of output confirms that the Metrics API server is unavailable by default on an Oracle Cloud Native Environment cluster.

Deploy the Metrics Server.

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Patch the deployment to trust the self-signed X509 certificates used in the default install.

kubectl patch deployment metrics-server -n kube-system --type 'json' -p '[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'

Confirm the Metrics Server Pod is running.
```
kubectl get pods -w -A | grep metrics
```
Wait for the metrics-server Pod to stabilize and report its STATUS as Running. Enter Ctrl-C to exit the watch command.

Confirm the Metrics API is available.

This command is an alternative to using kubectl get apiservices.

kubectl get --raw "/apis/metrics.k8s.io/" | jq

Example Output:

[oracle@ocne ~]$ kubectl get --raw "/apis/metrics.k8s.io/" | jq
{
  "kind": "APIGroup",
  "apiVersion": "v1",
  "name": "metrics.k8s.io",
  "versions": [
    {
      "groupVersion": "metrics.k8s.io/v1beta1",
      "version": "v1beta1"
    }
  ],
  "preferredVersion": {
    "groupVersion": "metrics.k8s.io/v1beta1",
    "version": "v1beta1"
  }
}

Confirm the reporting of metrics.

Show the metrics for the Pods within the cluster.

kubectl top pods -A

Example Output:

[oracle@ocne ~]$ kubectl top pods -A
NAMESPACE      NAME                                           CPU(cores)   MEMORY(bytes)   
kube-flannel   kube-flannel-ds-78vdr                          15m          15Mi            
kube-flannel   kube-flannel-ds-jwx9h                          16m          16Mi            
kube-system    coredns-f7d444b54-7p4j2                        2m           15Mi            
kube-system    coredns-f7d444b54-m5pm6                        2m           15Mi            
kube-system    etcd-ocne-control-plane-1                      16m          30Mi            
kube-system    kube-apiserver-ocne-control-plane-1            44m          193Mi           
kube-system    kube-controller-manager-ocne-control-plane-1   15m          56Mi            
kube-system    kube-proxy-dlz2l                               1m           18Mi            
kube-system    kube-proxy-s89gq                               1m           19Mi            
kube-system    kube-scheduler-ocne-control-plane-1            3m           21Mi            
kube-system    metrics-server-b79d5c976-vz8p4                 3m           19Mi            
ocne-system    ocne-catalog-578c959566-88vff                  1m           5Mi             
ocne-system    ui-84dd57ff69-gtrgf                            1m           14Mi

Next, show the metrics for the nodes.

kubectl top nodes

Example Output:

[oracle@ocne ~]$ kubectl top nodes
NAME                   CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
ocne-control-plane-1   264m         13%    713Mi           20%       
ocne-worker-1          34m          1%     373Mi           10%

Deploy an Application

You can deploy any containerized application as long as the YAML manifest file includes a resource limitation or request parameter. In this example, you will deploy a web server and define resource limits for both limit: and requests: types.

Create the application's deployment definition and service configuration file.

cat << EOF | tee hpa-demo.yaml > /dev/null
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hpa-demo
spec:
   selector:
      matchLabels:
         run: hpa-demo
   template:
     metadata:
       labels:
         run: hpa-demo
     spec:
       containers:
       - name: hpa-demo
         image: k8s.gcr.io/hpa-example
         ports:
         - containerPort: 80
         resources:
           limits:
             cpu: 500m
           requests:
             cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
   name: hpa-demo
   labels:
      run: hpa-demo
spec:
  ports:
  - port: 80
  selector:
    run: hpa-demo
EOF

Deploy the application and service file.
```
kubectl apply -f ./hpa-demo.yaml
```

Confirm the successful creation of the deployment and service.

kubectl get deploy,svc

Example Output:

[oracle@ocne ~]$ kubectl get deploy,svc
NAME                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/hpa-demo   1/1     1            1           2m58s

NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
service/hpa-demo     ClusterIP   10.106.17.104   <none>        80/TCP    2m58s
service/kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP   46m

Repeat the command until you see the deployment hpa-demo show 1/1 as READY and 1 AVAILABLE.

Create the HPA

This example tells Kubernetes to scale the targeted deployment to use between one and five Pods and maintain an average CPU of 50% utilization for the monitored deployment.

Create the HPA configuration file.

cat << EOF | tee hpa-test.yaml > /dev/null
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-test
spec:
   scaleTargetRef:
     apiVersion: apps/v1
     kind: Deployment
     name: hpa-demo
   minReplicas: 1
   maxReplicas: 5
   targetCPUUtilizationPercentage: 50
EOF

Where:

scaleTargetRef: The deployment monitored by the HPA
minReplicas: The minimum number of Pods running
maxReplicas: The maximum number of Pods to scale up to
targetCPUUtilizationPercentage: The CPU percentage when the HPA will begin to scale up.

Apply the HPA.

kubectl apply -f hpa-test.yaml -n default

Verify the HPA deployment before increasing the load.
```
kubectl get hpa -n default -w
```
Wait for the TARGETS to show cpu: 0%/50% and then enter `Ctrl-C to exit the command.

Increase the Load

Increase the load on the cluster so you can see how the HPA responds and manages the number of deployed Pods.

Open a new terminal and connect via SSH to the ocne instance.
```
ssh oracle@<ip_address_of_node>
```

Increase the CPU load.

kubectl -n default run -i --tty load-generator --rm --image=ghcr.io/hlesey/busybox:latest --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://hpa-demo; done"

The command starts a busybox container and uses it to submit an infinite loop to query the hpa-demo service and print OK! repeatedly. Leave it running for now.

Monitor the HPA Under Load

Switch to the previous terminal window connected to the ocne instance.

Watch the HPA increase the number of replicas to match the increasing load.

kubectl get hpa -w -n default

Enter Ctrl-C to exit the watch command after the HPA scales up to the configured maximum of five Pods.

The -w option watches the kubectl output and prints changes to the terminal as they occur.

Example Output:

[oracle@ocne ~]$ kubectl get hpa -w -n default
NAME       REFERENCE             TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
hpa-test   Deployment/hpa-demo   cpu: 250%/50%   1         5         2          4h10m
hpa-test   Deployment/hpa-demo   cpu: 169%/50%   1         5         4          4h10m
hpa-test   Deployment/hpa-demo   cpu: 115%/50%   1         5         5          4h10m
hpa-test   Deployment/hpa-demo   cpu: 64%/50%    1         5         5          4h10m
hpa-test   Deployment/hpa-demo   cpu: 70%/50%    1         5         5          4h10m

Confirm the number of Pods scaled as expected.
```
kubectl get deployment hpa-demo
```
Check the Pod usage metrics.
```
kubectl top pods -A | grep hpa-demo
```
You should see five instances of the Pod running.
Check for any issues.
```
kubectl get events -n default
```
Check the default namespace, which is where the autoscaled Pod deploys. This output is useful if you need to troubleshoot any issues.

Review how HPA performed by checking the HPA deployment events.

kubectl describe deploy hpa-demo

Look at the Events: section to see how the cluster scaled up the deployed Pods in response to the increased load.

Example Output:

[oracle@ocne ~]$ kubectl describe deploy hpa-demo
Name:                   hpa-demo
Namespace:              default
CreationTimestamp:      Wed, 04 Dec 2024 14:23:08 +0000
...
NewReplicaSet:   hpa-demo-5b4cb5d744 (5/5 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  15m   deployment-controller  Scaled up replica set hpa-demo-5b4cb5d744 to 1
  Normal  ScalingReplicaSet  12m   deployment-controller  Scaled up replica set hpa-demo-5b4cb5d744 to 3 from 1
  Normal  ScalingReplicaSet  12m   deployment-controller  Scaled up replica set hpa-demo-5b4cb5d744 to 5 from 3

Decrease the Load to Scale Down

Switch to the terminal window connected to the ocne instance running the busybox container.
Stop the CPU load by entering Ctrl-C to exit the container.
Switch to the previous terminal window connected to the ocne instance.

Verify the HPA scales down.

kubectl get hpa -w

Wait for the REPLICAS to reach 1, indicating the scale-down process is complete. Enter Ctrl-C to exit the watch command.

Example Output:

[oracle@ocne ~]$ kubectl get hpa -w
NAME       REFERENCE             TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
hpa-test   Deployment/hpa-demo   cpu: 0%/50%   1         5         5          53m
hpa-test   Deployment/hpa-demo   cpu: 0%/50%   1         5         5          56m
hpa-test   Deployment/hpa-demo   cpu: 0%/50%   1         5         5          56m
hpa-test   Deployment/hpa-demo   cpu: 0%/50%   1         5         1          56m

Note: The scale-down process time is controlled by Kubernetes and it's time to completion can vary.

Get a list of scaling events.
```
kubectl get event | grep -i scale
```
You can also use the kubectl describe deploy hpa-demo command to review the scaling events.

Next Steps

Being able to configure your deployments using horizontal pod scaling to respond to predictable variations in demand allows you to configure the Kubernetes cluster to meet any changes in demand. Continue to expand your knowledge in Kubernetes and Oracle Cloud Native Environment by taking a look at our other tutorials posted to the Oracle Linux Training Station.