Use Horizontal Pod Autoscaling with Oracle Cloud Native Environment
Introduction
The HorizontalPodAutoscaler (HPA) is the Kubernetes object for managing horizontal pod autoscaling. We refer to it as horizontal scaling because it allows Kubernetes to automatically deploy more Pods to match any variations to the incoming workload, such as the number of users making requests. Once the demand abates, HPA can scale the number of deployed Pods back down again. Kubernetes administrators can apply HPA to workload resources such as a Deployment or a StatefulSet as they include support for the scale
subresource . Not all Kubernetes objects support the scale
subresource, one of which is a DaemonSet. A scale
subresource allows the number of replicas to be altered depending on their current state.
Benefits of HPA
Kubernetes Horizontal Pod Autoscaling helps by providing the following:
- Automatically increase the number of Pods deployed to meet sustained changes to incoming workload.
- Contributes to cost savings by automatically reducing the number of Pods as external demand reduces.
- Monitors the Kubernetes cluster's metrics to ensure deployed applications remain available.
- Allows you to configure expected capacity needs for both normal and busy workloads.
Limitations of HPA
Some limitations of the Kubernetes Horizontal Pod Autoscaling include:
- It does not work with a DaemonSet.
- It only works if your cluster has sufficient resources, allowing it to add more Pods.
- HPA cannot scale the Kubernetes cluster nodes if the current cluster runs out of capacity.
- The CPU and Memory limits need to be tuned to prevent Pods from unexpectedly terminating if you set them too low or, conversely, wasting resources that could be used for other deployments if they are set too high.
- It does not consider other potential bottlenecks, such as network I/O, disk I/O, or disk space.
Given these benefits and limitations, Kubernetes HPA is best used to provide automation for predictable variations in demand, but the overall cluster health still needs to be monitored.
While similar to HPA, Kubernetes also provides a mechanism called Vertical Pod Autoscaling (VPA), but administrators would use it to adjust the available CPU and Memory resources to existing Pods in a deployment rather than changing the number of Pods deployed. Upstream documentation recommends that administrators not use HPA and VPA on the same resource metric.
Objectives
In this tutorial, you will learn to:
- Install Kubernetes Metric Server
- Use Horizontal Pod Autoscaling to enable deployments to respond to a changing workload
Prerequisites
- Installation of Oracle Cloud Native Environment
- a single control node and one worker node
Deploy Oracle Cloud Native Environment
Note: If running in your own tenancy, read the linux-virt-labs
GitHub project README.md and complete the prerequisites before deploying the lab environment.
Open a terminal on the Luna Desktop.
Clone the
linux-virt-labs
GitHub project.git clone https://github.com/oracle-devrel/linux-virt-labs.git
Change into the working directory.
cd linux-virt-labs/ocne2
Install the required collections.
ansible-galaxy collection install -r requirements.yml
Deploy the lab environment.
ansible-playbook create_instance.yml -e localhost_python_interpreter="/usr/bin/python3.6" -e install_ocne_rpm=true -e create_ocne_cluster=true -e "ocne_cluster_node_options='-n 1 -w 1'"
The free lab environment requires the extra variable
local_python_interpreter
, which setsansible_python_interpreter
for plays running on localhost. This variable is needed because the environment installs the RPM package for the Oracle Cloud Infrastructure SDK for Python, located under the python3.6 modules.The default deployment shape uses the AMD CPU and Oracle Linux 8. To use an Intel CPU or Oracle Linux 9, add
-e instance_shape="VM.Standard3.Flex"
or-e os_version="9"
to the deployment command.Important: Wait for the playbook to run successfully and reach the pause task. At this stage of the playbook, the installation of Oracle Cloud Native Environment is complete, and the instances are ready. Take note of the previous play, which prints the public and private IP addresses of the nodes it deploys and any other deployment information needed while running the lab.
Access the Kubernetes Cluster
It helps to know the number and names of nodes in your Kubernetes cluster.
Open a terminal and connect via SSH to the ocne instance.
ssh oracle@<ip_address_of_node>
Show the two nodes and verify that they are running.
kubectl get nodes
Install the Metrics Server
HPA leverages the resource usage metrics from the metrics.k8s.io
Metrics API, which the Metrics Server usually provides. HPA uses these metrics to decide whether to scale the number of Pods up or down based on the rules you defined in the HPA YAML definition file. The Metrics server records CPU and memory data for the Kubernetes cluster's nodes and Pods, which HPA then uses to determine whether to scale in or out.
HPA currently supports two API versions:
- Version 1 (autoscaling.V1 API) - tracks the Metrics Server's CPU utilization data.
- Version 2 (autoscaling/V2 API) - adds support for scaling based on memory usage, custom and external metrics.
Check whether the Metrics API is available.
kubectl get apiservices | grep metrics.k8s.io
The lack of output confirms that the Metrics API server is unavailable by default on an Oracle Cloud Native Environment cluster.
Deploy the Metrics Server.
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Patch the deployment to trust the self-signed X509 certificates used in the default install.
kubectl patch deployment metrics-server -n kube-system --type 'json' -p '[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'
Confirm the Metrics Server Pod is running.
kubectl get pods -w -A | grep metrics
Wait for the metrics-server Pod to stabilize and report its STATUS as Running. Enter
Ctrl-C
to exit the watch command.Confirm the Metrics API is available.
This command is an alternative to using
kubectl get apiservices
.kubectl get --raw "/apis/metrics.k8s.io/" | jq
Example Output:
[oracle@ocne ~]$ kubectl get --raw "/apis/metrics.k8s.io/" | jq { "kind": "APIGroup", "apiVersion": "v1", "name": "metrics.k8s.io", "versions": [ { "groupVersion": "metrics.k8s.io/v1beta1", "version": "v1beta1" } ], "preferredVersion": { "groupVersion": "metrics.k8s.io/v1beta1", "version": "v1beta1" } }
Confirm the reporting of metrics.
Show the metrics for the Pods within the cluster.
kubectl top pods -A
Example Output:
[oracle@ocne ~]$ kubectl top pods -A NAMESPACE NAME CPU(cores) MEMORY(bytes) kube-flannel kube-flannel-ds-78vdr 15m 15Mi kube-flannel kube-flannel-ds-jwx9h 16m 16Mi kube-system coredns-f7d444b54-7p4j2 2m 15Mi kube-system coredns-f7d444b54-m5pm6 2m 15Mi kube-system etcd-ocne-control-plane-1 16m 30Mi kube-system kube-apiserver-ocne-control-plane-1 44m 193Mi kube-system kube-controller-manager-ocne-control-plane-1 15m 56Mi kube-system kube-proxy-dlz2l 1m 18Mi kube-system kube-proxy-s89gq 1m 19Mi kube-system kube-scheduler-ocne-control-plane-1 3m 21Mi kube-system metrics-server-b79d5c976-vz8p4 3m 19Mi ocne-system ocne-catalog-578c959566-88vff 1m 5Mi ocne-system ui-84dd57ff69-gtrgf 1m 14Mi
Next, show the metrics for the nodes.
kubectl top nodes
Example Output:
[oracle@ocne ~]$ kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ocne-control-plane-1 264m 13% 713Mi 20% ocne-worker-1 34m 1% 373Mi 10%
Deploy an Application
You can deploy any containerized application as long as the YAML manifest file includes a resource limitation or request parameter. In this example, you will deploy a web server and define resource limits for both limit: and requests: types.
Create the application's deployment definition and service configuration file.
cat << EOF | tee hpa-demo.yaml > /dev/null apiVersion: apps/v1 kind: Deployment metadata: name: hpa-demo spec: selector: matchLabels: run: hpa-demo template: metadata: labels: run: hpa-demo spec: containers: - name: hpa-demo image: k8s.gcr.io/hpa-example ports: - containerPort: 80 resources: limits: cpu: 500m requests: cpu: 200m --- apiVersion: v1 kind: Service metadata: name: hpa-demo labels: run: hpa-demo spec: ports: - port: 80 selector: run: hpa-demo EOF
Deploy the application and service file.
kubectl apply -f ./hpa-demo.yaml
Confirm the successful creation of the deployment and service.
kubectl get deploy,svc
Example Output:
[oracle@ocne ~]$ kubectl get deploy,svc NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/hpa-demo 1/1 1 1 2m58s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/hpa-demo ClusterIP 10.106.17.104 <none> 80/TCP 2m58s service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 46m
Repeat the command until you see the deployment hpa-demo show 1/1 as READY and 1 AVAILABLE.
Create the HPA
This example tells Kubernetes to scale the targeted deployment to use between one and five Pods and maintain an average CPU of 50% utilization for the monitored deployment.
Create the HPA configuration file.
cat << EOF | tee hpa-test.yaml > /dev/null apiVersion: autoscaling/v1 kind: HorizontalPodAutoscaler metadata: name: hpa-test spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: hpa-demo minReplicas: 1 maxReplicas: 5 targetCPUUtilizationPercentage: 50 EOF
Where:
scaleTargetRef:
The deployment monitored by the HPAminReplicas:
The minimum number of Pods runningmaxReplicas:
The maximum number of Pods to scale up totargetCPUUtilizationPercentage:
The CPU percentage when the HPA will begin to scale up.
Apply the HPA.
kubectl apply -f hpa-test.yaml -n default
Verify the HPA deployment before increasing the load.
kubectl get hpa -n default -w
Wait for the TARGETS to show cpu: 0%/50% and then enter `Ctrl-C to exit the command.
Increase the Load
Increase the load on the cluster so you can see how the HPA responds and manages the number of deployed Pods.
Open a new terminal and connect via SSH to the ocne instance.
ssh oracle@<ip_address_of_node>
Increase the CPU load.
kubectl -n default run -i --tty load-generator --rm --image=ghcr.io/hlesey/busybox:latest --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://hpa-demo; done"
The command starts a busybox container and uses it to submit an infinite loop to query the hpa-demo service and print OK! repeatedly. Leave it running for now.
Monitor the HPA Under Load
Switch to the previous terminal window connected to the ocne instance.
Watch the HPA increase the number of replicas to match the increasing load.
kubectl get hpa -w -n default
Enter
Ctrl-C
to exit the watch command after the HPA scales up to the configured maximum of five Pods.The
-w
option watches thekubectl
output and prints changes to the terminal as they occur.Example Output:
[oracle@ocne ~]$ kubectl get hpa -w -n default NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE hpa-test Deployment/hpa-demo cpu: 250%/50% 1 5 2 4h10m hpa-test Deployment/hpa-demo cpu: 169%/50% 1 5 4 4h10m hpa-test Deployment/hpa-demo cpu: 115%/50% 1 5 5 4h10m hpa-test Deployment/hpa-demo cpu: 64%/50% 1 5 5 4h10m hpa-test Deployment/hpa-demo cpu: 70%/50% 1 5 5 4h10m
Confirm the number of Pods scaled as expected.
kubectl get deployment hpa-demo
Check the Pod usage metrics.
kubectl top pods -A | grep hpa-demo
You should see five instances of the Pod running.
Check for any issues.
kubectl get events -n default
Check the default namespace, which is where the autoscaled Pod deploys. This output is useful if you need to troubleshoot any issues.
Review how HPA performed by checking the HPA deployment events.
kubectl describe deploy hpa-demo
Look at the
Events:
section to see how the cluster scaled up the deployed Pods in response to the increased load.Example Output:
[oracle@ocne ~]$ kubectl describe deploy hpa-demo Name: hpa-demo Namespace: default CreationTimestamp: Wed, 04 Dec 2024 14:23:08 +0000 ... NewReplicaSet: hpa-demo-5b4cb5d744 (5/5 replicas created) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 15m deployment-controller Scaled up replica set hpa-demo-5b4cb5d744 to 1 Normal ScalingReplicaSet 12m deployment-controller Scaled up replica set hpa-demo-5b4cb5d744 to 3 from 1 Normal ScalingReplicaSet 12m deployment-controller Scaled up replica set hpa-demo-5b4cb5d744 to 5 from 3
Decrease the Load to Scale Down
Switch to the terminal window connected to the ocne instance running the busybox container.
Stop the CPU load by entering
Ctrl-C
to exit the container.Switch to the previous terminal window connected to the ocne instance.
Verify the HPA scales down.
kubectl get hpa -w
Wait for the REPLICAS to reach 1, indicating the scale-down process is complete. Enter
Ctrl-C
to exit the watch command.Example Output:
[oracle@ocne ~]$ kubectl get hpa -w NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE hpa-test Deployment/hpa-demo cpu: 0%/50% 1 5 5 53m hpa-test Deployment/hpa-demo cpu: 0%/50% 1 5 5 56m hpa-test Deployment/hpa-demo cpu: 0%/50% 1 5 5 56m hpa-test Deployment/hpa-demo cpu: 0%/50% 1 5 1 56m
Note: The scale-down process time is controlled by Kubernetes and it's time to completion can vary.
Get a list of scaling events.
kubectl get event | grep -i scale
You can also use the
kubectl describe deploy hpa-demo
command to review the scaling events.
Next Steps
Being able to configure your deployments using horizontal pod scaling to respond to predictable variations in demand allows you to configure the Kubernetes cluster to meet any changes in demand. Continue to expand your knowledge in Kubernetes and Oracle Cloud Native Environment by taking a look at our other tutorials posted to the Oracle Linux Training Station.