Run Control Group Version 2 on Oracle Linux
Introduction
Control Group (cgroup) is a Linux kernel feature for limiting, prioritizing, and allocating resources such as CPU time, memory, and network bandwidth for running processes.
This tutorial guides you through limiting the CPU time for user processes using cgroup v2.
Objectives
In this tutorial, you will learn how to:
- Enable control group version 2
- Set a soft CPU limit for a user process
- Set a hard CPU limit for a user process
Prerequisites
Minimum of a single Oracle Linux system
Each system should have Oracle Linux installed and configured with:
- A non-root user account with sudo access
- Access to the Internet
Deploy Oracle Linux
Note: If running in your own tenancy, read the linux-virt-labs
GitHub project README.md and complete the prerequisites before deploying the lab environment.
Open a terminal on the Luna Desktop.
Clone the
linux-virt-labs
GitHub project.git clone https://github.com/oracle-devrel/linux-virt-labs.git
Change into the working directory.
cd linux-virt-labs/ol
Install the required collections.
ansible-galaxy collection install -r requirements.yml
Deploy the lab environment.
ansible-playbook create_instance.yml -e localhost_python_interpreter="/usr/bin/python3.6"
The free lab environment requires the extra variable
local_python_interpreter
, which setsansible_python_interpreter
for plays running on localhost. This variable is needed because the environment installs the RPM package for the Oracle Cloud Infrastructure SDK for Python, located under the python3.6 modules.The default deployment shape uses the AMD CPU and Oracle Linux 8. To use an Intel CPU or Oracle Linux 9, add
-e instance_shape="VM.Standard3.Flex"
or-e os_version="9"
to the deployment command.Important: Wait for the playbook to run successfully and reach the pause task. At this stage of the playbook, the installation of Oracle Linux is complete, and the instances are ready. Take note of the previous play, which prints the public and private IP addresses of the nodes it deploys and any other deployment information needed while running the lab.
Create a Load-generating Script
Open a terminal and connect via SSH to the ol-node-01 instance.
ssh oracle@<ip_address_of_instance>
Create the
foo.exe
script.echo '#!/bin/bash /usr/bin/sha1sum /dev/zero' > foo.exe
Copy the
foo.exe
script to a location in your$PATH
and set the proper permissions.sudo mv foo.exe /usr/local/bin/foo.exe sudo chown root:root /usr/local/bin/foo.exe sudo chmod 755 /usr/local/bin/foo.exe
Fix the SELinux labels after copying and changing permissions on the
foo.exe
script.sudo /sbin/restorecon -v /usr/local/bin/foo.exe
Note: Oracle Linux runs with SELinux set to enforcing mode by default. You can verify this by running
sudo sestatus
.
Create a Load-generating Service
Create the
foo.service
file.echo '[Unit] Description=the foo service After=network.target [Service] ExecStart=/usr/local/bin/foo.exe [Install] WantedBy=multi-user.target' > foo.service
Copy the
foo.service
script to the default systemd scripts directory and set the proper permissions.sudo mv foo.service /etc/systemd/system/foo.service sudo chown root:root /etc/systemd/system/foo.service sudo chmod 644 /etc/systemd/system/foo.service
Fix the SELinux labels.
sudo /sbin/restorecon -v /etc/systemd/system/foo.service
Reload the systemd daemon so it recognizes the new service.
sudo systemctl daemon-reload
Start
foo.service
and check its status.sudo systemctl start foo.service sudo systemctl status foo.service
Create Users
Additional users will be allowed to run the load-generating script under these different accounts and different CPU weights.
Create users and set passwords.
sudo useradd -u 8000 ralph sudo useradd -u 8001 alice echo "ralph:oracle" | sudo chpasswd echo "alice:oracle" | sudo chpasswd
Allow SSH connections.
Copy the SSH key from the
oracle
user account for the 'ralph' user.sudo mkdir /home/ralph/.ssh sudo cp /home/oracle/.ssh/authorized_keys /home/ralph/.ssh/authorized_keys sudo chown -R ralph:ralph /home/ralph/.ssh sudo chmod 700 /home/ralph/.ssh sudo chmod 600 /home/ralph/.ssh/authorized_keys
Repeat for the
alice
user.sudo mkdir /home/alice/.ssh sudo cp /home/oracle/.ssh/authorized_keys /home/alice/.ssh/authorized_keys sudo chown -R alice:alice /home/alice/.ssh sudo chmod 700 /home/alice/.ssh sudo chmod 600 /home/alice/.ssh/authorized_keys
Open a new terminal and verify both SSH connections work.
ssh -l ralph -o StrictHostKeyChecking=accept-new <ip_address_of_instance> true
The
-o StrictHostKeyChecking=accept-new
option automatically accepts previously unseen keys but will refuse connections for changed or invalid hostkeys. This option is a safer subset of the current behavior of StrictHostKeyChecking=no. Thetrue
command runs on the remote host and always returns a value of 0, which indicates that the SSH connection was successful. If there are no errors, the terminal returns to the command prompt after running the SSH command.Repeat for the other user.
ssh -l alice -o StrictHostKeyChecking=accept-new <ip_address_of_instance> true
Exit the current terminal and switch to the other existing terminal connected to ol-node-01.
Enable Control Group Version 2
Note: Oracle Linux 9 and higher ships with cgroup v2 enabled by default.
For Oracle Linux 8, you must manually configure the boot kernel parameters to enable cgroup v2 as it mounts cgroup v1 by default.
If you are not using Oracle Linux 8, skip to the next section.
Update grub with the cgroup v2 systemd kernel parameter.
sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=1"
You can instead specify only your current boot entry by running
sudo grubby --update-kernel=/boot/vmlinuz-$(uname -r) --args="systemd.unified_cgroup_hierarchy=1"
.Confirm the changes.
cat /etc/default/grub |grep systemd.unified_cgroup_hierarchy
Reboot the instance for the changes to take effect.
sudo systemctl reboot
Note: Wait a few minutes for the instance to restart.
Reconnect to the ol-node-01 instance using SSH.
Verify that Cgroup v2 is Enabled
Check the cgroup controller list.
cat /sys/fs/cgroup/cgroup.controllers
The output should return similar results:
cpuset cpu io memory hugetlb pids rdma
Check the cgroup2 mounted file system.
mount |grep cgroup2
The output should return similar results:
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
Inspect the contents of the cgroup mounted directory.
ll /sys/fs/cgroup
Example output:
total 0 -r--r--r--. 1 root root 0 Mar 13 21:20 cgroup.controllers -rw-r--r--. 1 root root 0 Mar 13 21:20 cgroup.max.depth -rw-r--r--. 1 root root 0 Mar 13 21:20 cgroup.max.descendants -rw-r--r--. 1 root root 0 Mar 13 21:20 cgroup.procs -r--r--r--. 1 root root 0 Mar 13 21:20 cgroup.stat -rw-r--r--. 1 root root 0 Mar 13 21:20 cgroup.subtree_control -rw-r--r--. 1 root root 0 Mar 13 21:20 cgroup.threads -rw-r--r--. 1 root root 0 Mar 13 21:20 cpu.pressure -r--r--r--. 1 root root 0 Mar 13 21:20 cpuset.cpus.effective -r--r--r--. 1 root root 0 Mar 13 21:20 cpuset.mems.effective drwxr-xr-x. 2 root root 0 Mar 13 21:20 init.scope -rw-r--r--. 1 root root 0 Mar 13 21:20 io.pressure -rw-r--r--. 1 root root 0 Mar 13 21:20 memory.pressure drwxr-xr-x. 87 root root 0 Mar 13 21:20 system.slice drwxr-xr-x. 4 root root 0 Mar 13 21:24 user.slice
The output shows the root control group at its default location. The directory contains interface files all prefixed with cgroup and directories related to
systemd
that end in.scope
and.slice
.
Work with the Virtual File System
Before we get started, we need to learn a bit about the cgroup virtual file system mounted at /sys/fs/cgroup
.
Show which CPUs participate in the cpuset for everyone.
cat /sys/fs/cgroup/cpuset.cpus.effective
The output shows a range starting at 0 that indicates the system's effective CPUs, which consist of a combination of CPU cores and threads.
Show which controllers are active.
cat /sys/fs/cgroup/cgroup.controllers
Example output:
cpuset cpu io memory hugetlb pids rdma misc
It's good to see the cpuset controller present as we'll use it later in this tutorial.
Show processes spawned by
oracle
.First, we need to determine
oracle
's user id (UID).who id
Example output:
[oracle@ol-node-01 ~]$ who oracle pts/0 2022-03-13 21:23 (10.39.209.157) [oracle@ol-node-01 ~]$ id uid=1001(oracle) gid=1001(oracle) groups=1001(oracle),10(wheel) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
Using the UID, we can find the
oracle
users slice.cd /sys/fs/cgroup/user.slice ls
Example output:
[oracle@ol-node-01 ~]$ cd /sys/fs/cgroup/user.slice [oracle@ol-node-01 user.slice]$ ls cgroup.controllers cgroup.subtree_control memory.events memory.pressure pids.max cgroup.events cgroup.threads memory.events.local memory.stat user-0.slice cgroup.freeze cgroup.type memory.high memory.swap.current user-1001.slice cgroup.max.depth cpu.pressure memory.low memory.swap.events user-989.slice cgroup.max.descendants cpu.stat memory.max memory.swap.max cgroup.procs io.pressure memory.min pids.current cgroup.stat memory.current memory.oom.group pids.events
Systemd assigns every user a slice named
user-<UID>.slice
. So, what's under that directory?cd user-1001.slice ls
Example output:
[oracle@ol-node-01 user.slice]$ cd user-1001.slice/ [oracle@ol-node-01 user-1001.slice]$ ls cgroup.controllers cgroup.max.descendants cgroup.threads io.pressure user-runtime-dir@1001.service cgroup.events cgroup.procs cgroup.type memory.pressure cgroup.freeze cgroup.stat cpu.pressure session-3.scope cgroup.max.depth cgroup.subtree_control cpu.stat user@1001.service
These are the top-level cgroup for the
oracle
user. However, there are no processes listed incgroup.procs
. So, where is the list of user processes?cat cgroup.procs
Example output:
[oracle@ol-node-01 user-1001.slice]$ cat cgroup.procs [oracle@ol-node-01 user-1001.slice]$
When
oracle
opened the SSH session at the beginning of this tutorial, the user session created a scope sub-unit. Under this scope, we can check thecgroup.procs
for a list of processes spawned under that session.Note: The user might have multiple sessions based on the number of connections to the system; therefore, replace the 3 in the sample below as necessary.
cd session-3.scope ls cat cgroup.procs
Example output:
[oracle@ol-node-01 user-1001.slice]$ cd session-3.scope/ [oracle@ol-node-01 session-3.scope]$ ls cgroup.controllers cgroup.max.depth cgroup.stat cgroup.type io.pressure cgroup.events cgroup.max.descendants cgroup.subtree_control cpu.pressure memory.pressure cgroup.freeze cgroup.procs cgroup.threads cpu.stat [oracle@ol-node-01 session-3.scope]$ cat cgroup.procs 3189 3200 3201 54217
Now that we have found the processes the hard way, we can use
systemd-cgls
to show the same information in a tree-like view.Note: When run from within the virtual filesystem,
systemd-cgls
limits the cgroup output to the current working directory.cd /sys/fs/cgroup/user.slice/user-1001.slice systemd-cgls
Example output:
[oracle@ol-node-01 user-1001.slice]$ systemd-cgls Working directory /sys/fs/cgroup/user.slice/user-1001.slice: ├─session-3.scope │ ├─ 3189 sshd: oracle [priv] │ ├─ 3200 sshd: oracle@pts/0 │ ├─ 3201 -bash │ ├─55486 systemd-cgls │ └─55487 less └─user@1001.service └─init.scope ├─3193 /usr/lib/systemd/systemd --user └─3195 (sd-pam)
Limit the CPU Cores Used
With cgroup v2, systemd has complete control of the cpuset controller. This level of control enables an administrator to schedule work on only a specific CPU core.
Check CPUs for
user.slice
.cd /sys/fs/cgroup/user.slice ls cat ../cpuset.cpus.effective
Example output:
[oracle@ol-node-01 cgroup]$ cd /sys/fs/cgroup/user.slice/ [oracle@ol-node-01 user.slice]$ ls cgroup.controllers cgroup.subtree_control memory.events memory.pressure pids.max cgroup.events cgroup.threads memory.events.local memory.stat user-0.slice cgroup.freeze cgroup.type memory.high memory.swap.current user-1001.slice cgroup.max.depth cpu.pressure memory.low memory.swap.events user-989.slice cgroup.max.descendants cpu.stat memory.max memory.swap.max cgroup.procs io.pressure memory.min pids.current cgroup.stat memory.current memory.oom.group pids.events [oracle@ol-node-01 user.slice]$ cat ../cpuset.cpus.effective 0-1
The
cpuset.cpus.effective
shows the actual cores used by the user.slice. If a parameter does not exist in the specific cgroup directory, or we don't set it, the value gets inherited from the parent, which happens to be the top-level cgroup root directory for this case.Restrict the
system
and user 0, 1001, and 989 slices to CPU core 0.cat /sys/fs/cgroup/system.slice/cpuset.cpus.effective sudo systemctl set-property system.slice AllowedCPUs=0 cat /sys/fs/cgroup/system.slice/cpuset.cpus.effective
Example output:
[oracle@ol-node-01 user.slice]$ cat /sys/fs/cgroup/system.slice/cpuset.cpus.effective cat: /sys/fs/cgroup/system.slice/cpuset.cpus.effective: No such file or directory [oracle@ol-node-01 user.slice]$ sudo systemctl set-property system.slice AllowedCPUs=0 cat: /sys/fs/cgroup/system.slice/cpuset.cpus.effective: No such file or directory 0
Note: The
No such file or directory
indicates that by default, thesystem
slice inherits itscpuset.cpus.effective
value from the parent.sudo systemctl set-property user-0.slice AllowedCPUs=0 sudo systemctl set-property user-1001.slice AllowedCPUs=0 sudo systemctl set-property user-989.slice AllowedCPUs=0
Restrict the
ralph
user to CPU core 1.sudo systemctl set-property user-8000.slice AllowedCPUs=1 cat /sys/fs/cgroup/user.slice/user-8000.slice/cpuset.cpus.effective
Example output:
[oracle@ol-node-01 ~]$ sudo systemctl set-property user-8000.slice AllowedCPUs=1 [oracle@ol-node-01 ~]$ cat /sys/fs/cgroup/user.slice/user-8000.slice/cpuset.cpus.effective 1
Open a new terminal and connect via ssh as
ralph
to the ol-node-01 system.ssh ralph@<ip_address_of_instance>
Test using the
foo.exe
script.foo.exe &
Verify the results.
top
Once
top
is running, hit the1 key
to show the CPUs individually.Example output:
top - 18:23:55 up 21:03, 2 users, load average: 1.03, 1.07, 1.02 Tasks: 155 total, 2 running, 153 sleeping, 0 stopped, 0 zombie %Cpu0 : 6.6 us, 7.0 sy, 0.0 ni, 84.8 id, 0.0 wa, 0.3 hi, 0.3 si, 1.0 st %Cpu1 : 93.0 us, 6.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.7 hi, 0.3 si, 0.0 st MiB Mem : 14707.8 total, 13649.1 free, 412.1 used, 646.6 buff/cache MiB Swap: 4096.0 total, 4096.0 free, 0.0 used. 13993.0 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 226888 ralph 20 0 228492 1808 1520 R 99.7 0.0 199:34.27 sha1sum 269233 root 20 0 223724 6388 1952 S 1.3 0.0 0:00.04 pidstat 1407 root 20 0 439016 41116 39196 S 0.3 0.3 0:17.81 sssd_nss 1935 root 20 0 236032 3656 3156 S 0.3 0.0 0:34.34 OSWatcher 2544 root 20 0 401900 40292 9736 S 0.3 0.3 0:10.62 ruby 1 root 20 0 388548 14716 9508 S 0.0 0.1 0:21.21 systemd ...
Type
q
to quit top.Alternate way to check the processor running a process.
ps -eo pid,psr,user,cmd | grep ralph
Example output:
[ralph@ol-node-01 ~]$ ps -eo pid,psr,user,cmd | grep ralph 226715 1 root sshd: ralph [priv] 226719 1 ralph /usr/lib/systemd/systemd --user 226722 1 ralph (sd-pam) 226727 1 ralph sshd: ralph@pts/2 226728 1 ralph -bash 226887 1 ralph /bin/bash /usr/local/bin/foo.exe 226888 1 ralph /usr/bin/sha1sum /dev/zero 269732 1 ralph ps -eo pid,psr,user,cmd 269733 1 ralph grep --color=auto ralph
The
psr
column is the CPU number of thecmd
or actual process.Exit and close the current terminal and switch to the other existing terminal connected to ol-node-01.
Kill the
foo.exe
job.sudo pkill sha1sum
Adjust the CPU Weight for Users
Time to have alice
join in the fun. She has some critical work to complete, so, we'll give her twice the normal priority on the CPU.
Assign
alice
to the same CPU asralph
.sudo systemctl set-property user-8001.slice AllowedCPUs=1 cat /sys/fs/cgroup/user.slice/user-8001.slice/cpuset.cpus.effective
Set
CPUWeight
.sudo systemctl set-property user-8001.slice CPUWeight=200 cat /sys/fs/cgroup/user.slice/user-8001.slice/cpu.weight
The default weight is 100, so 200 is twice that number.
Open a new terminal and connect via SSH as
ralph
to the ol-node-01 system.ssh ralph@<ip_address_of_instance>
Run
foo.exe
asralph
.foo.exe &
Open another new terminal and connect via SSH as
alice
to the ol-node-01 system.ssh alice@<ip_address_of_instance>
Run
foo.exe
asalice
.foo.exe &
Verify via
top
thatalice
is getting the higher priority.top
Once
top
is running, hit the1 key
to show the CPUs individually.Example output:
top - 20:10:55 up 25 min, 3 users, load average: 1.29, 0.46, 0.20 Tasks: 164 total, 3 running, 161 sleeping, 0 stopped, 0 zombie %Cpu0 : 0.0 us, 0.0 sy, 0.0 ni, 96.5 id, 0.0 wa, 0.0 hi, 3.2 si, 0.3 st %Cpu1 : 92.4 us, 7.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 15715.8 total, 14744.6 free, 438.5 used, 532.7 buff/cache MiB Swap: 4096.0 total, 4096.0 free, 0.0 used. 15001.1 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7934 alice 20 0 15800 1768 1476 R 67.0 0.0 0:36.15 sha1sum 7814 ralph 20 0 15800 1880 1592 R 33.3 0.0 0:34.60 sha1sum 1 root 20 0 388476 14440 9296 S 0.0 0.1 0:02.22 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd ...
Switch to the terminal logged in as the
oracle
user.Load the
system.slice
using thefoo.service
.sudo systemctl start foo.service
Look now at the top output, which is still running in the
alice
terminal window. See that thefoo.service
consumes CPU 0 while the users split CPU 1 based on their weights.Example output:
top - 19:18:15 up 21:57, 3 users, load average: 2.15, 2.32, 2.25 Tasks: 159 total, 4 running, 155 sleeping, 0 stopped, 0 zombie %Cpu0 : 89.1 us, 7.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.7 hi, 0.3 si, 2.6 st %Cpu1 : 93.7 us, 5.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.7 hi, 0.3 si, 0.0 st MiB Mem : 14707.8 total, 13640.1 free, 420.5 used, 647.2 buff/cache MiB Swap: 4096.0 total, 4096.0 free, 0.0 used. 13984.3 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 280921 root 20 0 228492 1776 1488 R 93.4 0.0 0:07.74 sha1sum 279185 alice 20 0 228492 1816 1524 R 65.6 0.0 7:35.18 sha1sum 279291 ralph 20 0 228492 1840 1552 R 32.8 0.0 7:00.30 sha1sum 2026 oracle-+ 20 0 935920 29280 15008 S 0.3 0.2 1:03.31 gomon 1 root 20 0 388548 14716 9508 S 0.0 0.1 0:22.30 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.10 kthreadd ...
Assign a CPU Quota
Lastly, we will cap the CPU time for ralph
.
Return to the terminal logged in as the
oracle
user.Set the quota to 5%.
sudo systemctl set-property user-8000.slice CPUQuota=5%
The change takes effect immediately, as seen in the top output, which still runs in the
alice
user terminal.Example output:
top - 19:24:53 up 22:04, 3 users, load average: 2.21, 2.61, 2.45 Tasks: 162 total, 4 running, 158 sleeping, 0 stopped, 0 zombie %Cpu0 : 93.0 us, 4.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.7 hi, 0.0 si, 1.7 st %Cpu1 : 91.7 us, 5.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 1.0 hi, 1.0 si, 0.7 st MiB Mem : 14707.8 total, 13639.4 free, 420.0 used, 648.4 buff/cache MiB Swap: 4096.0 total, 4096.0 free, 0.0 used. 13984.7 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 280921 root 20 0 228492 1776 1488 R 97.4 0.0 6:26.75 sha1sum 279185 alice 20 0 228492 1816 1524 R 92.1 0.0 12:21.12 sha1sum 279291 ralph 20 0 228492 1840 1552 R 5.3 0.0 8:44.84 sha1sum 1 root 20 0 388548 14716 9508 S 0.0 0.1 0:22.48 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.10 kthreadd ...
Revert the cap on the
ralph
user using theoracle
terminal window.echo "max 100000" | sudo tee -a user-8000.slice/cpu.max
The quota gets written to the
cpu.max
file, and the defaults aremax 100000
.Example output:
[oracle@ol-node-01 user.slice]$ echo "max 100000" | sudo tee -a user-8000.slice/cpu.max max 100000
You can enable cgroup v2, limit users to a specific CPU when the system is under load, and lock them to using only a percentage of that CPU.
Next Steps
Thank you for completing this tutorial. Hopefully, these steps have given you a better understanding of installing, configuring, and using control group version 2 on Oracle Linux.