Run Control Groups Version 2 on Oracle Linux
Introduction
Control Groups (cgroups) is a Linux kernel feature for limiting, prioritizing, and allocating resources such as CPU time, memory, and network bandwidth for running processes.
This tutorial guides you through limiting the CPU time for user processes using cgroups v2.
Objectives
In this lab, you'll learn to:
- Enable cgroups v2
- Set a soft CPU limit for a user process
- Set a hard CPU limit for a user process
Prerequisites
- A system with Oracle Linux 8 installed with the following configuration:
- a non-root user with
sudo
permissions
- a non-root user with
Setup Lab Environment
Note: When using the free lab environment, see Oracle Linux Lab Basics for connection and other usage instructions.
Before getting started with the lab, we need to complete a few housekeeping items. The items created are used to demonstrate the limiting capabilities of cgroups.
Create load-generating script
If not already connected, open a terminal and connect via ssh to the ol-server system.
ssh oracle@<ip_address_of_ol-server>
Create the
foo.exe
script.echo '#!/bin/bash /usr/bin/sha1sum /dev/zero' > foo.exe
Copy the
foo.exe
script to a location in your$PATH
and set the proper permissions.sudo mv foo.exe /usr/local/bin/foo.exe sudo chown root:root /usr/local/bin/foo.exe sudo chmod 755 /usr/local/bin/foo.exe
Note: (Optional) If running with SELinux
enforcing
:sudo sestatus
Fix the SELinux labels after copying and changing permissions by running the following command:
sudo /sbin/restorecon -v /usr/local/bin/foo.exe
Create load-generating service
Create the
foo.service
file.echo '[Unit] Description=the foo service After=network.target [Service] ExecStart=/usr/local/bin/foo.exe [Install] WantedBy=multi-user.target' > foo.service
Copy the
foo.service
script to where systemd scripts are located and set the proper permissions.sudo mv foo.service /etc/systemd/system/foo.service sudo chown root:root /etc/systemd/system/foo.service sudo chmod 644 /etc/systemd/system/foo.service
Note: (Optional) If running with SELinux
enforcing
, fix the SELinux labels after copying and changing permissions by running the following command:sudo /sbin/restorecon -v /etc/systemd/system/foo.service
Reload the daemon, so systemd recognizes the new service.
sudo systemctl daemon-reload
Start
foo.service
and check its status.sudo systemctl start foo.service sudo systemctl status foo.service
Create users
Additional users will allow running the load-generating script under these different accounts and different CPU weights.
Create users and set passwords.
sudo useradd -u 8000 ralph sudo useradd -u 8001 alice echo "ralph:oracle" | sudo chpasswd echo "alice:oracle" | sudo chpasswd
Allow SSH connections.
Copy the SSH key from the
oracle
user account.sudo mkdir /home/ralph/.ssh sudo cp /home/oracle/.ssh/authorized_keys /home/ralph/.ssh/authorized_keys sudo chown -R ralph:ralph /home/ralph/.ssh sudo chmod 700 /home/ralph/.ssh sudo chmod 600 /home/ralph/.ssh/authorized_keys
Repeat for the
alice
user.sudo mkdir /home/alice/.ssh sudo cp /home/oracle/.ssh/authorized_keys /home/alice/.ssh/authorized_keys sudo chown -R alice:alice /home/alice/.ssh sudo chmod 700 /home/alice/.ssh sudo chmod 600 /home/alice/.ssh/authorized_keys
Open a new terminal and verify both SSH connections work.
ssh ralph@<ip_address_of_ol-server>
Then
exit
the session, and repeat for the following user.ssh alice@<ip_address_of_ol-server>
Exit the session, and close the terminal window.
Mount cgroups v2
Oracle Linux mounts cgroups v1 by default at boot time. To use cgroups v2, you must manually configure the boot kernel parameters.
Return to the terminal where you are logged in as
oracle
.Add the cgroups v2 systemd kernel parameter.
sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=1"
You can instead specify only your current boot entry by running
sudo grubby --update-kernel=/boot/vmlinuz-$(uname -r) --args="systemd.unified_cgroup_hierarchy=1"
.Reboot.
sudo reboot
The reboot will take a few minutes to complete.
Note: You will not be able to ssh into the system until the reboot completes and the sshd daemon is running.
Connect again via ssh to the ol-server system.
ssh oracle@<ip_address_of_ol-server>
Verify cgroups v2 was mounted.
sudo mount -l | grep cgroup
Example output:
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate)
Inspect the contents of the cgroups mounted directory.
ll /sys/fs/cgroup
Example output:
total 0 -r--r--r--. 1 root root 0 Mar 13 21:20 cgroup.controllers -rw-r--r--. 1 root root 0 Mar 13 21:20 cgroup.max.depth -rw-r--r--. 1 root root 0 Mar 13 21:20 cgroup.max.descendants -rw-r--r--. 1 root root 0 Mar 13 21:20 cgroup.procs -r--r--r--. 1 root root 0 Mar 13 21:20 cgroup.stat -rw-r--r--. 1 root root 0 Mar 13 21:20 cgroup.subtree_control -rw-r--r--. 1 root root 0 Mar 13 21:20 cgroup.threads -rw-r--r--. 1 root root 0 Mar 13 21:20 cpu.pressure -r--r--r--. 1 root root 0 Mar 13 21:20 cpuset.cpus.effective -r--r--r--. 1 root root 0 Mar 13 21:20 cpuset.mems.effective drwxr-xr-x. 2 root root 0 Mar 13 21:20 init.scope -rw-r--r--. 1 root root 0 Mar 13 21:20 io.pressure -rw-r--r--. 1 root root 0 Mar 13 21:20 memory.pressure drwxr-xr-x. 87 root root 0 Mar 13 21:20 system.slice drwxr-xr-x. 4 root root 0 Mar 13 21:24 user.slice
The output shows the root control group at its default location. The directory contains interface files all prefixed with cgroup and directories related to
systemd
that end in.scope
and.slice
.
Work with the Virtual File System
Before we get started, we need to learn a bit about the cgroups virtual file system mounted at /sys/fs/cgroup
.
Show which CPUs participate in the cpuset for everyone.
cat /sys/fs/cgroup/cpuset.cpus.effective
Example output:
[oracle@ol-server ~]$ cat /sys/fs/cgroup/cpuset.cpus.effective 0-1
Our test box was an Oracle Linux 8 instance deployed on a VM.Standard2.1 shape, which is a dual-core system.
Show which controllers are active.
cat /sys/fs/cgroup/cgroup.controllers
Example output:
[oracle@ol-server ~]$ cat /sys/fs/cgroup/cgroup.controllers cpuset cpu io memory pids rdma
It's good to see the cpuset controller present as we'll use it later in this lab.
Show processes spawned by
oracle
.First, we need to determine
oracle
's user id (UID).who id
Example output:
[oracle@ol-server ~]$ who oracle pts/0 2022-03-13 21:23 (10.39.209.157) [oracle@ol-server ~]$ id uid=1001(oracle) gid=1001(oracle) groups=1001(oracle),10(wheel) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
Using the UID, we can find the
oracle
users slice.cd /sys/fs/cgroup/user.slice ls
Example output:
[oracle@ol-server ~]$ cd /sys/fs/cgroup/user.slice [oracle@ol-server user.slice]$ ls cgroup.controllers cgroup.subtree_control memory.events memory.pressure pids.max cgroup.events cgroup.threads memory.events.local memory.stat user-0.slice cgroup.freeze cgroup.type memory.high memory.swap.current user-1001.slice cgroup.max.depth cpu.pressure memory.low memory.swap.events user-989.slice cgroup.max.descendants cpu.stat memory.max memory.swap.max cgroup.procs io.pressure memory.min pids.current cgroup.stat memory.current memory.oom.group pids.events
Systemd assigns every user a slice named
user-<UID>.slice
. So what's under that directory?cd user-1001.slice ls
Example output:
[oracle@ol-server user.slice]$ cd user-1001.slice/ [oracle@ol-server user-1001.slice]$ ls cgroup.controllers cgroup.max.descendants cgroup.threads io.pressure user-runtime-dir@1001.service cgroup.events cgroup.procs cgroup.type memory.pressure cgroup.freeze cgroup.stat cpu.pressure session-3.scope cgroup.max.depth cgroup.subtree_control cpu.stat user@1001.service
These are the top-level cgroups for the
oracle
user. However, there are no processes listed incgroup.procs
. So, where is the list of user processes?cat cgroup.procs
Example output:
[oracle@ol-server user-1001.slice]$ cat cgroup.procs [oracle@ol-server user-1001.slice]$
When
oracle
opened the SSH session at the beginning of this lab, the user session created a scope sub-unit. Under this scope, we can check thecgroup.procs
for a list of processes spawned under that session.Note: The user might have multiple sessions based on the number of connections to the system ;therefore, replace the 3 in the sample below as necessary.
cd session-3.scope ls cat cgroup.procs
Example output:
[oracle@ol-server user-1001.slice]$ cd session-3.scope/ [oracle@ol-server session-3.scope]$ ls cgroup.controllers cgroup.max.depth cgroup.stat cgroup.type io.pressure cgroup.events cgroup.max.descendants cgroup.subtree_control cpu.pressure memory.pressure cgroup.freeze cgroup.procs cgroup.threads cpu.stat [oracle@ol-server session-3.scope]$ cat cgroup.procs 3189 3200 3201 54217
Now that we found the processes the hard way, we can use
systemd-cgls
to show the same information in a tree-like view.Note: When run from within the virtual filesystem,
systemd-cgls
limits the cgroup output to the current working directory.cd /sys/fs/cgroup/user.slice/user-1001.slice systemd-cgls
Example output:
[oracle@ol-server user-1001.slice]$ systemd-cgls Working directory /sys/fs/cgroup/user.slice/user-1001.slice: ├─session-3.scope │ ├─ 3189 sshd: oracle [priv] │ ├─ 3200 sshd: oracle@pts/0 │ ├─ 3201 -bash │ ├─55486 systemd-cgls │ └─55487 less └─user@1001.service └─init.scope ├─3193 /usr/lib/systemd/systemd --user └─3195 (sd-pam)
Limit the CPU Cores Used
With cgroups v2, systemd has full control of the cpuset controller. This level of control enables an administrator to schedule work on only a specific CPU core.
Check CPUs for
user.slice
.cd /sys/fs/cgroup/user.slice ls cat ../cpuset.cpus.effective
Example output:
[oracle@ol-server cgroup]$ cd /sys/fs/cgroup/user.slice/ [oracle@ol-server user.slice]$ ls cgroup.controllers cgroup.subtree_control memory.events memory.pressure pids.max cgroup.events cgroup.threads memory.events.local memory.stat user-0.slice cgroup.freeze cgroup.type memory.high memory.swap.current user-1001.slice cgroup.max.depth cpu.pressure memory.low memory.swap.events user-989.slice cgroup.max.descendants cpu.stat memory.max memory.swap.max cgroup.procs io.pressure memory.min pids.current cgroup.stat memory.current memory.oom.group pids.events [oracle@ol-server user.slice]$ cat ../cpuset.cpus.effective 0-1
The
cpuset.cpus.effective
shows the actual cores used by the user.slice. If parameter does not exist in the specific cgroup directory, or we don't set it, the value get's inherited from the parent, which happens to be the top-level cgroup root directory for this case.Restrict the
system
and user 0, 1001, and 989 slices to CPU core 0.cat /sys/fs/cgroup/system.slice/cpuset.cpus.effective sudo systemctl set-property system.slice AllowedCPUs=0 cat /sys/fs/cgroup/system.slice/cpuset.cpus.effective
Example output:
[oracle@ol-server user.slice]$ cat /sys/fs/cgroup/system.slice/cpuset.cpus.effective cat: /sys/fs/cgroup/system.slice/cpuset.cpus.effective: No such file or directory [oracle@ol-server user.slice]$ sudo systemctl set-property system.slice AllowedCPUs=0 [oracle@ol-server user.slice]$ cat /sys/fs/cgroup/system.slice/cpuset.cpus.effective 0
Note: The
No such file or directory
indicates that by default, thesystem
slice inherits itscpuset.cpus.effective
value from the parent.sudo systemctl set-property user-0.slice AllowedCPUs=0 sudo systemctl set-property user-1001.slice AllowedCPUs=0 sudo systemctl set-property user-989.slice AllowedCPUs=0
Restrict the
ralph
user to CPU core 1.sudo systemctl set-property user-8000.slice AllowedCPUs=1 cat /sys/fs/cgroup/user.slice/user-8000.slice/cpuset.cpus.effective
Example output:
[oracle@ol-server ~]$ sudo systemctl set-property user-8000.slice AllowedCPUs=1 [oracle@ol-server ~]$ cat /sys/fs/cgroup/user.slice/user-8000.slice/cpuset.cpus.effective 1
Open a new terminal and connect via ssh as
ralph
to the ol-server system.ssh ralph@<ip_address_of_ol-server>
Test using the
foo.exe
script.foo.exe &
Verify the results.
top
Once
top
is running, hit the1 key
to show the CPUs individually.Example output:
top - 18:23:55 up 21:03, 2 users, load average: 1.03, 1.07, 1.02 Tasks: 155 total, 2 running, 153 sleeping, 0 stopped, 0 zombie %Cpu0 : 6.6 us, 7.0 sy, 0.0 ni, 84.8 id, 0.0 wa, 0.3 hi, 0.3 si, 1.0 st %Cpu1 : 93.0 us, 6.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.7 hi, 0.3 si, 0.0 st MiB Mem : 14707.8 total, 13649.1 free, 412.1 used, 646.6 buff/cache MiB Swap: 4096.0 total, 4096.0 free, 0.0 used. 13993.0 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 226888 ralph 20 0 228492 1808 1520 R 99.7 0.0 199:34.27 sha1sum 269233 root 20 0 223724 6388 1952 S 1.3 0.0 0:00.04 pidstat 1407 root 20 0 439016 41116 39196 S 0.3 0.3 0:17.81 sssd_nss 1935 root 20 0 236032 3656 3156 S 0.3 0.0 0:34.34 OSWatcher 2544 root 20 0 401900 40292 9736 S 0.3 0.3 0:10.62 ruby 1 root 20 0 388548 14716 9508 S 0.0 0.1 0:21.21 systemd ...
Type
q
to quit top.Alternate way to check the processor running a process.
ps -eo pid,psr,user,cmd | grep ralph
Example output:
[ralph@ol-server ~]$ ps -eo pid,psr,user,cmd | grep ralph 226715 1 root sshd: ralph [priv] 226719 1 ralph /usr/lib/systemd/systemd --user 226722 1 ralph (sd-pam) 226727 1 ralph sshd: ralph@pts/2 226728 1 ralph -bash 226887 1 ralph /bin/bash /usr/local/bin/foo.exe 226888 1 ralph /usr/bin/sha1sum /dev/zero 269732 1 ralph ps -eo pid,psr,user,cmd 269733 1 ralph grep --color=auto ralph
The
psr
column is the CPU number of thecmd
, or actual process.Exit and close the terminal window used to log in as
ralph
.Kill the
foo.exe
job.Switch back to the terminal where you are logged in as
oracle
and run the following command.sudo pkill sha1sum
Adjust the CPU Weight for Users
Time to have alice
join in the fun. She has some critical work to complete, and therefore, we'll give her twice the normal priority on the CPU.
Assign
alice
to the same CPU asralph
.sudo systemctl set-property user-8001.slice AllowedCPUs=1 cat /sys/fs/cgroup/user.slice/user-8001.slice/cpuset.cpus.effective
Set
CPUWeight
.sudo systemctl set-property user-8001.slice CPUWeight=200 cat /sys/fs/cgroup/user.slice/user-8001.slice/cpu.weight
The default weight is 100, so 200 is twice that number.
Open a new terminal and connect via ssh as
ralph
to the ol-server system.ssh ralph@<ip_address_of_ol-server>
Run
foo.exe
asralph
.foo.exe &
Open another new terminal and connect via ssh as
alice
to the ol-server system.ssh alice@<ip_address_of_ol-server>
Run
foo.exe
asalice
.foo.exe &
Verify via
top
thatalice
is getting the higher priority.top
Once top is running, hit the
1 key
to show the CPUs individually.Example output:
top - 20:10:55 up 25 min, 3 users, load average: 1.29, 0.46, 0.20 Tasks: 164 total, 3 running, 161 sleeping, 0 stopped, 0 zombie %Cpu0 : 0.0 us, 0.0 sy, 0.0 ni, 96.5 id, 0.0 wa, 0.0 hi, 3.2 si, 0.3 st %Cpu1 : 92.4 us, 7.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 15715.8 total, 14744.6 free, 438.5 used, 532.7 buff/cache MiB Swap: 4096.0 total, 4096.0 free, 0.0 used. 15001.1 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7934 alice 20 0 15800 1768 1476 R 67.0 0.0 0:36.15 sha1sum 7814 ralph 20 0 15800 1880 1592 R 33.3 0.0 0:34.60 sha1sum 1 root 20 0 388476 14440 9296 S 0.0 0.1 0:02.22 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd ...
Return to the terminal logged in as the
oracle
user.Load the
system.slice
using thefoo.service
.sudo systemctl start foo.service
Look now at the top output still running in the
alice
terminal window. See that thefoo.service
is consuming CPU 0, while the users split CPU 1 based on their weights.Example output:
top - 19:18:15 up 21:57, 3 users, load average: 2.15, 2.32, 2.25 Tasks: 159 total, 4 running, 155 sleeping, 0 stopped, 0 zombie %Cpu0 : 89.1 us, 7.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.7 hi, 0.3 si, 2.6 st %Cpu1 : 93.7 us, 5.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.7 hi, 0.3 si, 0.0 st MiB Mem : 14707.8 total, 13640.1 free, 420.5 used, 647.2 buff/cache MiB Swap: 4096.0 total, 4096.0 free, 0.0 used. 13984.3 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 280921 root 20 0 228492 1776 1488 R 93.4 0.0 0:07.74 sha1sum 279185 alice 20 0 228492 1816 1524 R 65.6 0.0 7:35.18 sha1sum 279291 ralph 20 0 228492 1840 1552 R 32.8 0.0 7:00.30 sha1sum 2026 oracle-+ 20 0 935920 29280 15008 S 0.3 0.2 1:03.31 gomon 1 root 20 0 388548 14716 9508 S 0.0 0.1 0:22.30 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.10 kthreadd ...
Assign a CPU Quota
Lastly, we will cap the CPU time for ralph
.
Return to the terminal logged in as the
oracle
user.Set the quota to 5%
sudo systemctl set-property user-8000.slice CPUQuota=5%
The change takes effect immediately, as seen in the top output still running in the
alice
user terminal.Example output:
top - 19:24:53 up 22:04, 3 users, load average: 2.21, 2.61, 2.45 Tasks: 162 total, 4 running, 158 sleeping, 0 stopped, 0 zombie %Cpu0 : 93.0 us, 4.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.7 hi, 0.0 si, 1.7 st %Cpu1 : 91.7 us, 5.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 1.0 hi, 1.0 si, 0.7 st MiB Mem : 14707.8 total, 13639.4 free, 420.0 used, 648.4 buff/cache MiB Swap: 4096.0 total, 4096.0 free, 0.0 used. 13984.7 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 280921 root 20 0 228492 1776 1488 R 97.4 0.0 6:26.75 sha1sum 279185 alice 20 0 228492 1816 1524 R 92.1 0.0 12:21.12 sha1sum 279291 ralph 20 0 228492 1840 1552 R 5.3 0.0 8:44.84 sha1sum 1 root 20 0 388548 14716 9508 S 0.0 0.1 0:22.48 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.10 kthreadd ...
Revert the cap on the
ralph
user using theoracle
terminal window.
echo "max 100000" | sudo tee -a user-8000.slice/cpu.max
The quota gets written to the cpu.max
file, and the defaults are max 100000
.
Example output:
[oracle@ol-server user.slice]$ echo "max 100000" | sudo tee -a user-8000.slice/cpu.max max 100000
You can enable cgroups v2, limit users to a specific CPU when the system is under load, and lock them to using only a percentage of that CPU. Check out our other resources for more on Oracle Linux.
For More Information
See other related resources: