Run Control Groups Version 2 on Oracle Linux

3
0
Send lab feedback

Run Control Groups Version 2 on Oracle Linux

Introduction

Control Groups (cgroups) is a Linux kernel feature for limiting, prioritizing, and allocating resources such as CPU time, memory, and network bandwidth for running processes.

This tutorial guides you through limiting the CPU time for user processes using cgroups v2.

Objectives

In this lab, you'll learn to:

  • Enable cgroups v2
  • Set a soft CPU limit for a user process
  • Set a hard CPU limit for a user process

Prerequisites

  • A system with Oracle Linux 8 installed with the following configuration:
    • a non-root user with sudo permissions

Setup Lab Environment

Note: When using the free lab environment, see Oracle Linux Lab Basics for connection and other usage instructions.

Before getting started with the lab, we need to complete a few housekeeping items. The items created are used to demonstrate the limiting capabilities of cgroups.

Create load-generating script

  1. If not already connected, open a terminal and connect via ssh to the ol-server system.

    ssh oracle@<ip_address_of_ol-server>
  2. Create the foo.exe script.

    echo '#!/bin/bash
    
    /usr/bin/sha1sum /dev/zero' > foo.exe
  3. Copy the foo.exe script to a location in your $PATH and set the proper permissions.

    sudo mv foo.exe /usr/local/bin/foo.exe
    sudo chown root:root /usr/local/bin/foo.exe
    sudo chmod 755 /usr/local/bin/foo.exe

    Note: (Optional) If running with SELinux enforcing:

    sudo sestatus

    Fix the SELinux labels after copying and changing permissions by running the following command:

    sudo /sbin/restorecon -v /usr/local/bin/foo.exe

Create load-generating service

  1. Create the foo.service file.

    echo '[Unit]
    Description=the foo service
    After=network.target
    
    [Service]
    ExecStart=/usr/local/bin/foo.exe
    
    [Install]
    WantedBy=multi-user.target' > foo.service
  2. Copy the foo.service script to where systemd scripts are located and set the proper permissions.

    sudo mv foo.service /etc/systemd/system/foo.service
    sudo chown root:root /etc/systemd/system/foo.service
    sudo chmod 644 /etc/systemd/system/foo.service

    Note: (Optional) If running with SELinux enforcing, fix the SELinux labels after copying and changing permissions by running the following command:

    sudo /sbin/restorecon -v /etc/systemd/system/foo.service
  3. Reload the daemon, so systemd recognizes the new service.

    sudo systemctl daemon-reload
  4. Start foo.service and check its status.

    sudo systemctl start foo.service
    sudo systemctl status foo.service

Create users

Additional users will allow running the load-generating script under these different accounts and different CPU weights.

  1. Create users and set passwords.

    sudo useradd -u 8000 ralph
    sudo useradd -u 8001 alice
    echo "ralph:oracle" | sudo chpasswd
    echo "alice:oracle" | sudo chpasswd
  2. Allow SSH connections.

    Copy the SSH key from the oracle user account.

    sudo mkdir /home/ralph/.ssh
    sudo cp /home/oracle/.ssh/authorized_keys /home/ralph/.ssh/authorized_keys
    sudo chown -R ralph:ralph /home/ralph/.ssh
    sudo chmod 700 /home/ralph/.ssh
    sudo chmod 600 /home/ralph/.ssh/authorized_keys
  3. Repeat for the alice user.

    sudo mkdir /home/alice/.ssh
    sudo cp /home/oracle/.ssh/authorized_keys /home/alice/.ssh/authorized_keys
    sudo chown -R alice:alice /home/alice/.ssh
    sudo chmod 700 /home/alice/.ssh
    sudo chmod 600 /home/alice/.ssh/authorized_keys
  4. Open a new terminal and verify both SSH connections work.

    ssh ralph@<ip_address_of_ol-server>

    Then exit the session, and repeat for the following user.

    ssh alice@<ip_address_of_ol-server>

    Exit the session, and close the terminal window.

Mount cgroups v2

Oracle Linux mounts cgroups v1 by default at boot time. To use cgroups v2, you must manually configure the boot kernel parameters.

  1. Return to the terminal where you are logged in as oracle.

  2. Add the cgroups v2 systemd kernel parameter.

    sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=1"

    You can instead specify only your current boot entry by running sudo grubby --update-kernel=/boot/vmlinuz-$(uname -r) --args="systemd.unified_cgroup_hierarchy=1".

  3. Reboot.

    sudo reboot

    The reboot will take a few minutes to complete.

    Note: You will not be able to ssh into the system until the reboot completes and the sshd daemon is running.

  4. Connect again via ssh to the ol-server system.

    ssh oracle@<ip_address_of_ol-server>
  5. Verify cgroups v2 was mounted.

    sudo mount -l | grep cgroup

    Example output:

    cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate)
  6. Inspect the contents of the cgroups mounted directory.

    ll /sys/fs/cgroup

    Example output:

    total 0
    -r--r--r--.  1 root root 0 Mar 13 21:20 cgroup.controllers
    -rw-r--r--.  1 root root 0 Mar 13 21:20 cgroup.max.depth
    -rw-r--r--.  1 root root 0 Mar 13 21:20 cgroup.max.descendants
    -rw-r--r--.  1 root root 0 Mar 13 21:20 cgroup.procs
    -r--r--r--.  1 root root 0 Mar 13 21:20 cgroup.stat
    -rw-r--r--.  1 root root 0 Mar 13 21:20 cgroup.subtree_control
    -rw-r--r--.  1 root root 0 Mar 13 21:20 cgroup.threads
    -rw-r--r--.  1 root root 0 Mar 13 21:20 cpu.pressure
    -r--r--r--.  1 root root 0 Mar 13 21:20 cpuset.cpus.effective
    -r--r--r--.  1 root root 0 Mar 13 21:20 cpuset.mems.effective
    drwxr-xr-x.  2 root root 0 Mar 13 21:20 init.scope
    -rw-r--r--.  1 root root 0 Mar 13 21:20 io.pressure
    -rw-r--r--.  1 root root 0 Mar 13 21:20 memory.pressure
    drwxr-xr-x. 87 root root 0 Mar 13 21:20 system.slice
    drwxr-xr-x.  4 root root 0 Mar 13 21:24 user.slice

    The output shows the root control group at its default location. The directory contains interface files all prefixed with cgroup and directories related to systemd that end in .scope and .slice.

Work with the Virtual File System

Before we get started, we need to learn a bit about the cgroups virtual file system mounted at /sys/fs/cgroup.

  1. Show which CPUs participate in the cpuset for everyone.

    cat /sys/fs/cgroup/cpuset.cpus.effective

    Example output:

    [oracle@ol-server ~]$ cat /sys/fs/cgroup/cpuset.cpus.effective
    0-1

    Our test box was an Oracle Linux 8 instance deployed on a VM.Standard2.1 shape, which is a dual-core system.

  2. Show which controllers are active.

    cat /sys/fs/cgroup/cgroup.controllers

    Example output:

    [oracle@ol-server ~]$ cat /sys/fs/cgroup/cgroup.controllers
    cpuset cpu io memory pids rdma

    It's good to see the cpuset controller present as we'll use it later in this lab.

  3. Show processes spawned by oracle.

    First, we need to determine oracle's user id (UID).

    who
    id

    Example output:

    [oracle@ol-server ~]$ who
    oracle   pts/0        2022-03-13 21:23 (10.39.209.157)
    [oracle@ol-server ~]$ id
    uid=1001(oracle) gid=1001(oracle) groups=1001(oracle),10(wheel) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

    Using the UID, we can find the oracle users slice.

    cd /sys/fs/cgroup/user.slice
    ls

    Example output:

    [oracle@ol-server ~]$ cd /sys/fs/cgroup/user.slice
    [oracle@ol-server user.slice]$ ls
    cgroup.controllers      cgroup.subtree_control  memory.events        memory.pressure      pids.max
    cgroup.events           cgroup.threads          memory.events.local  memory.stat          user-0.slice
    cgroup.freeze           cgroup.type             memory.high          memory.swap.current  user-1001.slice
    cgroup.max.depth        cpu.pressure            memory.low           memory.swap.events   user-989.slice
    cgroup.max.descendants  cpu.stat                memory.max           memory.swap.max
    cgroup.procs            io.pressure             memory.min           pids.current
    cgroup.stat             memory.current          memory.oom.group     pids.events

    Systemd assigns every user a slice named user-<UID>.slice. So what's under that directory?

    cd user-1001.slice
    ls

    Example output:

    [oracle@ol-server user.slice]$ cd user-1001.slice/
    [oracle@ol-server user-1001.slice]$ ls
    cgroup.controllers  cgroup.max.descendants  cgroup.threads  io.pressure        user-runtime-dir@1001.service
    cgroup.events       cgroup.procs            cgroup.type     memory.pressure
    cgroup.freeze       cgroup.stat             cpu.pressure    session-3.scope
    cgroup.max.depth    cgroup.subtree_control  cpu.stat        user@1001.service

    These are the top-level cgroups for the oracle user. However, there are no processes listed in cgroup.procs. So, where is the list of user processes?

    cat cgroup.procs

    Example output:

    [oracle@ol-server user-1001.slice]$ cat cgroup.procs
    [oracle@ol-server user-1001.slice]$

    When oracle opened the SSH session at the beginning of this lab, the user session created a scope sub-unit. Under this scope, we can check the cgroup.procs for a list of processes spawned under that session.

    Note: The user might have multiple sessions based on the number of connections to the system ;therefore, replace the 3 in the sample below as necessary.

    cd session-3.scope
    ls
    cat cgroup.procs

    Example output:

    [oracle@ol-server user-1001.slice]$ cd session-3.scope/
    [oracle@ol-server session-3.scope]$ ls
    cgroup.controllers  cgroup.max.depth        cgroup.stat             cgroup.type   io.pressure
    cgroup.events       cgroup.max.descendants  cgroup.subtree_control  cpu.pressure  memory.pressure
    cgroup.freeze       cgroup.procs            cgroup.threads          cpu.stat
    [oracle@ol-server session-3.scope]$ cat cgroup.procs
    3189
    3200
    3201
    54217

    Now that we found the processes the hard way, we can use systemd-cgls to show the same information in a tree-like view.

    Note: When run from within the virtual filesystem, systemd-cgls limits the cgroup output to the current working directory.

    cd /sys/fs/cgroup/user.slice/user-1001.slice
    systemd-cgls

    Example output:

    [oracle@ol-server user-1001.slice]$ systemd-cgls
    Working directory /sys/fs/cgroup/user.slice/user-1001.slice:
    ├─session-3.scope
    │ ├─ 3189 sshd: oracle [priv]
    │ ├─ 3200 sshd: oracle@pts/0
    │ ├─ 3201 -bash
    │ ├─55486 systemd-cgls
    │ └─55487 less
    └─user@1001.service
      └─init.scope
        ├─3193 /usr/lib/systemd/systemd --user
        └─3195 (sd-pam)

Limit the CPU Cores Used

With cgroups v2, systemd has full control of the cpuset controller. This level of control enables an administrator to schedule work on only a specific CPU core.

  1. Check CPUs for user.slice.

    cd /sys/fs/cgroup/user.slice
    ls
    cat ../cpuset.cpus.effective

    Example output:

    [oracle@ol-server cgroup]$ cd /sys/fs/cgroup/user.slice/
    [oracle@ol-server user.slice]$ ls
    cgroup.controllers      cgroup.subtree_control  memory.events        memory.pressure      pids.max
    cgroup.events           cgroup.threads          memory.events.local  memory.stat          user-0.slice
    cgroup.freeze           cgroup.type             memory.high          memory.swap.current  user-1001.slice
    cgroup.max.depth        cpu.pressure            memory.low           memory.swap.events   user-989.slice
    cgroup.max.descendants  cpu.stat                memory.max           memory.swap.max
    cgroup.procs            io.pressure             memory.min           pids.current
    cgroup.stat             memory.current          memory.oom.group     pids.events
    [oracle@ol-server user.slice]$ cat ../cpuset.cpus.effective
    0-1

    The cpuset.cpus.effective shows the actual cores used by the user.slice. If parameter does not exist in the specific cgroup directory, or we don't set it, the value get's inherited from the parent, which happens to be the top-level cgroup root directory for this case.

  2. Restrict the system and user 0, 1001, and 989 slices to CPU core 0.

    cat /sys/fs/cgroup/system.slice/cpuset.cpus.effective
    sudo systemctl set-property system.slice AllowedCPUs=0
    cat /sys/fs/cgroup/system.slice/cpuset.cpus.effective

    Example output:

    [oracle@ol-server user.slice]$ cat /sys/fs/cgroup/system.slice/cpuset.cpus.effective
    cat: /sys/fs/cgroup/system.slice/cpuset.cpus.effective: No such file or directory
    [oracle@ol-server user.slice]$ sudo systemctl set-property system.slice AllowedCPUs=0
    [oracle@ol-server user.slice]$ cat /sys/fs/cgroup/system.slice/cpuset.cpus.effective
    0

    Note: The No such file or directory indicates that by default, the system slice inherits its cpuset.cpus.effective value from the parent.

    sudo systemctl set-property user-0.slice AllowedCPUs=0
    sudo systemctl set-property user-1001.slice AllowedCPUs=0
    sudo systemctl set-property user-989.slice AllowedCPUs=0
  3. Restrict the ralph user to CPU core 1.

    sudo systemctl set-property user-8000.slice AllowedCPUs=1
    cat /sys/fs/cgroup/user.slice/user-8000.slice/cpuset.cpus.effective

    Example output:

    [oracle@ol-server ~]$ sudo systemctl set-property user-8000.slice AllowedCPUs=1
    [oracle@ol-server ~]$ cat /sys/fs/cgroup/user.slice/user-8000.slice/cpuset.cpus.effective
    1
  4. Open a new terminal and connect via ssh as ralph to the ol-server system.

    ssh ralph@<ip_address_of_ol-server>
  5. Test using the foo.exe script.

    foo.exe &

    Verify the results.

    top

    Once top is running, hit the 1 key to show the CPUs individually.

    Example output:

    top - 18:23:55 up 21:03,  2 users,  load average: 1.03, 1.07, 1.02
    Tasks: 155 total,   2 running, 153 sleeping,   0 stopped,   0 zombie
    %Cpu0  :  6.6 us,  7.0 sy,  0.0 ni, 84.8 id,  0.0 wa,  0.3 hi,  0.3 si,  1.0 st
    %Cpu1  : 93.0 us,  6.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.7 hi,  0.3 si,  0.0 st
    MiB Mem :  14707.8 total,  13649.1 free,    412.1 used,    646.6 buff/cache
    MiB Swap:   4096.0 total,   4096.0 free,      0.0 used.  13993.0 avail Mem
    
       PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
    226888 ralph     20   0  228492   1808   1520 R  99.7   0.0 199:34.27 sha1sum
    269233 root      20   0  223724   6388   1952 S   1.3   0.0   0:00.04 pidstat
       1407 root      20   0  439016  41116  39196 S   0.3   0.3   0:17.81 sssd_nss
       1935 root      20   0  236032   3656   3156 S   0.3   0.0   0:34.34 OSWatcher
       2544 root      20   0  401900  40292   9736 S   0.3   0.3   0:10.62 ruby
          1 root      20   0  388548  14716   9508 S   0.0   0.1   0:21.21 systemd
    ...

    Type q to quit top.

  6. Alternate way to check the processor running a process.

    ps -eo pid,psr,user,cmd | grep ralph

    Example output:

    [ralph@ol-server ~]$ ps -eo pid,psr,user,cmd | grep ralph
     226715   1 root     sshd: ralph [priv]
     226719   1 ralph    /usr/lib/systemd/systemd --user
     226722   1 ralph    (sd-pam)
     226727   1 ralph    sshd: ralph@pts/2
     226728   1 ralph    -bash
     226887   1 ralph    /bin/bash /usr/local/bin/foo.exe
     226888   1 ralph    /usr/bin/sha1sum /dev/zero
     269732   1 ralph    ps -eo pid,psr,user,cmd
     269733   1 ralph    grep --color=auto ralph

    The psr column is the CPU number of the cmd, or actual process.

  7. Exit and close the terminal window used to log in as ralph.

  8. Kill the foo.exe job.

    Switch back to the terminal where you are logged in as oracle and run the following command.

    sudo pkill sha1sum

Adjust the CPU Weight for Users

Time to have alice join in the fun. She has some critical work to complete, and therefore, we'll give her twice the normal priority on the CPU.

  1. Assign alice to the same CPU as ralph.

    sudo systemctl set-property user-8001.slice AllowedCPUs=1
    cat /sys/fs/cgroup/user.slice/user-8001.slice/cpuset.cpus.effective
  2. Set CPUWeight.

    sudo systemctl set-property user-8001.slice CPUWeight=200
    cat /sys/fs/cgroup/user.slice/user-8001.slice/cpu.weight

    The default weight is 100, so 200 is twice that number.

  3. Open a new terminal and connect via ssh as ralph to the ol-server system.

    ssh ralph@<ip_address_of_ol-server>
  4. Run foo.exe as ralph.

    foo.exe &
  5. Open another new terminal and connect via ssh as alice to the ol-server system.

    ssh alice@<ip_address_of_ol-server>
  6. Run foo.exe as alice.

    foo.exe &
  7. Verify via top that alice is getting the higher priority.

    top

    Once top is running, hit the 1 key to show the CPUs individually.

    Example output:

    top - 20:10:55 up 25 min,  3 users,  load average: 1.29, 0.46, 0.20
    Tasks: 164 total,   3 running, 161 sleeping,   0 stopped,   0 zombie
    %Cpu0  :  0.0 us,  0.0 sy,  0.0 ni, 96.5 id,  0.0 wa,  0.0 hi,  3.2 si,  0.3 st
    %Cpu1  : 92.4 us,  7.6 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    MiB Mem :  15715.8 total,  14744.6 free,    438.5 used,    532.7 buff/cache
    MiB Swap:   4096.0 total,   4096.0 free,      0.0 used.  15001.1 avail Mem 
    
        PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND  
       7934 alice     20   0   15800   1768   1476 R  67.0   0.0   0:36.15 sha1sum  
       7814 ralph     20   0   15800   1880   1592 R  33.3   0.0   0:34.60 sha1sum  
          1 root      20   0  388476  14440   9296 S   0.0   0.1   0:02.22 systemd  
          2 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kthreadd
    ...
  8. Return to the terminal logged in as the oracle user.

  9. Load the system.slice using the foo.service.

    sudo systemctl start foo.service

    Look now at the top output still running in the alice terminal window. See that the foo.service is consuming CPU 0, while the users split CPU 1 based on their weights.

    Example output:

    top - 19:18:15 up 21:57,  3 users,  load average: 2.15, 2.32, 2.25
    Tasks: 159 total,   4 running, 155 sleeping,   0 stopped,   0 zombie
    %Cpu0  : 89.1 us,  7.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.7 hi,  0.3 si,  2.6 st
    %Cpu1  : 93.7 us,  5.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.7 hi,  0.3 si,  0.0 st
    MiB Mem :  14707.8 total,  13640.1 free,    420.5 used,    647.2 buff/cache
    MiB Swap:   4096.0 total,   4096.0 free,      0.0 used.  13984.3 avail Mem
    
        PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
     280921 root      20   0  228492   1776   1488 R  93.4   0.0   0:07.74 sha1sum
     279185 alice     20   0  228492   1816   1524 R  65.6   0.0   7:35.18 sha1sum
     279291 ralph     20   0  228492   1840   1552 R  32.8   0.0   7:00.30 sha1sum
       2026 oracle-+  20   0  935920  29280  15008 S   0.3   0.2   1:03.31 gomon
          1 root      20   0  388548  14716   9508 S   0.0   0.1   0:22.30 systemd
          2 root      20   0       0      0      0 S   0.0   0.0   0:00.10 kthreadd
    ...

Assign a CPU Quota

Lastly, we will cap the CPU time for ralph.

  1. Return to the terminal logged in as the oracle user.

  2. Set the quota to 5%

    sudo systemctl set-property user-8000.slice CPUQuota=5%

    The change takes effect immediately, as seen in the top output still running in the alice user terminal.

    Example output:

    top - 19:24:53 up 22:04,  3 users,  load average: 2.21, 2.61, 2.45
    Tasks: 162 total,   4 running, 158 sleeping,   0 stopped,   0 zombie
    %Cpu0  : 93.0 us,  4.7 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.7 hi,  0.0 si,  1.7 st
    %Cpu1  : 91.7 us,  5.6 sy,  0.0 ni,  0.0 id,  0.0 wa,  1.0 hi,  1.0 si,  0.7 st
    MiB Mem :  14707.8 total,  13639.4 free,    420.0 used,    648.4 buff/cache
    MiB Swap:   4096.0 total,   4096.0 free,      0.0 used.  13984.7 avail Mem
    
        PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
     280921 root      20   0  228492   1776   1488 R  97.4   0.0   6:26.75 sha1sum
     279185 alice     20   0  228492   1816   1524 R  92.1   0.0  12:21.12 sha1sum
     279291 ralph     20   0  228492   1840   1552 R   5.3   0.0   8:44.84 sha1sum
          1 root      20   0  388548  14716   9508 S   0.0   0.1   0:22.48 systemd
          2 root      20   0       0      0      0 S   0.0   0.0   0:00.10 kthreadd
    ...
  3. Revert the cap on the ralph user using the oracle terminal window.

echo "max 100000" | sudo tee -a user-8000.slice/cpu.max

The quota gets written to the cpu.max file, and the defaults are max 100000.

Example output:

[oracle@ol-server user.slice]$ echo "max 100000" | sudo tee -a user-8000.slice/cpu.max
max 100000

You can enable cgroups v2, limit users to a specific CPU when the system is under load, and lock them to using only a percentage of that CPU. Check out our other resources for more on Oracle Linux.

For More Information

See other related resources:

SSR