Build a Software RAID Array on Oracle Linux

1
0
Send lab feedback

Build a Software RAID Array on Oracle Linux

Introduction

A Redundant Array of Independent Disks or RAID device is a virtual device created from two or more real block devices. This functionality allows multiple devices (typically disk drives or partitions of a disk) to be combined into a single device to hold a single filesystem. Some RAID levels include redundancy, allowing the filesystem to survive some degree of device failure.

The Oracle Linux kernel uses the Multiple Device (MD) driver to support Linux software RAID. This driver enables you to organize disk drives into RAID devices and implement different RAID levels.

For more information on these different RAID levels, see the Oracle documentation .

This tutorial will work with the MD utility (mdadm) to create a RAID1 device with a spare and then address a disk failure.

Objectives

In this tutorial, you will learn how to:

  • Create a RAID1 device with a spare
  • Recover a failed RAID1 device

Prerequisites

  • Minimum of a single Oracle Linux system

  • Each system should have Oracle Linux installed and configured with:

    • A non-root user account with sudo access
    • Access to the Internet
    • Two or more block devices attached to the system

Deploy Oracle Linux

Note: If running in your own tenancy, read the linux-virt-labs GitHub project README.md and complete the prerequisites before deploying the lab environment.

  1. Open a terminal on the Luna Desktop.

  2. Clone the linux-virt-labs GitHub project.

    git clone https://github.com/oracle-devrel/linux-virt-labs.git
  3. Change into the working directory.

    cd linux-virt-labs/ol
  4. Install the required collections.

    ansible-galaxy collection install -r requirements.yml
  1. Deploy the lab environment.

    ansible-playbook create_instance.yml -e localhost_python_interpreter="/usr/bin/python3.6" -e add_block_storage=true -e block_count=3

    The free lab environment requires the extra variable local_python_interpreter, which sets ansible_python_interpreter for plays running on localhost. This variable is needed because the environment installs the RPM package for the Oracle Cloud Infrastructure SDK for Python, located under the python3.6 modules.

    The default deployment shape uses the AMD CPU and Oracle Linux 8. To use an Intel CPU or Oracle Linux 9, add -e instance_shape="VM.Standard3.Flex" or -e os_version="9" to the deployment command.

    Important: Wait for the playbook to run successfully and reach the pause task. At this stage of the playbook, the installation of Oracle Linux is complete, and the instances are ready. Take note of the previous play, which prints the public and private IP addresses of the nodes it deploys and any other deployment information needed while running the lab.

Connect to the System

  1. Open a terminal and connect via SSH to the ol-node-01 instance.

    ssh oracle@<ip_address_of_instance>
  2. Verify the block volumes exist.

    sudo lsblk

    The output for the free lab environment should show the /dev/sda for the existing file system, and the available disks /dev/sdb, /dev/sdc, and /dev/sdd.

Install the MD Utility

  1. Install the MD utility.

    Check if mdadm is installed.

    sudo dnf list --installed mdadm

    If not installed, install mdadm.

    sudo dnf -y install mdadm

Create a RAID Device.

RAID1 provides data redundancy and resilience by writing identical data to each drive in the array. If one drive fails, a mirror can satisfy I/O requests. Mirroring is an expensive solution because the system writes the same information to all of the disks in the array.

Features of RAID1:

  • Includes redundancy
  • Uses two or more disks with zero or more spare disks
  • Maintains an exact mirror of the data written on each disk
  • Disk devices should be of equal size
    • If one disk device is larger than another, the RAID device will be the size of the smallest disk
  • Allows up to n-1 disk devices to be removed or fail while all data remains intact
  • Provided the system survives a crash and spare disks are available, recovery of the RAID1 mirror happens automatically and immediately upon detection of the fault
  • Slower write performance occurs compared to a single disk due to writing the same data to multiple disks in the mirror set
  1. List the options available to create a RAID device.

    Using mdadm --help shows how to use the --create option to create a new array from unused devices.

    sudo mdadm --create --help
  2. Create a RAID1 device with one spare disk.

    sudo mdadm --create --verbose /dev/md0 --level=1 --raid-devices=2 /dev/sdb /dev/sdc --spare-devices=1 /dev/sdd
    • --create : Creates the new array
    • --level : The raid level
    • --raid-devices : The number of active devices in the array
    • --spare-devices: The number of spare (extra) devices in the initial array

    In this command, we name the device (array) /dev/md0 and use /dev/sdb and /dev/sdc to create the RAID1 device. The device /dev/sdd is automatically used as a spare to recover from any active device's failure.

    Accept the Continue creating array? prompt by typing y and hitting ENTER.

    Example Output:

    [oracle@ol-mdadm-2022-06-04-180415 ~]$ sudo mdadm --create --verbose /dev/md0 --level=1 --raid-devices=2 /dev/sdb /dev/sdc --spare-devices=1 /dev/sdd
    mdadm: Note: this array has metadata at the start and
        may not be suitable as a boot device. If you plan to
        store '/boot' on this device please ensure that
        your boot-loader understands md/v1.x metadata, or use
        --metadata=0.90
    mdadm: size set to 52395008K
    Continue creating array? y
    mdadm: Defaulting to version 1.2 metadata
    mdadm: array /dev/md0 started.

Create a File System

  1. Create an ext4 filesystem on the RAID device and mount it.

    sudo mkfs.ext4 -F /dev/md0
    sudo mkdir /u01
    sudo mount /dev/md0 /u01
    
  2. Report the file system disk usage.

    df -h

    Example Output:

    [oracle@ol-mdadm-2022-06-04-180415 ~]$ df -h
    Filesystem                  Size  Used Avail Use% Mounted on
    ...
    /dev/md0                     49G   53M   47G   1% /u01
  3. Add an entry to /etc/fstab and make the mount point persistent across reboots.

    echo "/dev/md0    /data01    ext4    defaults    0 0" | sudo tee -a /etc/fstab > /dev/null

Verify RAID Device

  1. Get details about the array.

    sudo mdadm --detail /dev/md0

    Example Output:

    [oracle@ol-mdadm-2022-06-04-180415 ~]$ sudo mdadm --detail /dev/md0
    /dev/md0:
               Version : 1.2
         Creation Time : Sat Jun  4 20:08:32 2022
            Raid Level : raid1
            Array Size : 52395008 (49.97 GiB 53.65 GB)
         Used Dev Size : 52395008 (49.97 GiB 53.65 GB)
          Raid Devices : 2
         Total Devices : 3
           Persistence : Superblock is persistent
    
           Update Time : Sat Jun  4 20:28:58 2022
                 State : clean, resyncing 
        Active Devices : 2
       Working Devices : 3
        Failed Devices : 0
         Spare Devices : 1
    
    Consistency Policy : resync
    
         Resync Status : 59% complete
    
                  Name : ol-mdadm-2022-06-04-180415:0  (local to host ol-mdadm-2022-06-04-180415)
                  UUID : f6c35144:66a24ae9:5b96e616:f7252a9f
                Events : 9
    
        Number   Major   Minor   RaidDevice State
           0       8       16        0      active sync   /dev/sdb
           1       8       32        1      active sync   /dev/sdc
    
           2       8       48        -      spare   /dev/sdd

    In the output, the State shows the array is clean and resyncing. The resysc always occurs after the initial creation of the array, or after recovery. The output shows the resync is 59% complete.

  2. Check real-time information from the kernel.

    sudo cat /proc/mdstat

    Example Output:

    [oracle@ol-mdadm-2022-06-04-180415 ~]$ cat /proc/mdstat 
    Personalities : [raid1] 
    md0 : active raid1 sdd[2](S) sdc[1] sdb[0]
          52395008 blocks super 1.2 [2/2] [UU]
          [==================>..]  resync = 92.2% (48341824/52395008) finish=2.7min speed=24677K/sec
          
    unused devices: <none>

Create RAID Configuration File

  1. Add the RAID configuration to the mdadm configuration file.

    The configuration file identifies which devices are RAID devices and to which array a specific device belongs. Based on this configuration file, mdadm can assemble the arrays at boot time.

    sudo mdadm --examine --scan | sudo tee -a /etc/mdadm.conf

    Example Output:

    [oracle@ol-mdadm-2022-06-04-180415 ~]$ sudo mdadm --examine --scan | sudo tee -a /etc/mdadm.conf
    ARRAY /dev/md/0  metadata=1.2 UUID=34a52537:38660137:d8804219:dbfd7531 name=ol-node-01:0
       spares=1
  2. Adjust the name value in the configuration file.

    Due to a known issue in the latest mdadm package that causes a Not POSIX compatible warning in how mdadm --examine --scan assembles the configuration file, we must remove the trailing :0 in the name value.

    sudo sed -i 's/ol-node-01:0/ol-node-01/g' /etc/mdadm.conf

Manage RAID Devices

This option manages the component devices within an array, such as adding, removing, or faulting a device.

  1. List the options available to manage a RAID device.

    sudo mdadm --manage --help
    • --add : Hotadd subsequent devices.
    • --remove : Remove subsequent non-active devices.
    • --fail : Mark subsequent devices as faulty.
  2. Synchronize cached writes to persistent storage.

    Before running any disk management commands, you must run the sync command to write all disk caches to disk.

    sudo sync
  3. Mark a disk as failed.

    sudo mdadm --manage /dev/md0 --fail /dev/sdb
  4. Get array details.

    sudo mdadm --detail /dev/md0 

    Example Output:

    [oracle@ol-mdadm-2022-06-04-180415 ~]$ sudo mdadm --detail /dev/md0 
    /dev/md0:
               Version : 1.2
         Creation Time : Sat Jun  4 20:08:32 2022
            Raid Level : raid1
            Array Size : 52395008 (49.97 GiB 53.65 GB)
         Used Dev Size : 52395008 (49.97 GiB 53.65 GB)
          Raid Devices : 2
         Total Devices : 3
           Persistence : Superblock is persistent
    
           Update Time : Sat Jun  4 21:34:19 2022
                 State : clean, degraded, recovering 
        Active Devices : 1
       Working Devices : 2
        Failed Devices : 1
         Spare Devices : 1
    
    Consistency Policy : resync
    
        Rebuild Status : 1% complete
    
                  Name : ol-mdadm-2022-06-04-180415:0  (local to host ol-mdadm-2022-06-04-180415)
                  UUID : f6c35144:66a24ae9:5b96e616:f7252a9f
                Events : 19
    
        Number   Major   Minor   RaidDevice State
           2       8       48        0      spare rebuilding   /dev/sdd
           1       8       32        1      active sync   /dev/sdc
    
           0       8       16        -      faulty   /dev/sdb

    The array is marked as degraded and recovering. The output also shows that the spare device /dev/sdd is automatically rebuilding the array, while /dev/sdb is faulty.

  5. Remove the failed disk.

    sudo mdadm --manage /dev/md0 --remove /dev/sdb
  6. Replace the failed disk.

    If this was a physical system, this is when you replace the server's failed physical disk with a new one. For a virtual environment, you can repurpose the disk without any changes.

  7. Remove previous linux_raid_member signature.

    A signature, metadata, is written on a disk when used in a RAID array, and the disk cannot be moved to another system or repurposed until removing those signatures.

    sudo wipefs -a -f /dev/sdb

    Warning: The wipefs command is destructive and removes the entire partition table on the target disk (/dev/sdb) and any signatures.

  8. Add a new spare to the array.

    sudo mdadm --manage /dev/md0 --add /dev/sdb
  9. Verify the spare disk exists.

    sudo mdadm --detail /dev/md0 

    At the bottom of the output, the device /dev/sdb should appear in the list with the State set to spare.

Next Steps

You should now be able to create a RAID1 device with a spare, and know how to recover when one fails. Check out our other content on the Oracle Linux Training Station.

SSR