Configure RAID Logical Volumes on Oracle Linux

1
0
Send lab feedback

Configure RAID Logical Volumes on Oracle Linux

Introduction

LVM RAID is a way to create a Logical Volume (LV) that uses multiple physical devices to improve performance or tolerate device failures. In LVM, the physical devices are Physical Volumes (PVs) in a single Volume Group (VG).

This tutorial will work with the Oracle Linux Volume Manager utilities to create a RAID logical volume and then address a disk failure.

Objectives

  • Create a RAID logical volume
  • Resize a RAID logical volume
  • Recover a failed RAID device

Prerequisites

  • Minimum of a single Oracle Linux system

  • Each system should have Oracle Linux installed and configured with:

    • A non-root user account with sudo access
    • Access to the Internet
    • Six or more block devices attached to the system

Deploy Oracle Linux

Note: If running in your own tenancy, read the linux-virt-labs GitHub project README.md and complete the prerequisites before deploying the lab environment.

  1. Open a terminal on the Luna Desktop.

  2. Clone the linux-virt-labs GitHub project.

    git clone https://github.com/oracle-devrel/linux-virt-labs.git
  3. Change into the working directory.

    cd linux-virt-labs/ol
  4. Install the required collections.

    ansible-galaxy collection install -r requirements.yml
  5. Deploy the lab environment.

    ansible-playbook create_instance.yml -e localhost_python_interpreter="/usr/bin/python3.6" -e add_block_storage=true -e block_count=6

    The free lab environment requires the extra variable local_python_interpreter, which sets ansible_python_interpreter for plays running on localhost. This variable is needed because the environment installs the RPM package for the Oracle Cloud Infrastructure SDK for Python, located under the python3.6 modules.

    The default deployment shape uses the AMD CPU and Oracle Linux 8. To use an Intel CPU or Oracle Linux 9, add -e instance_shape="VM.Standard3.Flex" or -e os_version="9" to the deployment command.

    Important: Wait for the playbook to run successfully and reach the pause task. At this stage of the playbook, the installation of Oracle Linux is complete, and the instances are ready. Take note of the previous play, which prints the public and private IP addresses of the nodes it deploys and any other deployment information needed while running the lab.

Connect to the System

  1. Open a terminal and connect via SSH to the ol-node-01 instance.

    ssh oracle@<ip_address_of_instance>
  2. Verify the block volumes exist.

    sudo lsblk

    The output should show the sda device for the existing file system and the available disks sdb through sdg.

Physical Volume (PV)

  1. Create the physical volumes (PV) using the available disks.

    sudo pvcreate -v /dev/sd[b-e]

    Run the command with the -v option to get verbose information.

  2. Verify PV creation.

    sudo pvs

    Example Output:

    [oracle@ol-node01 ~]$ sudo pvs
      PV         VG        Fmt  Attr PSize  PFree 
      /dev/sda3  ocivolume lvm2 a--  45.47g     0 
      /dev/sdb             lvm2 ---  50.00g 50.00g
      /dev/sdc             lvm2 ---  50.00g 50.00g
      /dev/sdd             lvm2 ---  50.00g 50.00g
      /dev/sde             lvm2 ---  50.00g 50.00g

Volume Group (VG)

  1. Create the volume group (VG) using the newly created physical volumes.

    sudo vgcreate -v foo /dev/sd[b-e]
  2. Verify VG creation.

    sudo vgs

    Example Output:

    [oracle@ol-node01 ~]$ sudo vgs
      VG             #PV #LV #SN Attr   VSize   VFree  
      foo              4   0   0 wz--n- 199.98g 199.98g
      ocivolume        1   2   0 wz--n-  45.47g      0 

Logical Volume (LV)

  1. Create the RAID logical volume (LV).

    sudo lvcreate --type raid5 -i 3 -L 5G -n rr foo
    • --type: Set the RAID level. LVM supports RAID levels 0, 1, 4, 5, 6, and 10.
    • -i: Set the number (n) of stripes (devices) for a RAID 4/5/6 logical volume. A raid5 LV requires n+1 devices.
    • -L: Total size of the RAID array.
    • -n: Name of the RAID array.

    Example Output:

    [oracle@ol-node01 ~]$ sudo lvcreate --type raid5 -i 3 -L 5G -n rr foo
      Using default stripesize 64.00 KiB.
      Rounding size 5.00 GiB (1280 extents) up to stripe boundary size 5.00 GiB (1281 extents).
      Logical volume "rr" created.

    Check the manual page for `lvmraid(7) for more information.

  2. Verify LV creation.

    sudo lvdisplay foo

    The output shows all logical volumes contained within the foo VG.

    Example Output:

    [oracle@ol-node01 ~]$ sudo lvdisplay foo
      --- Logical volume ---
      LV Path                /dev/foo/rr
      LV Name                rr
      VG Name                foo
      LV UUID                vghyRi-nKGM-3b9t-tB1I-biJX-10h6-UJWvm2
      LV Write Access        read/write
      LV Creation host, time ol-node01, 2022-05-19 01:23:46 +0000
      LV Status              available
      # open                 0
      LV Size                5.00 GiB
      Current LE             1281
      Segments               1
      Allocation             inherit
      Read ahead sectors     auto
      - currently set to     1024
      Block device           252:10
  3. Display the LV type.

    sudo lvs -o name,segtype foo/rr
    • The lvs command can take the full LV path as an option to narrow the results.

    Example Output:

    [oracle@ol-node01 ~]$ sudo lvs -o name,segtype /dev/foo/rr
      LV     Type 
      rr     raid5

Create a File System

  1. Create an XFS file system on the RAID LV.

    sudo mkfs.xfs -f /dev/foo/rr
    • -f: Forces the overwrite of an existing file system.

    Example Output:

    [oracle@ol-node01 ~]$ sudo mkfs.xfs -f /dev/foo/rr
    meta-data=/dev/foo/rr            isize=512    agcount=8, agsize=163952 blks
             =                       sectsz=4096  attr=2, projid32bit=1
             =                       crc=1        finobt=1, sparse=1, rmapbt=0
             =                       reflink=1
    data     =                       bsize=4096   blocks=1311616, imaxpct=25
             =                       sunit=16     swidth=48 blks
    naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
    log      =internal log           bsize=4096   blocks=2560, version=2
             =                       sectsz=4096  sunit=1 blks, lazy-count=1
    realtime =none                   extsz=4096   blocks=0, rtextents=0

    Note: You cannot reduce the size of the XFS file system after its creation. However, the xfs_growfs command can enlarge it.

Mount the RAID LV

  1. Mount the file system.

    sudo mkdir -p /u01
    sudo mount /dev/foo/rr /u01
  2. Report the file system disk usage.

    df -h

    Example Output:

    [oracle@ol-node01 ~]$ df -h
    Filesystem                         Size  Used Avail Use% Mounted on
    ...
    /dev/mapper/foo-rr                 5.0G   69M  5.0G   2% /u01

Resize a RAID LV

There are several ways to resize a RAID LV:

  • Use lvresize or lvextend to increase the LV.
  • Use lvresize or lvreduce to shrink the LV.
  • Use lvconvert with the --stripes N parameter to change the number of stripes.

Important: Shrinking an LV is risky and may result in data loss. When running an XFS file system on the LV, avoid shrinking the LV, as XFS does not permit reducing the file system size.

Increase the RAID LV Capacity

  1. Using the available free space in the VG, increase the RAID LV size to 10G.

    sudo lvresize -L 10G foo/rr

    To increase the size by 10G, use the option -L +10G instead.

  2. Verify the LV increased to 10G.

    sudo lvs foo/rr

    The LSize should show 10g.

  3. Grow the file system.

    sudo xfs_growfs /u01
  4. Report the updated file system disk usage.

    df -h
  5. Check the RAID synchronization status before proceeding.

    Warning: Proceeding too quickly to the next step may show an error due to foo/rr not being in sync.

    This error occurs if the synchronization fails to complete after resizing the RAID LV above. Check the RAID LV with watch sudo lvs foo/rr and wait for the Cpy%Sync field to reach 100%. Once Cpy%Sync reaches 100%, use Ctrl-c to exit the watch command. See lvresize(8), lvextend(8) and lvreduce(8) man pages for more information.

Increase Stripes on RAID LV

Changing the number of stripes on a RAID LV increases overall capacity, which is possible on RAID 4/5/6/10. Each additional stripe requires an equal number of non-allocated physical volumes (devices) within the volume group.

  1. Check which physical volumes (PV) exist in VG foo.

    sudo pvs

    From the output /dev/sdb, /dev/sdc, /dev/sdd, and /dev/sde are all associated with VG foo.

  2. Determine if there are any unused physical volumes.

    sudo pvdisplay -m /dev/sd[b-e]

    Example Output:

      --- Physical volume ---
      PV Name               /dev/sdb
      VG Name               foo
      PV Size               50.00 GiB / not usable 4.00 MiB
      Allocatable           yes 
      PE Size               4.00 MiB
      Total PE              12799
      Free PE               11944
      Allocated PE          855
      PV UUID               Q1uEMC-0zL1-dgrA-9rIT-1xrA-Vnfr-2E8tJT
       
      --- Physical Segments ---
      Physical extent 0 to 0:
        Logical volume	/dev/foo/rr_rmeta_0
        Logical extents	0 to 0
      Physical extent 1 to 854:
        Logical volume	/dev/foo/rr_rimage_0
        Logical extents	0 to 853
      Physical extent 855 to 12798:
        FREE
    ...

    The pvdisplay command with the -m option shows the mapping of physical extents to logical volumes and logical extents. The PV /dev/sdb in the example output shows physical extents associated with the RAID LV. The same should appear for /dev/sdc, /dev/sdd, and /dev/sde.

  3. Add another PV to the VG.

    As the existing RAID LV uses all the existing physical volumes, add /dev/sdf to the PV foo.

    sudo vgextend foo /dev/sdf

    The output shows the vgextend command converts /dev/sdf to a PV before adding it to the VG foo.

  4. Add a stripe to the RAID LV.

    sudo lvconvert --stripes 4 foo/rr

    Respond with y to the prompt.

    Example Output:

    [oracle@ol-node01 ~]$ sudo lvconvert --stripes 4 foo/rr
      Using default stripe size 64.00 KiB.
      WARNING: Adding stripes to active and open logical volume foo/rr will grow it from 2562 to 3416 extents!
      Run "lvresize -l2562 foo/rr" to shrink it or use the additional capacity.
      Are you sure you want to add 1 images to raid5 LV foo/rr? [y/n]: y
      Logical volume foo/rr successfully converted.
  5. Verify LV's new size.

    sudo lvs foo/rr

    Example Output:

    [oracle@ol-node01 ~]$ sudo lvs foo/rr
      LV   VG  Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
      rr   foo rwi-aor--- 13.34g                                    2.24          

    The capacity (LSize) grew by 3.34g, and the synchronization (Cpy%Sync) began. Synchronization is the process that makes all the devices in a RAID LV consistent with each other, and a full sync becomes necessary when devices in the RAID LV are modified or replaced.

  6. Check the status of the syncronization.

    Run the check until the progress reaches 100%.

    watch sudo lvs foo/rr

    Once Cpy%Sync reaches 100%, use ctrl-c to exit the watch command.

    Other ways to use the watch command include:

    • Run watch -n 5 sudo lvs foo/rr to refresh every 5s instead of the default 2s.
    • Run timeout 60 watch -n 5 sudo lvs foo/rr to automatically exit after 60s.
  7. Show the new segment range and PV, which now includes /dev/sdf.

    sudo lvs -a -o lv_name,attr,segtype,seg_pe_ranges,dataoffset foo

Recover a Failed RAID Device in a LV

RAID arrays can continue to run with failed devices. Removing a device for RAID types other than RAID1 would imply converting to a lower-level RAID (RAID5 to RAID0 in this case).

LVM permits replacing a failed device in a RAID volume in a single step using the lvconvert --repair command for failed devices rather than removing a failed drive and possibly adding a replacement.

  1. Check the current RAID LV layout.

    sudo lvs --all --options name,copy_percent,devices foo
  2. Simulate a failure on /dev/sdd.

    echo 1 | sudo tee /sys/block/sdd/device/delete
  3. After failure, recheck the RAID LV layout.

    sudo lvs --all --options name,copy_percent,devices foo

    Notice the [unknown] devices.

    Example Output:

    [oracle@ol-node01 ~]$ sudo lvs --all --options name,copy_percent,devices foo
      WARNING: Couldn't find device with uuid o1JwCl-DTpi-anww-rYt3-1LCq-vmLV-FQCKyc.
      WARNING: VG foo is missing PV o1JwCl-DTpi-anww-rYt3-1LCq-vmLV-FQCKyc (last written to /dev/sdd).
      LV            Cpy%Sync Devices                                                                   
      rr            100.00   rr_rimage_0(0),rr_rimage_1(0),rr_rimage_2(0),rr_rimage_3(0),rr_rimage_4(0)
      [rr_rimage_0]          /dev/sdb(855)                                                             
      [rr_rimage_0]          /dev/sdb(1)                                                               
      [rr_rimage_1]          /dev/sdc(855)                                                             
      [rr_rimage_1]          /dev/sdc(1)                                                               
      [rr_rimage_2]          [unknown](855)                                                            
      [rr_rimage_2]          [unknown](1)                                                              
      [rr_rimage_3]          /dev/sde(855)                                                             
      [rr_rimage_3]          /dev/sde(1)                                                               
      [rr_rimage_4]          /dev/sdf(855)                                                             
      [rr_rimage_4]          /dev/sdf(1)                                                               
      [rr_rmeta_0]           /dev/sdb(0)                                                               
      [rr_rmeta_1]           /dev/sdc(0)                                                               
      [rr_rmeta_2]           [unknown](0)                                                              
      [rr_rmeta_3]           /dev/sde(0)                                                               
      [rr_rmeta_4]           /dev/sdf(0)        
  4. Replace the failed device.

    sudo lvconvert --repair foo/rr

    Respond with y to the prompt. The command fails to find available space or device to use in the VG.

    Example Output:

    [oracle@ol-node01 ~]$ sudo lvconvert --repair foo/rr
      WARNING: Couldn't find device with uuid o1JwCl-DTpi-anww-rYt3-1LCq-vmLV-FQCKyc.
      WARNING: VG foo is missing PV o1JwCl-DTpi-anww-rYt3-1LCq-vmLV-FQCKyc (last written to /dev/sdd).
      WARNING: Couldn't find device with uuid o1JwCl-DTpi-anww-rYt3-1LCq-vmLV-FQCKyc.
    Attempt to replace failed RAID images (requires full device resync)? [y/n]: y
      Insufficient free space: 856 extents needed, but only 0 available
      Failed to replace faulty devices in foo/rr.

    Warning: If the error contains a "Unable to replace devices in foo/rr while it is not in-sync" message, verify that the RAID-LV is in-sync by running watch sudo lvs foo/rr and confirming Cpy%Sync is 100%. Then try the lvconvert command again.

  5. Add the device /dev/sdg to the VG.

    sudo vgextend foo /dev/sdg

    The WARNING messages in the output are due to the still missing failed drive.

  6. Retry replacing the failed drive.

    sudo lvconvert --repair foo/rr

    Respond again with y to the prompt. The output again shows the WARNING messages about the missing drive, but the command successfully replaced the faulty device in the VG.

  7. Examine the layout.

    sudo lvs --all --options name,copy_percent,devices foo

    Notice /dev/sdg replaced all the [unknown] device entries.

  8. Remove the failed device from the VG.

    LVM utilities will continue reporting that LVM cannot find the failed device until you remove it from the VG.

    sudo vgreduce --removemissing foo

    The WARNING messages in the output are due to the still missing failed drive.

  9. Check the RAID synchronization status before proceeding.

    Warning: Proceeding too quickly to the next section may show the following error message:

    Example Output:

    [oracle@ol-node01 ~]$ sudo lvchange --syncaction check foo/rr
      foo/rr state is currently "recover". Unable to switch to "check".

    This error occurs if the synchronization fails to complete after adding stripes to the RAID LV. Check the RAID LV with watch sudo lvs foo/rr and wait for the Cpy%Sync field to reach 100%.

Check Data Coherency in RAID LV (Scrubbing)

LVM provides the ability to scrub a RAID LV, which reads all the data and parity blocks in an array and checks for coherency.

  1. Initiate a scrub in checking mode.

    sudo lvchange --syncaction check foo/rr
  2. Show the status of the scrubbing action.

    watch sudo lvs -a -o name,raid_sync_action,sync_percent foo/rr

    Example Output:

    [oracle@ol-node01 ~]$ sudo lvs -a -o name,raid_sync_action,sync_percent foo/rr
      LV   SyncAction Cpy%Sync
      rr   check      30.08   
  3. After scrubbing (syncronization) is complete, display the number of inconsistent blocks found.

    sudo lvs -o +raid_sync_action,raid_mismatch_count foo/rr

    The raid_sync_action option displays the SyncAction field with one of the following values:

    • idle: All actions complete.
    • resync: Initializing or Recovering after a system failure.
    • recover: Replacing a device in the array.
    • check: Looking for differences.
    • repair: Looking and repairing differences.

    Example Output:

    [oracle@ol-node01 ~]$ lvs -o +raid_sync_action,raid_mismatch_count foo/rr
      LV   VG  Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert SyncAction Mismatches
      rr   foo rwi-aor--- 13.34g                                    44.42            check               0

    The output shows 0 inconsistencies (Mismatches).

  4. (Optional) Fix the differences in the array.

    This step is optional as no differences likely exist in this sample array.

    sudo lvchange --syncaction repair foo/rr
  5. (Optional) Check the status of the repair.

    sudo lvs -o +raid_sync_action,raid_mismatch_count foo/rr

    Notice the SyncAction field changed to repair. See the lvchange(8) and lvmraid(7) man pages for more information.

Next Steps

You should now be able to create and resize a RAID logical volume while also being able to recover a failed RAID device. Check out our other content on the Oracle Linux Training Station.

SSR