XFS filesystems

The XFS filesystem may be used as an alternative to ext3 when large disks (>1TB) and/or large files are to be used. XFS filesystems can be up to 8 Exabytes in size.

XFS was created by Silicon Graphics in 1993 for their IRIX OS, but is available in Linux as well. See the Wikipedia article on XFS.

XFS documentation

There is an XFS information page containing an XFS_FAQ.

The developers at SGI have a Developer Central Open Source page at http://oss.sgi.com/projects/xfs/.

There are some useful remarks on the page http://www.mythtv.org/wiki/XFS_Filesystem.

XFS on CentOS

XFS is included in the CentOS Extras repository. Please read the CentOS Known Issues regarding XFS restrictions.

You first have to install XFS packages:

yum install xfsprogs

You create an XFS filesystem on a disk device by the command:

mkfs.xfs

see man mkfs.xfs.

The commands to backup and restore XFS filesystems are installed by:

yum install xfsdump

Label disk for large filesystems

Before using the disk a suitable disk label must be created using the parted command. We assume below that the disk device is named /dev/sdb.

Create an XFS primary partition (see the parted manual page) with a GPT label with an on the device:

# parted /dev/sdb
 (parted) mklabel gpt          # Makes a "GPT" label permitting large filesystems
 (parted) print
 (parted) mkpart primary xfs 0 100%   # Allocate 100% of the partition
 (parted) set 1 lvm on         # Set LVM flag
 (parted) quit

Here we have set the LVM flag on the partition. The new disk partition no. 1 will be named /dev/sdb1.

Test a disk with errors in disk label

If the disk label (as shown by parted) has become missing or defective due to hardware errors, you can search the disk for backup partition tables.

Install this tool from EPEL:

yum install testdisk

There is a testdisk homepage.

Use LVM to control disks

Normally the system-config-lvm tool is used to manage LVM disks and filesystems. Unfortunately, this tool does not understand XFS filesystems, so we have to use manual LVM commands.

Initialize the disk for LVM and create a new volume group:

pvcreate /dev/sdc1
vgcreate vgxfs /dev/sdc1

Now create an LVM logical volume using 100% of the disk space:

lvcreate -n lvxfs -l 100%VG vgxfs

Now you have this logical volume available:

# lvdisplay /dev/mapper/vgxfs-lvxfs
--- Logical volume ---
LV Name                /dev/vgxfs/lvxfs
VG Name                vgxfs
LV UUID                yKMgnS-Pe57-0MBV-fHmf-TBHY-mCAd-ZSsLh3
LV Write Access        read/write
LV Status              available
# open                 0
LV Size                6.82 TB
Current LE             1788431
Segments               1
Allocation             inherit
Read ahead sectors     auto
- currently set to     256
Block device           253:3

Adding LVM disks to the system

Useful tools for working with LVM disks added to a running system:

pvscan - scan all disks for physical volumes
pvdisplay - display attributes of a physical volume
vgscan - scan all disks for volume groups and rebuild caches
vgdisplay - display attributes of volume groups
lvscan - scan (all disks) for logical volumes
lvdisplay - display attributes of a logical volume

To activate an LVM volume that has been added to the system run:

vgchange -a y

Display the available space on a physical volume:

pvdisplay -s

Striping an LVM volume across multiple disks

If you create a logical volume on multiple physical disks (or disk shelves on a RAID controller), you can stripe the volume across disks for increased performance using the lvcreate flags -i and -I (see man lvcreate). For example, to stripe across 2 disks with stripe size 256 kbytes (must be a power of 2):

lvcreate -n lvxfs -l 100%VG vgxfs -i 2 -I 256

To display striping information about the LVM volumes in a volume group:

lvs --segments vgxfs

Create XFS filesystem

If your disk is a multi-disk RAID device, please read the section on performance optimization for striped volumes below. If you just have a simple disk or a mirrored disk, you can now create a filesystem on the new partition:

# mkfs.xfs /dev/mapper/vgxfs-lvxfs
meta-data=/dev/mapper/vgxfs-lvxfs isize=256    agcount=32, agsize=57229792 blks
         =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=1831353344, imaxpct=25
         =                       sunit=0      swidth=0 blks, unwritten=1
naming   =version 2              bsize=4096
log      =internal log           bsize=4096   blocks=32768, version=1
         =                       sectsz=512   sunit=0 blks, lazy-count=0
realtime =none                   extsz=4096   blocks=0, rtextents=0

Mount the filesystem:

mkdir -p /u3/raid
mount /dev/mapper/vgxfs-lvxfs /u3/raid

Checking an XFS filesystem

You may have to check the sanity of an XFS file system:

xfs_repair -n

Mount large XFS filesystems with inode64 option

XFS allocates inodes to reflect their on-disk location by default. However, because some 32-bit userspace applications are not compatible with inode numbers greater than 232, XFS will allocate all inodes in disk locations which result in 32-bit inode numbers. This can lead to decreased performance on very large filesystems (that is, larger than 2 terabytes), because inodes are skewed to the beginning of the block device, while data is skewed towards the end. To address this, use the inode64 mount option. This option configures XFS to allocate inodes and data across the entire file system, which can improve performance:

mount -o inode64 /dev/device /mount/point

See XFS_inode64 and 8.2. Mounting an XFS File System.

Mounting with inode64 may be the solution, but then one cannot revert this mount option later for kernels < 2.6.35!

There are also some warnings about applications and NFS when using XFS_inode64: https://hpc.uni.lu/blog/2014/xfs-and-inode64/

XFS performance optimization

Dell has a Guide on the page Dell HPC NFS Storage Solutions (NSS) which describes system performance tuning of a RHEL server with XFS filesystem.

On RHEL6 there is a utility tuned which may be used to optimize system settings.

  • If relevant, create a striped LVM volume over multiple disk shelves with correct number of stripes and a suitable stripesize, for example:

    lvcreate -n lvxfs -l 100%VG vgxfs -i 2 -I 4096
    
  • Choose XFS stripe units consistent with the underlying disks, for example:

    mkfs.xfs -d su=256k,sw=20 /dev/mapper/vgxfs-lvxfs
    

See man mkfs.xfs about the su,sw parameters, and see below how to determine the stripe size.

Finally, check the filesystem parameters with:

xfs_info /dev/mapper/vgxfs-lvxfs

Stripe size

The logical volume stripe size used by the HP SmartArray controllers may be viewed by the local script:

/root/smartshow -l

Look for Strip size (which may vary), or identify somehow the correct PCI slot (for example, with lspci) and do:

/usr/sbin/hpacucli controller slot=4 logicaldrive all show detail

Mounting NFS volumes

There are some articles on the net discussing performance tuning of the XFS filesystem:

So it may be a good idea to disable file access time logging noatime,nodiratime and inrease the number of log buffers logbufs=X with the mount options in /etc/fstab:

/dev/mapper/vgxfs-lvxfs /u3/raid  xfs  defaults,quota,noatime,nodiratime,logbufs=8,nosuid,nodev  1 2

XFS quotas

Quotas are administered differently in XFS, see the man xfs_quota section QUOTA ADMINISTRATION. Quotas are enabled by default, provided the filesystem is mounted with the quotas option in /etc/fstab:

defaults,quota,...

The xfs_quota administrative commands require the -x expert flag.

To set the default user quotas to 100/120 GB on the /u2/raid disk use the -d flag:

xfs_quota -x -c "limit bsoft=100g bhard=120g isoft=100000 ihard=120000 -d" /u2/raid

To set a specific user’s quota:

xfs_quota -x -c "limit bsoft=240g bhard=300g isoft=100000 ihard=120000 abild" /u2/raid

Note that a user’s quota doesn’t get updated until his files are modified.

To list the current disk quotas in “human” units:

xfs_quota -x -c "report -h" /u2/raid

Extend a Logical Volume and XFS filesystem

To add additional disks to a logical volume the procedure is as follows. We assume that a new disk /dev/sdc1 is available.

Initialize the disk for LVM and add it to the volume group:

pvcreate /dev/sdc1
vgextend vgxfs /dev/sdc1

Now that the Volume Group has available free space, the easiest way to extend the Logical Volume and XFS filesystem is using the GUI:

system-config-lvm

If you prefer to do this with manual commands, here is an example: Add 80% of the newly added disk to the previously created Logical Volume:

lvextend -l +80%FREE /dev/mapper/vgxfs-lvxfs

Use the flag lvextend -r to resize the filesystem automatically.

The XFS filesystem can now be extended to occupy all of the available disk space added above. Assume that the XFS filesystem is mounted on /u3/raid, then it is extended to occupy all available free disk space by:

# xfs_growfs /u3/raid
meta-data=/dev/mapper/vgxfs-lvxfs isize=256    agcount=32, agsize=28614880 blks
         =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=915676160, imaxpct=25
         =                       sunit=0      swidth=0 blks, unwritten=1
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=32768, version=1
         =                       sectsz=512   sunit=0 blks, lazy-count=0
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 915676160 to 1831352320

Shrinking an XFS filesystem ?

If you would like to shrink an XFS filesystem, this is not possible as explained in this XFS_FAQ: Is there a way to make a XFS filesystem larger or smaller?.

Remove a disk from a volume group

If you want to remove a physical vole (disk) from a volume group, see 4.3. Volume Group Administration, section 4.3.7. Removing Physical Volumes from a Volume Group.

For example, if you have 2 disks D and E in a volume group vgtest, migrate all data from disk D to disk E:

pvmove /dev/sdd1 /dev/sde1

This operation may take several hours!

Make sure that disk D is actually empty (PV Status = available):

pvdisplay /dev/sdd1

Then remove the empty disk D from the volume group vgtest:

vgreduce vgtest /dev/sdd1

Extend an ext3 filesystem

Like an XFS filesystem, an ext3 filesystem can also be extended on the fly. First you must enlarge the logical volume as above. Then you can extend the ext3 filesystem to the available size of the partition:

resize2fs -f /dev/VolGroup00/LogVol00

An optional size parameter can also be specified, see the man-page.

XFS performance measurement

We can measure the filesystem write performance by creating a file full of zeroes.

Hardware config for this measurement:

  • HP DL380 G5 server with a P800 Smart Array controller (512 MB RAM).

  • MSA60 disk shelf with 12 SATA disks 750 GB in a RAID-6 configuration.

We run this command to create a file of size 10 GB:

time dd if=/dev/zero of=temp bs=1024k count=10240

We obtain these timings from the dd command:

  • XFS filesystem: 10737418240 bytes (11 GB) copied, 55.6954 seconds, 193 MB/s

  • ext3 filesystem: 10737418240 bytes (11 GB) copied, 104.275 seconds, 103 MB/s

XFS fragmentation

There is some advice on XFS fragmentation. For example the allocsize mount option may be useful.

Run the following command to determine how fragmented your filesystem is:

xfs_db -c frag -r /dev/mapper/vgxfs-lvxfs

For files which are already heavily fragmented, the xfs_fsr command (from the xfsdump package) can be used to defragment individual files, or an entire filesystem:

xfs_fsr /dev/mapper/vgxfs-lvxfs

XFS filesystem gives “No space left on device” errors

This error message may be given even when there is plenty of free disk space. The reason is a problem with all inodes being located below 1 TB disk size, see this thread and the XFS FAQ on No space left and the XFS_FAQ on XFS_inode64.

There is a Red Hat Knowledge base article about this.

Mounting with inode64 may be the solution, but then one cannot revert this mount option later for kernels < 2.6.35!

Listing LVM disk segments and stripes

There is a useful LVM summary page cheat sheet on LVM using Linux.

To list the LVM logical volumes in one or more physical volumes:

pvdisplay -m

where the -m flag means display the mapping of physical extents to logical volumes and logical extents.

To list the LVM disk segments used:

pvs -v --segments

To list the number of disk segments, as well as the stripes of an LVM volume group use the lvs command:

# lvs -v --segments vgxfs
Using logical volume(s) on command line
LV     VG    Attr       Start  SSize    #Str Type    Stripe  Chunk
lvxfs2 vgxfs -wi-ao----     0    23.44t    4 striped 512.00k    0
lvxfs2 vgxfs -wi-ao---- 23.44t   12.84t    4 striped 512.00k    0
lvxfs2 vgxfs -wi-ao---- 36.27t    1.34t    4 striped 512.00k    0
lvxfs2 vgxfs -wi-ao---- 37.62t    1.45t    4 striped 512.00k    0
lvxfs3 vgxfs -wi-ao----     0     9.77t    1 linear       0     0
lvxfs4 vgxfs -wi-ao----     0     6.61t    1 linear       0     0
lvxfs4 vgxfs -wi-ao----  6.61t    1.20t    1 linear       0     0
lvxfs4 vgxfs -wi-ao----  7.81t    2.93t    1 linear       0     0
lvxfs4 vgxfs -wi-ao---- 10.74t 1000.00g    1 linear       0     0
lvxfs6 vgxfs -wi-ao----     0     4.88t    4 striped 512.00k    0
lvxfs7 vgxfs -wi-ao----     0     8.43t    1 linear       0     0
lvxfs7 vgxfs -wi-ao----  8.43t    2.31t    1 linear       0     0

This example shows how many physical disk volume stripes are being used by the logical volumes.