LVM-Thin Problem – always xfs_repair to up Operating System
What problem/issue/behavior are you having trouble with? What do you expect to see? - can you give the solution for this ? - any related this issue with lvm-thin provisioning ? if any what your suggestion ? - or this is a bug on rhel 7.6 ? Where are you experiencing the behavior? What environment? the hang is gone When does the behavior occur? Frequency? Repeatedly? At certain times? the frequency regarding this issue, i think is every week What information can you provide around timeframes and the business impact? impact to promote in production server
o Server details:
System:
Mfr: VMware, Inc.
Prod: VMware Virtual Platform
o OS details
Hostname: idcbpjnksapp001
Distro: [redhat-release] Red Hat Enterprise Linux Server release 7.6 (Maipo)
Booted kernel: 3.10.0-957.el7.x86_64
GRUB default: 3.10.0-957.el7.x86_64
o Logs
Before reboots there are error messages related to 'dm-3'
Jun 7 21:11:48 idcbpjnksapp001 kernel: buffer_io_error: 4856 callbacks suppressed
Jun 7 21:11:48 idcbpjnksapp001 kernel: Buffer I/O error on dev dm-3, logical block 9897400, lost async page write
Jun 7 21:11:48 idcbpjnksapp001 kernel: Buffer I/O error on dev dm-3, logical block 9897401, lost async page write
Jun 7 21:11:48 idcbpjnksapp001 kernel: Buffer I/O error on dev dm-3, logical block 9897402, lost async page write
[..]
Jun 8 09:16:45 idcbpjnksapp001 kernel: XFS: Failing async write: 4490 callbacks suppressed
Jun 8 09:16:45 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0x4b82db8. Retrying async write.
Jun 8 09:16:45 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0x4b8d3e0. Retrying async write.
Jun 8 09:16:45 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0x4b8d3c0. Retrying async write.
Jun 8 09:16:45 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0x4b82db8. Retrying async write.
Jun 8 09:16:45 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0x4b8d3e0. Retrying async write.
Jun 8 09:16:45 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0x4b8d3c0. Retrying async write.
Jun 8 09:16:45 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0x4b82db8. Retrying async write.
Jun 8 09:16:45 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0x4b8d3e0. Retrying async write.
Jun 8 09:16:45 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0x4b8d3c0. Retrying async write.
Jun 8 09:16:45 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0x4b82db8. Retrying async write.
Jun 8 09:16:48 idcbpjnksapp001 kernel: XFS (dm-3): metadata I/O error: block 0x4b82db8 ("xfs_buf_iodone_callback_error") error 5 numblks 8
Jun 8 09:16:53 idcbpjnksapp001 kernel: XFS (dm-3): metadata I/O error: block 0x4b82db8 ("xfs_buf_iodone_callback_error") error 5 numblks 8
[...Reboot...]
Jun 8 09:21:01 idcbpjnksapp001 kernel: Linux version 3.10.0-957.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Thu Oct 4 20:48:51 UTC 2018
Jun 12 19:19:57 idcbpjnksapp001 container-storage-setup: WARNING: /dev/rhel/root: Thin's thin-pool needs inspection.
Jun 12 19:19:57 idcbpjnksapp001 container-storage-setup: WARNING: /dev/rhel/swap: Thin's thin-pool needs inspection.
Jun 12 19:19:57 idcbpjnksapp001 container-storage-setup: WARNING: /dev/rhel/home: Thin's thin-pool needs inspection.
Jun 12 19:19:57 idcbpjnksapp001 container-storage-setup: WARNING: /dev/rhel/var: Thin's thin-pool needs inspection.
Jun 12 19:19:57 idcbpjnksapp001 container-storage-setup: WARNING: /dev/rhel/root: Thin's thin-pool needs inspection.
Jun 12 19:19:57 idcbpjnksapp001 container-storage-setup: WARNING: /dev/rhel/swap: Thin's thin-pool needs inspection.
Jun 12 19:19:57 idcbpjnksapp001 container-storage-setup: WARNING: /dev/rhel/home: Thin's thin-pool needs inspection.
Jun 12 19:19:57 idcbpjnksapp001 container-storage-setup: WARNING: /dev/rhel/var: Thin's thin-pool needs inspection.
Jun 12 19:20:23 idcbpjnksapp001 kernel: XFS (dm-3): metadata I/O error: block 0xc2710 ("xfs_buf_iodone_callback_error") error 5 numblks 8
Jun 12 19:20:28 idcbpjnksapp001 kernel: XFS (dm-3): metadata I/O error: block 0xc2710 ("xfs_buf_iodone_callback_error") error 5 numblks 8
Jun 12 20:12:19 idcbpjnksapp001 kernel: XFS: Failing async write: 2989 callbacks suppressed
Jun 12 20:12:19 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0xc2710. Retrying async write.
Jun 12 20:12:19 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0xc2710. Retrying async write.
Jun 12 20:12:19 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0xc2710. Retrying async write.
Jun 12 20:12:19 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0xc2710. Retrying async write.
Jun 12 20:12:19 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0xc2710. Retrying async write.
Jun 12 20:12:19 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0xc2710. Retrying async write.
Jun 12 20:12:19 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0xc2710. Retrying async write.
Jun 12 20:12:19 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0xc2710. Retrying async write.
Jun 12 20:12:19 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0xc2710. Retrying async write.
Jun 12 20:12:19 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0xc2710. Retrying async write.
Jun 12 20:12:20 idcbpjnksapp001 kernel: XFS (dm-3): metadata I/O error: block 0xc2710 ("xfs_buf_iodone_callback_error") error 5 numblks 8
[...Reboot...]
Jun 12 20:14:29 idcbpjnksapp001 kernel: Linux version 3.10.0-957.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Thu Oct 4 20:48:51 UTC 2018
o Latest logs
Jun 12 19:19:56 idcbpjnksapp001 kernel: buffer_io_error: 323 callbacks suppressed
Jun 12 19:19:56 idcbpjnksapp001 kernel: Buffer I/O error on dev dm-3, logical block 77082, lost async page write <---
Jun 12 19:19:56 idcbpjnksapp001 kernel: Buffer I/O error on dev dm-3, logical block 77083, lost async page write
Jun 12 19:19:56 idcbpjnksapp001 kernel: Buffer I/O error on dev dm-3, logical block 77084, lost async page write
Jun 12 20:21:33 idcbpjnksapp001 kernel: device-mapper: thin: No free metadata blocks
Jun 12 20:21:33 idcbpjnksapp001 kernel: device-mapper: thin: 253:2: switching pool to read-only mode <---
Jun 12 20:21:33 idcbpjnksapp001 kernel: device-mapper: thin: 253:2: metadata operation 'dm_pool_commit_metadata' failed: error = -1
Jun 12 20:21:33 idcbpjnksapp001 kernel: device-mapper: thin: 253:2: aborting current metadata transaction <---
o LVM status
$ cat sos_commands/lvm2/lvs_-a_-o_lv_tags_devices_--config_global_locking_type_0
WARNING: Locking disabled. Be careful! This could corrupt your metadata.
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert LV Tags Devices
home rhel Vwi-aotz-- 8.00g pool00 84.52
[lvol0_pmspare] rhel ewi------- 28.00m /dev/sda3(0)
pool00 rhel twi-cotzM- 91.55g 55.14 100.00 pool00_tdata(0) <--- Full thinpool,writable,inherited,-,check needed,(o)pen,(t)hin,(z)eroes,(M)etadata read only
[pool00_tdata] rhel Twi-ao---- 91.55g /dev/sda3(7)
[pool00_tmeta] rhel ewi-ao---- 28.00m /dev/sda3(23445)
root rhel Vwi-aotz-- <49.95g pool00 52.15
swap rhel Vwi-aotz-- 25.60g pool00 29.82
var rhel Vwi-aotz-- 10.00g pool00 100.00 <--- Full
u01lv u01vg -wi-ao---- 249.00g /dev/sdb(0)
u01lv u01vg -wi-ao---- 249.00g /dev/sdd(0)
u02lv u02vg -wi-ao---- 190.25g /dev/sdc(0)
$ cat sos_commands/lvm2/pvs_-a_-v_-o_pv_mda_free_pv_mda_size_pv_mda_count_pv_mda_used_count_pe_start_--config_global_locking_type_0
Reloading config files
WARNING: Locking disabled. Be careful! This could corrupt your metadata.
WARNING: /dev/rhel/root: Thin's thin-pool needs inspection.
WARNING: /dev/rhel/swap: Thin's thin-pool needs inspection.
WARNING: /dev/rhel/home: Thin's thin-pool needs inspection.
WARNING: /dev/rhel/var: Thin's thin-pool needs inspection.
PV VG Fmt Attr PSize PFree DevSize PV UUID PMdaFree PMdaSize #PMda #PMdaUse 1st PE
/dev/sda1 --- 0 0 2.00m 0 0 0 0 0
/dev/sda2 --- 0 0 500.00m 0 0 0 0 0
/dev/sda3 rhel lvm2 a-- <114.52g 22.91g 114.52g AsIdPe-ohEQ-w0po-yR1A-Y1im-CgJX-mBbn2Y 0 1020.00k 1 1 1.00m <-- rhel VG still has 22+g space
/dev/sdb u01vg lvm2 a-- 49.75g 0 50.00g B5CWf8-yBdU-0KFz-GR6E-GFhj-Ft7x-Yf5oNM 0 1020.00k 1 1 1.00m
/dev/sdc u02vg lvm2 a-- 199.75g 9.50g 200.00g hjqcqe-vMNI-STUH-vAz1-4e2t-Pv7G-zkCzNz 0 1020.00k 1 1 1.00m
/dev/sdd u01vg lvm2 a-- 199.75g 512.00m 200.00g ud9IlR-1nZT-kzi5-S1dD-v8e5-HYYi-7IaDF5 0 1020.00k 1 1 1.00m
/dev/u01vg/u01lv --- 0 0 249.00g 0 0 0 0 0
/dev/u02vg/u02lv --- 0 0 190.25g 0 0 0 0 0
Reloading config files
o df status of 'var'
/dev/mapper/rhel-var 10475520 6944108 3531412 67% /var
Action Plan:
-----------
[1] Extending thinpool lvm tmeta
# lvextend --poolmetadatasize +1000M rhel/pool00
# lvs -ao+devices
[2] You may want to run
[A] fstrim to discard unused blocks on a mounted filesystem
# fstrim /var
OR
[B] Extend thin lv related to 'var' if you want to give more than 10g to var
[a] Execute below command to thinlv 'var'
# lvextend -L+100M rhel/var
[b] Extend 'xfs' file system on '/var' using https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/storage_administration_guide/xfsgrow
[c] Check status
# lvs -ao+devices
[3] If "M" attribute seen after extending pool00 meta data.
pool00 rhel twi-cotzM- 91.55g 55.14 100.00 pool00_tdata(0)
# lvchange --refresh rhel/pool00