LVM-Thin Problem – always xfs_repair to up Operating System

What problem/issue/behavior are you having trouble with?  What do you expect to see?
- can you give the solution for this ? 
- any related this issue with lvm-thin provisioning ? if any what your suggestion ?
- or this is a bug on rhel 7.6 ?

Where are you experiencing the behavior? What environment?
the hang is gone 

When does the behavior occur? Frequency? Repeatedly? At certain times?
the frequency regarding this issue, i think is every week 

What information can you provide around timeframes and the business impact?
impact to promote in production server

o Server details:

 System:
    Mfr:  VMware, Inc.
    Prod: VMware Virtual Platform


o OS details
  Hostname: idcbpjnksapp001
  Distro:   [redhat-release] Red Hat Enterprise Linux Server release 7.6 (Maipo)
    Booted kernel:  3.10.0-957.el7.x86_64
    GRUB default:   3.10.0-957.el7.x86_64  

o Logs

Before reboots there are error messages related to 'dm-3' 

Jun  7 21:11:48 idcbpjnksapp001 kernel: buffer_io_error: 4856 callbacks suppressed
Jun  7 21:11:48 idcbpjnksapp001 kernel: Buffer I/O error on dev dm-3, logical block 9897400, lost async page write
Jun  7 21:11:48 idcbpjnksapp001 kernel: Buffer I/O error on dev dm-3, logical block 9897401, lost async page write
Jun  7 21:11:48 idcbpjnksapp001 kernel: Buffer I/O error on dev dm-3, logical block 9897402, lost async page write

[..]

Jun  8 09:16:45 idcbpjnksapp001 kernel: XFS: Failing async write: 4490 callbacks suppressed
Jun  8 09:16:45 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0x4b82db8. Retrying async write.
Jun  8 09:16:45 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0x4b8d3e0. Retrying async write.
Jun  8 09:16:45 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0x4b8d3c0. Retrying async write.
Jun  8 09:16:45 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0x4b82db8. Retrying async write.
Jun  8 09:16:45 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0x4b8d3e0. Retrying async write.
Jun  8 09:16:45 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0x4b8d3c0. Retrying async write.
Jun  8 09:16:45 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0x4b82db8. Retrying async write.
Jun  8 09:16:45 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0x4b8d3e0. Retrying async write.
Jun  8 09:16:45 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0x4b8d3c0. Retrying async write.
Jun  8 09:16:45 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0x4b82db8. Retrying async write.
Jun  8 09:16:48 idcbpjnksapp001 kernel: XFS (dm-3): metadata I/O error: block 0x4b82db8 ("xfs_buf_iodone_callback_error") error 5 numblks 8
Jun  8 09:16:53 idcbpjnksapp001 kernel: XFS (dm-3): metadata I/O error: block 0x4b82db8 ("xfs_buf_iodone_callback_error") error 5 numblks 8

[...Reboot...]

Jun  8 09:21:01 idcbpjnksapp001 kernel: Linux version 3.10.0-957.el7.x86_64 (mockbuild@x86-040.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Thu Oct 4 20:48:51 UTC 2018


Jun 12 19:19:57 idcbpjnksapp001 container-storage-setup: WARNING: /dev/rhel/root: Thin's thin-pool needs inspection.
Jun 12 19:19:57 idcbpjnksapp001 container-storage-setup: WARNING: /dev/rhel/swap: Thin's thin-pool needs inspection.
Jun 12 19:19:57 idcbpjnksapp001 container-storage-setup: WARNING: /dev/rhel/home: Thin's thin-pool needs inspection.
Jun 12 19:19:57 idcbpjnksapp001 container-storage-setup: WARNING: /dev/rhel/var: Thin's thin-pool needs inspection.

Jun 12 19:19:57 idcbpjnksapp001 container-storage-setup: WARNING: /dev/rhel/root: Thin's thin-pool needs inspection.
Jun 12 19:19:57 idcbpjnksapp001 container-storage-setup: WARNING: /dev/rhel/swap: Thin's thin-pool needs inspection.
Jun 12 19:19:57 idcbpjnksapp001 container-storage-setup: WARNING: /dev/rhel/home: Thin's thin-pool needs inspection.
Jun 12 19:19:57 idcbpjnksapp001 container-storage-setup: WARNING: /dev/rhel/var: Thin's thin-pool needs inspection.



Jun 12 19:20:23 idcbpjnksapp001 kernel: XFS (dm-3): metadata I/O error: block 0xc2710 ("xfs_buf_iodone_callback_error") error 5 numblks 8
Jun 12 19:20:28 idcbpjnksapp001 kernel: XFS (dm-3): metadata I/O error: block 0xc2710 ("xfs_buf_iodone_callback_error") error 5 numblks 8



Jun 12 20:12:19 idcbpjnksapp001 kernel: XFS: Failing async write: 2989 callbacks suppressed
Jun 12 20:12:19 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0xc2710. Retrying async write.
Jun 12 20:12:19 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0xc2710. Retrying async write.
Jun 12 20:12:19 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0xc2710. Retrying async write.
Jun 12 20:12:19 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0xc2710. Retrying async write.
Jun 12 20:12:19 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0xc2710. Retrying async write.
Jun 12 20:12:19 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0xc2710. Retrying async write.
Jun 12 20:12:19 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0xc2710. Retrying async write.
Jun 12 20:12:19 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0xc2710. Retrying async write.
Jun 12 20:12:19 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0xc2710. Retrying async write.
Jun 12 20:12:19 idcbpjnksapp001 kernel: XFS (dm-3): Failing async write on buffer block 0xc2710. Retrying async write.
Jun 12 20:12:20 idcbpjnksapp001 kernel: XFS (dm-3): metadata I/O error: block 0xc2710 ("xfs_buf_iodone_callback_error") error 5 numblks 8

[...Reboot...]

Jun 12 20:14:29 idcbpjnksapp001 kernel: Linux version 3.10.0-957.el7.x86_64 (mockbuild@x86-040.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Thu Oct 4 20:48:51 UTC 2018


o Latest logs

Jun 12 19:19:56 idcbpjnksapp001 kernel: buffer_io_error: 323 callbacks suppressed
Jun 12 19:19:56 idcbpjnksapp001 kernel: Buffer I/O error on dev dm-3, logical block 77082, lost async page write	<---
Jun 12 19:19:56 idcbpjnksapp001 kernel: Buffer I/O error on dev dm-3, logical block 77083, lost async page write
Jun 12 19:19:56 idcbpjnksapp001 kernel: Buffer I/O error on dev dm-3, logical block 77084, lost async page write


Jun 12 20:21:33 idcbpjnksapp001 kernel: device-mapper: thin: No free metadata blocks
Jun 12 20:21:33 idcbpjnksapp001 kernel: device-mapper: thin: 253:2: switching pool to read-only mode		<---
Jun 12 20:21:33 idcbpjnksapp001 kernel: device-mapper: thin: 253:2: metadata operation 'dm_pool_commit_metadata' failed: error = -1
Jun 12 20:21:33 idcbpjnksapp001 kernel: device-mapper: thin: 253:2: aborting current metadata transaction	<---


o LVM status

$ cat sos_commands/lvm2/lvs_-a_-o_lv_tags_devices_--config_global_locking_type_0 
  WARNING: Locking disabled. Be careful! This could corrupt your metadata.
  LV              VG    Attr       LSize   Pool   Origin Data%  Meta%  Move Log Cpy%Sync Convert LV Tags Devices         
  home            rhel  Vwi-aotz--   8.00g pool00        84.52                                                           
  [lvol0_pmspare] rhel  ewi-------  28.00m                                                               /dev/sda3(0)    
  pool00          rhel  twi-cotzM-  91.55g               55.14  100.00                                   pool00_tdata(0) <--- Full thinpool,writable,inherited,-,check needed,(o)pen,(t)hin,(z)eroes,(M)etadata read only

  [pool00_tdata]  rhel  Twi-ao----  91.55g                                                               /dev/sda3(7)    
  [pool00_tmeta]  rhel  ewi-ao----  28.00m                                                               /dev/sda3(23445)
  root            rhel  Vwi-aotz-- <49.95g pool00        52.15                                                           
  swap            rhel  Vwi-aotz--  25.60g pool00        29.82                                                           
  var             rhel  Vwi-aotz--  10.00g pool00        100.00                                                          <--- Full
  u01lv           u01vg -wi-ao---- 249.00g                                                               /dev/sdb(0)     
  u01lv           u01vg -wi-ao---- 249.00g                                                               /dev/sdd(0)     
  u02lv           u02vg -wi-ao---- 190.25g                                                               /dev/sdc(0)  

$ cat sos_commands/lvm2/pvs_-a_-v_-o_pv_mda_free_pv_mda_size_pv_mda_count_pv_mda_used_count_pe_start_--config_global_locking_type_0 
    Reloading config files
  WARNING: Locking disabled. Be careful! This could corrupt your metadata.
  WARNING: /dev/rhel/root: Thin's thin-pool needs inspection.
  WARNING: /dev/rhel/swap: Thin's thin-pool needs inspection.
  WARNING: /dev/rhel/home: Thin's thin-pool needs inspection.
  WARNING: /dev/rhel/var: Thin's thin-pool needs inspection.
  PV               VG    Fmt  Attr PSize    PFree   DevSize PV UUID                                PMdaFree  PMdaSize  #PMda #PMdaUse 1st PE 
  /dev/sda1                   ---        0       0    2.00m                                               0         0      0        0      0 
  /dev/sda2                   ---        0       0  500.00m                                               0         0      0        0      0 
  /dev/sda3        rhel  lvm2 a--  <114.52g  22.91g 114.52g AsIdPe-ohEQ-w0po-yR1A-Y1im-CgJX-mBbn2Y        0   1020.00k     1        1   1.00m <-- rhel VG still has 22+g space
  /dev/sdb         u01vg lvm2 a--    49.75g      0   50.00g B5CWf8-yBdU-0KFz-GR6E-GFhj-Ft7x-Yf5oNM        0   1020.00k     1        1   1.00m
  /dev/sdc         u02vg lvm2 a--   199.75g   9.50g 200.00g hjqcqe-vMNI-STUH-vAz1-4e2t-Pv7G-zkCzNz        0   1020.00k     1        1   1.00m
  /dev/sdd         u01vg lvm2 a--   199.75g 512.00m 200.00g ud9IlR-1nZT-kzi5-S1dD-v8e5-HYYi-7IaDF5        0   1020.00k     1        1   1.00m
  /dev/u01vg/u01lv            ---        0       0  249.00g                                               0         0      0        0      0 
  /dev/u02vg/u02lv            ---        0       0  190.25g                                               0         0      0        0      0 
    Reloading config files


o df status of 'var'

/dev/mapper/rhel-var     10475520   6944108   3531412  67% /var


Action Plan:
-----------


[1] Extending thinpool lvm tmeta

    # lvextend --poolmetadatasize +1000M rhel/pool00
    # lvs -ao+devices

[2] You may want to run

    [A] fstrim to discard unused blocks on a mounted filesystem

        # fstrim /var

    OR

    [B] Extend thin lv related to 'var' if you want to give more than 10g to var

      [a] Execute below command to thinlv 'var'

          # lvextend -L+100M rhel/var
    
      [b] Extend 'xfs' file system on '/var' using https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/storage_administration_guide/xfsgrow


      [c] Check status

         # lvs -ao+devices

[3] If "M" attribute seen after extending pool00 meta data.

    pool00          rhel  twi-cotzM-  91.55g               55.14  100.00                                   pool00_tdata(0)

    # lvchange --refresh rhel/pool00