KVM HDD Issues

Munzy · Apr 21, 2015

Alright, so I am getting this in my syslog....

[75120.290408] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[75120.291355] rs:main Q:Reg D ffff88001fd12780 0 1880 1 0x00000000
[75120.291358] ffff88001f0ff8c0 0000000000000086 ffff880000000000 ffff88001dd5a180
[75120.291361] 0000000000012780 ffff88001c627fd8 ffff88001c627fd8 ffff88001f0ff8c0
[75120.291364] 0000000000000246 0000000181350ef1 ffff88001ffc5cd8 ffff88000d9d8438
[75120.291367] Call Trace:
[75120.291371] [<ffffffffa00fa672>] ? do_get_write_access+0x1ad/0x36a [jbd2]
[75120.291374] [<ffffffff8105fe2d>] ? autoremove_wake_function+0x2a/0x2a
[75120.291382] [<ffffffffa011363b>] ? ext4_dirty_inode+0x2a/0x45 [ext4]
[75120.291386] [<ffffffffa00fa923>] ? jbd2_journal_get_write_access+0x21/0x38 [jbd2]
[75120.291394] [<ffffffffa013339b>] ? __ext4_journal_get_write_access+0x4f/0x5e [ext4]
[75120.291400] [<ffffffffa0111bee>] ? ext4_reserve_inode_write+0x37/0x7a [ext4]
[75120.291403] [<ffffffff8102bb5c>] ? pvclock_clocksource_read+0x42/0xb2
[75120.291408] [<ffffffffa0111c99>] ? ext4_mark_inode_dirty+0x68/0x1da [ext4]
[75120.291414] [<ffffffffa0113625>] ? ext4_dirty_inode+0x14/0x45 [ext4]
[75120.291421] [<ffffffffa0128791>] ? ext4_journal_start_sb+0x139/0x14f [ext4]
[75120.291427] [<ffffffffa011363b>] ? ext4_dirty_inode+0x2a/0x45 [ext4]
[75120.291433] [<ffffffffa0113611>] ? ext4_evict_inode+0x2a6/0x2a6 [ext4]
[75120.291436] [<ffffffff81117f03>] ? __mark_inode_dirty+0x22/0x17a
[75120.291439] [<ffffffff8110d4dd>] ? file_update_time+0xda/0x105
[75120.291442] [<ffffffff810b645c>] ? __generic_file_aio_write+0x160/0x278
[75120.291445] [<ffffffff8106d6ca>] ? futex_wait_queue_me+0xba/0xd5
[75120.291448] [<ffffffff81036638>] ? should_resched+0x5/0x23
[75120.291450] [<ffffffff810b65d1>] ? generic_file_aio_write+0x5d/0xb5
[75120.291455] [<ffffffffa010bbd4>] ? ext4_file_write+0x1e1/0x235 [ext4]
[75120.291458] [<ffffffff8106e4fb>] ? futex_wake+0xe9/0xfb
[75120.291461] [<ffffffff810face8>] ? do_sync_write+0xb4/0xec
[75120.291463] [<ffffffff8106f622>] ? do_futex+0xd7/0x80c
[75120.291466] [<ffffffff8102bb5c>] ? pvclock_clocksource_read+0x42/0xb2
[75120.291469] [<ffffffff811654b9>] ? security_file_permission+0x16/0x2d
[75120.291471] [<ffffffff810fb3d9>] ? vfs_write+0xa2/0xe9
[75120.291473] [<ffffffff810fb5b6>] ? sys_write+0x45/0x6b
[75120.291476] [<ffffffff81355f92>] ? system_call_fastpath+0x16/0x1b

During these periods I can't even cd into /root/ and the load spikes up to massive levels.

I have asked the company if they could check the raid / hdd to see if there was any issues and they responded with:

Our internal monitoring hasn't notified of any issues with the array, I have checked manually and see nothing wrong on our end.

We can see our test vm on the node is running perfectly as well (albeit it has nothing running on it)

I really think it is something on there end, but I'd like a little confirmation that I'm not being an idiot.

Thanks

Munzy · Apr 21, 2015

Code:

tune2fs -l /dev/vda1
tune2fs 1.42.5 (29-Jul-2012)



Filesystem volume name:   <none>
Last mounted on:          /
Filesystem UUID:          9b88c5bf-51af-4705-b265-bdd8bffdd34d
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype n                                                                                                                                                                                               eeds_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nli                                                                                                                                                                                               nk extra_isize
Filesystem flags:         signed_directory_hash
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              3217536
Block count:              12845056
Reserved block count:     642265
Free blocks:              12426062
Free inodes:              3191269
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      1020
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8208
Inode blocks per group:   513
Flex block group size:    16
Filesystem created:       Mon Apr 20 09:45:58 2015
Last mount time:          Tue Apr 21 15:40:30 2015
Last write time:          Tue Apr 21 15:40:30 2015
Mount count:              1
Maximum mount count:      -1
Last checked:             Tue Apr 21 15:40:30 2015
Check interval:           0 (<none>)
Lifetime writes:          3465 MB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      6de64466-cbd4-4912-b0db-b8d2ef9e9236
Journal backup:           inode blocks

expertvm · Apr 21, 2015

Do a fsck scan on your server. set your system to run on your next reboot.

Munzy · Apr 21, 2015

expertvm said:
Do a fsck scan on your server. set your system to run on your next reboot.

I did.

Code:

Log of fsck -C -f -a -t ext4 /dev/vda1
Tue Apr 21 15:40:30 2015

fsck from util-linux 2.20.1
/dev/vda1: 26267/3217536 files (0.3% non-contiguous), 418994/12845056 blocks

Tue Apr 21 15:40:30 2015
----------------

Munzy · Apr 22, 2015

... and the answer is, a faulty LSI raid controller will cause a lot of fun things like the above to happen.

KVM HDD Issues

Munzy

Active Member

Munzy

Active Member

expertvm

Member

Munzy

Active Member

Munzy

Active Member