amuck-landowner

KVM guest with higher CPU than host

libro22

Member
Is it normal for KVM guest to have higher CPU usage when host is pretty much idle?


I have this dedi server, only 1 cPanel server in it, using 8 vcpus (no other tuning in virsh xml)


I did a force upcp update where it hogged a lot of resources, 80-95%~ wait time for about 5-10 minutes straight (Guest) - and yes sites where visibly loading slow during this time.


I checked host, and it's 89-95% idle with comfortably 0% wait time.


As for I/O, it's on a good virtio (lvm/raw/native/nocache) setup, when host gets 189mb/s read, guest usually up to 120+ on low load. (SW raid)


Where might be the bottleneck and where to fine tune next?
 

HalfEatenPie

The Irrational One
Retired Staff
Huh.


That's fairly odd.  Have you checked the I/O during this period of higher load?  That's all my brain can think up of right now at the moment.  
 

libro22

Member
Huh.


That's fairly odd.  Have you checked the I/O during this period of higher load?  That's all my brain can think up of right now at the moment.  

During recent normal usage with load average ranging from (0-2), there's just quick spikes from ext4 journal (with noatime in fstab) reaching up to 99% - from mysql/lsphp processes.


This recent cPanel update (forced), journal seems to be fine (peaking at around 25% quick spike) and disk load seems to be from the update (cagesfctl mostly).


dd benchmark is on the 100mb range at any point in time.


And munin disk utilization maxing out 100 from /root while /home is averaging at 10-60.


24 hour's sar i/o wait maxes at 25% (i'm quite happy it's going back to normal). But any other heavy process makes I/O wait high enough that sites begin to slow down which is not reflective in host's load.
 

libro22

Member
This seems to be a case of MySQL filling up cache and might be KVM over committing CPU.


Most of my MySQL queries are for reading consuming at least 80% of total queries, right now, it's now in 20% of my 16gb RAM (no swap) and server has been stable for at least 48 hours. Previous months (where it was pretty stable), it's reaching up to 35%.


Also found out that a 25mbps I/O limit in CL LVE can heavily bring down your server.


I'm gonna experiment with vCPUs also, using 8 vCPUs in a 8-core hyper threaded node seems to be unhelpful than beneficial. Should equal physical cores than HT. CPU wait in guest is unequal to CPU wait in node (although there's only 1 guest - but top shows all CPU cores are being utilized in node).
 

libro22

Member
Fixed this! 


Never set vCPUs with # of HTs, just physical cores :) I/O increased from 80-100mb to 130+


And 1mb for sort/read/join buffer on MariaDB seems to be the magic number.
 
Top
amuck-landowner