I'm absolutely backing off from Ploop as I was telling another provider not long ago. Originally had positive results testing Ploop in dev/qa. Then I got my horror story...
http://www.johnedel.me/ext4-file-system-corruption-openvz-ploop/
Sorry for the lousy markup, I just recently migrated a bunch of WP posts to Ghost and I have some cleaning up to do.
Jun 30 20:18:00 ovz-angelica kernel: [526746.951505] EXT4-fs error (device ploop60273p1): mbfree
blocks: double-free of inode 0's block 79217420(bit 17164 in group 2417)
Jun 30 20:18:00 ovz-angelica kernel: [526746.951563] EXT4-fs error (device ploop60273p1): mbfree
blocks: double-free of inode 0's block 79217421(bit 17165 in group 2417)
Jun 30 20:18:00 ovz-angelica kernel: [553166.612375] JBD: Spotted dirty metadata buffer (dev = ploop43717p1, blocknr = 0). There's a risk of filesystem corruption in case of system crash
The next day, another of my nodes with identical specs (HW RAID 10 across 8x4TB WD RE4 + 3ware controller), the BBU failed, showed a 0 capacity, went into hard reboot, and when back online, the group of Ploop containers on that node reported read-only and required repairing. In both cases, writeback was enabled. The BBU situation was my own fault of course, I know better than to miss something like that, but I got to thinking that Ploop may end up being too much hassle for the client long-term wise. I found myself losing out on certain
I also don't feel like having to vzctl compact to correct weirdness with resizing... automated or not I've grown quite fond of quick vertical scaling given to us by SimFS... I also noticed a while back that there's at least 5 new Ploop bugs reported each month, and twice that for broken live migrations as of late -- try and train yourselves to use the '-r no' switch if you haven't yet.
I'm not sure why Kir and the guys thew us the equivalent of a *vz overhaul with little warning (I had about 2 days to test netfilter in dev. Most of it I wasn't worried too much about and was easy enough to tune up again, but it created a lot of confusion for the blissfully unaware who run yum updates without paying attention....
Good luck switching between SimFS and Ploop if you use SolusVM. Right now it's either one or the other. Turn off Ploop, disk usage stops reporting. Enable it and VE_LAYOUT=simfs is overwritten and spools the container as a Ploop anyway, and I believe a blank $CTID.conf was created (or one that had no parameters set past physpages/swappages) then too.
Virtualizor seems to have gotten around that problem though... except that I'm not their #1 fan these days.
Also Ploop containers and SolusVM Quick Backup = unusable as far as I can tell. That was the start of the problem I blogged about. Interestingly enough you can create a template from a Ploop container, mount it with --layout simfs and for some reason it actually works. I haven't figured out why yet, but it does.
Also if you go to do a manual vzdump on a ploop container the dump is stored in /var/tmp/vzdumptmp - greaaaat fun if /vz is it's own partition and your root filesystem is only 10 o 15gb. It can be tweaked to change the destination but that's a lot of work...
This was kind of jumbled together in between tickets and trying to remember everything I wrote about in May.... but I've rolled all but a few containers back to simfs where everything's comfortable and functions how the clients want.
Didn't have writeback enabled on the one server using P410i but did have a RAID degradation caused by power supply issues. The other servers were software RAID and software SSD caching with hard reboots. It seems like it is occurring on all types of systems as the two other users involved were using HW RAID if I recall correctly, I'm not sure if they had writeback enabled but I would assume it is a good chance that they did. The major problem I'm seeing is not necessarily the corruption but the lack of ability to repair. There is no way to fsck the filesystem to correct any corruption, the whole disk images are lost. The fact that this has happened on three nodes in the past ~3 weeks has made me decide to abandon ploop all together. It isn't worth the risk or trouble and the benefits of ploop can be substituted by other more reliable means (as I suggested in my previous post).