Heads up: OpenVZ updates will probably break your system

eddynetweb · Jun 26, 2014

Kruno said:
That was related to latest OVZ kernel, not ploop.

Oh, my mistake.

eddynetweb · Jun 26, 2014

So has anybody been affected by this in a large scale?

Magiobiwan · Jun 26, 2014

I remember reading multiple threads on LET about how SolusVM broke with it, but iirc they patched that quickly. We noticed a few minor hiccups with Feathur, but it was due to the switch to ploop for the default and the ploop libraries not being installed on the node. Turns out that doesn't work too well!

Kruno · Jun 28, 2014

All in all, ploop is a major improvement compared to simfs.

There are still a few issues to be sorted out, though. The biggest one being ploop containers don't shrink down when files inside them get deleted. While df inside container shows accurate results, /vz/private/cid/root.hdd/root.hdd still uses too much space and makes less free space on the node itself.

Solution is vzctl compact.

#!/bin/sh
vzlist -a | awk '{print $1}' | sed '/CTID/d' > ctid.txt

echo "Compact started on $(date)" >> compactlogs.txt
for i in `cat ctid.txt`
do
vzctl compact $i;
done
echo "Compact finished on $(date)" >> compactlogs.txt

Putting above script to daily cron would be a good workaround until they release a proper patch to this issue. Other than that, everything has been working like a charm so far.

Any of you experienced that on your Nodes?

mtwiscool · Jul 2, 2014

Kruno said:
All in all, ploop is a major improvement compared to simfs.

There are still a few issues to be sorted out, though. The biggest one being ploop containers don't shrink down when files inside them get deleted. While df inside container shows accurate results, /vz/private/cid/root.hdd/root.hdd still uses too much space and makes less free space on the node itself.

Solution is vzctl compact.

#!/bin/sh
vzlist -a | awk '{print $1}' | sed '/CTID/d' > ctid.txt

echo "Compact started on $(date)" >> compactlogs.txt
for i in `cat ctid.txt`
do
vzctl compact $i;
done
echo "Compact finished on $(date)" >> compactlogs.txt

Putting above script to daily cron would be a good workaround until they release a proper patch to this issue. Other than that, everything has been working like a charm so far.

Any of you experienced that on your Nodes?

I have been using ploop for over 6 months and it works vary well.

Zero crashes and vary fast and effencint.

I would recmend anyone who uses openvz gets ploop.

Kruno · Jul 4, 2014

Update: Looks like they are not going to fix this.

BugZilla updates(status changed to WONTFIX):

"This is the way it works. Dynamic instant downsize would require much more resources, i.e. ploop would be way slower. If this is what you need, do vzctl compact from cron."

"I mean, vzctl compact is done online, you don't need to stop anything. Please explain what prevents you from doing it on a daily basis if you are so tied on diskspace."

devonblzx · Jul 4, 2014

Kruno said:
Update: Looks like they are not going to fix this.

BugZilla updates(status changed to WONTFIX):

"This is the way it works. Dynamic instant downsize would require much more resources, i.e. ploop would be way slower. If this is what you need, do vzctl compact from cron."

"I mean, vzctl compact is done online, you don't need to stop anything. Please explain what prevents you from doing it on a daily basis if you are so tied on diskspace."

I have written a script to resolve this. Run it as cron every 15 minutes. The script will:

Check for a minimum level of free space to automatically run compact every 15 minutes
Automatically run compact every 24 hours on all servers
If the compact fails to reclaim enough space, it will email you.

http://blog.byteonsite.com/?p=87

Kruno · Jul 4, 2014

Thanks Devon.

I already wrote my own script for this, though. Hope it will come handy for other providers.

Cheers

devonblzx · Aug 20, 2014

@Kruno and others. Recent circumstances has made me reconsider ploop images as a whole. We have been using them for over a year and not until the past month have we seen this problem but it appears to be a major problem. I have had this problem with both HW RAID and SW RAID where there is either a power failure (hard reboot) or a RAID degradation. I have seen this on multiple servers now and OpenVZ doesn't appear to be recognizing this as a major problem. The data integrity of the ploop images is in question in my mind, it appears like a part of the images are getting corrupted (quite easily) and then the whole ploop image won't mount making it completely inaccessible. From my experiences, this seems to be more at risk to larger ploop images than smaller ones.

I have pointed that I had not had an issue running the same setups for 12 months prior but only in the last month have I seen these errors so I believe this is an issue with a more recent update either in the kernel or ploop libraries.

You can read about my bug report 3030 here: https://bugzilla.openvz.org/show_bug.cgi?id=3030. So far two other users have experienced the same issue.

I highly recommend people stay at simfs for now. We are switching to a combination between private volumes (using a volume manager, eg LVM) and simfs for a similar goal of ploop but much more stable from the appearance of things as of recently. I will share my experience once everything is completed on our end. I know many users are stuck with SolusVM so it may be harder to work a custom solution.

Kruno · Aug 21, 2014

@devonblzx I didn't experience any of those yet. Granted those are enterprise drives on highly quality HP RAID cards(e.g. P222) with BBU and failure rate is honestly very very small.

Did you have BBU on your RAID cards and did you have write-cache enabled on either RAID card or a separate SSD enabled?

devonblzx · Aug 21, 2014

Kruno said:
@devonblzx I didn't experience any of those yet. Granted those are enterprise drives on highly quality HP RAID cards(e.g. P222) with BBU and failure rate is honestly very very small.

Did you have BBU on your RAID cards and did you have write-cache enabled on either RAID card or a separate SSD enabled?

Didn't have writeback enabled on the one server using P410i but did have a RAID degradation caused by power supply issues. The other servers were software RAID and software SSD caching with hard reboots. It seems like it is occurring on all types of systems as the two other users involved were using HW RAID if I recall correctly, I'm not sure if they had writeback enabled but I would assume it is a good chance that they did. The major problem I'm seeing is not necessarily the corruption but the lack of ability to repair. There is no way to fsck the filesystem to correct any corruption, the whole disk images are lost. The fact that this has happened on three nodes in the past ~3 weeks has made me decide to abandon ploop all together. It isn't worth the risk or trouble and the benefits of ploop can be substituted by other more reliable means (as I suggested in my previous post).

Geek · Aug 21, 2014

I'm absolutely backing off from Ploop as I was telling another provider not long ago. Originally had positive results testing Ploop in dev/qa. Then I got my horror story...

http://www.johnedel.me/ext4-file-system-corruption-openvz-ploop/

Sorry for the lousy markup, I just recently migrated a bunch of WP posts to Ghost and I have some cleaning up to do.

Jun 30 20:18:00 ovz-angelica kernel: [526746.951505] EXT4-fs error (device ploop60273p1): mbfreeblocks: double-free of inode 0's block 79217420(bit 17164 in group 2417)
Jun 30 20:18:00 ovz-angelica kernel: [526746.951563] EXT4-fs error (device ploop60273p1): mbfreeblocks: double-free of inode 0's block 79217421(bit 17165 in group 2417)

Jun 30 20:18:00 ovz-angelica kernel: [553166.612375] JBD: Spotted dirty metadata buffer (dev = ploop43717p1, blocknr = 0). There's a risk of filesystem corruption in case of system crash

The next day, another of my nodes with identical specs (HW RAID 10 across 8x4TB WD RE4 + 3ware controller), the BBU failed, showed a 0 capacity, went into hard reboot, and when back online, the group of Ploop containers on that node reported read-only and required repairing. In both cases, writeback was enabled. The BBU situation was my own fault of course, I know better than to miss something like that, but I got to thinking that Ploop may end up being too much hassle for the client long-term wise. I found myself losing out on certain

I also don't feel like having to vzctl compact to correct weirdness with resizing... automated or not I've grown quite fond of quick vertical scaling given to us by SimFS... I also noticed a while back that there's at least 5 new Ploop bugs reported each month, and twice that for broken live migrations as of late -- try and train yourselves to use the '-r no' switch if you haven't yet.

I'm not sure why Kir and the guys thew us the equivalent of a *vz overhaul with little warning (I had about 2 days to test netfilter in dev. Most of it I wasn't worried too much about and was easy enough to tune up again, but it created a lot of confusion for the blissfully unaware who run yum updates without paying attention....

Good luck switching between SimFS and Ploop if you use SolusVM. Right now it's either one or the other. Turn off Ploop, disk usage stops reporting. Enable it and VE_LAYOUT=simfs is overwritten and spools the container as a Ploop anyway, and I believe a blank $CTID.conf was created (or one that had no parameters set past physpages/swappages) then too.

Virtualizor seems to have gotten around that problem though... except that I'm not their #1 fan these days.

Also Ploop containers and SolusVM Quick Backup = unusable as far as I can tell. That was the start of the problem I blogged about. Interestingly enough you can create a template from a Ploop container, mount it with --layout simfs and for some reason it actually works. I haven't figured out why yet, but it does.

Also if you go to do a manual vzdump on a ploop container the dump is stored in /var/tmp/vzdumptmp - greaaaat fun if /vz is it's own partition and your root filesystem is only 10 o 15gb. It can be tweaked to change the destination but that's a lot of work...

This was kind of jumbled together in between tickets and trying to remember everything I wrote about in May.... but I've rolled all but a few containers back to simfs where everything's comfortable and functions how the clients want.

devonblzx said:
Didn't have writeback enabled on the one server using P410i but did have a RAID degradation caused by power supply issues. The other servers were software RAID and software SSD caching with hard reboots. It seems like it is occurring on all types of systems as the two other users involved were using HW RAID if I recall correctly, I'm not sure if they had writeback enabled but I would assume it is a good chance that they did. The major problem I'm seeing is not necessarily the corruption but the lack of ability to repair. There is no way to fsck the filesystem to correct any corruption, the whole disk images are lost. The fact that this has happened on three nodes in the past ~3 weeks has made me decide to abandon ploop all together. It isn't worth the risk or trouble and the benefits of ploop can be substituted by other more reliable means (as I suggested in my previous post).

Francisco · Aug 21, 2014

Goddammit Solus, how are you this broken.

Francisco

KuJoe · Aug 21, 2014

Francisco said:
Goddammit Solus, how are you this broken.

Francisco

I'm more surprised that you're surprised.

Also, did anybody know that solus is latin for alone, i.e. how you feel when you open a ticket with them for support sometimes (I've heard this was remedied recently but I stopped opening tickets when they told me that include and exclude meant the same thing).

devonblzx · Aug 21, 2014

For hosts that have the ability for customization, I recommend looking into LVM with thin provisioning. Steps to take:

Set the default layout back to simfs in /etc/vz/vz.conf
Create a thin volume, aka sparse, (volume that grows in size as it is filled) for each container with the size you wish.
Make an ext4 filesystem on that volume
Mount that volume as /vz/private/VEID

Since vzctl create fails when /vz/private/VEID exists, you have to rework it a little bit. Basically you can extract the OS template into /vz/private/VEID and copy the base configuration file. Then set the disk quotas to unlimited so the container will only be limited by their filesystem that you created.

tar -zxf /vz/template/cache/${OST}.tar.gz -C /vz/private/${VEID}/
cp -n /etc/vz/conf/ve-${CONFIG}.conf-sample /etc/vz/conf/${VEID}.conf
vzctl set ${VEID} --ostemplate $OST --diskspace 100T:100T --diskinodes 9223372036854775807:9223372036854775807 --save
Then create an action script named vps.premount to mount the private directory for each container before the container is started. Read more about action scripts here: http://openvz.org/Man/vzctl.8#ACTION_SCRIPTS

vps.premount:

#!/bin/bash
mount /dev/vg0/lv${VEID} /vz/private/${VEID}
if ! mount | grep /vz/private/${VEID} >/dev/null; then
   echo "Unable to mount storage volume"
   exit 1
else
   exit 0
fi

You can easily grow the filesystems online with resize2fs, however, shrinking requires the container to be offline and the filesystem to be unmounted, so that is the one con to this setup, but I think the price is worth it for better data integrity.

This would allow you to take full advantage of LVM per VE, such as snapshots and more. Should any filesystem issues occur, you can also run fscks per VE rather than having the host system and the VE to worry about. Hope this helps you guys!

Geek · Aug 21, 2014

KuJoe said:
I'm more surprised that you're surprised.

Also, did anybody know that solus is latin for alone, i.e. how you feel when you open a ticket with them for support sometimes (I've heard this was remedied recently but I stopped opening tickets when they told me that include and exclude meant the same thing).

I was not aware of the Latin meaning -- nor was I aware that it's safe to swear here without being reprimanded.

Also - a thousand pardons where I left out a couple things ... like, ya know, words...

Getting by on cat naps this week.

What I was going to say was that I've noticed myself having to tell clients that "unfortunately this method of _______ has changed with the introduction of ___________" -- and that's where I have to draw the line. As much as I appreciate and respect the enhancements I've (we've) seen over the past few years, I refuse to put a client at risk of silent corruption because ||s rushed the introduction of Ploop. I came, I saw, I rolled back.

What's more I get the feeling we'll be seeing more complaints of container corruption, more so with providers who are either young and clueless, or cheap and careless. Read about one just the other day... provider picks out a $4000.00 in procs, another $2000 for the drives, and a nice chunk of RAM, a few months later they did a blind conversion, corrupted 3/4 of their client base because they didn't spend $100 to rent a budget box to screw up first. Most expensive foot stool I've ever seen.

That's what I meant to add earlier -- the quick backups in SolusVM -- are still stored in /vz/private/panelbackup making them inaccessable at the container level. The admin has to move the /panelbackup manually into /vz/root/$CTID/.

And this stuff is peanuts compared to Devon's 800gb inaccessible container. https://bugzilla.openvz.org/show_bug.cgi?id=3030#c21

It's just not mature enough yet ... just like some of the service providers out there...

Urban Dictionary defines ploop as:

"The sound that is made when you have let a gigantic piece of feces, poop, shit, or crap fall out of your butt and into the toilet. Mostly likely the poop has been held in for a long LONG time.

When can I use the bathroom? In a minute. Ploop! Uh oh! What? We gonna need a new toilet! EWWWW!"

...or maybe it's this guy.

devonblzx · Aug 21, 2014

Update for my previous post above. I had figured unlimited would work with simfs, but apparently it doesn't. So instead of unlimited:unlimited, just use incredibly high limits like 100T:100T and 9223372036854775807:9223372036854775807 for inodes. The limits will basically be non-existent in simfs then and will fall back to the limits of the ext4 volume you created.

KuJoe · Aug 22, 2014

invpsus said:
I my opinion, OpenVZ is already outdated OS virtualization and for the better server performance is it better to upgrade to Xen or KVM.

False on both accounts.

1) OpenVZ has been proven to have much better performance than Xen and KVM.

2) Xen is two years older than OpenVZ.

I know you're only here to advertise and your posts are just fluff so that you can post your ads (as evident from your other posts with similar bot-like qualities), but please try to be accurate in your posts.

Geek · Aug 22, 2014

I my opinion, OpenVZ is already outdated OS virtualization and for the better server performance is it better to upgrade to Xen or KVM.

Wow. That's so ... 2008. Containerization is far more sophisticated now, making the usual tired opinions that much harder to stomach. Especially when you consider that Google is almost completely containerized now (maybe 100% by now), and I believe that some of Facebook's infrastructure is as well.

I had someone at WHT earlier this summer trying to start a rumor in some of my threads that "OpenVZ was closing this summer". Don't know what their problem was, but just about any seasoned veteran can reach out to Kirill or someone over at ||'s to squash that crap... done and done, btw.

People with that opinion should probably watch James Bottomley's talk at CloudOpen last year...

https://www.youtube.com/watch?v=p-x9wC94E38 - some of it's common knowledge for providers, but it's actually a pretty interesting lecture considering the KVM folks were in the audience the whole time. :lol:

KuJoe · Aug 22, 2014

Geek said:
Wow. That's so ... 2008. Containerization is far more sophisticated now, making the usual tired opinions that much harder to stomach. Especially when you consider that Google is almost completely containerized now (maybe 100% by now), and I believe that some of Facebook's infrastructure is as well.

I had someone at WHT earlier this summer trying to start a rumor in some of my threads that "OpenVZ was closing this summer". Don't know what their problem was, but just about any seasoned veteran can reach out to Kirill or someone over at ||'s to squash that crap... done and done, btw.

People with that opinion should probably watch James Bottomley's talk at CloudOpen last year...

https://www.youtube.com/watch?v=p-x9wC94E38 - some of it's common knowledge for providers, but it's actually a pretty interesting lecture considering the KVM folks were in the audience the whole time. :lol:

While a very passionate post, I feel it was wasted on a bot who's just here to increase post count and post copy+paste ads once they post enough nonsense.

I'm going to watch the YouTube video now though so it wasn't a total waste.

Heads up: OpenVZ updates will probably break your system

New Member

New Member

Insert Witty Statement Here

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

Technolojesus

Company Lube

Well-Known Member

New Member

Technolojesus

New Member

Well-Known Member

Technolojesus

Well-Known Member