amuck-landowner

Heads up: OpenVZ updates will probably break your system

Geek

Technolojesus
Verified Provider
While a very passionate post, I feel it was wasted on a bot who's just here to increase post count and post copy+paste ads once they post enough nonsense. :(

I'm going to watch the YouTube video now though so it wasn't a total waste. :D
Yeh, perhaps you're right, but I'd still prefer to contribute something worthwhile over fluff or copy/paste like you said.  Plus I just started posting here yesterday, and even though I see a lot of familiar faces, first impressions are important, and I'd hate for people to think that I have a hidden agenda when I really just enjoy contributing... if that makes sense. Hope you enjoyed the vid though. :D
 

Geek

Technolojesus
Verified Provider
Just a little FYI, it seems if you go to convert a container where second-level quotas are established, the /etc/mtab symbolic link to /proc/mounts gets broken and isn't correctly re-established, so whenever the container is rebooted, the container will show the former simfs layout unless you re-create the symlink.  Observe....

root@ctdev19 [~]# mount; echo; cat /proc/mounts; echo; ls -al /etc/mtab; echo;
/dev/simfs on / type reiserfs (rw,usrquota,grpquota)
proc on /proc type proc (rw,relatime)
sysfs on /sys type sysfs (rw,relatime)
none on /dev type devtmpfs (rw,relatime,mode=755)
none on /dev/pts type devpts (rw,relatime,mode=600,ptmxmode=000)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /var/tmp type tmpfs (rw,nosuid,nodev,noexec,relatime)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)

/dev/ploop63231p1 / ext4 rw,relatime,barrier=1,data=ordered,balloon_ino=12,jqfmt=vfsv0,usrjquota=aquota.user,grpjquota=aquota.group 0 0
proc /proc proc rw,relatime 0 0
sysfs /sys sysfs rw,relatime 0 0
none /dev devtmpfs rw,relatime,mode=755 0 0
none /dev/pts devpts rw,relatime,mode=600,ptmxmode=000 0 0
tmpfs /tmp tmpfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /var/tmp tmpfs rw,nosuid,nodev,noexec,relatime 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0

-rw-r--r-- 1 root root 383 Aug 22 04:45 /etc/mtab


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

root@ctdev19 [~]# rm -rf /etc/mtab
root@ctdev19 [~]# ln -s /proc/mounts /etc/mtab


 

root@ctdev19 [~]# mount; echo; cat /proc/mounts; echo; ls -al /etc/mtab; echo;
/dev/ploop63231p1 on / type ext4 (rw,relatime,barrier=1,data=ordered,balloon_ino=12,jqfmt=vfsv0,usrjquota=aquota.user,grpjquota=aquota.group)
proc on /proc type proc (rw,relatime)
sysfs on /sys type sysfs (rw,relatime)
none on /dev type devtmpfs (rw,relatime,mode=755)
none on /dev/pts type devpts (rw,relatime,mode=600,ptmxmode=000)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /var/tmp type tmpfs (rw,nosuid,nodev,noexec,relatime)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)

/dev/ploop63231p1 / ext4 rw,relatime,barrier=1,data=ordered,balloon_ino=12,jqfmt=vfsv0,usrjquota=aquota.user,grpjquota=aquota.group 0 0
proc /proc proc rw,relatime 0 0
sysfs /sys sysfs rw,relatime 0 0
none /dev devtmpfs rw,relatime,mode=755 0 0
none /dev/pts devpts rw,relatime,mode=600,ptmxmode=000 0 0
tmpfs /tmp tmpfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /var/tmp tmpfs rw,nosuid,nodev,noexec,relatime 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0

lrwxrwxrwx 1 root root 12 Aug 27 08:55 /etc/mtab -> /proc/mounts



The behavior of the conversion is normal if second-level quotas are temporarily disabled prior to converting to Ploop.  If anyone feels like dissecting the conversion to find out why, be my guest. Otherwise I guess we'll just have to remember.
 

Francisco

Company Lube
Verified Provider
@Geek - Have you tried the latest ploop? It looks like they provided a way to zero out bad blocks and they feel

that the issue is due to writeback mode being set on EXT4.

Francisco
 

devonblzx

New Member
Verified Provider
@Geek - Have you tried the latest ploop? It looks like they provided a way to zero out bad blocks and they feel


that the issue is due to writeback mode being set on EXT4.


Francisco
I am not entirely confident in that resolution but I didn't have enough time to debug further or expertise in ploop to state otherwise.  We already started moving servers away from ploop to a custom solution.

While writeback mode did seem dangerous with how ploop works, one of the other servers that experienced data loss without any hard drive issues was running in ordered mode.  So I just don't trust it for the time being, not to mention the compact issue.  Ploop images will continue to grow if you don't compact them regularly.  If the underlying filesystem runs out of space because of this, then there will be data integrity issues inside the VPS.   This problem is because when files are deleted, they are just marked in the ploop image not the underlying filesystem.   Therefore, from my understanding, if a VPS deletes and recreates a 10GB file 100 times the ploop image could be as large as 1TB if you haven't compacted it.
 
Last edited by a moderator:

Francisco

Company Lube
Verified Provider
I am not entirely confident in that resolution but I didn't have enough time to debug further or expertise in ploop to state otherwise.  We already started moving servers away from ploop to a custom solution.

While writeback mode did seem dangerous with how ploop works, one of the other servers that experienced data loss without any hard drive issues was running in ordered mode.  So I just don't trust it for the time being, not to mention the compact issue.  Ploop images will continue to grow if you don't compact them regularly.  If the underlying filesystem runs out of space because of this, then there will be data integrity issues inside the VPS.   This problem is because when files are deleted, they are just marked in the ploop image not the underlying filesystem.   Therefore, from my understanding, if a VPS deletes and recreates a 10GB file 100 times the ploop image could be as large as 1TB if you haven't compacted it.
I don't think that's how it works.....

I'm pretty sure EXT4 would just re-allocate the same sectors for the last 10GB file.

I've looked into an LVM based solution but was unsure of it.

Francisco
 

devonblzx

New Member
Verified Provider
I don't think that's how it works.....


I'm pretty sure EXT4 would just re-allocate the same sectors for the last 10GB file.


I've looked into an LVM based solution but was unsure of it.


Francisco
Ext4 isn't being made aware of the delete, that is the problem.  Just from normal usage before we started doing compacts, we saw ploop images that were over 100GB with servers that were only using* ~40GB.
 
Last edited by a moderator:

Geek

Technolojesus
Verified Provider
Sorry I'm late.  Had a root canal this week & I delegate everything when Vicodin's involved... :p

Ext4 isn't being made aware of the delete, that is the problem.  Just from normal usage before we started doing compacts, we saw ploop images that were over 100GB with servers that were only using* ~40GB.
So this is the one where Kir said "that's just how it is, use cron if you have to" or something?  Yet more resource-intensive rituals for people like us. I did experience data loss running in ordered mode, then there was my little fiasco with the BBU failure in writeback mode which damaged all the ploops.  I'm certain this will spell trouble for some.  Gonna be sticking with ol' trusty for a while longer.  Sorry I don't have more to contribute this time. Freaking jaw is throbbing... thanks for the info Devon!  :)
 

Geek

Technolojesus
Verified Provider
vzctl compact/ploop-baloon discard isn't quite making the cut, it seems...

 

 

[root@mulva ~]# vzctl exec 1230 "df -h"; echo

Filesystem         Size  Used Avail Use% Mounted on

/dev/ploop18019p1   60G  9.7G   47G  18% /

none               768M  4.0K  768M   1% /dev

none               768M     0  768M   0% /dev/shm

tmpfs              768M     0  768M   0% /tmp

 


[root@mulva ~]# du -sh /vz/private/1230/root.hdd/*

4.0K    /vz/private/1230/root.hdd/DiskDescriptor.xml

0       /vz/private/1230/root.hdd/DiskDescriptor.xml.lck

16G     /vz/private/1230/root.hdd/root.hdd

4.0K    /vz/private/1230/root.hdd/root.hdd.mnt

 


[root@mulva ~]# ploop-balloon discard /vz/private/1230/root.hdd/DiskDescriptor.xml

 

Trying to find free extents bigger than 0 bytes

Waiting

Call FITRIM, for minlen=33554432

Call FITRIM, for minlen=16777216

Call FITRIM, for minlen=8388608

Call FITRIM, for minlen=4194304

Call FITRIM, for minlen=2097152

Call FITRIM, for minlen=1048576

0 clusters have been relocated

 


 

...so I guess that leaves me with 7.3gb wasted space until they figure it out?

 

 

Edit: Meant to add that there's an entry in OVZ Bugzilla about vzctl compact not doing the whole job... apparently you can use vzctl and shrink the fs to recover a bit of that, but even in the QA node I don't really want to screw around much with it yet.
 
Last edited by a moderator:

devonblzx

New Member
Verified Provider
I saw similar results with my testing of ploop.  It wasn't a huge deal for me because we tend to have plenty of free space on our nodes.  I'm not sure why there 

I think the discard issue will be always be present on ploop because of its design of a filesystem over a filesystem.  The first filesystem won't automatically pass the deletions to the second filesystem unless it is mounted with a discard option. 

Ext4 supports trim with the discard option, which is made to pass deletions to the next level (mainly for SSDs).  It seems like they could mount the ploop device with the discard option to resolve this if they have configured ploop to allow for that.

I haven't tested and I don't have a dev box to test with right now, but does this command work?


fstrim -v /vz/root/VEID
I wouldn't run it on a production node, but that is the standard method of doing a manual trim.  If that works, then mounting the ploop device with discard should also.
 
Last edited by a moderator:

Geek

Technolojesus
Verified Provider
Nothing that changed the issue with CT 1230 in dev.  I thought I saw something a while back on the *vz forum that mounting in discard wasn't an option.  

 

I'm going to shrink container 2222 and see if I can grab any of that 18gb wasted space.  Just saw that the testing kernel was updated to include some of your bug reports.  Might just boot into into it after I try resizing that CT.   

 

[root@mulva ~]# fstrim -v /vz/root/2222

/vz/root/2222: 294179635200 bytes were trimmed

 

[root@mulva ~]# du -sh /vz/private/2222/root.hdd/*

4.0K    /vz/private/2222/root.hdd/DiskDescriptor.xml

0       /vz/private/2222/root.hdd/DiskDescriptor.xml.lck

138G    /vz/private/2222/root.hdd/root.hdd

 

[root@mulva ~]# vzctl exec 2222 "df -h"

Filesystem         Size  Used Avail Use% Mounted on

/dev/ploop16657p1  394G  120G  254G  33% /

none               768M  4.0K  768M   1% /dev

none               154M  1.1M  153M   1% /run

none               5.0M     0  5.0M   0% /run/lock

none               768M     0  768M   0% /run/shm

 

 

Oh, you're welcome to have access to my dev box if you want a safe place to play.  It's going away next month after I get settled into my larger office ...anyhow, if we kill it, we kill it.  Keep ya posted
 
Last edited by a moderator:

Geek

Technolojesus
Verified Provider
Well, wouldja look at that...

Sooo.... now what?  Another thing to add to the already long list of concerns?  Like you, I have some beefy arrays where it would likely go unnoticed for some time, but still, that feeling of knowing you have wasted space on a production node?  I don't think I like that much....

[root@mulva ~]# vzctl set 2222 --diskspace 264246648 --save

Completing an on-going operation RELOC for device /dev/ploop16657

TRUNCATED: 16163 cluster-blocks (0 bytes)

dumpe2fs 1.41.12 (17-May-2010)

CT configuration saved to /etc/vz/conf/2222.conf

 

[root@imulva ~]# vzctl exec 2222 "df -h"

Filesystem         Size  Used Avail Use% Mounted on

/dev/ploop16657p1  246G  120G  106G  54% /

none               768M  4.0K  768M   1% /dev

none               154M  1.1M  153M   1% /run

none               5.0M     0  5.0M   0% /run/lock

none               768M     0  768M   0% /run/shm

 

[root@mulva ~]# du -sh /vz/private/2222/root.hdd/*

4.0K    /vz/private/2222/root.hdd/DiskDescriptor.xml

0       /vz/private/2222/root.hdd/DiskDescriptor.xml.lck

122G    /vz/private/2222/root.hdd/root.hdd
 
Last edited by a moderator:

devonblzx

New Member
Verified Provider
So what'd you do with the last attempt?  Just reset the diskspace and it trimmed it on its own?
 

devonblzx

New Member
Verified Provider
Precisely.
Interesting, seems simple enough and a useful workaround for now for ploop admins.  Do you have to set a different diskspace amount or can you just reset the same value repeatedly?   I don't use ploop anymore on my systems so I can't test but thanks for sharing.
 

Geek

Technolojesus
Verified Provider
Set the diskspace about 35% less as the initial test, it trimmed, then I set it back. I hadn't thought to reset to the same value. I haven't done much with Ploop this summer either.  Frankly, using it takes away from the density and scalability that containers are known for.
 

devonblzx

New Member
Verified Provider
Set the diskspace about 35% less as the initial test, it trimmed, then I set it back. I hadn't thought to reset to the same value. I haven't done much with Ploop this summer either.  Frankly, using it takes away from the density and scalability that containers are known for.
Well that may be the right type of approach as it may require a lower setting to attempt to truncate unused blocks.  Using that logic, I wonder if this approach would work:


vzctl set VEID --diskspace 1M
So by setting a ridiculously low value, my assumption would be that OpenVZ would first attempt to truncate then check to see if the setting is lower than the used disk space, thus failing to actually change the setting.

This may truncate without altering anything.  Of course you'd also want to skip the --save option.  Of course noting that I haven't tested, this is only a theory.
 
Last edited by a moderator:
Top
amuck-landowner