amuck-landowner

How Do You Backup Your VPS/Shared? Deep Thoughts Inside

raindog308

vpsBoard Premium Member
Moderator
I religiously backup my VPSes.  I've gone through several evolutions in thought regarding the best way to do this, balancing security, bandwidth, and efficiency.
 
How do you backup your VPSes?  Below are my thoughts.
 
The Basic Requirements/Assumptions

  • Backups need to be off-site, not at the same provider.  I use dedicated backup VPSes/servers.
  • I want to backup shared hosting accounts where I don't have root
  • Traffic needs to encrypted
  • I'm not going to pay for R1Soft, etc.
  • I also like reports like last backup, alert when a backup fails, etc. but that's common to all methods
  • Assume backup prep work like creating mysql/pg dumps, etc. which I'm not discussing
  • Assume firewall rules as appropriate to control access
Backup Methods I Don't Use
 
 
rsync-over-ssh by Backup Server
 
If the backup server is compromised, it has ssh keys to all clients. Par-tay!  There's unfortunately no way to allow rsync over ssh without also allowing a shell (i.e., with the authorized_keys' file's command= param).  If you allow rsync-over-ssh, you implictly allow that same user to open a shell.  Also, it's very tempting to have the backup user come in as root because you want to backup /home or something...
 

dump-over-ssh by Backup Server
 

This is reasonable secure.  You set command= in authorized_keys and the backup server invokes dump over ssh.  There's an example on the sshd man page.  This also allows a full/incremental cycle.  
 
The problem is that you have to code different commands for different filesystems (thank you Linux) and operating systems.  And it doesn't work (or anyway isn't installed) on most shared hosts.  Also, restores are a headache - no quickly scping back a DB backup, etc.
 
tar over ssh (with authorized_keys' command=) is possible but difficult because the command= hardcodes things...you can't say "let him run tar with his own args".
 

rsync by Backup Server
 

In this scenario, client runs rsyncd and the backup server runs a regular rsync.  You can lock down who can do this rsync - both with firewall rules and then also with rsyncd's conf (and also only allow one rsync at a time, etc.)  If the backup server was compromised, the worst he could do is delete backups, not cascade the compromise.
 
The problem is that everything is transmitted in the clear.  It's possible to tunnel this over stunnel or spiped, but of course you can't do any of this shared hosting.
 
 
Better Methods
 
rsync-over-ssh from Client
 
For a while, I had my VPSes run a nightly job that rsync'd over ssh to a couple different backup VPSes.  The problems with this method is if the VPS is ever compromised, the backups will likely be swiftly deleted.  
 
Unfortunately, chflags (a BSDism) and its ilk don't allow "create directory but can't delete/modify files after creation" which would fix this.
 
One possible fix is on on the backup server, once the backup is done, duplicate it or move it out of the way (though if you want the benefit of rsync, you have to move it back :)  The tradeoff is more disk space/io work on the backup server.  
 
Unfortunately, you still have the shell problem.  You can limit the damage with chroot.
 
sftp from Client
 
This method is less elegant and has some drawbacks.  A script on the client does the following:
 
- reads the list of dirs to backup
- creates a compressed tarball of each, sftping each tarball to the backup server
 
On the backup server
 
- each backup client has a unique user
- user's shell is sftp-server and user is chrooted into his backup directory (with YYYY-MM-DD sub dirs if you want)
- a cron job executes every 5 minutes waiting to see if the client has placed a "my backups are done for the day" flag in his dir
- if so, the files are chown'd to root and protected so if the VPS is compromised, the backups cannot be modified.  I could also move them somewhere else, but this way they are readily available if needed
 
The disadvantages of this method are
 
- it's a "default don't include" model, which I've always found dangerous.  I'd rather default-include and specify excludes.  
- need to have staging space for tarballs on the client.  So it doesn't really scale well.
- much more bandwidth used, as opposed to rsync
- backing up from a home system (i.e., in my house) is slightly more complicated as I have to rsync from the backup server unless I do something silly like dynamic DNS, etc.  This isn't a big deal, as the backup server typically consumes a lot of incoming bandwidth (which is free from some providers) and the rsync from home consumes outgoing bandwidth.
 

William

pr0
Verified Provider
Like this:

Code:
for backupserver in $(ls -1); do
    date
    echo "Backup $backupserver"
    cd $backupserver
    month=$(date +%m)
    if [ ! -d $month ]; then mkdir $month; fi
    cd $month
    day=$(date +%d)
    if [ ! -d $day ]; then mkdir $day; fi
    cd $day
    mkdir temp
    cd temp
    rsync --numeric-ids -4 -azR --exclude=/proc --exclude=/backup --exclude=/sys --exclude=/dev --exclude=/var/spool/squid root@$backupserver:/ . >>/dev/null
    backupdate=$(date +%d.%m.%Y)
    tar -zcf ../$backupdate.tar.gz .
    cd ..
    rm -r temp
    gpg --trust-model always --encrypt --recipient A1BEA55C -o $backupdate.tar.gz.enc $backupdate.tar.gz
    rm $backupdate.tar.gz
    date
    cd ..
    cd ..
    cd ..
done
 
Last edited by a moderator:

wlanboy

Content Contributer
Every vps does have a cronjob to create daily/weekly backups.

They scp them to a central backup vps.

This vps is rsyncing the backups to a second backup vps.

Goal is to keep montly backups on the second one and the current (weekly) on the first one.

I am using rssh/scponly to limit the access to the backup vps:
 
Top
amuck-landowner