I religiously backup my VPSes. I've gone through several evolutions in thought regarding the best way to do this, balancing security, bandwidth, and efficiency.
How do you backup your VPSes? Below are my thoughts.
The Basic Requirements/Assumptions
rsync-over-ssh by Backup Server
If the backup server is compromised, it has ssh keys to all clients. Par-tay! There's unfortunately no way to allow rsync over ssh without also allowing a shell (i.e., with the authorized_keys' file's command= param). If you allow rsync-over-ssh, you implictly allow that same user to open a shell. Also, it's very tempting to have the backup user come in as root because you want to backup /home or something...
dump-over-ssh by Backup Server
This is reasonable secure. You set command= in authorized_keys and the backup server invokes dump over ssh. There's an example on the sshd man page. This also allows a full/incremental cycle.
The problem is that you have to code different commands for different filesystems (thank you Linux) and operating systems. And it doesn't work (or anyway isn't installed) on most shared hosts. Also, restores are a headache - no quickly scping back a DB backup, etc.
tar over ssh (with authorized_keys' command=) is possible but difficult because the command= hardcodes things...you can't say "let him run tar with his own args".
rsync by Backup Server
In this scenario, client runs rsyncd and the backup server runs a regular rsync. You can lock down who can do this rsync - both with firewall rules and then also with rsyncd's conf (and also only allow one rsync at a time, etc.) If the backup server was compromised, the worst he could do is delete backups, not cascade the compromise.
The problem is that everything is transmitted in the clear. It's possible to tunnel this over stunnel or spiped, but of course you can't do any of this shared hosting.
Better Methods
rsync-over-ssh from Client
For a while, I had my VPSes run a nightly job that rsync'd over ssh to a couple different backup VPSes. The problems with this method is if the VPS is ever compromised, the backups will likely be swiftly deleted.
Unfortunately, chflags (a BSDism) and its ilk don't allow "create directory but can't delete/modify files after creation" which would fix this.
One possible fix is on on the backup server, once the backup is done, duplicate it or move it out of the way (though if you want the benefit of rsync, you have to move it back The tradeoff is more disk space/io work on the backup server.
Unfortunately, you still have the shell problem. You can limit the damage with chroot.
sftp from Client
This method is less elegant and has some drawbacks. A script on the client does the following:
- reads the list of dirs to backup
- creates a compressed tarball of each, sftping each tarball to the backup server
On the backup server
- each backup client has a unique user
- user's shell is sftp-server and user is chrooted into his backup directory (with YYYY-MM-DD sub dirs if you want)
- a cron job executes every 5 minutes waiting to see if the client has placed a "my backups are done for the day" flag in his dir
- if so, the files are chown'd to root and protected so if the VPS is compromised, the backups cannot be modified. I could also move them somewhere else, but this way they are readily available if needed
The disadvantages of this method are
- it's a "default don't include" model, which I've always found dangerous. I'd rather default-include and specify excludes.
- need to have staging space for tarballs on the client. So it doesn't really scale well.
- much more bandwidth used, as opposed to rsync
- backing up from a home system (i.e., in my house) is slightly more complicated as I have to rsync from the backup server unless I do something silly like dynamic DNS, etc. This isn't a big deal, as the backup server typically consumes a lot of incoming bandwidth (which is free from some providers) and the rsync from home consumes outgoing bandwidth.
How do you backup your VPSes? Below are my thoughts.
The Basic Requirements/Assumptions
- Backups need to be off-site, not at the same provider. I use dedicated backup VPSes/servers.
- I want to backup shared hosting accounts where I don't have root
- Traffic needs to encrypted
- I'm not going to pay for R1Soft, etc.
- I also like reports like last backup, alert when a backup fails, etc. but that's common to all methods
- Assume backup prep work like creating mysql/pg dumps, etc. which I'm not discussing
- Assume firewall rules as appropriate to control access
rsync-over-ssh by Backup Server
If the backup server is compromised, it has ssh keys to all clients. Par-tay! There's unfortunately no way to allow rsync over ssh without also allowing a shell (i.e., with the authorized_keys' file's command= param). If you allow rsync-over-ssh, you implictly allow that same user to open a shell. Also, it's very tempting to have the backup user come in as root because you want to backup /home or something...
dump-over-ssh by Backup Server
This is reasonable secure. You set command= in authorized_keys and the backup server invokes dump over ssh. There's an example on the sshd man page. This also allows a full/incremental cycle.
The problem is that you have to code different commands for different filesystems (thank you Linux) and operating systems. And it doesn't work (or anyway isn't installed) on most shared hosts. Also, restores are a headache - no quickly scping back a DB backup, etc.
tar over ssh (with authorized_keys' command=) is possible but difficult because the command= hardcodes things...you can't say "let him run tar with his own args".
rsync by Backup Server
In this scenario, client runs rsyncd and the backup server runs a regular rsync. You can lock down who can do this rsync - both with firewall rules and then also with rsyncd's conf (and also only allow one rsync at a time, etc.) If the backup server was compromised, the worst he could do is delete backups, not cascade the compromise.
The problem is that everything is transmitted in the clear. It's possible to tunnel this over stunnel or spiped, but of course you can't do any of this shared hosting.
Better Methods
rsync-over-ssh from Client
For a while, I had my VPSes run a nightly job that rsync'd over ssh to a couple different backup VPSes. The problems with this method is if the VPS is ever compromised, the backups will likely be swiftly deleted.
Unfortunately, chflags (a BSDism) and its ilk don't allow "create directory but can't delete/modify files after creation" which would fix this.
One possible fix is on on the backup server, once the backup is done, duplicate it or move it out of the way (though if you want the benefit of rsync, you have to move it back The tradeoff is more disk space/io work on the backup server.
Unfortunately, you still have the shell problem. You can limit the damage with chroot.
sftp from Client
This method is less elegant and has some drawbacks. A script on the client does the following:
- reads the list of dirs to backup
- creates a compressed tarball of each, sftping each tarball to the backup server
On the backup server
- each backup client has a unique user
- user's shell is sftp-server and user is chrooted into his backup directory (with YYYY-MM-DD sub dirs if you want)
- a cron job executes every 5 minutes waiting to see if the client has placed a "my backups are done for the day" flag in his dir
- if so, the files are chown'd to root and protected so if the VPS is compromised, the backups cannot be modified. I could also move them somewhere else, but this way they are readily available if needed
The disadvantages of this method are
- it's a "default don't include" model, which I've always found dangerous. I'd rather default-include and specify excludes.
- need to have staging space for tarballs on the client. So it doesn't really scale well.
- much more bandwidth used, as opposed to rsync
- backing up from a home system (i.e., in my house) is slightly more complicated as I have to rsync from the backup server unless I do something silly like dynamic DNS, etc. This isn't a big deal, as the backup server typically consumes a lot of incoming bandwidth (which is free from some providers) and the rsync from home consumes outgoing bandwidth.