amuck-landowner

Looking for (urgent!) help on an archiving project for a few days

joepie91

New Member
We (ArchiveTeam) are currently archiving Blip.tv, as it's shutting down, but we've run into a bit of a capacity snafu. It's an all-volunteer (and very time-sensitive!) effort.

The archival is distributed amongst many systems, and the archived data is then uploaded to one of several 'rsync targets' - collection boxes, pretty much. These rsync servers need to be pretty beefy, as they get potentially several gbps of data thrown at them constantly, for the duration of an archival project.

Now our primary rsync target has filled up to the brim, and we can't get it emptied out in time - Blip is already in the process of shutting down, and most likely the content will disappear somewhere during today. The rsync servers we are currently using, are having throughput issues. We're still trying to save the last bits, however, so we could really use any help.

What we're looking for: Somebody to donate spare server space (or a VM, it doesn't matter) for a few days. Expect it to need a few TB of disk space, and inbound bandwidth to possibly hit a few gbps, at least for the first day. After Blip shuts down (likely later today), we'll likely still need a few days to move everything back off.

All it will really need to run, is an rsync daemon - shell/root access would be handy (to set up processing scripts), but is not strictly required. Expect it to use a lot of disk I/O - video files, so largely sequential writes.

If you can help out, then please either join the IRC channel (#blooper.tv on EFNet), or PM me on here.

Thanks!
 

joepie91

New Member

willie

Active Member
I thought ArchiveTeam was associated with the Internet Archive, which has tons of storage and bandwidth... is there a reason you're not using archive.org machines?
 

joepie91

New Member
I thought ArchiveTeam was associated with the Internet Archive, which has tons of storage and bandwidth... is there a reason you're not using archive.org machines?

We use their infrastructure for long-term storage, and there's one person (Jason Scott) who's involved with both, but there's no formal association between the two. We're already offloading data to the Internet Archive, but unfortunately they have capacity limits, intake-wise... I believe that at last count, we couldn't get more than 500mbps worth of data into IA at the same time (our intake is several times that, currently).

So yeah, eventually it will end up there, we're simply doing the actual data-grabbing part, and act as a buffer until it can be uploaded to IA.
 

willie

Active Member
Could you ask Jason if he could get the IA to assign you a few raw machines temporarily?  That is you'd get shell access and run rsync or whatever on them, instead of going through the IA file upload and cataloging pipeline which is probably what's causing the speed restriction.

Of course there's cheap Hetzner computers with tons of disk space here, though the network isn't the greatest:

https://robot.your-server.de/order/market/country/US?hdsize=3000
 
Last edited by a moderator:

joepie91

New Member
Could you ask Jason if he could get the IA to assign you a few raw machines temporarily?  That is you'd get shell access and run rsync or whatever on them, instead of going through the IA file upload and cataloging pipeline which is probably what's causing the speed restriction.

I suspect that if that were possible, it'd have been done already. Might be either technical or legal reasons, I don't know. It all hangs together with duct tape, but it's pretty effective duct tape :)
 

drmike

100% Tier-1 Gogent
Really liking this project, and glad to run into Jason Scott again.   Knew of him a long long time ago.

@joepie91 always has the most interesting projects.
 
Top
amuck-landowner