amuck-landowner

Self-hosted Distributed Storage

HalfEatenPie

The Irrational One
Retired Staff
Alright so lets talk about this.

Anyone know a good and easy way to create a self-hosted distributed storage system?

This can either be a local cluster (via LAN), or one through the interweb (aka like hooking up a bunch of cloud servers up to make a distributed storage).  

I think it'd be awesome to backup all my work onto a storage "cluster" (per se) and have that 200 points of backup.  Redundancy would be awesome (as in one server drops off the face of the earth (or two) and everything would still be fine).  

Anyways, hit me with your best!  Also, if it could be mounted/dropbox-style sync app, that'd be major plus! 
 

HalfEatenPie

The Irrational One
Retired Staff
Tahoe-lafs? I don't know it's best but it should be OK for what you want.
Also, I forgot to ask (haven't done much research on Tahoe-LAFS), does it support public-linking of files?  Or can it easily be integrated to work with something like that? 
 

perennate

New Member
Verified Provider
Huh.  I thought Tahoe-LAFS had problems operating non-locally.  Or am I wrong? 
How do you mean? Tahoe-LAFS is entirely designed to operate over the Internet, there are large "volunteer storage grids" that you can store files in for free even.

A few distributed filesystems: glusterfs, moosefs/lizardfs, xtreemfs (xtreemfs is also specifically designed to handle WAN use case, although highly experimental); all can be mounted via FUSE.

Edit: what do you mean by "public-linking of files"? Most of these are designed for storage and filesystem only, not with extra things like web frontend access. You could for example deploy ownCloud on top of any distributed filesystem if you wanted to.
 
Last edited by a moderator:

drmike

100% Tier-1 Gogent
Anyone know a good and easy way to create a self-hosted distributed storage system?

This can either be a local cluster (via LAN), or one through the interweb (aka like hooking up a bunch of cloud servers up to make a distributed storage).  

I think it'd be awesome to backup all my work onto a storage "cluster" (per se) and have that 200 points of backup.  Redundancy would be awesome (as in one server drops off the face of the earth (or two) and everything would still be fine).  

Anyways, hit me with your best!  Also, if it could be mounted/dropbox-style sync app, that'd be major plus! 

This is sort of why I am experimenting with BTSync, Syncthing, etc.  Approach is to take generic low resource machines and install said daemon and then configure the replication pool [directory of files to replicate and where to replicate them to].

This is an easier approach and suitable for single user world vs. complexity of true redundant disk and OS based replication solution = $$$$ + time + complexity.

Also, I forgot to ask (haven't done much research on Tahoe-LAFS), does it support public-linking of files?  Or can it easily be integrated to work with something like that? 
Public linking would be sharing a file outside of your baked solution via a URL to say anyone?   Yeah the file system solutions aren't going to handle such.  That's another layer of solutions, like as recommended, OwnCloud.   Anything Python or PHP with file exposure and delivery would suffice.

Up for the conversation / ideas on the file exposure, as I am not interested in OwnCloud whatsoever.  Something lighter, not PHP, ideally with own embedded and baked in web server is more my speed.
 

HalfEatenPie

The Irrational One
Retired Staff
How do you mean? Tahoe-LAFS is entirely designed to operate over the Internet, there are large "volunteer storage grids" that you can store files in for free even.

A few distributed filesystems: glusterfs, moosefs/lizardfs, xtreemfs (xtreemfs is also specifically designed to handle WAN use case, although highly experimental); all can be mounted via FUSE.

Edit: what do you mean by "public-linking of files"? Most of these are designed for storage and filesystem only, not with extra things like web frontend access. You could for example deploy ownCloud on top of any distributed filesystem if you wanted to.
Hit the nail right on the head.  Thanks for that.  Yeah my writing could/should have been better.  

This is sort of why I am experimenting with BTSync, Syncthing, etc.  Approach is to take generic low resource machines and install said daemon and then configure the replication pool [directory of files to replicate and where to replicate them to].

This is an easier approach and suitable for single user world vs. complexity of true redundant disk and OS based replication solution = $$$$ + time + complexity.

Public linking would be sharing a file outside of your baked solution via a URL to say anyone?   Yeah the file system solutions aren't going to handle such.  That's another layer of solutions, like as recommended, OwnCloud.   Anything Python or PHP with file exposure and delivery would suffice.

Up for the conversation / ideas on the file exposure, as I am not interested in OwnCloud whatsoever.  Something lighter, not PHP, ideally with own embedded and baked in web server is more my speed.
Yeah.  I think I'll (sooner or later) be possibly using Owncloud or something similar to that nature for it.  
 

drmike

100% Tier-1 Gogent
Owncloud could be alright I suppose... I am apprehensive about the whole software stack to get it running and maintaining that.  Too many moving/breakable pieces for my patience.

Last go round with OwnCloud I was rather underwhelmed with the actual finished user experience.  

But this approach of formatted disk on whatever mirrored to whatever else with an access layer like OwnCloud, that's what we all need.   Anyone working on cobbling such here would get a good bit of interest from the community I'd think.
 

nunim

VPS Junkie
Other alternative to OwnCloud that doesn't get enough love... and it's more privacy focused:

https://spideroak.com/
Maybe I'm blind but I couldn't find any links to the server download/anything about hosting it myself.

SparkleShare seems like it could work, seems to be basically just git, but you'd have to rig up your own access control:

http://sparkleshare.org/

I setup OwnCloud recently, the desktop clients work fairly well but the WebUI is slowwwwww...
 
Last edited by a moderator:

splitice

Just a little bit crazy...
Verified Provider
On LAN, I use GlusterFS.

For storage of web data (heavy stat-less caching used on PHP to prevent overload) and other shared resources. Both for redundancy and general reliability (when compared with NFS).

Also its in use purely for reliable shared access to configuration for configuration files used in two cluster. Plans are also to extend it to another cluster's configuration files once the ACL's are worked out (that cluster spans between two datacenters!)

Haven't experienced any issues that have not been self-induced (strange things happen if you run out of disk on one node).
 

drmike

100% Tier-1 Gogent
I am experimenting with Seafile and Syncthing..  Seafile just bombed following a reboot - no clue what is up there.  Since coder dude, yeah I avoid such projects since inevitably such is too much for one lonesome guy to keep up with.

Syncthing is getting my attention currently and it's nice, but purely replication, no grand features or access.  Pretty straightforward.  Probably best compared to BTSync I'd say.

Maybe I'm blind but I couldn't find any links to the server download/anything about hosting it myself.
Download is up on the nav @nunim:

https://spideroak.com/opendownload/
 

drmike

100% Tier-1 Gogent
This actually looks quite good, I'll give it a try later. Does anyone know if this includes bandwidth limiting?


EDIT: Found it in the config file https://discourse.syncthing.net/t/config-file-and-directory/204
Good luck with the bandwidth limiting.  In my experience, it doesn't seem to work in the latest release.   Interested if it actually does for you.   Could be it doesn't like my network setup, VPN, other stuff somehow.  Shouldn't have any bearing, but some things do break with such :)

Ummm Syncthing has web based config you realize?  Starts on 127.0.0.1:8080.  You can adjust that value in the config file if in there... when you do the web interface, up top is a button for various config options... Be sure to lock down and secure your admin panel with a username and password :)
 

trewq

Active Member
Verified Provider
Good luck with the bandwidth limiting. In my experience, it doesn't seem to work in the latest release. Interested if it actually does for you. Could be it doesn't like my network setup, VPN, other stuff somehow. Shouldn't have any bearing, but some things do break with such :)


Ummm Syncthing has web based config you realize? Starts on 127.0.0.1:8080. You can adjust that value in the config file if in there... when you do the web interface, up top is a button for various config options... Be sure to lock down and secure your admin panel with a username and password :)
That's for that, only had a quick look so wasn't sure about the web interface capabilities.


What OS are you using that the limiting isn't working on? OpenVPN?


I use tinc for VPN is so easy.
 

HalfEatenPie

The Irrational One
Retired Staff
Ooh tinc for a VPN network to hook up all the storage systems does sound interesting.

Besides for security concerns, any benefit over this from? (simply curious)
 

MartinD

Retired Staff
Verified Provider
Retired Staff
Joepie wrote a really good tutorial for this very thing. I think I posted it on here when I was doing something similar, with his position of course!
 

HalfEatenPie

The Irrational One
Retired Staff
Hm.  I don't see it as far as I can tell.

Although I did find the LET topic on it.  

Also seems Shovenose posted about it a while back here: 
 
Top
amuck-landowner