# Self-hosted Distributed Storage



## HalfEatenPie (Sep 26, 2014)

Alright so lets talk about this.

Anyone know a good and easy way to create a self-hosted distributed storage system?

This can either be a local cluster (via LAN), or one through the interweb (aka like hooking up a bunch of cloud servers up to make a distributed storage).  

I think it'd be awesome to backup all my work onto a storage "cluster" (per se) and have that 200 points of backup.  Redundancy would be awesome (as in one server drops off the face of the earth (or two) and everything would still be fine).  

Anyways, hit me with your best!  Also, if it could be mounted/dropbox-style sync app, that'd be major plus!


----------



## rds100 (Sep 26, 2014)

Tahoe-lafs? I don't know it's best but it should be OK for what you want.


----------



## HalfEatenPie (Sep 26, 2014)

rds100 said:


> Tahoe-lafs? I don't know it's best but it should be OK for what you want.


Huh.  I thought Tahoe-LAFS had problems operating non-locally.  Or am I wrong?


----------



## HalfEatenPie (Sep 26, 2014)

rds100 said:


> Tahoe-lafs? I don't know it's best but it should be OK for what you want.


Also, I forgot to ask (haven't done much research on Tahoe-LAFS), does it support public-linking of files?  Or can it easily be integrated to work with something like that?


----------



## perennate (Sep 26, 2014)

HalfEatenPie said:


> Huh.  I thought Tahoe-LAFS had problems operating non-locally.  Or am I wrong?


How do you mean? Tahoe-LAFS is entirely designed to operate over the Internet, there are large "volunteer storage grids" that you can store files in for free even.

A few distributed filesystems: glusterfs, moosefs/lizardfs, xtreemfs (xtreemfs is also specifically designed to handle WAN use case, although highly experimental); all can be mounted via FUSE.

Edit: what do you mean by "public-linking of files"? Most of these are designed for storage and filesystem only, not with extra things like web frontend access. You could for example deploy ownCloud on top of any distributed filesystem if you wanted to.


----------



## drmike (Sep 26, 2014)

HalfEatenPie said:


> Anyone know a good and easy way to create a self-hosted distributed storage system?
> 
> This can either be a local cluster (via LAN), or one through the interweb (aka like hooking up a bunch of cloud servers up to make a distributed storage).
> 
> ...



This is sort of why I am experimenting with BTSync, Syncthing, etc.  Approach is to take generic low resource machines and install said daemon and then configure the replication pool [directory of files to replicate and where to replicate them to].

This is an easier approach and suitable for single user world vs. complexity of true redundant disk and OS based replication solution = $$$$ + time + complexity.



HalfEatenPie said:


> Also, I forgot to ask (haven't done much research on Tahoe-LAFS), does it support public-linking of files?  Or can it easily be integrated to work with something like that?


Public linking would be sharing a file outside of your baked solution via a URL to say anyone?   Yeah the file system solutions aren't going to handle such.  That's another layer of solutions, like as recommended, OwnCloud.   Anything Python or PHP with file exposure and delivery would suffice.

Up for the conversation / ideas on the file exposure, as I am not interested in OwnCloud whatsoever.  Something lighter, not PHP, ideally with own embedded and baked in web server is more my speed.


----------



## HalfEatenPie (Sep 26, 2014)

perennate said:


> How do you mean? Tahoe-LAFS is entirely designed to operate over the Internet, there are large "volunteer storage grids" that you can store files in for free even.
> 
> A few distributed filesystems: glusterfs, moosefs/lizardfs, xtreemfs (xtreemfs is also specifically designed to handle WAN use case, although highly experimental); all can be mounted via FUSE.
> 
> Edit: what do you mean by "public-linking of files"? Most of these are designed for storage and filesystem only, not with extra things like web frontend access. You could for example deploy ownCloud on top of any distributed filesystem if you wanted to.


Hit the nail right on the head.  Thanks for that.  Yeah my writing could/should have been better.  



drmike said:


> This is sort of why I am experimenting with BTSync, Syncthing, etc.  Approach is to take generic low resource machines and install said daemon and then configure the replication pool [directory of files to replicate and where to replicate them to].
> 
> This is an easier approach and suitable for single user world vs. complexity of true redundant disk and OS based replication solution = $$$$ + time + complexity.
> 
> ...


Yeah.  I think I'll (sooner or later) be possibly using Owncloud or something similar to that nature for it.


----------



## drmike (Sep 26, 2014)

Owncloud could be alright I suppose... I am apprehensive about the whole software stack to get it running and maintaining that.  Too many moving/breakable pieces for my patience.

Last go round with OwnCloud I was rather underwhelmed with the actual finished user experience.  

But this approach of formatted disk on whatever mirrored to whatever else with an access layer like OwnCloud, that's what we all need.   Anyone working on cobbling such here would get a good bit of interest from the community I'd think.


----------



## drmike (Sep 26, 2014)

Other alternative to OwnCloud that doesn't get enough love... and it's more privacy focused:

https://spideroak.com/


----------



## nunim (Sep 26, 2014)

drmike said:


> Other alternative to OwnCloud that doesn't get enough love... and it's more privacy focused:
> 
> https://spideroak.com/


Maybe I'm blind but I couldn't find any links to the server download/anything about hosting it myself.

SparkleShare seems like it could work, seems to be basically just git, but you'd have to rig up your own access control:

http://sparkleshare.org/

I setup OwnCloud recently, the desktop clients work fairly well but the WebUI is slowwwwww...


----------



## splitice (Sep 26, 2014)

On LAN, I use GlusterFS.

For storage of web data (heavy stat-less caching used on PHP to prevent overload) and other shared resources. Both for redundancy and general reliability (when compared with NFS).

Also its in use purely for reliable shared access to configuration for configuration files used in two cluster. Plans are also to extend it to another cluster's configuration files once the ACL's are worked out (that cluster spans between two datacenters!)

Haven't experienced any issues that have not been self-induced (strange things happen if you run out of disk on one node).


----------



## HalfEatenPie (Sep 26, 2014)

drmike said:


> Other alternative to OwnCloud that doesn't get enough love... and it's more privacy focused:
> 
> https://spideroak.com/


Ya know, another alternative is Seafile.  My only worry though is that it's maintained by a single individual.


----------



## perennate (Sep 27, 2014)

This one's also interesting, it's like btsync, doesn't need central server -- http://syncthing.net/


----------



## drmike (Sep 27, 2014)

I am experimenting with Seafile and Syncthing..  Seafile just bombed following a reboot - no clue what is up there.  Since coder dude, yeah I avoid such projects since inevitably such is too much for one lonesome guy to keep up with.

Syncthing is getting my attention currently and it's nice, but purely replication, no grand features or access.  Pretty straightforward.  Probably best compared to BTSync I'd say.



nunim said:


> Maybe I'm blind but I couldn't find any links to the server download/anything about hosting it myself.


Download is up on the nav @nunim:

https://spideroak.com/opendownload/


----------



## trewq (Sep 27, 2014)

perennate said:


> This one's also interesting, it's like btsync, doesn't need central server -- http://syncthing.net/


This actually looks quite good, I'll give it a try later. Does anyone know if this includes bandwidth limiting?

EDIT: Found it in the config file https://discourse.syncthing.net/t/config-file-and-directory/204


----------



## drmike (Sep 27, 2014)

trewq said:


> This actually looks quite good, I'll give it a try later. Does anyone know if this includes bandwidth limiting?
> 
> 
> EDIT: Found it in the config file https://discourse.syncthing.net/t/config-file-and-directory/204


Good luck with the bandwidth limiting.  In my experience, it doesn't seem to work in the latest release.   Interested if it actually does for you.   Could be it doesn't like my network setup, VPN, other stuff somehow.  Shouldn't have any bearing, but some things do break with such 

Ummm Syncthing has web based config you realize?  Starts on 127.0.0.1:8080.  You can adjust that value in the config file if in there... when you do the web interface, up top is a button for various config options... Be sure to lock down and secure your admin panel with a username and password


----------



## trewq (Sep 27, 2014)

drmike said:


> Good luck with the bandwidth limiting. In my experience, it doesn't seem to work in the latest release. Interested if it actually does for you. Could be it doesn't like my network setup, VPN, other stuff somehow. Shouldn't have any bearing, but some things do break with such
> 
> 
> Ummm Syncthing has web based config you realize? Starts on 127.0.0.1:8080. You can adjust that value in the config file if in there... when you do the web interface, up top is a button for various config options... Be sure to lock down and secure your admin panel with a username and password


That's for that, only had a quick look so wasn't sure about the web interface capabilities.


What OS are you using that the limiting isn't working on? OpenVPN?


I use tinc for VPN is so easy.


----------



## HalfEatenPie (Sep 27, 2014)

Ooh tinc for a VPN network to hook up all the storage systems does sound interesting.

Besides for security concerns, any benefit over this from? (simply curious)


----------



## MartinD (Sep 27, 2014)

Joepie wrote a really good tutorial for this very thing. I think I posted it on here when I was doing something similar, with his position of course!


----------



## HalfEatenPie (Sep 27, 2014)

Hm.  I don't see it as far as I can tell.

Although I did find the LET topic on it.  

Also seems Shovenose posted about it a while back here:


----------



## MartinD (Sep 27, 2014)

Difficult to search on my phone, will look when I'm on a laptop later


----------



## Francisco (Sep 28, 2014)

Question for everyone tinkering with SyncThing - Does it sync files opened for writing?

I'm mostly curious if it syncs streaming data. Namely, you're WGET'ing a file to the FS,

is it syncing all of that on the fly or does it wait for the file to be complete before

blasting it off?

Francisco


----------



## drmike (Sep 28, 2014)

Francisco said:


> Question for everyone tinkering with SyncThing - Does it sync files opened for writing?
> 
> 
> I'm mostly curious if it syncs streaming data. Namely, you're WGET'ing a file to the FS,
> ...


Good question....   

I'll fire up a wget here in a few and see... big ISO maybe


----------



## drmike (Sep 28, 2014)

So @Francisco  SyncThing.....

Rescan Interval

30 s

Lowered that from 60s.

Ran Debian ISO of 4GB~ DVD.  Slow downloading (BW + VPN lag).

SyncThing on the remote end (this in datacenter where I am replicating to):

Global Repository 237 items, ~1.19 GiB  Local Repository 237 items, ~634 MiB  Out Of Sync

On the local SyncThing master (here on my desk):

Global Repository 237 items, ~1.24 GiB  Local Repository 237 items, ~1.24 GiB  Out Of Sync 0 items, 0 B

The local repository continues to group in GiB size:

Global Repository 237 items, ~1.25 GiB  Local Repository 237 items, ~1.25 GiB  Out Of Sync 0 items, 0 B

On the local master repository here on my desk:

Download Rate 61 bps (261 KiB)  Upload Rate 208 kbps (54.4 MiB)  Address -:22000  Synchronization 74%

The upload rate = current throughput and total transfer to remote since this session was initiated.   This will at times drop and reconnect and thus may be subject to reset of values.

So it *apears* the remote replication is happening while I download the file locally with wget.

I need to figure out where SyncThing stashes partial file transfers like this to confirm where we are.


----------



## drmike (Sep 28, 2014)

The partial transfer file in SyncThing is given a period prefixed name:

.syncthing.debian-7.6.0-amd64-DVD-1.iso

= 70M

wget on the local server is up to about 750M currently....

Yes, my upload speeds are much slower than download speeds.

Will continue to watch and see how this does.


----------



## drmike (Sep 29, 2014)

That ISO is still chugging, slowly along on the replication.

Gets mega confusing if you have multiple web interfaces open for the end points in your SyncThing setup.  Cause on the master progress percentages aren't updated until the file is finished and complete on the other side.   In this instance I have like 1.6GB uploaded to remote and another 2GB+ to go on mirroring the ISO.

It appears to make some attempt to start/restart the files out of sync.   Waiting until done to see total bandwidth chewed up, but so far, looks right for file it is moving and not multiple retries or anything weird on full parts of files wasted/flushed...

I need to send up two ends on remote servers to pace this properly...  This week


----------

