top | item 697360

Ask HN: Distributed File System

28 points| ErrantX | 16 years ago

Hey peeps.

So I need some recommendations.

I've been building a distributed file system for work to store our hash tables with. These are 1 gig files (about 40TB worth of them) that are write once, read many.

It needs to duplicate the data across servers and make the file available via HTTP. Oh and it needs to scale quite well because as from next month we are potentially adding another TB per month.

So far I havent been able to find a DFS that does all the above so have been working on my own. But am nervous - the files are mission critical but I am not too worried about losing stuff per se (there are alternative backup solutions that make sure we have multiple static copies safe). Im more worried about not being able ot cope with the load. My current implementation is in Python and simply uses a central MYSQL server to track file locations.

So. Can anyone recommend a DFS option I have missed that fulfills my requirements. Or even better can anyone offer technical ideas to help with the development of our code.

34 comments

[+] comice|16 years ago|reply

GlusterFS is a FUSE based distributed filesystem. You can mirror the files on as many servers as you like, aggregate storage from different servers. It has no central metadata server either.

It doesn't serve via HTTP directly, but it's easy to point a web server at the filesystem (it has an Apache module to provide direct access without going through FUSE if you want higher performance).

http://www.gluster.org

[+] bjclark|16 years ago|reply

I highly recommend against glusterfs. We used to use it at my current employer and had nothing but trouble serving large files from it. It was fine until we starting putting large files in it, then healing would randomly bring the whole server to it's knees.

[+] kierank|16 years ago|reply

MogileFS is what you're looking for. Native http support with a MySQL tracker.

[+] cperciva|16 years ago|reply

I'm not sure I understand what the problem is. You say that you've got approximately 40000 files; a database which keeps track of where those files are stored is a trivially small problem.

[+] ErrantX|16 years ago|reply

It needs to scale robustly to support at least a million files (with 3 duplications - so 4 million entries minimum) and accept really high load (min. 1000 requests simultaneously, better 5000 requests simultaneously). The main concern is MYSQL is the single point of failure.

[+] Tichy|16 years ago|reply

Not sure if something like Cassandra would help? http://en.wikipedia.org/wiki/Cassandra_%28database%29

I think there must be open source clones of the FS Google uses, but I don't know the names.

[+] ErrantX|16 years ago|reply

a GFS clone was the angle I was looking at.

Cassandra looks pretty fun - are you suggesting that as the database right? Im thinking a quick python implementation for PUT (maybe DELETE) and meta operations, using cassandra as a backend and Lighttpd for the GET (high performance) might work..... cheers.

[+] gamache|16 years ago|reply

I am surprised that no one has mentioned Amazon S3. You will definitely plunk down a few bucks to store 40TB there, but it's a ready-made solution to your problem.

Edit: OK, now I see that S3 was suggested but vetoed. Unwise decision, in my opinion. If you need data security, encrypt your data.

[+] ErrantX|16 years ago|reply

Bandwidth costs would probably kill us TBH. Ultimately we will have 20,000 clients connecting at least 3 times a week and uploading once a fortnight. It adds up to about 500TB of new data a year.

S3 is great for us atm but the work to encrypt it is too much because the cost doesnt scale for us long term. Potentially within a couple of years we will be spending 1/2 a million on bandwith and 1/2 a million on storage - which will continue upwards :)

Were facing the Google model: commodity hardware on a cheap T1 connection :)

[+] simonw|16 years ago|reply

Sounds like a job for the Hadoop HDFS.

[+] tk999|16 years ago|reply

agree. You need to look into Hadoop HDFS. (a GFS clone...)

[+] ajb|16 years ago|reply

Hmm, sounds like tahoe may do what you want: http://allmydata.org/trac/tahoe

[+] Top80sSons|16 years ago|reply

Tahoe, looks very promising, but I've been unable to install it any time I tried. Do you know any enviroment it would work out the box? Tried Fedora 11, Ubuntu 9 and Windows XP myself.

[+] fh973|16 years ago|reply

I suggest you also check out XtreemFS (http://www.xtreemfs.org/). It supports striping/parallel IO and your throughput will scale with the number of storage servers you deploy (we ran at GB/s, not Gbits, on a single file). It gives you full file system semantics (you mount it via FUSE) and is quite easy to setup.

[+] bjko|16 years ago|reply

XtreemFS can also replicate write-once files over the WAN in addition to striping. There is a screencast at http://www.youtube.com/watch?v=0co_-_e0Hq4

[+] dizz|16 years ago|reply

Ceph? http://ceph.newdream.net/about/

[+] speek|16 years ago|reply

Ever thought about getting some super-duper SAN toys?

[+] ErrantX|16 years ago|reply

Teensy bit expensive :)

We have 4000 clients atm but hope to increase that quickly (20,000 within a year). That's TB's of data a month. SAN was considered but it too expensive to scale :( EDIT: well, not over the top expensive. But commodity servers with software is cheaper/better for us.

Also we at some point need a HTTP interface to the outside world.

[+] gcv|16 years ago|reply

I'm about to look into somewhat similar requirements for an app I'm consulting on, and I plan to research GlusterFS first.