top | item 10413618

Show HN: Goofys – a faster s3fs written in Go

40 points| khc | 10 years ago |github.com | reply

23 comments

order
[+] rictic|10 years ago|reply
I appreciate any project that makes its strengths and weaknesses very clear and documents its consistency semantics.

Reading around[1], it looks like close-to-open consistency means that all writes must be flushed before the close() call returns, and all subsequent open() calls to the file must see the changes. IO may otherwise be buffered or cached by the FS.

1] Best source I found were these lecture slides: http://www0.cs.ucl.ac.uk/staff/B.Karp/gz03/f2011/lectures/gz...

[+] notacoward|10 years ago|reply
I applaud the "Filey System" clarification. Confusion between things that are very close to 100% POSIX compliance and things that don't even have that as a goal but adopt similar interfaces or structures has been a real problem. It's nice to see someone being up front about the distinction, and it looks like a useful project too.
[+] khc|10 years ago|reply
Thank you! It's not a general purpose filesystem but I don't think that has to be the case to be useful. I am lazy and I want to do as little POSIX as possible.
[+] khc|10 years ago|reply
I started learning Go a month ago and wrote goofys in it as an exercise. Looking for feedbacks from Go best practices as well as usefulness of reduced-POSIX filesystems.
[+] edutechnion|10 years ago|reply
riofs (https://github.com/skoobe/riofs) seems much faster than s3fs and deserves a spot in any benchmarks.
[+] henningpeters|10 years ago|reply
Not surprising that s3fs is slower, it's implementation quality is not very high. Goofys looks much better on first sight. It would be nice if anybody could do a benchmark between goofys and riofs (without cache). But honestly, if you have some proper request handling there is not that much to tune. The biggest performance gains can be achieved from a good cache implementation and make such a system useful in a production environment, that's what riofs was written for.

Disclaimer: I initiated and supervised riofs.

[+] khc|10 years ago|reply
I've updated the benchmark to include riofs
[+] Goopplesoft|10 years ago|reply
This is cool. Any reason you chose not to cache reads locally (detect md5/mtime changes for a new read or something similar)?
[+] khc|10 years ago|reply
Many reasons:

* I am not working at the moment and have some free time (shameless plug: resume at my profile), so I want to bound the amount of time I need to get something useful

* many archiving/backup workloads are WORN (write once read never), that and many streaming workloads (data processing, media streaming) don't really benefit from cache. (Unless your cache is as big as your data, but that's usually not why people use S3)

* for the use cases that cache can help, I think you can just use another layer of caching filesystem. I intend to write one if one doesn't exist already. I wonder if you can use cachefs with fuse filesystems? Let me know your use cases and I will think about it some more.

[+] SergeyPopoff|10 years ago|reply
I want to see benchmark comparsion with the Rust version.
[+] khc|10 years ago|reply
I see what you are doing there :-P goroutines do make certain things easier but there's nothing you can do with good old pthreads. One of the motivations for this project is for me to understand better what this Go hype is about.