top | item 29403749

(no title)

knl | 4 years ago

That’s all fine and dandy, but it only means that there is some good/great research in the topic. Winning best paper awards doesn’t say a thing about the implementation and handling of various edge cases.

Like many research projects, this one will also probably last as long as there is funding. Remember, the goal of PhD students is to publish papers, not develop and maintain software. Thus, without skin in the game, I couldn’t trust my data/workloads to such systems.

discuss

order

tytso|4 years ago

The goal of academic research is to explore ideas, which are judged by submitting papers to conferences for review, and to train the next generation of academics (i.e., graduate students) in coming up with ideas, proving them, and then writing up said ideas and the proof that they work well. It is not to create a production quality software, which is an orthogonal set of goals and skills.

The key thing to remember is that THERE IS NOTHING WRONG WITH THIS ACADEMIC PROCESS. I go to the Filesystems and Storage Technology (FAST) conference, where many of these BetrFS papers were published, to harvest ideas which I might use in my production systems, and of course, to see if any of the graduate students who have decided that the academic life is not for them, whether they might come to work for my company[1]. I personally find the FAST conference incredibly useful on both of these fronts, and I think the BetrFS papers are super useful if you approach them from the perspective of being a proving ground for ideas, not as a production file system.

So it's unfortunate that people seem to be judging BetrFS on whether they should "trust my data/workloads to such systems", and complaining that the prototype is based on the 3.11 kernel. That's largely irrelevant from the perspective of proving such ideas. Now, I'm going to be much more harshly critical when someone proposes a new file system for inclusion in the upstream kernel, and claiming that it is ready for prime time, and then when I run gce-xfstests on it, we see it crashing right and left[2][3]. But that's a very different situation. You will notice that no one is trying to suggest that BetrFS is being submitted upstream.

A good example of how this works is the iJournaling paper[4], where the ideas were used as the basis for ext4 fast commits[5]. We did not take their implementation, and indeed, we simplified their design for simplicity/robustness/deployment concerns. This is an example of academic research creating real value, and shows the process working as intended. It did NOT involve taking the prototype code from the jJournaling research effort and slamming it into ext4; we reimplemented the key ideas from that paper from scratch. And that's as it should be.

[1] Oligatory aside: if you are interested in working on file systems and storage in the Linux kernel; reach out to me --- we're hiring! My contact information should be very easily found if you do a Google search, since I'm the ext4 maintainer

[2] https://lore.kernel.org/r/YQdlJM6ngxPoeq4U@mit.edu

[3] https://lore.kernel.org/all/YQgJrYPphDC4W4Q3@mit.edu/

[4] https://www.usenix.org/conference/atc17/technical-sessions/p...

[5] https://lwn.net/Articles/842385/

throwaway02201|4 years ago

I hope you are being downvoted for the harshness and not the content.

> Like many research projects, this one will also probably last as long as there is funding. Remember, the goal of PhD students is to publish papers, not develop and maintain software. Thus, without skin in the game, I couldn’t trust my data/workloads to such systems.

Sadly true. For-profit companies only care about $$$. Academia only cares about publishing to get funding.

Both options are not ideal for developing trusted and user-focused software in the long term. OpenSSL is a good example.

No-profits really struggle to get funding. Government grants are a mess.

The world really needs a new approach to R&D.

globular-toast|4 years ago

> Academia only cares about publishing to get funding.

That's just not true. To do well in academia you have to be truly invested in your field. You can just about get by if you're only in it for the papers, but it's just like getting by in a job that you're only in for the money. At the end of the day, though, in a world where everyone is forced to be productive or be homeless, there are times when publishing becomes a necessity. This doesn't mean they only care about publishing, though.

gnufied|4 years ago

Looks like only works with Linux kernel - 3.11? https://github.com/oscarlab/betrfs/blob/master/README.md , so definitely have not been updated for awhile.

I am not even sure it wants to be production ready but may be it is a playground for ideas.

donporter|4 years ago

Indeed, we are behind on releases. We do anticipate a major release, including 4.19 kernel support, in the coming months.

Part of our challenge is that we are also exploring non-standard extensions to the VFS API - largely supported by kallsyms + copied code to avoid kernel modifications. This makes rolling forward more labor intensive, but we are working to pay down this technical debt over time, or possibly make a broader case for a VFS API change.

all2|4 years ago

Extricating something from specific kernel API calls won't be fun. Might be a good learning experience, tho. I may take a crack at this in my spare time (I'm not good at C. At all. So this will be more learning for me, and much less functional).

mirekrusin|4 years ago

Tokutek's fractal tree was quite known when they did backend for mongodb on it with record breaking perf, from what I recall it was patented and that was the reason people didn't dive into it.

hhmc|4 years ago

Where did you get the impression that this is the product of PhD students?

lvh|4 years ago

Not OP, but the majority of people involved have .edu homepages (stints in industry, still research emphasis) and many of the alums appear to have become alums contemporaneously with the end of their academic career, most of them via Stony Brook, and finally there are a bunch of academic papers with authors clearly acting in their academic capacity (and typically prior to their stints in industry), so, IDK, seems like a reasonable assertion that this has a strong academic emphasis and a lot of the work was done by academic students. Whether it's actually unreliable is a different question, but it seems pretty reasonable to suggest that it's a research project and not a production filesystem.

knl|4 years ago

The sibling comment described it well. In addition, the majority of github commits are done by the people that are listed in the alumni section, while they where PhD students. There aren’t many commits from people listed as current members, and last significant commits are from the last year.

klyrs|4 years ago

Nobody is asking you to deploy this in your production system. This is about an experimental filesystem which supports exactly one version of the Linux kernel. It's neat to see progress in this field -- maybe try and learn something new?

And, the way to get production-ready code is to write a kernel module, with hopes that others in the kernel community will pick it up. Linux certainly didn't start out mature, but you're probably using it now.

_jal|4 years ago

It is both kind of hilarious and kind of terrifying to see this sort of anti-academic, anti-expert nonsense is bleeding in to %$&#ing software development.

All your written-in-production, battle-hardened code with no effete book-larnin' algorithms aren't going to run very well without a functional electricity grid.

rackjack|4 years ago

What is going on? The grandparent comment is merely noting the novelty of a filesystem utilizing a recently invented data structure. The parent is weirdly mentioning how they wouldn't trust a research filesystem for real work (who would...?). Now THIS comment is claiming the parent comment is anti-academic and anti-expert when it's actually mainly raising common concerns about the disconnect between theory and practice (then this comment mentions the electricity grid, as if that's of any relevance??). Just a really strange series of disconnects between the arguments.

freedomben|4 years ago

I'm extremely pro-academic, but I think you're taking the least charitable interpretation of the parent. While I fully disagree with the parent on the value proposition here, they are quite correct that (at least most) phds aren't concerned with implementation problems like corner cases and long term maintenance. There are of course exceptions, but having worked on quite a bit of academic code, I can say that anecdatally maintainability is not a high priority. It's very much like a typical PoC is in a startup.

toast0|4 years ago

I understand some of the frustration though. I was trying to do some audio processing work once. Found the paper(s), which promised code available from websites that are no longer available. Dug through the internet archive to find the zip files with the matlab code; managed to tweak it to run with the matlab version I have; found it works as described with the sample inputs, but crashes horribly on my inputs.

wittycardio|4 years ago

You realize that most of the core software that we depend on was built by graduate students right ? Idk why the average programmer assumes that PhDs in freaking computer science can't code. Implementation and edge cases are the easy part, the hard part is design and algorithms. One just requires some focused work , the other requires real skill and intelligence

knl|4 years ago

Sorry, but this is nonsense. Look at the chubby implementation and the subsequent paper - implementation and edge cases were the hard part, that took a lot of skill to get right. The algorithm is important, but labeling one as easy is far away from real world experiences.

I never assumed that PhD students can’t code. They can and they are pretty good at that. My point is that their incentives are in writing papers and running experiments that support claims in their papers, not produce reliable software. It might be reliable, but mostly it’s not. When we use tools build by PhD students, it’s usually when there are companies/startups built around it, and that is what I refer to as having skin in the game.

mbreese|4 years ago

> PhDs in freaking computer science can't code

I’ve known people with PhDs in computer science (from a top tier school) that couldn’t code. Their research was all done in Matlab for simulations, modeling a biological process. It was a very specific set of skills required. And at the time, this person couldn’t have written a web front end to a database to save their lives.

Just because one is good at the theory behind CS doesn’t mean they understand software engineering. Similarly, because one is good at the theory doesn’t mean they can’t code.

They are two related, but different, skill sets.

sfink|4 years ago

> average programmer assumes that PhDs in freaking computer science can't code.

Average programmer here. PhDs in computer science can't code.

Ok, it's an overgeneralization. And it's probably based on a flawed sample of job applicants that make it past HR screening to get to me. The base rate of applicants who can't code is disturbingly high, probably around 20%. (Not that high numerically, but given that they've passed pre-screening and have something impressive-sounding on their resume, it's too high.) The rate of applicants with a PhD in CS who can't code is way higher, probably around 60%.

Note that these tend to be fresh graduates. And it even makes sense -- most theses require just enough coding to prove a point or test a hypothesis. In fact, the people who like to code tend to get sucked into the coding and have trouble finishing the rest of their thesis work, which may start out interesting but soon gets way less fun than the coding part. Often such people bail out with an MS instead of a PhD.

(Source: personal experience, plus talking to people I've worked with, plus pulling stuff out of my butt.)

At the same time, many of the best coders I know have PhDs.

> Implementation and edge cases are the easy part, the hard part is design and algorithms.

Hahahaha. <Snarky comment suppressed, with difficulty.>

I agree that design and algorithms can be hard. (Though they usually aren't; the vast majority of things being built don't require a whole lot.) But the entire history of our field shows that even a working implementation Just Isn't Good Enough. Especially when what you're writing is exposed in a way that security vulnerabilities matter.

Though it's a bit of a false dichotomy. Handling the edge cases and the interaction with the rest of the system requires design, generally much more so than people give it credit for. Algorithms sometimes too, to avoid spending half your memory or code size on the 1% of edge cases.

gnufied|4 years ago

I have a feeling that - software along with hardware today has got lot more complicated than what was 30-40 years ago.

Most production software (esp low level stuff like Kernel, filesystem) today is written and maintained by people having that work as jobs. I wish it was any other way. Also, what users expect from production software is way different than situation 30-40 years ago. An Operating sytem must work for different CPU, GPU. A bare-bones OS is basically a non-starter. I mean look at Haiku-OS or any of other operating system projects, for most part they have gone nowhere.

A filesystem is also fairly complicated piece and what we expect from a filesystem is different. Speed is good but that is not the only criteria and I am afraid it does take serious engineering effort (edge cases and all) to get it usable on today's hardware.

WastingMyTime89|4 years ago

I think it’s more that people believe (in my opinion rightfully) that good design is a skill which comes with experience. That’s why I expect great algorithms and small software from graduate students and awesome design from established teams working on large scale problems.

That doesn’t really apply here obviously. The BetrFS team has experienced members.