Jonhoo's comments

Jonhoo | 2 years ago | on: Readyset: A MySQL and Postgres wire-compatible caching layer

Just poking my head in to say that I technically never departed ReadySet — what happened was that I co-founded the company, but was so burnt-out when it came to databases after my PhD that I decided to leave the running of the company to others. Then, US visa regulations made it so that I couldn't really be involved _at all_ if I wasn't an actual employee, which meant I truly was "just a founder" with no real involvement in the company's execution if you will. Now that I'm back in Europe, that's changing a bit, and I have regular calls with the CEO and such!

Jonhoo | 2 years ago | on: Build your own BitTorrent

Hope you enjoy it! I have captions "enabled" (and primary language set) on all my videos, but my experience has been that YouTube is very hit-or-miss with whether it adds auto-caption to longer videos (somewhere around 2h seems to be the limit). Sometimes it appears later, it just takes a while, other times it just never manifests. It's unfortunate, but as far as I can tell there's nothing I can do about it :'(

Jonhoo | 3 years ago | on: How PlanetScale Boost serves SQL queries faster

:wave: Author of the paper this work is based on here.

I'm so excited to see dynamic, partially-stateful data-flow for incremental materialized view maintenance becoming more wide-spread! I continue to think it's a _great_ idea, and the speed-ups (and complexity reduction) it can yield are pretty immense, so seeing more folks building on the idea makes me very happy.

The PlanetScale blog post references my original "Noria" OSDI paper (https://pdos.csail.mit.edu/papers/noria:osdi18.pdf), but I'd actually recommend my PhD thesis instead (https://jon.thesquareplanet.com/papers/phd-thesis.pdf), as it goes much deeper about some of the technical challenges and solutions involved. It also has a chapter (Appendix A) that covers how it all works by analogy, which the less-technical among the audience may appreciate :) A recording of my thesis defense on this, which may be more digestible than the thesis itself, is also online at https://www.youtube.com/watch?v=GctxvSPIfr8, as well as a shorter talk from a few years earlier at https://www.youtube.com/watch?v=s19G6n0UjsM. And the Noria research prototype (written in Rust) is on GitHub: https://github.com/mit-pdos/noria.

As others have already mentioned in the comments, I co-founded ReadySet (https://readyset.io/) shortly after graduating specifically to build off of Noria, and they're doing amazing work to provide these kinds of speed-ups for general-purpose relational databases. If you're using one of those, it's worth giving ReadySet a look to get these kinds of speedups there! It's also source-available @ https://github.com/readysettech/readyset if you're curious.

Jonhoo | 3 years ago | on: Introducing ReadySet

I'm pretty excited about it too! I remember when I initially started the research I was amazed that this didn't already exist.

Some context: https://twitter.com/jonhoo/status/1511401461669720068

Basically, I co-founded the company around the time I graduated, but had had my fill of database research after six years of PhD. So I joined AWS to work on Rust while Alana (the CEO) took on leading ReadySet.

Jonhoo | 6 years ago | on: Dashmap: Fast concurrent HashMap for Rust

Nope, Dashmap is all xacrimon, and came on the scene long before my port. We've been collaborating on writing a shared benchmarking suite over at https://github.com/jonhoo/bustle/ though. For the time being, it looks like Dashmap outperforms the port of ConcurrentHashMap (called "flurry"), often by a significant amount. It seems to be mainly due to the garbage collection scheme flurry uses, but we're still digging into it (maybe you want to come help?).

In any case, I'm glad you enjoy the videos!

Jonhoo | 6 years ago | on: Dashmap: Fast concurrent HashMap for Rust

I can't speak to the implementation differences between the two, but I know the author of dashmap is relatively active in responding online, so they may show up shortly to explain. In terms of performance comparisons, we're actually working on building a shared benchmarking tool for all of Rust's concurrent maps that you may find interesting: https://github.com/jonhoo/bustle.

Jonhoo | 6 years ago | on: The Missing Semester of Your CS Education

Over the years, we (@anishathalye, @jjgo, @jonhoo) have helped teach several classes at MIT, and over and over we have seen that many students have limited knowledge of the tools available to them. Computers were built to automate manual tasks, yet students often perform repetitive tasks by hand or fail to take full advantage of powerful tools such as version control and text editors. Common examples include holding the down arrow key for 30 seconds to scroll to the bottom of a large file in Vim, or using the nuclear approach to fix a Git repository (https://xkcd.com/1597/).

At least at MIT, these topics are not taught as part of the university curriculum: students are never shown how to use these tools, or at least not how to use them efficiently, and thus waste time and effort on tasks that should be simple. The standard CS curriculum is missing critical topics about the computing ecosystem that could make students’ lives significantly easier.

To help mitigate this, we ran a short lecture series during MIT’s Independent Activities Period (IAP) that covered all the topics we consider crucial to be an effective computer scientist and programmer. We’ve published lecture notes and videos in the hopes that people outside MIT find these resources useful.

To offer a bit of historical perspective on the class: we taught this class for the first time last year, when we called it “Hacker Tools” (there was some great discussion about last year’s class here: https://news.ycombinator.com/item?id=19078281). We found the feedback from here and elsewhere incredibly helpful. Taking that into account, we changed the lecture topics a bit, spent more lecture time on some of the core topics, wrote better exercises, and recorded high-quality lecture videos using a fancy lecture capture system (and this hacky DSL for editing multi-track lecture videos, which we thought some of you would find amusing: https://github.com/missing-semester/videos).

We’d love to hear any insights or feedback you may have, so that we can run an even better class next year!

-- Anish, Jose, and Jon

Jonhoo | 7 years ago | on: MIT Hacker Tools: a lecture series on programmer tools

So, the term "hacker" and "hacking" has traditionally meant someone who is exploring the limits of systems (whether computer-based on otherwise) in a playful manner, often finding "clever" ways to make the system behave in unintended ways. The term is arguably even broader than that — see the Wikipedia entry on Hacker Culture[1] for more. The use of "hacker" to mean the exploitation of computer systems is more recent, and was traditionally referred to as "cracking". The two terms do have a decent amount of overlap, but the latter is much more commonly associated with malicious intent, whereas hacking really just means that a desire to push systems to their limits and see what kind of cleverness you can pull off!

[1]: https://en.wikipedia.org/wiki/Hacker_culture

Jonhoo | 7 years ago | on: MIT Hacker Tools: a lecture series on programmer tools

Hi all! We (@anishathalye, @jjgo, and @jonhoo) have long felt that while university CS classes are great at teaching specific topics, they often leave it to students to figure out a lot of the common knowledge about how to actually use your computer. And in particular, how to use it efficiently.

There’s just no class in the undergrad curriculum that teaches you how to become familiar with the system you’re working with! Students are expected to know about, or figure out, the shell, editors, remote access and file management, version control, debugging and profiling utilities, and all sorts of other useful tools on their own. Often times, they won’t even know that many of these tools exist, and instead do things in roundabout ways or simply be left frustrated about their development environment.

To help mitigate this, we decided to run this short lecture series at MIT during the January Independent Activities Period that we called “Hacker Tools” (in reference to “hacker culture”, not hacking computers). Our hope was that through this class, and the resulting lecture materials and videos, we might be able to bootstrap students’ knowledge about the tools that are available to them, which they can then put to use throughout their time at university, and beyond.

We’ve shared both the lecture notes and the recordings of the lectures in the hopes that people outside of MIT may also find these resources useful in making better use of their tools. If that turns out to be true, we’re also thinking of re-doing the videos in screen-cast style with live chat and a proper microphone when we get the time. If that sounds interesting to you, and if you have ideas about other things you’d like to see us cover, please leave a comment below; we’d love to hear from you!

We’re sure there are also plenty of cool tools that we didn’t get to cover in this series that you all know and love. Please share them below along with a short description so we can all learn something new!

Anish, Jose, and Jon

Jonhoo | 7 years ago | on: Rust at speed – building a fast concurrent database [video]

@NovaX it's worth noting that swapping out the underlying hash map that evmap uses is really easy, which is one of the advantages of the design as far as I'm concerned! For example, here's the diff for moving to someone else's custom hash map implementation: https://github.com/jonhoo/rust-evmap/compare/hashbrown. A better benchmarking harness is a good idea, though I'd like to see something that is somewhat disconnected from evmap! Pre-generating the randomness is something I've done in the past, but it does have the downside that you end up not actually exercising the distribution well (unless you generate vast amounts of keys)...

Jonhoo | 7 years ago | on: Rust at speed – building a fast concurrent database [video]

I did a quick-test now, and with 16 cores + FNV hashing + disabling hyperthreads, I got ~41M ops/s total on those cores. That got me to digging a little further, and I decided to use the same benchmark harness to benchmark just a std::collections::HashMap with FNV hashing. Running a single-threaded write-then-read benchmark yields a throughput of 5M reads/s, which is about 2x that of evmap (which seems like a reasonable overhead).

Digging a little further, I realized that the benchmarker spends a bunch of time on generating random numbers. In particular, generating a Zipf-distributed number (which the benchmarker does even when you run uniform) takes about 100ns: https://github.com/jonhoo/rust-zipf, which sets an upper limit of 10M ops/s that the benchmarker can measure. With Zipf removed entirely, I can get the std HashMap up to 8M reads/s, but no higher, which makes me think that the map really then is the bottleneck (generating a uniformly random number takes ~5ns, so shouldn't be the bottleneck). Running evmap with uniform and the Zipf-generation removed gives 73M reads/s, so ~4.5M reads/s/thread, which is again about 1/2 of the standard library HashMap with no synchronization.

So, all that said, I do not believe this is an error in evmap. Rather, if you believe 8M read/s per core on a HashMap is slow, then it's the Rust HashMap implementation in the standard library that is slow. evmap's 2x overhead doesn't seem that bad to me.

I'm glad I dug through this though!

page 1