Ask HN: What are the “best” codebases that you've encountered?

[+] tschwimmer|6 years ago|reply

Perhaps I'm jaded, but I notice that all the examples given here are developer tools or otherwise things with well scoped functional inputs and outputs (e.g. ffmpeg).

Anyone have an example of a consumer application that has a good codebase? Chromium, GitLab, OpenOffice, etc? I feel like such applications inherently have more spaghetti because the human problems they're aiming to solve are less concretly scoped. Even something as simple as "Take the data from this form and send it to the project manager" ends up being insanely complex and nitpicky. In what format should the data be sent? How do we know who the project manager is? Via what format should the data be sent? How should we notify the project manager? When should we send the report? Some of these decisions are inherently inelegant, so I feel like you get inelegant code.

[+] slondr|6 years ago|reply

NetHack's source code is absolutely beautiful. It's probably the most well-kept, well-designed, well-structured and well-implemented K&R C program in history. Although they are switching to ANSI syntax soon. It's even more amazing because NetHack is (1) a video game, where generally speaking code quality is sacrificed for efficiency, and (2) it has been developed by a constantly changing team of volunteers over the course of over thirty years, and there still isn't any spaghetti.

[+] westoncb|6 years ago|reply

It seems like that's how this question is always answered. I'd also be interested to see some good application code.

Also curious about some good, not too hugely sized, game code (preferably something not written in C/C++, maybe like an indie game from the past decade or so). Anyone know something?

[+] aidos|6 years ago|reply

I actually find Chromium to be a lot more approachable than I assumed it would be. Have only really checked out the layout / graphics part of Blink, but it’s laid out pretty intuitively.

But yeah, Postgres also gets my vote. I guess there’s a bit of a bias there because devs are likely to read the code of the tools they use; either to track downs bug or just to understand how it works.

[+] sharkjacobs|6 years ago|reply

The latest version of NetNewsWire is a nice example of a modern MacOS application codebase.

https://github.com/brentsimmons/NetNewsWire

[+] christophilus|6 years ago|reply

It’s just a demo app, but I find the ClojureScript re-frame implementation of the real world app to be the cleanest of the bunch: https://github.com/gothinkster/realworld/blob/master/README....

[+] bamboozled|6 years ago|reply

I think everything from Hashicorp is absolutely top-notch, working on their stuff taught me to be a much better Ruby and Go programmer.

[+] nstart|6 years ago|reply

Discourse is a wonderful code base to peruse. When I was just starting out building more complex web apps I repeatedly referred to it to learn common patterns such as how queues are used to offload tasks.

Just to drive the point home, I was developing in Python and with no knowledge of Ruby, I was able to go through code using just github and I got what I wanted every single time.

[+] wp381640|6 years ago|reply

It shouldn't be a surprise that developers are more likely to read and understand the source code of development tools

[+] laumars|6 years ago|reply

sqlite is the usual example of elegant code in a domain that often breads inelegant code.

[+] blueprint|6 years ago|reply

Check out https://github.com/mymonero/mymonero-app-ios

[+] lewaldman|6 years ago|reply

Redis...

[+] hugofirth|6 years ago|reply

For a C codebase, Postgres[1] wins for me hands down. It's clean and suuuuuuper well commente, such that with a little context you can dive into something very complex and still get a feel for what is going on.

[1]: https://github.com/postgres/postgres

[+] jakewins|6 years ago|reply

Second this; Postgres codebase is what got me out of the "good code is self documenting" nonsense. For those of us in the database space it is an incredible resource - and overall a great example of good code.

sqlite is much less complex, but similarly approachable.

In more recent examples, I think you see a lot of this same reader-centric pragmatic ethos in many Go projects. The Kubernetes codebase comes to mind as a very large tome that remains approachable. And the Go stdlib, of course.

Java generally falls on the opposite side, but there are counterexamples. A lot of Martin Thompsons code eschews Java "best practices" in favor of good code. Seeing competent people in the Java space "break the rules" helps.. though of course Java is forever hampered by having internalized illegible patterns as best practices in the first place.

It's a shame because at least the OpenJDK implementation of the standard library in Java is generally quite good, especially around the concurrency parts. Clean, easy to follow, reasonable comments. But of course that's Java written by C developers, mostly.

[+] unknown|6 years ago|reply

[deleted]

[+] rurban|6 years ago|reply

Back when I maintained postgresql on cygwin I did it because if its clean codebase. But eventually I got struck when trying to fix a build system bug when creating importlibs. On all other build systems it was easy to fix the bug, but no so with postgresql. I eventually gave up after years, and I think it's still broken.

A special mingw tool to create importlibs is/was broken on 64bit. I think it was called dlltool. Normally you'll just need to add a flag to the linker to create that.

So no, postgresql not.

[+] ekidd|6 years ago|reply

Anything by burntsushi, but especially xsv and ripgrep:

xsv: https://github.com/BurntSushi/xsv

ripgrep: https://github.com/BurntSushi/ripgrep

His code typically has extensive tests, helpful comments, and logical structure. It was fun trying to imitate his style when writing a PR for xsv.

The Quake 2 engine was also pretty interesting: It was almost totally undocumented, and it had plenty of weird things going on. But I could count on the weird things being there for a reason, if only I thought about it long enough.

[+] nindalf|6 years ago|reply

Seconding burntsushi. I learnt the basic Rust idioms by doing the advent of code last year and comparing my solution to his.

[+] mehrdadn|6 years ago|reply

Note: this assumes you speak Rust...

[+] bitwize|6 years ago|reply

NetBSD, hands down. Beautiful, simple to understand, consistent. The documentation is also top notch -- I wrote a trivial character-device kernel driver using only the man pages as a reference. And you can too.

Also -- the source code to Doom. Read it, marvel at its clarity and efficiency -- and then laugh when you realize that the recent console ports were completely rewritten in fucking Unity. And the Switch version chugs, despite the original running well on 486-class hardware.

[+] Wowfunhappy|6 years ago|reply

> and then laugh when you realize that the recent console ports were completely rewritten in fucking Unity. And the Switch version chugs, despite the original running well on 486-class hardware.

I wonder why they didn't just write an emulator, then. Especially on the Switch if there are performance issues.

[+] cameronbrown|6 years ago|reply

Unity sucks but as a hobbyist game developer it is a godsend. The only other way I can support all the platforms I want to (Android, web, PC) is through Web APIs directly, and nobody likes more Electron.

[+] hermitdev|6 years ago|reply

I used NetBSD as a reference when I needed a cross platform strptime that behaved identical everywhere.

I found the source very approachable. Source was well laid out and fairly clear. Some of it was subjectively a bit ugly to just look at, but when you read it, it was very clear.

Couldn't use glibc as a reference because this in a closed source commercial product and, well, GPL.

[+] ashafer|6 years ago|reply

I completely agree. I've gotten to work with both of these at my summer job and it's been an absolute pleasure.

[+] jihadjihad|6 years ago|reply

For canonical C code, without a doubt I would say Redis and Postgres. Redis is written and annotated in a way that even someone with a cursory knowledge of C can understand what's going on.

For Python, I really like how SQLAlchemy is written and designed.

For Rust, ripgrep stands out as a sterling example of how to write a powerful low-level utility like that.

[+] rsweeney21|6 years ago|reply

The Windows operating system.

Windows is quite an engineering achievement. We didn't prioritize readability or "clean code". All the variables used hungarian notation, so you had horrible names like lpszFileName (lpsz = long pointer to a zero terminated string) or hwndSaveButton (window handle). You also had super long if(SUCCEEDED(hr)) chains that looked like your code was spilling down a staircase. Oh yeah, and pidls (pronounced "piddles" and short for "pointer to an id list") used for file operations.

What made the code base beautiful was the extreme lengths we went to to be fast and keep 3rd party software working. WndProcs seem clunky, but they are elegant in their own way and blazingly fast. All throughout the code base you would find stuff like "If application = Corel Draw, don't actually free the memory for this window handle because Corel uses it after sending a WM_DESTROY message."

The fact that thousands of people worked on the code base was mind boggling.

[+] inlined|6 years ago|reply

I worked on Windows for 3.5 years and hated most the code I touched:

1. I think I counted 5 string implementations in active use and code at the boundary had to convert between them all.

2. The SUCCEEDED macro is a mask against HRESULT but who the hell actually uses non-zero HRESULTS to communicate domain-specific success codes? And don’t forget that posix APIs return 0-for-non-error ints and COM APIs can use S_TRUE (0 to be a non-error) and S_FALSE (1) so you have to flip them for real bools. Or have if (bResult == S_TRUE)

3. Nobody wanted to touch old codebases. I fixed an assert in Trident layout code because a whole library used upper-left, lower-right input (and params called ul, lr) but one function (contrary to docs) used upper-left, width, & height. When I fixed the library and 2/3 call sites I was called arrogant, to revert changes in the library, and change the last 1/3 to also have the inverse bug in its call-site.

4. Another Trident API (written by an intern) had a tree where fastInsert() could only be called after slowLookup() but nothing in the api enforces this

5. Every COM object decides whether it’s faster or thread-safe by whether the refcount uses atomic ops or just —/++

6. Saw parallel arrays in files where a struct held an object which might have suffered the slicing problem in insert. Another struct field held an into the index of the sliced part array. Users rehydrated. This wouldn’t happen with an object pointer, but indirection was unacceptable because the author didn’t trust the small allocation heap’s locality.

7. My codebase included a while c++ runtime because my core-OS team didn’t trust msvcrt.dll because the shell team wrote it.

[+] breck|6 years ago|reply

I once tried to add a feature to Windows (back in the sd days), and it was a nightmare. I was in the C&E org so it was a side project thing, and I eventually postponed it (but then left MS so never got it finished). I imagine it's gotten much better since then, in large part from the shift to git alone. It certainly was super impressive feat for the shear scale and longevity. I have a lot of respect for the folks who make that beast work. And there are bits of it that are brilliant. But it was an ugly beast.

[+] criddell|6 years ago|reply

The number of cycles that have been wasted checking for Corel Draw must be astounding. I wonder if an environmental cost could be calculated for something like that.

[+] tempguy9999|6 years ago|reply

I'm mega curious about this

> ...like lpszFileName (lpsz = long pointer to a zero terminated string)

I remember those.

AIUI hungarian gives you some kind of typing. The typing is done by humans using the names. The humans have to get it right; they are the typecheckers.

The first thing I'd do is offload the typechecking onto an automatic framework - the idea of letting people do a computer's job is madness. It would not have been too hard to do (relatively very cheap for a large codebase like an OS), I think, and would have allowed the hungarian prefixes to be dropped because they'd become redundant, and strengthened and speeded up typechecking. So where is the flaw in my thinking?

(aside: one of my first contract jobs was working in pascal (delphi actually). The company I worked for had coding standards cos you need standards, don't you. It was to prefix every integer with i_, every float with f_, every int array with ai_, et cetera. As pascal was strongly typed this was totally pointless).

[+] cryptica|6 years ago|reply

The Windows API was very ugly... not to mention unnecessarily complicated. It does not belong in this thread.

[+] sea6ear|6 years ago|reply

I really liked the Go standard library (or at least from around 1.4-ish, it might have gotten more complicated now).

I liked that it was actually possible to read it and understand what was going on.

In a similar vein, P. J. Plauger's version of the The Standard C Library is nice because even if it might not be especially optimized(?), you can actually read the code and understand the concepts that the standard library is based on.

Software Tools by Kernighan and Plauger would also be great except that you have to translate from the RatFor dialect of Fortran or Pascal to use the code examples.

Even so, I used its implementation of Ed, to create a partial clone in PowerShell that let me do remote file editing on Windows via Powershell when that was the only access that was available.

So even over 4 decades and various operating systems removed, there are still concepts in there that are useful.

Jonesforth is also a great and mind blowing code base although I'm not sure where the canonical repository is currently.

[+] quadcore|6 years ago|reply

it seems that in real life, a really high-quality codebase is hard to come by

I think a common misconception amongst mid-experienced programmers is that they confuse look with quality. Reading clean written code gives you a feeling of control and also the feeling that someone must have thought about that program. It's reassuring. You have in front of you a code that gives you trust.

When in fact, that code can be complete garbage.

The look of the code doesn't matter, what matters is the program. In the abstract meaning of the term. You don't judge a code by reading it, but by running it in your head. Granted you have to understand it in order to do that. Once you understand the code, you run it in your head and that's when quality enter the scene because running it in your head is what you do all day when you code. Some says that you spend most of your time reading code. That's simply not true, the effort is definitely not in reading but in running the code in your head. Basically what I'm describing is a 2 by 2 matrix where there is one column for look bad, one for look good, one row for runs badly in the head and one for run smoothly in the head. Granted, the best may be when both the code looks right and runs right, but don't be mistaken, the real important and difficult part is whether or not it runs well in the head.

A poor quality program may look good, but don't run well in the head. It's too complex or too confusing (in terms of logic, not in terms of presentation) or convoluted or simply wrong in terms of what it's supposed to do. On the other hand good quality code is code that surprises you by the way it runs. It's beautiful in terms of simplicity, it delivers a lot, it's small so that it fits well in the coder's head. And it may look like garbage which is not so important.

You may wonder how to know very quickly the quality of a code base. Run part of it in your head. Contemplate the machinery. Try not to think to much about the language and how it's constructed in this language, try instead to contemplate it in an abstract manner. Be critic, and critic your critics.

[+] omarhaneef|6 years ago|reply

I wish there was a way to read the codebase where there is a tag that tells you what the folder does.

In github, rather than see what has changed, it would be interesting if there was a comment that told you what the folder contained.

edit: Relevant here because the best codebase for me is one where I can understand the folder structure, but that is a sort of 0th order effect that should be equalized with some tool.

[+] cryptica|6 years ago|reply

Unfortunately, I've found that almost all developers are incapable of objectively judging the quality of code until they actually have to start working with it and then after a few months they can start to appreciate or despise the code.

It takes a lot of investment from a developer before they can appreciate the beauty of the code... To make matters more confusing, a lot of developers tend to become extremely attached to even horrible code if they spend enough time working with it; it must be some kind of Stockholm syndrome.

I think the problem is partly caused by a lack of diversity in experience; if a developer hasn't worked on enough different kinds of companies and projects, their understanding of coding is limited to a very narrow spectrum. They cannot judge if code is good or bad because they don't have clear values or philosophy to draw from to make such judgements. If you can't even separate what is important from what is not important, then you are not qualified to judge code quality.

If you think that the quality of a project is determined mostly by the use of static vs dynamic types, the kind of programming paradigm (e.g. FP vs OOP), the amount of unit test coverage and code linting, then you are not qualified to judge code quality.

I think that the best metric for code/project quality is simply how much time and effort it takes for a newcomer to be able to start making quality contributions to the project. This metric also tends to correlate with robustness/reliability of the code and also test quality (e.g. the tests make sense and they help newcomers to quickly adapt to the project).

As developers, we are familiar with very few projects. If a developer says that they like React or VueJS or Angular, etc... they usually have such limited view of the whole ecosystem that their opinion is essentially worthless; and that's why no one ever seems to agree about anything. We are all constantly dumbing down everything to the lowest common denominator and regurgitating hype. Hype defies all reason.

It's the same with developers; most developers (especially junior and mid-level) are incapable of telling who is actually a good developer until they've worked with them for about 6 months to a year.

If you are not a good developer, you will not be able to accurately judge/rank someone who is better than you at coding until several months or years of working with them. Sometimes it can take several years after you've left the company to fully realize just how good they were.

[+] oaxacaoaxaca|6 years ago|reply

Django! And django rest framework. To me, both codebases are so readable and so well put together that even if their documentation was bad (which it isn't), you could fully grasp their APIs and how to use their libraries by just reading through some of the code.

[+] bijection|6 years ago|reply

The Codemirror codebase [0] is simply written and richly commented, and using Codemirror itself in a project is a pleasure.

Tellingly, Marijn Haverbeke, Codemirror's creator, is also the author of the excellent 'Eloquent Javascript' [1].

[0] https://github.com/codemirror/codemirror

[1] http://eloquentjavascript.net/

[+] luminati|6 years ago|reply

269 comments