top | item 20809666

Ask HN: Recommended resources to learn the Linux kernel and OS theory?

408 points| non-entity | 6 years ago

So recently I did a couple of minor patches on the FreeBSD and NetBSD kernels and played with some linux kernel. It was the first time in a a few years, I've been excited about programming.

Unfortunately I'm still completely lost. It seems that there's so much to learn and every kernel update breaks whatever you just finished writing. I see terms in unfamiliar with and often find myself googling basic OS concepts.

Are there any recommendations reading materials to get a better grasp on OS theory and / or the Linux / UNIX kernels and programming for them?

83 comments

order
[+] waddlesplash|6 years ago|reply
The OSDev wiki can probably teach you anything you need to know about "OS theory" (and practice, mostly on x86): https://wiki.osdev.org/Expanded_Main_Page

Depending on your style of programming, I'd recommend maybe not working on Linux if you are trying to learn kernel development and OS internals.

Have you looked at Haiku (https://www.haiku-os.org/)? We have a very well organized and commented modular-monolitic kernel, and a pretty active development team with a wide range of experience levels. I'm more than happy to help you (or anyone else) learn OS development!

Some example Haiku kernel code:

* "load_image_internal()", which is responsible for creating processes: https://github.com/haiku/haiku/blob/master/src/system/kernel...

* a recent change by me to replace a global lock with two local ones: https://github.com/haiku/haiku/commit/37eda488be1c9fee242e8e...

[+] xz0r|6 years ago|reply
> I'd recommend maybe not working on Linux if you are trying to learn kernel development and OS internals.

Could you also tell why ?

[+] non-entity|6 years ago|reply
I have heard of Haiku,but haven't looked at it in a while. However, I have heard great things about it and would certainly be interesting in learning and contributing
[+] no-dr-onboard|6 years ago|reply
> modular-monolitic kernel

This sounds like a bit of an oxymoron. What is a modular-monolithic kernel?

[+] okl|6 years ago|reply
Function with 9 arguments, 200 lines. Mix of abstraction levels. Basically untestable. Could use RAII.
[+] hatsubai|6 years ago|reply
One important part about learning how the Linux kernel works is understanding the details of the system's interface the kernel provides. In my opinion, there is no better book out there than the "Linux Programming Interface" by Michael Kerrisk: http://man7.org/tlpi/

It provides extremely detailed information about everything going on in Linux, as well as example programs and exercises to help you further your knowledge. While it doesn't get deep into kernel theory like Tanenbaum's books tend to do, it will provide you with a greater understanding of how things work, IMO.

[+] whowhatwhy|6 years ago|reply
This book is great, so is Advanced Programming in the Unix Environment, which covers much of the same material but details the differences in standards and implementation. I'd also add `Linux Kernel Development` by Robert Love.
[+] sruffell|6 years ago|reply
I was going to suggest this book when I see I'm late to the party, so I'll just have to add my voice to the chorus.

"Linux Programming Interface" is one of the best technical books I've ever read.

[+] aidos|6 years ago|reply
£50 on kindle! Looks like a great book, but that’s punchy. (Maybe that’s standard for a text book like this, not something I would normally buy)
[+] tbrock|6 years ago|reply
This book is great. Seconded.
[+] koolba|6 years ago|reply
That’s a fantastic book.
[+] filereaper|6 years ago|reply
Many folks here are recommending reading up on production grade operating systems which might be a very steep learning curve.

I'd recommend starting with an academic operating system to nail the fundamentals down, once you have a solid foundation you can then use all the excellent links provided by everyone else and ramp up on production grade kernels.

I have linked to multiple course syllabus'es below from excellent institutes, pick whichever feels most comfortable.

Harvard with MIPS based OS/161 (I learned this personally): http://www.eecs.harvard.edu/~cs161/syllabus.html

Berkeley EECS 162 (John Kubiatowicz is amazing and is behind RISC V) https://inst.eecs.berkeley.edu/~cs162/sp19/

MIT OCW: https://ocw.mit.edu/courses/electrical-engineering-and-compu...

[+] pm215|6 years ago|reply
My somewhat off-beat recommendation is "Lions' Commentary on UNIX 6th Edition with Source Code". The sources to a 1976 version of Unix written in an archaic dialect of C and targeting a long-dead CPU architecture are obviously not of immediate relevance to the modern world, and may well not be to your taste unless you already have some sympathy with the idea of retrocomputing. But if you do: 6th Edition is small enough that you really can read through the whole set of sources and understand pretty much how the whole thing is put together; the basic bones of the design are not so far away from Linux, so it gives you a workable conceptual model of what the shape of the Linux sources are and what the various important parts are; the commentary is really good, explaining the inscrutable but nudging you to figure things out yourself too; and overall it is practice in reading a large volume of somebody else's code, which is something you're going to be doing all the time if you work on Linux or any modern kernel. You probably also want a more theoretical and general book too, of course.
[+] jiveturkey|6 years ago|reply
And you can witness the glory of swtch() and the original "You are not expected to understand this" comment.

Highly recommended.

[+] alfiedotwtf|6 years ago|reply
I've commented below... there's an old book called "Linux Core Kernel Commentary" which is based on the Lion's book. It's a great read.

There's a Xinu book which is similar, but without the accompanying source code.

[+] geofft|6 years ago|reply
Yes, there is just so much, and it is not thoroughly documented. "BPF maps" in particular are a) a special feature of a special feature that's Linux-specific, not anything that generalizes across OSes, and b) a feature for userspace anyway, not for kernel internals. But even longstanding kernel-internal features aren't well documented (e.g., the other day I was trying to figure out what struct file's f_version does, and I think there's genuinely no docs for it.) So honestly I think the answer there is to not feel bad about not knowing everything.

The basic trick of dealing with the kernel is becoming comfortable working in a large codebase most of which you don't understand, and figuring out how to find what you need. Honestly, git grep is one of the best tools here. Get some practice finding some specific thing and where it's implemented, e.g., find a syscall (git grep SYSCALL.*foo) and trace what it calls. Find the definition of a structure inside include/ and see who uses it. Get comfortable with the kernel's OO-ish system of operations structs, and get some practice tracing both "this function makes a generic call, here's a sample driver that implements it" and "this is an implementation of a generic function, here's the syscall that calls it."

Beyond that, reading https://lwn.net 's articles is invaluable, partly for the clear prose coverage and partly for the breadth of what they talk about. (You don't need to pay unless you care strongly about this week's updates - you'll learn plenty from reading articles a week behind - but support them if you can, they're an important resource.) Again, you're not going to follow exactly why e.g. Google wants a new syscall for "restartable sequences" on your first read, but you'll get a sense of what is involved in adding a syscall, how various concurrency models work, what other kernel features are relevant, etc.

What sort of OS concepts are you finding yourself Googling? I will say that actually doing a college OS class is what made things like virtual memory management click in my head. It's an intensive approach but writing code in a much smaller kernel than Linux is a valuable way to understand concepts without being drowned in real-world optimizations and edge cases and portability.

[+] livueta|6 years ago|reply
> The basic trick of dealing with the kernel is becoming comfortable working in a large codebase most of which you don't understand, and figuring out how to find what you need. Honestly, git grep is one of the best tools here.

That's a very good point. If we're recommending tools, cscope (especially if you're a vim/emacs user) or opengrok (if you're not or prefer a web frontend for other reasons) are super helpful for navigating and comprehending large, mysterious codebases.

DTrace/eBPF are also valuable tools for grokking kernel functionality.

[+] non-entity|6 years ago|reply
Perhaps I should've said more. I do often end up googling basic OS stuff, especially related to memory (software interrupts still confuse me though, I understand the concepts, but never the implementation), but even more so have been my ignorance of hardware level concepts. For example trying to patch the code for a network card the other week, I learned what watchdog timer is, I've never heard of that prior.
[+] whydoyoucare|6 years ago|reply
"Operating Systems: Three Easy Pieces" is a more modern book to OS concepts, explained with real source code as opposed to algorithms, and is regularly updated.

http://pages.cs.wisc.edu/~remzi/OSTEP/

(Online version above, you can also order a print copy).

[+] 0x262d|6 years ago|reply
I took this class (although not from him). it's a good textbook and was pretty much sufficient.
[+] snoocat|6 years ago|reply
This is what I learned from, along with some xv6 labs. Just wanted to say that both Remzi and Andrea were exceptional professors and their book is clear and concise.
[+] lorenzfx|6 years ago|reply
If you are still interested in FreeBSD, "The Design and Implementation of the Freebsd Operating System" is a fantastic book. It's a great guide in how a UNIX system and especially FreeBSD works and written by some of FreeBSD's most respected developers.
[+] mjb|6 years ago|reply
This is a great book, and while FreeBSD and Linux differ in a lot of ways, there are enough similarities to make it worth reading even if you're only interested in Linux.
[+] non-entity|6 years ago|reply
I actually did get a copy of that! Unfortunately, I only have the ebook at the moment, which I find difficult to read
[+] qntty|6 years ago|reply
If you want to understand Linux systems programming, I've never found anything better than the book The Linux Programming Interface. For Linux internals, Linux Kernel Development by Robert Love is good. For something more removed from current systems, The Design of the UNIX Operating System and Operating Systems: Design and Implementation are good.
[+] weinzierl|6 years ago|reply
Not very Linux specific but Tanenbaum's "Modern Operating Systems" is an excellent source for OS theory and very readable as well. It's an expensive book, but I found it worth its money.
[+] Noe2097|6 years ago|reply
Seconded. I would add "Structured Computer Organization", as a preamble book. I read both as if they were novels - they really are interesting and written in such a way that can hook you just as a good story
[+] jascii|6 years ago|reply
I agree! I took Andy's class that he wrote the book for and it was one of the most influential classes I have ever taken. It is also one of the few books I have kept over the years and throughout all my travels.
[+] bytematic|6 years ago|reply
OSTEP is the best book on Operating systems easily. http://pages.cs.wisc.edu/~remzi/OSTEP/
[+] apacheCamel|6 years ago|reply
Not sure if it is the best book overall, but I used it for one of my college classes and I think it is a great book for learning much more about operating systems. And it is available for free online so my college student wallet loved it!
[+] yodsanklai|6 years ago|reply
Not Linux, but I'd recommend this lab from MIT.

https://pdos.csail.mit.edu/6.828/2014/overview.html

I completed this one, but they may have more recent versions. They give you some initial code that you need to complete, as well as tests to check your work before going to the next step. It covers a lot of material. It takes some patience and it's not for complete beginners but it's a great fun project. Not sure, but I'd say it took me 1 or 2 weeks full time.

[+] peterkelly|6 years ago|reply
I recommend starting with something smaller.

Here's something I put together some years ago - it's about 5,000 lines of code and supports a number of key Unix features. There's a PDF of lecture notes accompanying it.

http://adelaideos.sourceforge.net/

Once you've gone through this you'll be in a better position to tackle the Linux kernel and books on OS architecture. My personal favorite is "The Design of the UNIX Operating System" by Maurice J. Bach. It's an old book but explains things well, and I believe it was one of the references Linux Torvalds relied on when creating Linux.

[+] gamescodedogs|6 years ago|reply
Years ago in school, I decided to figure out how the Linux kernel works. So I've downloaded the sources and open them with a notepad++. After an hour I've figured out that I have no idea what am I reading in the code, close the notepad++ and never open it back. Buy the sources are still awaiting in some old directory at my home's PC :)
[+] yourbandsucks|6 years ago|reply
The O'Reilly book Understanding The Linux Kernel is fantastic. Starts you off in the deep end with memory management/addressing and fans out from there. Notably excludes networking, as the book is long enough without it.
[+] DanAtC|6 years ago|reply
Any suggestions for Linux networking? I have Understanding Linux Network Internals but it’s a bit dated.
[+] kernyan|6 years ago|reply
https://github.com/s-matyukevich/raspberry-pi-os

Matyukevich page on GitHub has lessons with accompanying code to develop a Linux OS for raspberry Pi. The author goes into kernel, processor initialization, interrupt handling, scheduler, implementing syscall, and virtual memory.

I like his approach for these reasons 1. Minimal workable code, 2. Points you to entry function in linux repo 3. Line commentary of those entry function

Note that it's also a somewhat short read as he focuses on the practical implementations instead of the theory.

I myself was hoping that the author continue developing the chapters on file system, drivers, and networking but seemed to have been on hiatus.

[+] throwawaypls|6 years ago|reply
Linux Kernel Development By Robert Love and Operating Systems Concepts by Silberschatz are great resources.
[+] alfiedotwtf|6 years ago|reply
For the Linux kernel, there is an excellent book called "Linux Core Kernel Commentary" (which is based off the SVR4 Lion's Commentary book), which has the source of the kernel (an old version) printed out and then a commentary on all the bits (code and data structures). It was an eye opener for me.

As for OS theory, "Operating Systems Vade Macum" and "Operating Systems Design and Implementation" are my favorites.