Completely agree. But the tools don't make this very easy.
Back in college I was working on patches to OpenSSL, Chrome, Firefox, Apache, etc., to add support for TLS-SRP, and it was a huge pain to jump into these massive codebases and try to understand them. I was using Emacs and had all of the various language support modes configured, but go-to-definition and cross-references barely worked. Searching was slow, and if I wanted to discuss a piece of code with my CS lab partners, I couldn't just share a link.
A friend felt the same pain but then went to work at Google for a bit. At Google, they have some pretty amazing code reading/searching tools (see https://static.googleusercontent.com/media/research.google.c...), and these tools helped Google build a culture of thoroughly reading and reviewing code. The causality is bidirectional, but having good tools certainly played a role in Google's success.
That friend and I ended up building a product, Sourcegraph, initially for ourselves to make code reading easier. We've now built a successful business out of it with the help of an amazing team. Here it is pulling in the OpenBSD sources: https://sourcegraph.com/github.com/openbsd/src/-/blob/lib/li.... Sourcegraph has advanced features for several languages; see https://sourcegraph.com/github.com/mholt/caddy/-/blob/caddyh..., for example. If you love to read code (or want to), we hope you'll love our product. Email me if you have any feedback/requests.
Just some honest feedback on your pricing that's hopefully helpful: Your Enterprise plan is at least an order of magnitude more expensive than what many organizations would pay for something like this.
E.g. a Jira license for 2000 people costs $24,000 yearly[1], licensing this for the same amount of users would be $1,200,000.
This is way more than organizations of that size tend to pay for top-tier support contracts for software that's critical to business continuity.
Pricing per-user without any advertised discounts is also a trap if you're selling to large organizations. A lot of them tend to, for simplicity's sake, want to just give everyone in the org access to a tool like this, but only 5-10% of the workforce might be using it, but due to how you're pricing it there's no way it's going to be bought in the first place.
> I was using Emacs and had all of the various language support modes configured, but go-to-definition and cross-references barely worked. Searching was slow, and if I wanted to discuss a piece of code with my CS lab partners, I couldn't just share a link.
When you say something vague yet absolute like you "had all of the various language support modes configured," that is a big indication that you did not have them configured. There are about four major modes for C/C++. Searching and cross-reference is done with external tools. The only time I ever thought searching was slow was when using the grep that came with Mac OS X. There is absolutely no way that online tools can beat ag or rg for code searching, especially if you have an SSD. Exuberant ctags and GNU Global work for cross-referencing and support dozens of languages. And you have Magit and VC mode right there to track down source code history.
A lot of us learned this way from John Lions' Unix Commentary. There have been similar books for Linux and Apache. (They didn't have a concordance for the Linux Core Kernel Commentary so I wrote one and sent it the author who mentioned it in the second printing.) But I think for OpenBSD something like SourceGraph is more than enough.
I would recommend the SourceGraph people post browsers for Linux, L4, Xen, LLVM, ... and other great open source infrastructure projects. You'll drive more interest in your product in a helpful way.
There is a note that shows up when viewing OpenBSD sources
"C/C++ is not yet supported (beyond basic code browsing and text search)"
In my experience, these two languages are the most difficult to find good tools for, to browse, jump, and manage large code bases. Yes, some exist, but I thought this was the point of "good tools matter"?
I never got around to really testing it, but still remember when OpenGrok was announced, way back when - apparently it's still active. I've been wondering if it would make a good front-end/source-browser component for a trac[t]-like product (with something else for vcs and bug tracking etc):
> I was using Emacs and had all of the various language support modes configured, but go-to-definition and cross-references barely worked. Searching was slow, and if I wanted to discuss a piece of code with my CS lab partners, I couldn't just share a link.
Meanwhile, most IDEs, QtCreator, KDevelop, VS, Eclipse... had this feature for years.
I love this idea in part because it's the very opposite of the way I tend to work, which is to drive very hard to get a surface understanding of a thing in order to make a very targeted change. I learn lots along the way with this approach, but don't often get the deep, wholistic understanding of existing systems that only comes with repeated exposure over a long time.
Some kinds of understanding involve a no shortcuts grind. That sort of a grind is a big commitment though.
Your process is very good at achieving results fast. But the problem is you always stay in the same "level" of achieving results, which of course in the beginning is a very low level.
What do I mean with "level"? Let's look at transportation in that regards. At first we just had walking/running. Then we learned how to use horses. Then we developed the wheel and could use horse wagons. Then we discovered the walking bike, etc.
If you are on a low level you may be the fastest on that level, but you may be dimensions slower than people on higher levels. Think horse riding vs car.
But that is not the biggest problem if you only use the approach. On a higher level you'll also be able to solve problems that you didn't even know where solvable. For instance if everybody walks you won't even consider visiting other continents. But if you have airplanes you can get there in a few hours and it becomes something people do at least twice a year.
In programming this "solving problems easily that you didn't even know that there were solvable" happens if you really learn software architecture from actual tools, apis, how standards work, etc. The biggest wow for me was when I started to put in the additional 20-50% overhead to becoming standard conform for a standardized API. In the end when I had a problem I didn't have to code anything, because the other tools were already working with the same API as my tool, and I could just connect them and be done. This way I solved 75% of a semester long software project in one weekend, and I wouldn't consider myself especially intelligent. I just put in the hours to become standard conform, because from learning open source tools I found out it's something that people really do and that it is possible to do that.
What you're hinting at is Minimalist Learning Theory.
You have a production bias to learn only what you need to immediately get your job done, rather than investing time to learn up front which could potentially be more efficient in the long run.
I have the same problem, and I'm striving to overcome it. I'm excellent at gaining surface level knowledge quickly, deep knowledge requires much more discipline - and I struggle with how to do it, as I don't have a good strategy for that kind of learning.
I've been using Typing.io as a platform for reading source code (working my way through Gitlab now) and practicing typing with the right fingers. I have a few minor bad habits to correct, and I want to familiarize myself with the codebase, it's a good way to warm up for the day.
The Go standard library is very well written and depending on which parts you read you can learn about lots of things like file operations, HTTP, crypto, etc.
It's easy to read it all on the web, the docs are here: https://golang.org/pkg/ and clicking on a function name shows the source.
I've been diving into OpenCV recently. It's a very popular library and generally pretty readable (although the code can get quite messy because they use vectorisation a lot). Since it's about as close as we have to a standard, fast, vision library, it's interesting to see how stuff gets implemented.
There are loads of little things that could be improved, so it's also a nice codebase for contributing to.
In general I think you should pick something that you use every day. There's no point reading OpenBSD for the sake of reading OpenBSD, you'll get far more value out of something familiar.
The D standard library's Algorithms and ranges are nice. Realistically, most production code for things people depend on is quite well done: e.g. LLVM is pretty easy to read.
Opposite of this: STL implementations come to mind.
I've fallen into doing something similar. I read the mailing lists regularly try to look over the source for something that gets a proposed patch. Because OpenBSD boils down their software to the essentials and tries to make their APIs impossible to misuse I find it pretty easy reading even though I'm not very experienced with C.
Here's the latest daily chat transcript, from Jun 9th: The topic was OpenBSD nc(1) and libtls, but it wandered over to pledge(2) and other code fixes from new participants eager to contribute.
Anyone have a good exemplar React application for this purpose? I'm building small experimental stuff and now I need to level up and see how real applications are being built.
I read the emacs lisp source code in site-lisp/ every day. The goal is to understand one file about a week or so. It has made my emacs and lisp knowledge better, and made me aware of several nice emacs features (align-regexp, for example).
This teaches you C, x86-64 arch, stuff like two's complement integers and floating point, how memory and CPUs work, how the C compiler works (linking stage, preprocessor), and how to write/understand signals and processes, file I/O, network code ect.
After that you can just start reading the OpenBSD code and will figure it out or get an Andrew S Tanenbaum book on Operating Systems.
Note, if you buy the Pearson Global Edition of CS:APP (it's only 10% the reg price) there's a lot of errata you will have to check. I once got stuck reversing an assembly program into C that did an even/odd parity check because of a print error returning the final XOR'd value & 0 instead of & 1.
For a good base, I recommend the book "The Design and Implementation of the FreeBSD Operating System" by Marshall Kirk McKusick. It's really in depth and I was able to follow it without previously taking an OS course.
The best start would be to read a book about operating systems such as Tanenbaums. Reading the source of an OS has a really really low signal to noise ratio in getting important knowledge about operating systems due to implementation details an OS-specific peculiarities.
If you want to improve your code reading skills and/or C programming skills, then you can probably go ahead and start reading, for example, the OpenBSD source code, even though you don't have any operating systems knowledge.
How about using a static security source code analyzer and going through all findings? The very good ones, the commercial ones, are free to use for open source projects. That would be real benefit to the project I think.
It's missing the point entirely, their primary goal is to become better developer and to improve their C knowledge, hence the effort being mostly focused on reading and understanding a fair amount of code, including its context. Fixes and improvements are positive side-effects of the original effort, not goals in and of themselves.
This is a smart thing to do! I might do it eventually... not right now though, because of <insert whatever reason you can think of here>, and stuff, you know.
[+] [-] sqs|8 years ago|reply
Back in college I was working on patches to OpenSSL, Chrome, Firefox, Apache, etc., to add support for TLS-SRP, and it was a huge pain to jump into these massive codebases and try to understand them. I was using Emacs and had all of the various language support modes configured, but go-to-definition and cross-references barely worked. Searching was slow, and if I wanted to discuss a piece of code with my CS lab partners, I couldn't just share a link.
A friend felt the same pain but then went to work at Google for a bit. At Google, they have some pretty amazing code reading/searching tools (see https://static.googleusercontent.com/media/research.google.c...), and these tools helped Google build a culture of thoroughly reading and reviewing code. The causality is bidirectional, but having good tools certainly played a role in Google's success.
That friend and I ended up building a product, Sourcegraph, initially for ourselves to make code reading easier. We've now built a successful business out of it with the help of an amazing team. Here it is pulling in the OpenBSD sources: https://sourcegraph.com/github.com/openbsd/src/-/blob/lib/li.... Sourcegraph has advanced features for several languages; see https://sourcegraph.com/github.com/mholt/caddy/-/blob/caddyh..., for example. If you love to read code (or want to), we hope you'll love our product. Email me if you have any feedback/requests.
[+] [-] problems|8 years ago|reply
Someone has it run on the openbsd code here: http://bxr.su/OpenBSD/ and it should produce a much more useful representation of the code, see http://bxr.su/OpenBSD/lib/libutil/bcrypt_pbkdf.c#98
Mozilla also has one called DXR which is designed for their large, C++ heavy codebases: https://wiki.mozilla.org/DXR
[+] [-] avar|8 years ago|reply
E.g. a Jira license for 2000 people costs $24,000 yearly[1], licensing this for the same amount of users would be $1,200,000.
This is way more than organizations of that size tend to pay for top-tier support contracts for software that's critical to business continuity.
Pricing per-user without any advertised discounts is also a trap if you're selling to large organizations. A lot of them tend to, for simplicity's sake, want to just give everyone in the org access to a tool like this, but only 5-10% of the workforce might be using it, but due to how you're pricing it there's no way it's going to be bought in the first place.
1. https://www.atlassian.com/licensing/jira-software
[+] [-] sedachv|8 years ago|reply
When you say something vague yet absolute like you "had all of the various language support modes configured," that is a big indication that you did not have them configured. There are about four major modes for C/C++. Searching and cross-reference is done with external tools. The only time I ever thought searching was slow was when using the grep that came with Mac OS X. There is absolutely no way that online tools can beat ag or rg for code searching, especially if you have an SSD. Exuberant ctags and GNU Global work for cross-referencing and support dozens of languages. And you have Magit and VC mode right there to track down source code history.
[+] [-] CalChris|8 years ago|reply
I would recommend the SourceGraph people post browsers for Linux, L4, Xen, LLVM, ... and other great open source infrastructure projects. You'll drive more interest in your product in a helpful way.
[+] [-] eatbitseveryday|8 years ago|reply
There is a note that shows up when viewing OpenBSD sources
"C/C++ is not yet supported (beyond basic code browsing and text search)"
In my experience, these two languages are the most difficult to find good tools for, to browse, jump, and manage large code bases. Yes, some exist, but I thought this was the point of "good tools matter"?
[+] [-] e12e|8 years ago|reply
https://github.com/OpenGrok/OpenGrok
There's also a list of similar tools at Gnu.org:
https://www.gnu.org/software/global/links.html
Thought it might be of interest for others looking at "source browsing" tools.
[t] https://trac.edgewall.org/
[+] [-] jcelerier|8 years ago|reply
Meanwhile, most IDEs, QtCreator, KDevelop, VS, Eclipse... had this feature for years.
[+] [-] unknown|8 years ago|reply
[deleted]
[+] [-] lars_francke|8 years ago|reply
[+] [-] peatmoss|8 years ago|reply
Some kinds of understanding involve a no shortcuts grind. That sort of a grind is a big commitment though.
[+] [-] erikb|8 years ago|reply
What do I mean with "level"? Let's look at transportation in that regards. At first we just had walking/running. Then we learned how to use horses. Then we developed the wheel and could use horse wagons. Then we discovered the walking bike, etc.
If you are on a low level you may be the fastest on that level, but you may be dimensions slower than people on higher levels. Think horse riding vs car.
But that is not the biggest problem if you only use the approach. On a higher level you'll also be able to solve problems that you didn't even know where solvable. For instance if everybody walks you won't even consider visiting other continents. But if you have airplanes you can get there in a few hours and it becomes something people do at least twice a year.
In programming this "solving problems easily that you didn't even know that there were solvable" happens if you really learn software architecture from actual tools, apis, how standards work, etc. The biggest wow for me was when I started to put in the additional 20-50% overhead to becoming standard conform for a standardized API. In the end when I had a problem I didn't have to code anything, because the other tools were already working with the same API as my tool, and I could just connect them and be done. This way I solved 75% of a semester long software project in one weekend, and I wouldn't consider myself especially intelligent. I just put in the hours to become standard conform, because from learning open source tools I found out it's something that people really do and that it is possible to do that.
[+] [-] azhenley|8 years ago|reply
You have a production bias to learn only what you need to immediately get your job done, rather than investing time to learn up front which could potentially be more efficient in the long run.
[+] [-] RUG3Y|8 years ago|reply
[+] [-] s_kilk|8 years ago|reply
*holistic
[+] [-] kfrzcode|8 years ago|reply
[+] [-] VMG|8 years ago|reply
Some kind of curated genius.com for source code would be interesting.
[+] [-] thomas11|8 years ago|reply
It's easy to read it all on the web, the docs are here: https://golang.org/pkg/ and clicking on a function name shows the source.
[+] [-] Baeocystin|8 years ago|reply
http://fabiensanglard.net/doom3/index.php
[+] [-] joshvm|8 years ago|reply
There are loads of little things that could be improved, so it's also a nice codebase for contributing to.
In general I think you should pick something that you use every day. There's no point reading OpenBSD for the sake of reading OpenBSD, you'll get far more value out of something familiar.
[+] [-] wolfgke|8 years ago|reply
> https://pdos.csail.mit.edu/6.828/2016/xv6.html
[+] [-] noshbrinken|8 years ago|reply
https://genius.it/johnresig.com/files/jquery-original.html
[+] [-] mhh__|8 years ago|reply
Opposite of this: STL implementations come to mind.
[+] [-] fasquoika|8 years ago|reply
[+] [-] RUG3Y|8 years ago|reply
[+] [-] kruhft|8 years ago|reply
[1] http://emma.nfshost.com/v7x86/index.html
[+] [-] pikachuaintcool|8 years ago|reply
[+] [-] aomix|8 years ago|reply
[+] [-] brynet|8 years ago|reply
https://junk.tintagel.pl/openbsd-daily-nc.txt
[+] [-] andrestc|8 years ago|reply
[+] [-] vhhhggv|8 years ago|reply
[deleted]
[+] [-] topspin|8 years ago|reply
[+] [-] sn41|8 years ago|reply
[+] [-] ianai|8 years ago|reply
[+] [-] hackermailman|8 years ago|reply
Lectures for it are here: https://scs.hosted.panopto.com/Panopto/Pages/Sessions/List.a...
This teaches you C, x86-64 arch, stuff like two's complement integers and floating point, how memory and CPUs work, how the C compiler works (linking stage, preprocessor), and how to write/understand signals and processes, file I/O, network code ect.
After that you can just start reading the OpenBSD code and will figure it out or get an Andrew S Tanenbaum book on Operating Systems.
Note, if you buy the Pearson Global Edition of CS:APP (it's only 10% the reg price) there's a lot of errata you will have to check. I once got stuck reversing an assembly program into C that did an even/odd parity check because of a print error returning the final XOR'd value & 0 instead of & 1.
[+] [-] autoreleasepool|8 years ago|reply
[+] [-] masklinn|8 years ago|reply
[+] [-] irundebian|8 years ago|reply
[+] [-] doody12|8 years ago|reply
If you want to improve your code reading skills and/or C programming skills, then you can probably go ahead and start reading, for example, the OpenBSD source code, even though you don't have any operating systems knowledge.
[+] [-] carlmungz|8 years ago|reply
[+] [-] jamie__k|8 years ago|reply
[+] [-] z3t4|8 years ago|reply
[+] [-] woranl|8 years ago|reply
[+] [-] err4nt|8 years ago|reply
For now, what's been fun is to load up the same file in both Chromium and Firefox source, and compare the two and how both browsers work.
Chromium source: https://cs.chromium.org/chromium/src/third_party/WebKit/Sour...
Firefox source: https://dxr.mozilla.org/mozilla-central/source/
[+] [-] FreeFull|8 years ago|reply
[+] [-] irundebian|8 years ago|reply
[+] [-] masklinn|8 years ago|reply
[+] [-] lbill|8 years ago|reply