Ask HN: What are the ways you go about getting comfortable with a new codebase?

[+] buro9|3 years ago|reply

Three ways...

1. I go straight to the entry point, the main(), and then follow how the initial configuration, flow of data, sanitisation, and routing is done.

2. I look for bugs. Fixing bugs reveals the complexity as you need to look for side effects of the fixes when you don't yet know the system. Writing tests for those fixes also helps understand the system.

3. I look for the least changed part. I find these are usually the oldest and most core part of how the program works, whereas more recent changes are business logic and feature addition.

But of these, the first yield the greatest initial understanding and allows me to change things with less fear.

[+] smugma|3 years ago|reply

How do you think of #1 in the context of a web app? Each page is essentially a different entry point into the code.

I guess the landing page or authentication page could be considered the equivalent, but I’m not sure those would hit your goals to understand flow of data, etc. ?

[+] tedmiston|3 years ago|reply

These are great.

A 4th way I would add is: If you need to make a minor change or understand how one specific function is expected to work, search for its unit tests and start there.

[+] gompertz|3 years ago|reply

#1 is my method too. I also take it as a sign learning the code base is going to be difficult if I have to ripgrep to find main because they couldn't put it in a cleanly named file like main.c =)

[+] jarusll|3 years ago|reply

1st is exactly what I do while reminding myself that I don't need to understand everything. Somehow for me it's easy to think like that in OO code and not FP.

[+] cornel_io|3 years ago|reply

Try to profile the code and see where it's spending time.

Get a good flame graph up, and you'll have a really solid visual representation of what's going on.

Bonus: on almost any project, nobody has done a profiling pass in at least a few months, so you'll probably discover some extremely easy performance improvements and you'll look like a goddamn hero when you speed up e.g. the test suite by a factor of 3 in your first week on the job.

[+] rozularen|3 years ago|reply

Hey, I'd like to get a grasp of how to do what you're explaining, would you have any link or resource by chance? thanks!

[+] jarusll|3 years ago|reply

Definitely a good advice. Profiling is something I haven't looked into and should be after debugging.

[+] baby|3 years ago|reply

I did this and flamegraphs in Rust are not great due to rayon :(

[+] quickthrower2|3 years ago|reply

Ideally you have someone experienced in the codebase who can give a lay of the land.

I suggest:

1. Find a senior dev, ask then for exisiting pointers to good documentation to self learn.

2. give that a go, make note of all the questions you have

3. then have a session with that dev for platform walk through. Take lots of notes and ask your questions.

4. offer to update docs where you found errata or missing steps or even complete topics not mentioned

5. suggest to the team anything about onboarding that can be improved.

[+] jarusll|3 years ago|reply

This is exactly what I would do ideally except I couldn't. I can understand that working in startups you would overlook alot of theory and due to hectic nature low quality calls(code/architecture explanation) are not appreciated.

And that's why the first thing I ask during a technical interview is "Do you have internal documentation?".

[+] oxff|3 years ago|reply

(This might be more abstract than you wanted)

Hopefully there's an overview of the code base in an `ARCHITECTURE.md` file[1], and then read through it, and the respective documentation and tests for the main modules mentioned in it.

If you assume their tests cover the important business logic / stuff "they want to keep" (ref. "Beyonce Rule"[2]), they should inform you about the most important stuff.

> [1] https://matklad.github.io/2021/02/06/ARCHITECTURE.md.html

> [2] https://www.oreilly.com/library/view/software-engineering-at...

[+] jteppinette|3 years ago|reply

“Hopefully” … 0% chance

[+] robalni|3 years ago|reply

The best way is probably to make changes to it because that forces you to really understand the code. If you just read it without making changes it's too easy to pretend that you understand it. If there is one part of the code that you are trying to understand, I can recommend to write a replacement for it; then you at least learn how it works and you might even get a better solution.

[+] asp_hornet|3 years ago|reply

Fixing bugs.

Someone can explain a code path, what it should do, what the bug is and with that you can get familiar with a path through the application.

[+] drewcoo|3 years ago|reply

This!

Ask around about the big things everyone would like to change and where the scary code is that nobody wants to touch, and with those things in mind, fix some bugs.

The initial questions will tell you what to avoid initially. Longer term, if you can fix them you'll look like a rock star.

[+] nottorp|3 years ago|reply

Just mapping the whole codebase without a specific goal in mind seems counterproductive to me.

Instead, I get myself a couple specific tiny bugfixes/features to do first. Just finding out where those are, one by one, tells you a lot and may not be as simple as it sounds.

I was once hired to help with polishing a code base for imminent shipping. I fixed one bug. The fix was one line, but not trivial at all. Took me a whole week of reading code. The customer was extatic. There were like 12-15 years worth of layers of code to read.

[+] flibble|3 years ago|reply

If the codebase is ginormous and hard to decipher then you could use the magic source control to go back in time to an early point in the codebase. It’s probably going to be easier to understand a codebase that is 3 months old vs 6 years old, so you could go and check out that version, understand it and then jump forward a few years. This also gives you the benefit of understanding the evolution of the code and understanding why it is not just what it is.

[+] janee|3 years ago|reply

As with most things depends on the code base, but if documentation in whatever form is available, I'd start there rather than just jumping into code.

I'd start from the highest level abstraction of the code and work downwards until I reach a domain I'm either interested in or asked to work on and specialize on that vertical for a while (this can be anything from a few weeks to say 6 months). I then repeat the process on other verticals if needed/wanted.

So going from highest level of abstraction down to actual code:

1. read docs or converse with others around what the value proposition/s are of the product/service/app.

2. Understand the main use-cases or if not obvious, read product brochures or w/e you have in terms of "sales" material for end-users.

3. Try to map the main use-cases back to high-level architecture diagrams (if available).

4. After doing above steps if there are multiple domains I would pick one based on either personal interest or assigned work.

5. When starting with a business domain (meaning some high level grouping of code based on their business function), I tend to focus first on the design of the persistence layers as its usually less dense and less sprawling than other parts of a code base and can give you some idea of state management.

6. From here I generally start up the service/s or apps related to he domain I'm studying and try to play around with it, trying to tie previous steps together in my mind with what I'm observing with my interactions.

7. At this point I would generally have documented my findings (whatever means / form it is done) and ask for a session with someone that's familiar with this domain and ask their opinion of my documentation, making corrections where needed.

7. After this it's generally best in my opinion to just jump into work.

8. Personally I find doing support work fixing bugs for about 6 months gives you a very good lay of the land and people.

Jumping straight into feature work is not optimal in my experience as it's less likely to provide as wide an array of exposure as support.

This obviously only fits certain scenarios, but for your garden variety product/s this is how I'd go about understanding the code base.

Oh, commit history is also a very very rich source of info if there's an established culture of good commit messages.

[+] jarusll|3 years ago|reply

This is very well written. 1-3 describes my experience of tinkering in Pharo except I haven't actually built anything with it.

[+] ToJans|3 years ago|reply

I scan everything that is closely or remotely related to the code base: the code, commit logs, diagrams, bug reports, change requests, user manuals, tests, technical logs, databases, other storage, cloud infra, ...

Usually at least one of them stands out, so I at least read this through (usually diagonally).

I might also pick different things based on my goals.

Once I think I have a grasp of the high level aspects, I start pairing or validate with tiny feedback loops.

Update: I also create my own (naive) helicopter view diagram of the context and validate it with people on different levels.

[+] smugma|3 years ago|reply

We have a tool that was built by a third party. They did an excellent job but for various reasons we needed to change vendors. I didn’t hire the old or new dev teams, so it wasn’t my role to tell them how to come up to speed. Early on they said they wanted to redo all the test cases, which seemed off to me (it’s too abstract, and why redo test cases for parts of the code that are unlikely to be modified). I said something but didn’t push it.

Someone on my team has been giving the dev team demos of the functionality and thinking behind the product a few days a week. My one request at the beginning was that they should learn enough about the product to be able to give a demo back to us. It took them about 2-3 weeks (maybe 8 45 min overview sessions from my team, which owns the product requirements), but it showed that they know what it is the tool is supposed to do.

They spent another 3 weeks “getting comfortable” (6 weeks from start) they finally felt comfortable to start implementing small features and bug fixes. I’d have preferred that they start fixing bugs right away (it might take 2-3 weeks to fix the first bug because they need to figure out how to get access to systems, documentation, deployment, etc.) because it’s more tangible, but I know I’m impatient and let them do it this way. It seems to work ok so far but will be another month or so before I can decide whether or not they are actually competent. I guess the good news for them is they (team of 10 in Eastern Europe) aren’t being bugged by the client, and if they actually are good, should be enjoying the freedom to do things their way and implement their own processes.

[+] jrumbut|3 years ago|reply

Besides all the great steps below, I like to browse the git repo to find the files that have been changed most recently and most often.

Projects, especially messy ones, often behave like lava flows where there is an active and ever expanding edge where changes are currently being made. Beneath this are layer upon layer of nearly impenetrable and often implicitly deprecated code from former developers.

This practice came from a time when I was brought in midway through a rewrite to get rid of unmaintainable code from some offshore contractors. I saw a repository where half the code lacked any organizing principles and had massive security issues. The second half was textbook (pedantic even) OOP, the kind taught in Java textbooks. It was beautifully executed except for using a few outdated tricks to do OOP in early versions of PHP (no longer needed in the version used for this project).

Because I didn't look at the dates, I assumed the neat OOP code was the result of the rewrite. I was wrong.

[+] unknown|3 years ago|reply

[deleted]

[+] ahurmazda|3 years ago|reply

If you are lucky (working with a mature codebase), tests are my number one go to when getting started. I need to know input/output of things. I work in ml space so ymmv. This allows me to make small changes and check my assumptions as I gain more confidence around the codebase

[+] 4pkjai|3 years ago|reply

If there is one, I often like to look at the database. The stuff that is stored and the names of the tables should give you a good idea how the application works.

Sort of unrelated, but I've got a story about a project I was looking through that confused the hell out of me. It was a C# library that would allow you to render an element from a shockwave flash file (it was either .swf or .fla).

I spent ages digging through the code. The example worked really well, but I couldn't get it to work with one of my files.

Eventually I contacted the author and he told me the library used reflection to get the name of your variable and would look for that variable name in the flash document.

[+] funwie|3 years ago|reply

In addition to already mentioned steps,

0) Read the code base docs (or README).

1) Pair with someone with knowledge of the code base or ask them to walk you through the code base.

2) Identify the public interface to interact with the app/api. How do consumers use the software. Play around with the app or api to get a sense of how things link up.

3) Identify various tools used in the code base(db, messaging, external api, etc). Now you know each tool is setup somewhere and used in one or more places.

4) Identify the patterns and conventions used (CQRS, mediator, dependency injection, middleware, pipelines, logging, etc). Now map the flow of each public interface using this knowledge.

[+] travoltaj|3 years ago|reply

Two things - First - I learn how the data flows from the source to the end. That teaches me to navigate the codebase entirely. (User action to database, or source data to end data etc.

Second - I learn how different components are wired together.

[+] jarusll|3 years ago|reply

So indexing all the components and finding out all the interactions between them. This is exactly what class browsers do, they index all the classes and messages. The interactions could be described using an example/documentation.

This reminds me of Pharo which does all of the above, indexes classes, messages and has a rich documentation support.

[+] itsmemattchung|3 years ago|reply

Checkout the repository and run the unit tests. If none exists, write the first one.

[+] GlennS|3 years ago|reply

I often write out callstacks/dependency chains on paper. I find that makes it stick.

Try actually using the program as an end user would.

Read error messages, read code, make predictions about what the code does, find out if your predictions are true.

[+] ricardolopes|3 years ago|reply

I like to focus on the main business element. If it's a SaaS for sharing videos with comments, for instance, I'd take a longer look at the video and comment models, their relations, and the call chain from API endpoint to model.

Another strategy I like is picking parts of the codebase and trying to refactor them. You don't even need to commit anything if you're not supposed to go around changing things: just by spending some time moving things around, seeing what breaks and so on will give you a better understanding of the code and what it does.

[+] mstipetic|3 years ago|reply

Not using things like go to definition and any fancy tools, just manually forcing myself to work through files to understand how things fit together and using basic tools like grep

[+] davidatbu|3 years ago|reply

Very surprising! IIUC, you consider "Go To Definition"/"Go To References" and other "LSP assists" unhelpful (or worse) when familiarizing yourself with a new codebase. I personally find them indispensable. Could you say more to help me understand your position?

[+] alain_gilbert|3 years ago|reply

I'd give the complete opposite advice.

I use "go to definition"/"find references" to go all around the code base and at the same time try to figure out how each files interact with each others.

67 comments