top | item 17944306

Ask HN: How to transition from academic programming to software engineering?

220 points| fdsvnsmvas | 7 years ago | reply

I taught myself how to code and ended up doing a PhD in a computational discipline. Programming has been a big part of my life for at least the last decade, during which I've written code almost every day, but always by myself. After graduating I joined a medium-sized company (~10^2 developers) as a machine learning engineer and realized how much I don't know about software engineering. I feel very comfortable with programming in the small, but programming in the large still feels mostly opaque to me. Issues like testing / mocking, code review, management of dev / stage / prod workflows and, most importantly, the judgment / taste required to make maintainable changes to a million LOC repository, are areas where I can tell I need to improve.

Former academics who moved into software engineering, which resources did you find most useful as you made the transition? Python-specific or language-agnostic books would be most helpful, but all advice would be welcome.

103 comments

order
[+] ChuckMcM|7 years ago|reply
I have hired some PhDs in your situation and worked with others, I personally just went to work after I got my BSEE.

My observation is that you're halfway there when you realize that you need to improve, of the folks I saw who did poorly it was because they didn't realize that you could be both the smartest person in the room and the least capable at the same time.

Right now, on your first job experience, even a kid who never went to college is better at programming than you are because they've been experiencing all the pitfalls that can happen and have built up a knowledge from that which is perhaps more intuitive than formal but serves them well. What you have over that person is that you've trained yourself to consume, classify, organize, and distill massive amounts of information in a short amount of time.

Use that training to consume everything you can on the art of writing programs. Read "Test Driven Development" read "Code Complete", read "Design Patterns", read "The Design of the UNIX Operating System", read "TCP/IP Illustrated Volume 1", and a book or two on databases. With your training you should be able to skim through those to identify the various "tricky bits" and get a feel for what is and what is not important in this new field of yours.

Soak in as much as you can and ask a lot of questions. Pretty soon no one will know you haven't been doing this your whole life.

[+] stedalus|7 years ago|reply
This is pretty good advice overall. One small change I suggest is to take it easy on Design Patterns and the like. I’ve seen people in OPs position (general smarts but limited production experience) turn into architecture astronauts and start overengineering everything. It can be useful if you’re working on a legacy codebase and need to understand the jargon that can appear in [possibly overengineered] existing codebases.
[+] rhizome31|7 years ago|reply
> folks I saw who did poorly it was because they didn't realize that you could be both the smartest person in the room and the least capable at the same time

Any advice to deal with this kind of people would be much appreciated. I had a hard time trying to convey basic software engineering practices such as not hard-coding file paths, using version control instead of filenames to manage versions, removing dead code, keeping the code at least moderately DRY, more or less sticking to a code convention, etc. They would use all their reasoning power to refute what they would see as meaningless pet-peeves. It was tiring. I tried to provide pointers so they could find out for themselves but they gave it zero interest. At the moment I've given up.

The simple fact the OP is asking this question here means she or he will almost certainly do greatly.

[+] anonytrary|7 years ago|reply
I would also add that lots of new people waste more time doing TDD wrong than they would save doing TDD right. There are lots of things you only really understand until you've done them wrong.
[+] samuell|7 years ago|reply
I want to second about the "Code Complete" book. IMO, it provides a much more practical and hands on approach than most of the other books, and does this to a great depth. In my view, if I'd recommend only one book on good software engineering practice, it would probably be that one.

On a related note, I just this week stumbled upon a post / essay, which was probably the best piece of writing I've read in computer-aided engineering, and working in a disciplined manner. I'd recommend reading it right now, as the very first thing you do: https://queue.acm.org/detail.cfm?id=3197520

[+] byebyetech|7 years ago|reply
> read "The Design of the UNIX Operating System", read "TCP/IP Illustrated Volume 1"

What? Those are unnecessary details as far as software engineering is concerned.

[+] crackerjackmack|7 years ago|reply
Some general advice I've given multiple junior developers over the years, you probably aren't a junior but most likely applicable to the advice you are seeking. These were passed down to me by other developers. Other HN folk will have links to literature but hopefully my advice will give you a precursor.

* testing - write your functions small enough to be readable, but not so small their abstractions are meaningless (because you have to test them all)

* testing - don't reach into your code's modules and mock. Instead use dependency injection with non-testing defaults

* code review - It shouldn't be personal, if it is are you reading it wrong or are they attacking you personally?

* code review - when referencing style complaints ask for reference material. Don't get caught in cyclic-pedantic style war between lead devs.

* code - your code should be environment agnostic, if you have environment/context specific things to do, pass along a environment/configuration dict or make a global config singleton. As long as your code depends on that you can write code more discretely.

* code - personal preference but try to not nest your loops too deeply, when you can use itertools.

* code - if you can help it, try not to mutate dicts/objects in place while in a loop. Makes testing a difficult.

* code - exit early if possible, test for failures instead of nesting your entire function inside a single `if`. Helps identify the bad inputs faster as well.

Above all, remember code isn't perfect. It's a tool to get to an end goal. If you aren't solving for the end goal you aren't solving the right problem. At the end of the day, you are employed to build a product and that product needs to perform it's job. (that isn't a pass to write super shitty code)

edit: formatting

[+] loup-vaillant|7 years ago|reply
> write your functions small enough to be readable, but not so small their abstractions are meaningless

More precisely, you want your functions to be deep. The interface to implementation ratio must be as low as is reasonable: you want to hide implementation behind interfaces that are much smaller than them.

This goes for functions (something with 5 arguments that takes 2 lines is too shallow, something with 2 arguments that spans 20 lines is deep), classes (something that is mostly getters and setters is too shalow, something with 3 methods full of business logic is deep), or anything else.

This is not just for testing, this is to make sure you can understand the program at all. Deep functions (and deep classes, and deep modules…), are also about good old decoupling: the less you have to understand about a piece of code to be able to use it, the better.

[+] keshab|7 years ago|reply
I couldn't agree more with you about nesting loops. It seems clever at the moment when you're writing but when you have to come back after a while or worse, another developer has to, it becomes a nightmare.

I would also go a bit further and put nesting if statements. Sometimes it's really required but other times nesting can be avoided. I try to avoid nesting as much as possible.

[+] fraudsyndrome|7 years ago|reply
> testing - don't reach into your code's modules and mock. Instead use dependency injection with non-testing defaults

Could you please go into more depth with this?

[+] fdsvnsmvas|7 years ago|reply
Thanks everyone, the comments are much appreciated. Here's a list of books and other media resources recommended so far in the thread:

Robert C. Martin, Clean code: https://www.amazon.com/Clean-Code-Handbook-Software-Craftsma...

Vaughn Vernon, various: https://vaughnvernon.co/?page_id=168

Steve McConnell, Code Complete: https://www.amazon.com/Code-Complete-Practical-Handbook-Cons... 2

Clean coder: https://cleancoders.com/ videos

Hunt and Thomas, The Pragmatic Programmer: https://www.amazon.com/Pragmatic-Programmer-Journeyman-Maste...

Hitchhiker's Guide to Python: https://docs.python-guide.org/

Dustin Boswell The Art of Readable Code: https://www.amazon.com/Art-Readable-Code-Practical-Technique...

John Ousterhout, A Philosophy of Software Design: https://www.amazon.com/Philosophy-Software-Design-John-Ouste... This one looks particularly interesting, thanks AlexCoventry!

Kent Beck, Test Driven Development: https://www.amazon.com/Test-Driven-Development-Kent-Beck/dp/...

Dan Bader, Python Tricks: The Book: https://dbader.org/

Ian Sommerville, Software Engineering: https://www.amazon.com/Software-Engineering-10th-Ian-Sommerv...

Svilen Dobrev, various: http://www.svilendobrev.com/rabota/

[+] sanderjd|7 years ago|reply
There are a lot of good recommendations here, and I certainly relate to the instinct to go to books when you're looking to level up a skill set, but I really think what you need is not a bunch of books to read, but a few people to watch do the work. The only real way to do that is to get a job alongside them. You can read the books at the same time; you can ask your new coworkers which recommendations they agree with and read those ones first.
[+] blub|7 years ago|reply
Skip anything by Robert Martin (clean coder series) and read at first Ousterhout and then McConnell instead.

Martin is well intentioned, but very dogmatic about some things like TDD, functions size, personal responsibility, etc. You need to already have some decent engineering experience to be able to detect and ignore the harmful stuff from his books.

[+] n4r9|7 years ago|reply
I'd like to re-emphasise sanderjd's point not to focus too much on reading books. I myself went from doing a PhD and a lectureship in mathematics (with some coding here and there) to a decent software engineering job in a smallish company. I've learnt everything on the fly by reading code, searching stack overflow, trying stuff out and coding alongside others. The great thing coming out of a PhD is not just that you have to be pretty smart to have done it: you now know you can grasp almost any aspect of human knowledge with sufficient brain racking. This is a vastly underrated piece of self-awareness which enables one to stay humble and tenacious.
[+] tensor|7 years ago|reply
In my experience, it's easy to just learn on the job. Some basic points though:

* Follow whatever formatting and style rules your workplace uses. It's religion and not worth getting into, as long as everyone uses the same style its a win.

* Dev/stage/prod is also workplace specific. Just go with the flow and avoid time wasting arguments on these topics, it's not usually worth it.

* Try to break your work into small commits. This is both easier to review and easier to estimate time on.

* Architect your code so that you can add unit tests. Make sure all your commits have this.

* Prefer longer simpler code to clever code, you're optimizing for newcomers to your code reading it.

* When a one line comment explains it to you, you'll probably need a paragraph at least for someone outside the field to get started understanding it.

* Think about how you'll respond to someone coming to you and saying "something something prod something something your code is buggy." How will you get enough information to determine if this is true, and to debug it when it is? Logging is one good tool here, so consider what you log carefully.

Finally, don't be too surprised if you find people talk down to you. Unless you are in a FAANG company, which it sounds like you are not, developers can be very condescending towards academics (and people from other fields).

[+] khitchdee|7 years ago|reply
That is a wonderful point. There is no replacement for an on-the-job experience as the understudy of a more experienced professional. I have experienced this first hand and can vouch for it.
[+] _9hey|7 years ago|reply
- Most of your job is to make people happy. Communicate well. Coming from pure research, it might feel a little uncomfortable at first. Remember, you're there to consult, and that happens to involve writing code.

- Go to hackathons to learn to ship code fast and get used to building "skateboards". Learn how to make tradeoffs that optimize for development speed. It's not about writing crappy code, it's about optimizing for the right variables at the right time. There are now a lot of real world variables to consider.

- Practice Kanban. Divide and conquer your projects. Make small and focused pull requests. You will naturally start strategizing on how to do things right while you're doing things quickly.

- Using category theory and functional programming in your code, but being practical about it so others can read it, will really help when it comes to writing unit tests. Unguided polymorphism is from the devil.

[+] pjmorris|7 years ago|reply
I'd suggest "The Pragmatic Programmer" by Hunt and Thomas. It's a compendium of advice on being an effective programmer compiled from experience. Also, take look at "The Practice of Programming" by Kernighan and Pike. It's a bit more narrowly-focused, but Kernighan and Pike are models for clarity in programming and in writing.
[+] navinsylvester|7 years ago|reply
It is critical for a company to have a concrete on-boarding process. If your present company doesn't have a good one take this as an opportunity to design one. You will learn a lot and also help others in the process.

Here are some of the guidelines:

  # Code style/guidelines
  # Git/version control workflow
  # Testing methodologies & tools used
  # Agile/project management tools used and best practices
  # Read the wiki about infra/services used in production/dev/staging and its workflow
  # Release guidelines & workflow
  # Mentoring process
  # Engineering style/culture
[+] cnees|7 years ago|reply
Your coworkers are your best resource.

- Ask them to review your code and suggest changes

- Look for questions of taste and ask more. It may feel intuitive to them, but if you dig in you can often find a good reason/principle behind it.

- Read your coworkers' code

- Read the comments people leave on others' code

[+] cnees|7 years ago|reply
Here are some principles that inform my taste in maintainable code:

- Each function should do just one thing.

- Don't reuse a variable if making a new variable with a new name would describe the value better.

- Give functions verb names that describe what they do. If that's hard, they may be doing too many things.

- A function should either change something or return a value (command-query separation)

- Any data should have a single, canonical source of truth. https://en.wikipedia.org/wiki/Single_source_of_truth

- When deciding between making code DRY https://en.wikipedia.org/wiki/Don%27t_repeat_yourself or not, decide if future changes should affect both places at the same time (use DRY) or not (probably doesn't need DRY, maybe shouldn't have it.)

- Avoid spooky action at a distance https://en.wikipedia.org/wiki/Action_at_a_distance_(computer... and if you can't, refactor or at least add comments.

- Write modular functions that can be used without understanding much about the function beside what the name/arguments tell you.

One measure of your success in this area is how quickly someone who'd never seen your code could describe what it does.

[+] mitchellst|7 years ago|reply
This is the best answer here. You've come to the conclusion that you're good at coding alone, but you don't know how to do it well in a team or company— your team and company. Other answers frame this as a technology problem (patterns and practices) but you'll hack it faster as an acculturation task. Get mentors. Plural. Grab one person in each department where you feel shaky— QA's, solutions architects, operations, maybe product, etc. Tell them you're new at this and you want to ask them questions and work closely with them to get better. (This will not offend them and it will not make them look down on you. If it does, you don't want what they're selling anyway.) Two months into asking them for code reviews and just taking them to lunch and asking about things you know they care about in their areas of responsibility, you'll notice results in terms of your own thinking and output. 1 year in, you'll be very, very good at this.
[+] currymj|7 years ago|reply
I have made the jump from writing academic code to working on a product where actual software engineering was encouraged. Although I did jump back to academia pretty quickly.

Hitchhiker's Guide to Python is a very good book (freely available online, or get a copy from O'Reilly); some of it may be obvious but some might not be.

It is true IMO that making your code testable will also make it better designed. It might even be worthwhile to do completely dogmatic test-driven development (i.e. always write tests first, then stub out everything with NotImplementedError, then write actual code until all tests pass) for a while to get used to it, and force yourself to become familiar with tools for dependency injection/mocks/etc.

This is complicated by the fact that unit-testing machine learning code can be unusually tricky; normal unit-testing practices and metrics (e.g. code coverage) may not be very effective.

[+] currymj|7 years ago|reply
Oh and I don't think you'll be as hopeless a coder as some other posters might think, because you did your PhD in a computational discipline and know Python and have heard of unit testing.

There are, say, physics PhDs who only write numerical Fortran or C++ routines (in one big file, sometimes even in one big function), who really might want to attend a boot camp or something but it doesn't sound like you're in that boat.

[+] blt|7 years ago|reply
I went back for a PhD after a few years in industry.

As a PhD student, my code and habits do not meet the standard of industry. This is because I'm constantly changing the whole architecture to try new ideas, so I I optimize for small and simple code at the expense of testability, modularization, robustness, etc.

It's important to recognize this. You will need to change your style.

I can't recommend any one book. I feel like I mostly learned these lessons through random articles, lecture videos, conversations, and personal experience.

IMO, some of the most important principles:

* Implement as much as possible with pure functions (but don't contort the code to achieve this).

* Make your commits as small as possible. Well structured version control history is valuable.

* Spend lots of time time thinking about how data flows through your program, more than how the code is organized.

* Strongly prefer DAG dependency structure. Write a set of libraries and then a top level program that uses them.

[+] svilen_dobrev|7 years ago|reply
Read "Software engineering" by Ian Sommerville. Any edition (maybe from 6 onwards, though they are slighty different.. pick latest u can get). Maybe skim/skip the (technical) parts u think u know, and read the rest. Most will not make sense initialy.. does not matter, keep reading. u need to get all that "uploaded" in brain in order to be able to grasp it one day.

It took me 10 years to be able to skip all the technicals. And another 10+ years to understand why u may ever need the rest..

for judgment etc... Maybe pick some big-enough open-source project in a domain u know well and follow it - how and when they do change what. Dont worry, it does take years to really form your own judgment.

btw u will need some philosophy/methodology/human-side too.. there's not many of it in the above book.

For more, see the recommended readings on www.svilendobrev.com/rabota/

have fun

[+] ttalviste|7 years ago|reply
First of all, SW engineering is a practice with a lot of responsibility. The main responsibility lays in writing code, that is easy to understand. For example, if you think you write well written code, then try reading code that you have written a couple of months ago. Usually, a very painful experience :D

So try to write code for an audience. This has been the trigger for me. Also I encourage code reviews and TDD.

The main learning resources for me have been, Clean Coder videos by Robert C Martin aka Uncle Bob. They are pure gold. They can feel awkward, but after a while they make sense.

Also DDD domain driven design is a key topic to tackle.

Books: - Clean code - DDD by Vaughn Vernon

Videos: - Clean coder E1-E52

With these two books and videos you are on a good track! These worked for me.

[+] Lyren|7 years ago|reply
I can vouch for Clean Coder. We watched them in our company. It's a small dev team so we took the time together. Afterwards we implemented a 4-line rule amongst other things.

We don't always hold ourselves to it, sometimes 5-6 line functions make sense, but we strive toward 4. Sometimes it's as easy as breaking code out into a new function, but sometimes you just simply have to create a class for it. That way a lot of complicated code suddenly becomes very easy without much effort.

[+] grigjd3|7 years ago|reply
Be patient with yourself. You have talents that are quite useful, but good code design and architecture are rarely thought about in academics. Realize that while you spent 4-6 years getting your PhD, your coworkers were becoming better engineers. That doesn't mean you can't do good work, but for a while you'll mostly be learning from others.
[+] tensor|7 years ago|reply
The exception is if you are in the area of study whose entire existence is to understand what is good code design and architecture.
[+] Arnie0426|7 years ago|reply
Agree with a lot of the comments here. I went through this very issue a couple of years back and I did end up reading a few of the books suggested in that thread and while they were good reads, I think I learned the most from my colleagues’ harsh code reviews and developing a slightly thicker skin to those review comments, and not getting triggered at every single slight disagreement. I used to write a lot of grad student code at my current work and got rightly flamed for it (when appropriate)…

These days, I do try and think about the software engineering side of things first just so that I get quicker +1s from my team, and honestly, all the _quick and dirty prototypes_ I used to write (I still do, but a lot less) ended up requiring me to do a lot more debugging/redo-ing/thinking about scaling up etc later on anyway.

Books can get you a decent idea of what to do, but I think I found reading code (and especially code reviews of my colleagues for other people’s code) much more useful. I think reading a few 800 paged books to improve your software engineering skills is a very grad student thing to do. :p. I admit I did that way too much.

[+] baq|7 years ago|reply
whenever you want or need to do something more than shuffling bits between buckets with different names, do some research. most likely someone already did it and published a library for it.

test third-party libraries. it's uncommon to find bugs, but it's not so rare that it happens only to others.

don't forget to leave comments. a lot of my code review questions could be answered (hence could be not asked in the first place) by a well placed comment.

sometimes people say that code is documentation or code should read like documentation. this is false. code can explain (usually poorly) what it does but it can't give a rationale why it does it the way it's been written, can't say what it doesn't do, etc. always write some documentation - comments and commit messages at least. this should be enforced in code review.

i'd say engineering is about not writing code unless absolutely necessary. code is an asset, but it's also a liability. you really don't want more than you need.

[+] sevensor|7 years ago|reply
I made a very similar transition four years ago. Finished a Ph.D. Started a job in a related field writing lots of Python. My advice is to take advantage of your analytical and abstract reasoning skills. Your peers may have more concrete experience writing software, but you did a Ph.D., which means you have the patience and tenacity to follow all the threads until you figure out where they go. That means that where other people might give up, you can actually figure out how the system works and where a new feature fits in it. Or why it doesn't work the way anybody thinks it does. Think of reading other people's code like doing a lit review -- multiple authors, different schools of thought, arguments about how to do things right -- these have all played out in the code base and they're there for you to read. As a Ph.D., you have the ability to pull this all together into something that makes sense.
[+] sanderjd|7 years ago|reply
My two cents: you don't need to read anything at this point, you need to apprentice. Go work somewhere where there are experienced developers. Spend your first weeks there sussing out who is highly respected among your coworkers and choose one or more of those people that you click with. Then just brazenly copy all their techniques and opinions for awhile. Pretty soon you'll find yourself disagreeing with some of what they're doing or thinking. That's natural, but you should resist the urge for awhile; some of that stuff comes from hard-won experience that is hard to explain. Eventually you'll start going your own way more and more. Sometimes that will blow up in your face, and that will give you your own hard-won experiences. Before you know it, you'll be one of the highly respected engineers that the newbies are cribbing from.
[+] anonytrary|7 years ago|reply
Code review, dev/stage/prod workflows all vary on a team-by-team basis. If you already know what they are and why they exist, there isn't a better way to "prepare" for these than to just roll up your sleeves and look at how your current team implements these things.

Good testing practices:

1. Minimize mocking as much as you can -- as a rule of thumb, mocking is inversely proportional to test confidence.

2. Don't test implementation details, test public-facing APIs. This way, your implementation can change. Mocking makes this harder. Don't test how you get things done -- test that they are done.

3. Make sure your API is well defined before you start writing tests, or you will waste time.

You can find loads of Python testing guides on Google on the first two points. There will be times when you have to break some of those rules, but knowing when will come with experience.

[+] AlexCoventry|7 years ago|reply
> the judgment / taste required to make maintainable changes to a million LOC repository

Try The Art of Readable Code (a pair of google authors, IIRC), and Ousterhout's A Philosophy of Software Design.

[+] JanisL|7 years ago|reply
Recently I've been involved in transitioning an academic software piece to an open source library. One of the most noticeable things is the different priorities and emphasis on what is driving value in these different environments. The people who were making the code before had priorities mostly to do with research, the main artifacts were papers and research, the software itself was not the main artifact. The interesting thing is that they had good software and research skills so it wasn't a matter of bad skills muddying the waters and hence gave a great spotlight into how different people can have different priorities with code. So when we were making it into a library which others could base their work off there was a big shift in priorities because the code became an artifact worthy of directly spending more time/money on. You may find what we wrote about this process interesting as it highlights the things from a software engineering/open source perspective that were now important and had to be done to make the project a standalone library useful for consumption by other developers: https://www.customprogrammingsolutions.com/blog/2018-02-25/P...