I taught myself how to code and ended up doing a PhD in a computational discipline. Programming has been a big part of my life for at least the last decade, during which I've written code almost every day, but always by myself. After graduating I joined a medium-sized company (~10^2 developers) as a machine learning engineer and realized how much I don't know about software engineering. I feel very comfortable with programming in the small, but programming in the large still feels mostly opaque to me. Issues like testing / mocking, code review, management of dev / stage / prod workflows and, most importantly, the judgment / taste required to make maintainable changes to a million LOC repository, are areas where I can tell I need to improve.Former academics who moved into software engineering, which resources did you find most useful as you made the transition? Python-specific or language-agnostic books would be most helpful, but all advice would be welcome.
[+] [-] ChuckMcM|7 years ago|reply
My observation is that you're halfway there when you realize that you need to improve, of the folks I saw who did poorly it was because they didn't realize that you could be both the smartest person in the room and the least capable at the same time.
Right now, on your first job experience, even a kid who never went to college is better at programming than you are because they've been experiencing all the pitfalls that can happen and have built up a knowledge from that which is perhaps more intuitive than formal but serves them well. What you have over that person is that you've trained yourself to consume, classify, organize, and distill massive amounts of information in a short amount of time.
Use that training to consume everything you can on the art of writing programs. Read "Test Driven Development" read "Code Complete", read "Design Patterns", read "The Design of the UNIX Operating System", read "TCP/IP Illustrated Volume 1", and a book or two on databases. With your training you should be able to skim through those to identify the various "tricky bits" and get a feel for what is and what is not important in this new field of yours.
Soak in as much as you can and ask a lot of questions. Pretty soon no one will know you haven't been doing this your whole life.
[+] [-] stedalus|7 years ago|reply
[+] [-] rhizome31|7 years ago|reply
Any advice to deal with this kind of people would be much appreciated. I had a hard time trying to convey basic software engineering practices such as not hard-coding file paths, using version control instead of filenames to manage versions, removing dead code, keeping the code at least moderately DRY, more or less sticking to a code convention, etc. They would use all their reasoning power to refute what they would see as meaningless pet-peeves. It was tiring. I tried to provide pointers so they could find out for themselves but they gave it zero interest. At the moment I've given up.
The simple fact the OP is asking this question here means she or he will almost certainly do greatly.
[+] [-] anonytrary|7 years ago|reply
[+] [-] samuell|7 years ago|reply
On a related note, I just this week stumbled upon a post / essay, which was probably the best piece of writing I've read in computer-aided engineering, and working in a disciplined manner. I'd recommend reading it right now, as the very first thing you do: https://queue.acm.org/detail.cfm?id=3197520
[+] [-] byebyetech|7 years ago|reply
What? Those are unnecessary details as far as software engineering is concerned.
[+] [-] crackerjackmack|7 years ago|reply
* testing - write your functions small enough to be readable, but not so small their abstractions are meaningless (because you have to test them all)
* testing - don't reach into your code's modules and mock. Instead use dependency injection with non-testing defaults
* code review - It shouldn't be personal, if it is are you reading it wrong or are they attacking you personally?
* code review - when referencing style complaints ask for reference material. Don't get caught in cyclic-pedantic style war between lead devs.
* code - your code should be environment agnostic, if you have environment/context specific things to do, pass along a environment/configuration dict or make a global config singleton. As long as your code depends on that you can write code more discretely.
* code - personal preference but try to not nest your loops too deeply, when you can use itertools.
* code - if you can help it, try not to mutate dicts/objects in place while in a loop. Makes testing a difficult.
* code - exit early if possible, test for failures instead of nesting your entire function inside a single `if`. Helps identify the bad inputs faster as well.
Above all, remember code isn't perfect. It's a tool to get to an end goal. If you aren't solving for the end goal you aren't solving the right problem. At the end of the day, you are employed to build a product and that product needs to perform it's job. (that isn't a pass to write super shitty code)
edit: formatting
[+] [-] ahussain|7 years ago|reply
[+] [-] loup-vaillant|7 years ago|reply
More precisely, you want your functions to be deep. The interface to implementation ratio must be as low as is reasonable: you want to hide implementation behind interfaces that are much smaller than them.
This goes for functions (something with 5 arguments that takes 2 lines is too shallow, something with 2 arguments that spans 20 lines is deep), classes (something that is mostly getters and setters is too shalow, something with 3 methods full of business logic is deep), or anything else.
This is not just for testing, this is to make sure you can understand the program at all. Deep functions (and deep classes, and deep modules…), are also about good old decoupling: the less you have to understand about a piece of code to be able to use it, the better.
[+] [-] keshab|7 years ago|reply
I would also go a bit further and put nesting if statements. Sometimes it's really required but other times nesting can be avoided. I try to avoid nesting as much as possible.
[+] [-] fraudsyndrome|7 years ago|reply
Could you please go into more depth with this?
[+] [-] fdsvnsmvas|7 years ago|reply
Robert C. Martin, Clean code: https://www.amazon.com/Clean-Code-Handbook-Software-Craftsma...
Vaughn Vernon, various: https://vaughnvernon.co/?page_id=168
Steve McConnell, Code Complete: https://www.amazon.com/Code-Complete-Practical-Handbook-Cons... 2
Clean coder: https://cleancoders.com/ videos
Hunt and Thomas, The Pragmatic Programmer: https://www.amazon.com/Pragmatic-Programmer-Journeyman-Maste...
Hitchhiker's Guide to Python: https://docs.python-guide.org/
Dustin Boswell The Art of Readable Code: https://www.amazon.com/Art-Readable-Code-Practical-Technique...
John Ousterhout, A Philosophy of Software Design: https://www.amazon.com/Philosophy-Software-Design-John-Ouste... This one looks particularly interesting, thanks AlexCoventry!
Kent Beck, Test Driven Development: https://www.amazon.com/Test-Driven-Development-Kent-Beck/dp/...
Dan Bader, Python Tricks: The Book: https://dbader.org/
Ian Sommerville, Software Engineering: https://www.amazon.com/Software-Engineering-10th-Ian-Sommerv...
Svilen Dobrev, various: http://www.svilendobrev.com/rabota/
[+] [-] sanderjd|7 years ago|reply
[+] [-] blub|7 years ago|reply
Martin is well intentioned, but very dogmatic about some things like TDD, functions size, personal responsibility, etc. You need to already have some decent engineering experience to be able to detect and ignore the harmful stuff from his books.
[+] [-] n4r9|7 years ago|reply
[+] [-] tensor|7 years ago|reply
* Follow whatever formatting and style rules your workplace uses. It's religion and not worth getting into, as long as everyone uses the same style its a win.
* Dev/stage/prod is also workplace specific. Just go with the flow and avoid time wasting arguments on these topics, it's not usually worth it.
* Try to break your work into small commits. This is both easier to review and easier to estimate time on.
* Architect your code so that you can add unit tests. Make sure all your commits have this.
* Prefer longer simpler code to clever code, you're optimizing for newcomers to your code reading it.
* When a one line comment explains it to you, you'll probably need a paragraph at least for someone outside the field to get started understanding it.
* Think about how you'll respond to someone coming to you and saying "something something prod something something your code is buggy." How will you get enough information to determine if this is true, and to debug it when it is? Logging is one good tool here, so consider what you log carefully.
Finally, don't be too surprised if you find people talk down to you. Unless you are in a FAANG company, which it sounds like you are not, developers can be very condescending towards academics (and people from other fields).
[+] [-] khitchdee|7 years ago|reply
[+] [-] _9hey|7 years ago|reply
- Go to hackathons to learn to ship code fast and get used to building "skateboards". Learn how to make tradeoffs that optimize for development speed. It's not about writing crappy code, it's about optimizing for the right variables at the right time. There are now a lot of real world variables to consider.
- Practice Kanban. Divide and conquer your projects. Make small and focused pull requests. You will naturally start strategizing on how to do things right while you're doing things quickly.
- Using category theory and functional programming in your code, but being practical about it so others can read it, will really help when it comes to writing unit tests. Unguided polymorphism is from the devil.
[+] [-] pjmorris|7 years ago|reply
[+] [-] navinsylvester|7 years ago|reply
Here are some of the guidelines:
[+] [-] cnees|7 years ago|reply
- Ask them to review your code and suggest changes
- Look for questions of taste and ask more. It may feel intuitive to them, but if you dig in you can often find a good reason/principle behind it.
- Read your coworkers' code
- Read the comments people leave on others' code
[+] [-] cnees|7 years ago|reply
- Each function should do just one thing.
- Don't reuse a variable if making a new variable with a new name would describe the value better.
- Give functions verb names that describe what they do. If that's hard, they may be doing too many things.
- A function should either change something or return a value (command-query separation)
- Any data should have a single, canonical source of truth. https://en.wikipedia.org/wiki/Single_source_of_truth
- When deciding between making code DRY https://en.wikipedia.org/wiki/Don%27t_repeat_yourself or not, decide if future changes should affect both places at the same time (use DRY) or not (probably doesn't need DRY, maybe shouldn't have it.)
- Avoid spooky action at a distance https://en.wikipedia.org/wiki/Action_at_a_distance_(computer... and if you can't, refactor or at least add comments.
- Write modular functions that can be used without understanding much about the function beside what the name/arguments tell you.
One measure of your success in this area is how quickly someone who'd never seen your code could describe what it does.
[+] [-] mitchellst|7 years ago|reply
[+] [-] currymj|7 years ago|reply
Hitchhiker's Guide to Python is a very good book (freely available online, or get a copy from O'Reilly); some of it may be obvious but some might not be.
It is true IMO that making your code testable will also make it better designed. It might even be worthwhile to do completely dogmatic test-driven development (i.e. always write tests first, then stub out everything with NotImplementedError, then write actual code until all tests pass) for a while to get used to it, and force yourself to become familiar with tools for dependency injection/mocks/etc.
This is complicated by the fact that unit-testing machine learning code can be unusually tricky; normal unit-testing practices and metrics (e.g. code coverage) may not be very effective.
[+] [-] currymj|7 years ago|reply
There are, say, physics PhDs who only write numerical Fortran or C++ routines (in one big file, sometimes even in one big function), who really might want to attend a boot camp or something but it doesn't sound like you're in that boat.
[+] [-] blt|7 years ago|reply
As a PhD student, my code and habits do not meet the standard of industry. This is because I'm constantly changing the whole architecture to try new ideas, so I I optimize for small and simple code at the expense of testability, modularization, robustness, etc.
It's important to recognize this. You will need to change your style.
I can't recommend any one book. I feel like I mostly learned these lessons through random articles, lecture videos, conversations, and personal experience.
IMO, some of the most important principles:
* Implement as much as possible with pure functions (but don't contort the code to achieve this).
* Make your commits as small as possible. Well structured version control history is valuable.
* Spend lots of time time thinking about how data flows through your program, more than how the code is organized.
* Strongly prefer DAG dependency structure. Write a set of libraries and then a top level program that uses them.
[+] [-] svilen_dobrev|7 years ago|reply
It took me 10 years to be able to skip all the technicals. And another 10+ years to understand why u may ever need the rest..
for judgment etc... Maybe pick some big-enough open-source project in a domain u know well and follow it - how and when they do change what. Dont worry, it does take years to really form your own judgment.
btw u will need some philosophy/methodology/human-side too.. there's not many of it in the above book.
For more, see the recommended readings on www.svilendobrev.com/rabota/
have fun
[+] [-] ttalviste|7 years ago|reply
So try to write code for an audience. This has been the trigger for me. Also I encourage code reviews and TDD.
The main learning resources for me have been, Clean Coder videos by Robert C Martin aka Uncle Bob. They are pure gold. They can feel awkward, but after a while they make sense.
Also DDD domain driven design is a key topic to tackle.
Books: - Clean code - DDD by Vaughn Vernon
Videos: - Clean coder E1-E52
With these two books and videos you are on a good track! These worked for me.
[+] [-] Lyren|7 years ago|reply
We don't always hold ourselves to it, sometimes 5-6 line functions make sense, but we strive toward 4. Sometimes it's as easy as breaking code out into a new function, but sometimes you just simply have to create a class for it. That way a lot of complicated code suddenly becomes very easy without much effort.
[+] [-] grigjd3|7 years ago|reply
[+] [-] tensor|7 years ago|reply
[+] [-] Arnie0426|7 years ago|reply
These days, I do try and think about the software engineering side of things first just so that I get quicker +1s from my team, and honestly, all the _quick and dirty prototypes_ I used to write (I still do, but a lot less) ended up requiring me to do a lot more debugging/redo-ing/thinking about scaling up etc later on anyway.
Books can get you a decent idea of what to do, but I think I found reading code (and especially code reviews of my colleagues for other people’s code) much more useful. I think reading a few 800 paged books to improve your software engineering skills is a very grad student thing to do. :p. I admit I did that way too much.
[+] [-] baq|7 years ago|reply
test third-party libraries. it's uncommon to find bugs, but it's not so rare that it happens only to others.
don't forget to leave comments. a lot of my code review questions could be answered (hence could be not asked in the first place) by a well placed comment.
sometimes people say that code is documentation or code should read like documentation. this is false. code can explain (usually poorly) what it does but it can't give a rationale why it does it the way it's been written, can't say what it doesn't do, etc. always write some documentation - comments and commit messages at least. this should be enforced in code review.
i'd say engineering is about not writing code unless absolutely necessary. code is an asset, but it's also a liability. you really don't want more than you need.
[+] [-] sevensor|7 years ago|reply
[+] [-] sanderjd|7 years ago|reply
[+] [-] anonytrary|7 years ago|reply
Good testing practices:
1. Minimize mocking as much as you can -- as a rule of thumb, mocking is inversely proportional to test confidence.
2. Don't test implementation details, test public-facing APIs. This way, your implementation can change. Mocking makes this harder. Don't test how you get things done -- test that they are done.
3. Make sure your API is well defined before you start writing tests, or you will waste time.
You can find loads of Python testing guides on Google on the first two points. There will be times when you have to break some of those rules, but knowing when will come with experience.
[+] [-] AlexCoventry|7 years ago|reply
Try The Art of Readable Code (a pair of google authors, IIRC), and Ousterhout's A Philosophy of Software Design.
[+] [-] JanisL|7 years ago|reply