Torvalds' quote about good programmers

[+] antirez|13 years ago|reply

This is one of the few programming quotes that is not just abstract crap, but one thing you can use to improve your programming skills 10x IMHO.

Something like 10 years ago I was lucky enough that a guy told me and explained me this stuff, that I was starting to understand myself btw, and programming suddenly changed for me. If you get the data structures right, the first effect is that the code becomes much simpler to write. It's not hard that you drop 50% of the whole code needed just because you created better data structures (here better means: more suitable to describe and operate on the problem). Or something that looked super-hard to do suddenly starts to be trivial because the representation is the right one.

[+] smoyer|13 years ago|reply

+1 ... and it's been known for a long time. Back in the '70s, Fred Brooks said "Show me your [code] and conceal your [data structures], and I shall continue to be mystified. Show me your [data structures], and I won't usually need your [code]; it'll be obvious."

[+] necrodome|13 years ago|reply

any concrete examples you can point to?

[+] robomartin|13 years ago|reply

Absolutely right. I was lucky enough to learn this in college. Although, I did not learn it from the CS professors but rather my physics prof. He was a champion for a language called APL and he actually cut a deal with the CS department to accept credits for taking an APL class he was teaching as a substitute for the FORTRAN class. APL was an amazing mind-opening experience.

Throughout the APL 101 and 102 courses he would repeat this mantra: "Work on your data representation first. Once you have fully understood how to represent your data start thinking about how to manage it with code."

He would throw this at us permanently. At the time it sounded like our Physics prof had lost his marbles (he was a very, shall we say, eccentric guy). It would take a few years after college for me to realize the value of that advise.

Put another way, our business is managing data of some sort. Whether you work on embedded systems or web applications, you are always dealing with data. You can make your programs far more complicated than necessary by neglecting to choose the right (or a good) representation of your problem space (data).

I equate it to designing an assembly line. Anyone who's watched a show like "How it's Made" cannot escape the realization that efficient manufacturing requires engineering an efficient assembly process. Sometimes far more engineering work goes into the design of the manufacturing process and equipment than the part that is actually being made. The end result is that the plant run efficiently and with fewer defects than alternative methods.

In programming, data representation can make the difference between a quality, stable, bug-free and easy to maintain application and an absolute mess that is hard to program, maintain and extend.

[+] sedachv|13 years ago|reply

Your prof was onto something that seems to be very in the zeitgeist today. To "understood how to represent your data" you have to understand what it is you're trying to represent. Eric Evans popularized this notion with Domain-Driven Design.

If you follow this line of thinking far enough, you realize that computer programming is just applied analytic philosophy. You have your metamodel (logic/programming language) and then you build your model (ontology/software).

I really like your assembly line metaphor. Knowing that a "customer" has a name and email address is almost of no importance compared to understanding how the "customer" information arrives, the actions around the "customer," and the end result of the actions. That's the assembly line.

[+] lifeisstillgood|13 years ago|reply

Thank you, and you have just reminded me why I am liking the explosion of no-SQL stores - we have for a very long time been storing all our data in one factory design, with one, really flexible and powerful layout.

Being able to have a red black factory is rather nice. Although it does mean we now need to think carefully about what factory we shall need before even starting. And accepting occasionally moving the while factory three blocks over, during prodction

[+] lifeisstillgood|13 years ago|reply

[deleted]

[+] mtkd|13 years ago|reply

You can normally fix bad code - fixing bad data structures is not usually easy or even possible.

It's why I've still not fully bought in to 'release early release often'.

I prefer to defer releasing for production use until really satisfied with the structures - this way you have no barrier to ripping the foundations up.

If not 100% comfortable with the model - prototype a bare metal improved one (schemaless DBs ftw) - if it feels better start pasting what logic/tests you can salvage from the earlier version and move on.

[+] hermannj314|13 years ago|reply

I'm in the position of maintaining a legacy codebase. I feel like I've shown up half-way through a game of Jenga and management still wants me to play the game with the same speed as the guy who played the 1st half.

Meanwhile, he's been promoted to start work on a brand-new Jenga tower since he's demonstrated such remarkable success in the past.

I just want everyone to stop playing Jenga.

[+] nahname|13 years ago|reply

The whole point of continuous delivery is that correcting things like data structures is no longer the big deal it once was because it happens frequently. Rather than letting months (or years!) of data migrations pile up, you have a few days worth (or weeks). In my opinion, it is best to ship something that works today and have a system in place that makes correcting it as painless as possible.

That's the real problem with the current model. The data structures will never be perfect and you cannot know how they will change. Yet they do. Then all the FUD from the last migration that scared everyone prevents the team from due diligence and correcting issues when they are discovered. The team waits until the problem comes to a head, management has to be involved, new FUD is created and people dream about perfect data structures to prevent this whole mess.

Remember,

Shipping > Shipping Shit > Not Shipping

[+] chernevik|13 years ago|reply

As a new programmer I know I should ship a lot faster than I do, but focussing on data structures makes me really slow. I can usually jump a hurdle with a hack on my extant structures, but this introduces code complexity and leaves me at square one when a similar hurdle appears elsewhere. I try to be disciplined about fixing stuff at a data structure level, but changes there set off change propagations throughout the code. Or I introduce hacks into the data structures, which then become convoluted and start acquiring code of their own.

I find unit testing does help with all this. It forces exposure of the data stuctures, essentially documenting them. And good coverage gives a list of breakages and sometimes helps find elegant repairs. But I also find myself wanting high level-tests, I guess essentially integration tests, that check not components but overall behavior, and I find writing and maintaining these becomes a real problem.

But I really, really wish I had better tools / procedures for thinking through the problems and designing a proper data model for solving them.

[+] joedoe55555|13 years ago|reply

Agree, fixing bad data structures is much more painful than fixing bad code. The reason is that the deployment of the refactoring has the complexity of a new deployment, or even higher.

However, given that at large organizations updates and deployments can easily become political issues, it's a good habit to deploy often. That makes your life easier when trying to deploy new changes because those who are watching or performing the deployments get used to it - and the errors occuring during such deployments.

[+] jt2190|13 years ago|reply

You're just hung up on your definition of "release", which is "release for production use". "Release early release often" doesn't dictate how you release your application, just that you expose it in some form (private beta, public beta, pre-release) to the real world for vetting. Projects that fail to vet their assumptions are more prone to poor data structures and over-engineering.

[+] dustingetz|13 years ago|reply

release early release option gives you chances to fix your mistakes before its too late. waterfall works great if you can manage to get your data structures perfect before production. it begins to fail when it becomes prohibitively expensive to fix mistakes after production, where not unexpectedly breaking old code has precedence over deploying new.

[+] barrkel|13 years ago|reply

This is approximately the same reason as why I start out writing most of my programs by creating a bunch of types, and why I find dynamic programming languages uncomfortable to use.

I'm less and less a fan of the ceremony of object orientation, but I think there's a lot to be said for having a succinct formalized statement of your data structures up front. Once you understand the data structures, the code is usually easy to follow. The hardest times I've had comprehending code in my career, apart from disassembly, have been from undocumented C unions.

[+] tikhonj|13 years ago|reply

It sounds like you'd really like Haskell. It gives you a far more succinct way to represent your data. Since the overhead of creating a new type is very low, you also become far more likely to express more of your logic in the types.

I always start my Haskell projects by laying out the data types. The type declarations are very readable--you can just skim over them to see what's going on. This means that you can get a very good overview of what the project wants to do just by quickly looking over what types it declared, and then looking through functions' type signatures.

Then, after you have the data types defined, the resulting code is not only easier to read but reflects the data operations it is doing. I've found the type signatures above each function--which are optional but highly recommended--really help tie the code back to the data it operates on. Additionally, pattern-matching against the data makes the structure of most functions clearly reflect the exact data it is working on.

The low overhead of creating Haskell types also makes it very easy to add aliases to existing types. So perhaps you use a `Map Int String` in your code; you then give it a domain specific name:

    type IdMap = Map Int String

so your functions refer just to that. Then, when somebody comes along and tells you about `IntMap`, refactoring all your code to use `IntMap String` is far easier!

So if you really like a more type/data directed style of programming but are getting tired of OOP, you should definitely check out Haskell (or something similar like OCaml).

[+] antidoh|13 years ago|reply

"This is approximately the same reason as why I start out writing most of my programs by creating a bunch of types, and why I find dynamic programming languages uncomfortable to use."

Classes. There are your types. Python has them. Ruby has them. Javascript has them. . . .

[+] hcarvalhoalves|13 years ago|reply

Data structures != Types

Dynamic languages are not an argument to make bad use of data structures in any way, it changes nothing.

[+] hermannj314|13 years ago|reply

Next week on Hacker News: Bad Programmers worry about their code. Good programmers ship.

"Bad programmers [technique A on programming KPI metric N1]. Good programmers [technique B on programming KPI metric N1]."

Responses: Someone will ask, "What about metric N2?" And someone will say, "What about technique C?" Someone will post a personal anecdote showing that people really underestimate the value of A. Someone will respond to that by posting a hyperlink to an anecdote that shows technique B really is what matters.

[+] jrajav|13 years ago|reply

10 people learn about techniques A, B, and C who didn't before. 10 other people start thinking in terms of metrics N1 and N2 who weren't before. We learn and improve collectively. I think that is a pretty amazing thing about the internet and boards like this.

That's not to say that some things don't get passed around a lot, but that's generally because they're worthwhile enough to make sure that everyone gets a look.

[+] InclinedPlane|13 years ago|reply

To paraphrase, if I may, a novice imagines that the goal of programming is to create code which solves a problem. This is true, but limited. The real goal is to create an abstract model which can be used to solve a problem and then to implement that model in code.

[+] slurgfest|13 years ago|reply

I really hope that people reading this know to apply Ockham's Razor to their abstract models (lest they write a lot of AbstractSingletonProxyFactoryBeans).

[+] 6ren|13 years ago|reply

I see a program as a theory, a theory of the problem it solves. You can see how well it generalises, if it is needlessly complicated (Occam)... and in some magical moments, you'll find it predicting phenomena you hadn't explicitly anticipated.

So I think a program's conceptualisation of a problem is the most important thing - more important than data structures or code. Though, data structures are usually closer to it, by representing the concepts you model.

However, it's really hard to get these things right. Linus created both his great successes (linux and git) after experience with similar systems (unix/minix and bitkeeper). Being able to play with an implementation, experience its strengths and weaknesses, gives you a concrete base to use, reason with, push against, and come up with new insights - it's enormously helpful in seeing the problem.

But that's a grand vision - I wonder if Linus is also talking about programming in the small, each step of the way, as a low-level pragmatic guide. I don't like git's interface or docs much, but the concepts are great, it is implemented really well, very few bugs, and even on ancient hardware boy is it fast.

[+] jedbrown|13 years ago|reply

This is Normalization rearing its head. A properly normalized database can be extended without needing refactoring and does not have modification anomalies. There is a formal process to normalization. There is no such equivalent in code, but a poorly normalized data model virtually guarantees that any code wrapped around it will be messy. Conversely, mediocre code wrapped around a clean data model (less common in the wild) is much more amenable to incremental improvement.

[+] joefarish|13 years ago|reply

"good data structures make the code very easy to design and maintain, whereas the best code can't make up for poor data structures."

Quite a nice summary, courtesy of http://programmers.stackexchange.com/a/163195/31774

[+] east2west|13 years ago|reply

This brings up a burning question that I have been pondering for a while. I still don't get how to properly design good APIs. I have been programming but as a scientific research not as a professional developer, and I have found I cannot remember how to use my code a day after writing that code.

Take my current project as an example. I have some samples, each of which are observations along a sequence of non-overlapping segments. My objective is to extract observations over arbitrary intervals for all samples. So I have a segment defined as a pair of start and end position plus its observation, a sample as a vector of segments, and all samples as a dictionary of samples. There are various utility functions to make segments, collect samples, and scan individual samples. The problem is I have to remember all three levels of data structures to use this code. I wonder whether it is better to define an interface for those data structures as well so I just need to remember the interface. My objections to formal definition of interfaces is that everything is so simple and obvious and formal interfaces smack of overengineering.

I got to this point because in my previous projects I put every identifiable thing as a class and found too much coupling in classes and convoluted interfaces.

[+] joedoe55555|13 years ago|reply

Hi, I used to have a similar problem: seeing that when I do architectures they become difficult to maintain. Or difficult to talk about, to be honest, I don't understand your second paragraph :)

What really helped me is the approach to program API-driven. Don't start with your algorithms but start with what kind of functions you probably need and what would be the easiest way to use them. (In fact this is not so far from this data-centric approach as the most basic functions of APIs are usually function to retrieve or modify data.)

Try to read some good code from one of your favorite open-source projects. At some point some code may catch your attention because it's so simple and elegant. Why is it so elegant? Often because the underlying structures are just simple and made from common-sense. Don't over-engineer stuff, the simpler solution is often superior to the full-featured solution. And often you should ask yourself: do I really need this features currently to show some progresS? Shouldn't I not rather post-pone it?

[+] sedachv|13 years ago|reply

How do you name things? From your text ("I have a segment defined as a pair of start and end position plus its observation, a sample as a vector of segments, and all samples as a dictionary of samples") it seems you have a lot of different names for different things. Decide what names are important and which ones aren't (the "vector" and "dictionary" probably aren't).

Eric Evans wrote a big book called Domain-Driven Design about these things, but his advice basically boils down to this.

[+] kobolt|13 years ago|reply

This is similar to the rule of representation from the Unix philosophy, covered here: http://www.faqs.org/docs/artu/ch01s06.html

"Rule of Representation: Fold knowledge into data so program logic can be stupid and robust."

[+] maxwell|13 years ago|reply

This seems to apply to all kinds of "writing" (symbol sequence generation), from math to poetry, though the terms differ, e.g.:

  Bad novelists worry about the plot. Good novelists worry   
  about the characters and their relationships.

[+] warmfuzzykitten|13 years ago|reply

That's more a value judgement. As Samuel Johnson said, "No man but a blockhead ever wrote, except for money." "Bad" novelists who worry about the plot can make a lot of money.

[+] lttlrck|13 years ago|reply

Algorithms + Data Structures = Programs

A 1976 book written by Niklaus Wirth, designer of Pascal

http://en.wikipedia.org/wiki/Algorithms_%2B_Data_Structures_...

[+] qznc|13 years ago|reply

... and then there is user interface, debugging, support for various formats, documentation, and other mostly boring stuff.

[+] zxcdw|13 years ago|reply

Heh, exactly my answer on the question at programmers SE.

[+] Jarihd|13 years ago|reply

So you are at some "Source" and you want to get to the required "Destination"(goal) -- what do you do ???? --- you plan your journey well. Let your plan take into considerations all the possibilities --- all pros an cons.

Good Programmers - well they plan; understand the requirements of the problem; case study or analyze the problem space; consider all(or most) of the possibilities to reach their goal(destination); then make a design (create a plan) and decide on the path(s) to be taken i.e. choose data structures, algorithms, programming language, and other factors. Having understood the pros and cons of their design; they begin to code. This process generally works most of the time; but there are times when you go mid way and then change the design or might consider another alternative(like data-structures); this generally happens when you've missed some problem space to analyze earlier while planning. None-the-less; planning well ahead of time; before you begin coding helps get a good product and helps save a lot of time, money and effort.

Bad programmers on the other hand know about their Destination(goal) but do not know how to get there; they simply jump into coding hoping that they would someday get to their destination. This too works, but it takes more time; and when one realizes that one has made a mistake; it becomes very difficult to come up with a new plan to move forward from that point. The product loses its quality. Often you land up starting again from square zero.

[+] bitcracker|13 years ago|reply

IMHO: That's why good programmers love Lisp.

Lisp is all about data structures. Data can be expressed so easily no matter how complex it is. Lisp coding is merely writing minimum code to handle data structures. Even code is data. So it's no problem to extend Lisp with new commands. That's precisely coding around data.

In Java or C# however you have a lot of libraries to handle data but you don't have such freedom of data expression. You have to write a lot of code to express and handle complex data.

[+] zalew|13 years ago|reply

there's quite a similar quote on photography

"Amateurs worry about equipment, professionals worry about money, masters worry about light"

[+] sneak|13 years ago|reply

The ending changes the whole thing:

"... I just take pictures."

Programming, motherfucker.

[+] seanalltogether|13 years ago|reply

My only problem with this quote is it equates "new" programmers with "bad" programmers. Yes if you still have these problems after 10 years of professional work, then you're a bad programmer, but if you show these symptoms after 6 months it just means you're still learning. There's got to be a better way of stating this.

[+] dsymonds|13 years ago|reply

A new programmer is a bad programmer. There's no shame in that. In fact it's better for a new programmer to realise that; the worst programmers are those that don't even realise they are bad.

A new programmer may still be learning, but that doesn't mean they aren't bad. It just means they are bad now, and hopefully won't be bad later.

[+] nnq|13 years ago|reply

I always do it the other way around (when starting software from scratch):

0. write the simplest mock/pseudo-code I can think of for the business logic that needs to be implemented

1. extract from this ideal code the data structure that it needs in order to actually be so simple and write real code that implements these ideal data structures

2. write the real code that actually does the work

I think Linus means the same thing, but he doesn't get it that in order to imagine those "perfect data structures" he has to start with some idea of the code that will be using them, otherwise they will not be "perfect" for his program. I'm sure he's just smart enough to go through my "0" step in his mind without actually writing things down.

It's an obvious case of very smart people omitting the steps that are obvious/implicit to them when expressing their ideas to "lesser minds"...

[+] jiggy2011|13 years ago|reply

Maybe this is a dumb quesion , but I don't get how you would write code without thinking about your data structures?

Most of the code I write is manipulating a data structure in some way, I have no idea how I would even know where to begin with at least some idea about which structure I should be using.

[+] mdonahoe|13 years ago|reply

The data structure you choose will have a profound impact on the code you write, so choose wisely.

[+] KevinMS|13 years ago|reply

I think this can be distilled down to: "bad programmers worry about how, good programmers worry about why"

104 comments