Data-oriented design or why you might shoot yourself in the foot with OOP (2009)

[+] ferdowsi|4 years ago|reply

In all my experience with OOP, it's always been inheritance that is the root of all evil. Rust and Go got this correct by having class-like objects with no inheritance, to achieve encapsulation without fragility.

Unfortunately, all the other languages that included inheritance in their design can't wish it away. Devs are going to keep reaching for inheritance as the closest, most comfortable abstraction.

[+] colllectorof|4 years ago|reply

>In all my experience with OOP, it's always been inheritance that is the root of all evil.

It's not, though, and the fact that people keep repeating this meme shows that most developers don't even bother thinking about issues they face beyond superficial blamesplaining.

The reason inheritance causes so many issues in languages like Java is because they are statically typed and also use classes as types[1]. Classes must be somewhere in the inheritance tree, hence you are forced into some place of that tree. To make things worse, Java has many keywords that restrict what inheritor of a class can do (private, final, etc).

Inheritance is much less troublesome in, say, Smalltalk, since the language is dynamically typed. If someone expects you to implement Foo, you can (almost always) just implement its relevant methods without explicitly extending the class. Thus, a whole host of annoying scenarios simply does not occur.

--

[1] BTW, this breaks one of the fundamental commandments of classic OOP: you should not depend on implementation details of an object, only on its message protocol. Obviously, it's impossible to be independent of implementation details if some library forces you to use a particular class.

[+] Zababa|4 years ago|reply

> In all my experience with OOP, it's always been inheritance that is the root of all evil.

I have this "theory" in the back of my head that trees are usually the wrong things to model thing in life but it's what come to us naturally. For example, a blog with categories and sub-catogories for articles (a tree, inheritence) can often describe the content better by using tags (a graph, composition). I think that's because trees are easy to deal with and understand, but graphs are more "open" with what you can do.

[+] lkrubner|4 years ago|reply

I've previously suggested that initiation is the "root of all evil." See my essay:

Object Oriented Programming Is An Expensive Disaster Which Must End:

http://www.smashcompany.com/technology/object-oriented-progr...

[+] meheleventyone|4 years ago|reply

The weirdest thing is that the ECS as a way of building a game is inherently object oriented. You take a set of components and compose an object called an entity. The components on the entity define not only it's data but also it's behavior by the set of systems that act on the corresponding components. And you can take these object definitions and inherit them to add additional behavior or change the existing behavior by adding more components to the new definition.

Then if you solve the entity communication conundrum with message passing and don't allow entities to directly access one another's data you basically have all the elements.

[+] lowbloodsugar|4 years ago|reply

Like anything, inheritance can be used poorly, just as anyone can right poorly encapsulated code in Rust or Go. You might be able to convince me that inheritance is too dangerous for idiots, but then so is a computer, and we'd be debating where to draw the line of how smart/experienced you have to be to use it safely.

This article from Noel, and the ones from Mike he links to, get under the hood and into "what is the compiler doing" and "what is the CPU doing". Down here, we're looking at how to use the features of whatever language we're using to get the results we want, rather than "how should i program oop gud".

[+] jayd16|4 years ago|reply

It's the dose that makes the poison. Of course we want nice typed collection libraries and interfaces. The kids go overboard though.

[+] simiones|4 years ago|reply

Go actually does implement inheritance, albeit in a roundabout sort of way: a struct can have one or more base members, and any method defined on the base members is accessible from the new struct implicitly, so they also implicitly implement any interface that was implemented by their base members.

Here's an example (in the playground, because it gets a bit long): https://play.golang.org/p/TblQypAbIL2

[+] vlunkr|4 years ago|reply

I think that it can work. IMO ActiveRecord is a perfect use of inheritance. You get tons of useful functionality out of the box, you don't have to worry about what that code looks like, and it's easy to extend or modify it. But often when I see co-workers come up with their own hierarchies, it saves maybe a couple of lines of code and makes it 5x more difficult to read, since you're jumping between parent and child classes and trying to keep track of the order of execution.

[+] duped|4 years ago|reply

I don't agree with this, and I'm personally an anti-OOP militant.

Inheritance isn't the root of all evil, dynamic dispatch is. It's a remarkably powerful implementation detail but one with enormous cost, regardless of whether you're using an AoT/JIT compiled or interpreted language.

[+] axilmar|4 years ago|reply

There is some middle ground between data-oriented design and OOP: just organize your objects in such a way that:

a) objects of the same type occupy continuous blocks in memory,

b) messages are passed to objects of the same type, then to objects of another type etc.

In this way, you don't lose the advantages of encapsulation, inheritance, polymorphism etc but you also don't sacrifice cache coherence much.

OOP does not enforce a 'random' memory access order, you can very will organize your objects in such a way that speed is not sacrificed much.

[+] ajuc|4 years ago|reply

This is kinda what Entity Component Systems do - they implement in-memory relational database for game objects, handle dependenceis and allow your game logic code to run efficiently over them while still keeping the pretense of OOP :)

Why pretense? Because behaviors (Systems in ECS terms) are completely separated from data (Components) and data for different game objects (Entities) is kept together in regular or sparse arrays.

Encapsulation is nowhere to be seen, code is written to specify the components it depends on and run on these arrays.

ECS is very fashionable in gamedev lately as it allows for efficient multithreading, explicit depencencies for each subsystem, cache locality and trivial (de)serialization. Used together with handles (tagged indexes instead of direct pointers) it reduces likelihood of dangling pointers and other memory management bugs.

[+] bruce343434|4 years ago|reply

> objects of the same type occupy continuous blocks in memory,

Depending on the language, a single object may have a lot of overhead that adds up in an array. What you often see is one ArrayObject with arrays of properties, kind of like a transposition.

A problem there is that in memory the arrays are of course laid out one after the other, which actually destroys cache locality if you need to access more than 1 property inside a loop (it will need to load back and forth to the different property arrays), so it's a somewhat dumb approach. But, at least it saves the overhead, so maybe not too bad. And in a high level interpreted language like php you likely weren't gonna get cache locality anyway.

The point is to group all properties you are going to be accessing in a hot loop together in a small-ish array.

C has structs for this, 0 overhead "entities" (although they may be padded to multiples of 4 bytes, so keep that in mind). You have compiler specific keywords to forego padding ("struct packing"), or maybe you're lucky and the data just fits exactly right. Either way, in such cases an array of structs is imo the most sane way to go.

In fact, C++ offers classes and structs. In my opinion, struct should be used for entities like "weapon" or "car". CLASSES (or objects) should be unix-philosophy adhering miniprograms that do one task and do it well (oh hey, it's the single responsibility principle!).

They way most programmers write OOP is a pretty convoluted way to model actual entities anyway. car.drive()? Oh? The car drives it self? No. agent.drive(car) should be the actual method. Agent, mind you, can be a driving AI, or a human driver, or whatever. Maybe the agent is a part of the car? In that case, use composition, not inheritance. (oh hey, entity component system!)

[+] fennecfoxen|4 years ago|reply

You might be looking for "Entity - Component - System" design, common in video games. Entities are still virtual-world objects like you might expect, but none of them would dare keep track of something like their position or temperature or whatever. Instead, they register a component with the appropriate system, which keeps all the data colocated for efficient physics and the like.

[+] megameter|4 years ago|reply

If we are speaking of C code, it's not quite so bad as it looks to have somewhat fat structs across multiple arrays, since you can fit 64 bytes in a cache line on contemporary desktop CPUs, and that sets your real max-unit-size; the CPU is actively trying to keep the line hot and it does so (in the average case) by speculating that you're going to fetch the next index of the array. Since you have multiple cache lines, you can keep multiple arrays hot at the same time, it's just a matter of keeping it easy to predict fetching behavior by using simple loops that don't jump around...which leads to the pattern parent suggests, of cascading messages or buffers in groups of same type so that you get a few big iterations out of the way, and then a much smaller number of indirected accesses.

[+] physicsguy|4 years ago|reply

I find in simulation codes that lack of awareness of (a) is an absolute performance killer. Generally, it's better to use a pattern for an object that's a container for something - so don't have a 'Particle' object but a 'Particles' one that keeps things stores the properties of particles contiguously. In my old magnetics research area you have at least 8 and more frequently 10+ spatially varying parameters in double precision that you'd potentially need to store per particle/cell.

[+] inopinatus|4 years ago|reply

Quite so. There’s a false equivalence in this article between data and encapsulated state, but if that were so then the flyweight pattern and its ilk couldn’t exist.

[+] corty|4 years ago|reply

Only in C++. Most other OOP languages do not allow controlling allocation that way.

Also, OOP only allows array-of-structs continuous data. Struct-of-arrays and hybrid forms are usually awkward or impossible. And with everything except maybe C++ and Rust, those "structs" in OOP-land do have quite an overhead compared to C structs.

[+] pulse7|4 years ago|reply

Overuse/misuse of inheritance has triggered hatred of OOP among many software developers...

[+] throwaway894345|4 years ago|reply

I’ll go out on a limb and posit that there are virtually no valid uses of (implementation) inheritance. Perhaps one valid use is getting rid of delegation boilerplate (e.g., normally you would compose one object inside another but you want the outer object methods to delegate to the inner object methods but you don’t want to have to write N function definitions that just call the same methods on the inner object so instead your outer object inherits from your inner object). This problem is better solved by something like Go’s struct embedding since it doesn’t do anything more than this kind of automatic delegation.

And if you get rid of inheritance, there is very little left to distinguish OOP from procedural programming like one would do in C or Go. And this is the semantic problem: no one really agrees on what OOP is and proponents will rebut any criticism with “that’s not true OOP”. Any definitions of OOP that aren’t easily assailable are also indistinguishable from other existing paradigms.

Downvoters: i’m very interested in your opinions about why I’m wrong and specifically when you think inheritance is appropriate. Everyone says “there’s a time and a place!” but no one articulates when/where beyond cat/dog/animal toy examples.

[+] brobdingnagians|4 years ago|reply

Yeah, I've loved classes and OOP for a long time, but lately I've been on a project that is going into more and more contortions and complexity to make everything fit a theoretical ideal. It's revived my interest in learning FP languages to avoid the arcane complexity that some people make out of OOP.

[+] AdieuToLogic|4 years ago|reply

> Overuse/misuse of inheritance has triggered hatred of OOP among many software developers...

"It is not the tools we use that make us good, but rather how we employ them."[0]

0 - https://en.wiktionary.org/wiki/a_bad_workman_always_blames_h...

[+] moth-fuzz|4 years ago|reply

Honestly I find ECS dogmatic and difficult. Data oriented design as a general practice is a good thing, but there’s layers to all things. OOP is a very broad category of practices and designs, and ECS usually refers to one specific architecture.

The specific architecture in question tends to be full of soft dependencies - most ECSes don’t allow you to simply store a piece of data without opting in to all systems matching the data type executing arbitrary code on it. So much for separation of data and behaviour. No, now they’re even More dependent than they are in the usual OOP sense and you might not even know it.

Furthermore, usually when you want to think about in-game entities, you want to look at a single class. Now all entities’ data is split into numerous components and all entities’ behaviour is split into numerous systems and you don’t know frame by frame how they’re gonna interact, provided you even know about all systems in the first place. It’s total spaghetti code.

I’m much more inclined lately towards a shallow actor system . If I want to know how player’s behaviour functions, I need only look at player.cpp and nothing else. Then, to reap the benefits of data oriented design, certain objects can use a certain allocation scheme that makes sense for the object in question. In the general sense, any Component<T> can just have a vector<T> or an unordered_map<T> of all components and the memory access is abstracted away without it being detrimental. That’s C++’s whole deal actually, zero cost abstractions.

I wouldn’t call an entire paradigm shift in which one rewrites everything from memory allocation all the way up to ‘use WASD to move’ zero-cost in any sense.

In C++ it is trivial to overload new, or derive from some class which does, or to write a custom allocator, and frankly there is zero need for the same person who’s writing ‘use WASD to move’ to know about memory management.

[+] setr|4 years ago|reply

The way I view it is that you look at components like a database -- a bunch of tables that don't really tell you much about the business logic; the main thing they do is offer data coherence.

And like a webserver, the business logic has been entirely moved out of the data storage -- you construct an "object" out of the raw parts from the database, and you operate on that. The main thing is that you can construct multiple objects from the same raw dataset -- different views (as in MVC).

I think however it's a mistake when you assume most game's design -- where largely there are a very small amount of entities, and a small amount of behaviors, and really the game design is about careful placement of these fairly rudimentary entities on the map. In that scenario, the flexibility of ECS does you no good -- the game design is itself inflexible, so making your logic flexible is largely a premature optimization. If there's only one reasonable "View" of the entity, being able to construct an infinite set of alternative views is pointless.

ECS appeals to me more when you start talking about simulation-style games, where the game is far less hard-coded. Dwarf Fortress is ridiculously flexible in its game design (at runtime), and ECS would be a natural fit for that (entities in DF are literally defined by tags, and groups of tags, and those tags get modified at runtime[0]). It's not spaghetti code then -- it's really the only reasonable way to approach the problem.

Defining each entity uniquely makes a lot of sense when your entities are largely unique (perhaps with a common base, e.g. for physics). ECS makes more sense when your entities share of lot of logic, but random subsets of it, and especially so when the game itself treats that random subset as dynamic.

[0] http://www.dfwk.ru/Creature_standard_1.txt

[+] lamontcg|4 years ago|reply

After studying ECS for all of a week, I was left wondering if there wasn't a way to reintroduce strong typing to an ECS system (without reintroducing all the problems of inheritance). So you have a player_entity factory that ensures that a player_entity only wrap entities that are actually players. Then you can pass that around to strongly typed functions, but keep the overall design reasonably inheritance-free so it was more like rust/go's strong typing systems.

[+] bob1029|4 years ago|reply

You can get some pretty unbelievable performance gains out of a single writer and arrays of structs.

Bonus points if you figure out a way to have an array per type of struct pre-allocated with more elements than you will ever need. Even if you use a GC language you can almost eliminate collections with this approach.

[+] PicassoCTs|4 years ago|reply

Even the array of structs is a non-ideal approach, as structs are usually viewed as a static collection of data.

But if you look at the hot loop, it usually boils down to a funnel - not unlike a furnace. Lots of highly spacious needing raw materials are gathered and passed through, to be condensed into relatively small output.

So the ideal structure is a sort of union-struct, that compresses the results down each step of the algo, keeping it all in cache, while keeping it slim..

[+] urthor|4 years ago|reply

People forget what the intent of OOP originally was.

OOP was envisioned as a way to manage software projects with many contributors at a time when we didn't have half the tools for hiding context that we do now.

Micro-services and micro-kernels are far far far more prevalent these days.

Garbage collection was also far less of a thing in that era, as all programmers were squeezing every last iota out of the hardware.

Hence rogue pointers were far more of a risk.

Multi-core? Haha.

I know this is not particularly relevant to the original article, but if you don't know the history and the intent behind why something exists, you are reasonably likely to misapply it.

Most of the mistakes of OOP are from a lack of understanding of why things got invented in the first place.

[+] AdieuToLogic|4 years ago|reply

> People forget what the intent of OOP originally was.

Not really.

> OOP was envisioned as a way to manage software projects with many contributors at a time when we didn't have half the tools for hiding context that we do now.

No, the purpose of "OOP" is specifically for "hiding context" by encapsulating implementation logic exposed via a collaboration contract.

> Micro-services and micro-kernels are far far far more prevalent these days.

Non-sequitur.

> Garbage collection was also far less of a thing in that era, as all programmers were squeezing every last iota out of the hardware.

This literally has nothing to do with a programming paradigm.

> Hence rogue pointers were far more of a risk. > Multi-core? Haha

Again, this literally has nothing to do with a programming paradigm.

> I know this is not particularly relevant to the original article, but if you don't know the history and the intent behind why something exists, you are reasonably likely to misapply it.

This is a wise statement, one which I hope you say aloud whilst reading this reply.

> Most of the mistakes of OOP are from a lack of understanding of why things got invented in the first place.

A programming paradigm is not the source of mistakes. Its practitioners certainly can be however.

[+] BoiledCabbage|4 years ago|reply

Also, the article does a really poor job of describing any drawbacks of Data Oriented Design. It's a real pet-peeve of mine.

> Drawbacks of Data-Oriented Design Data-oriented design is not the silver bullet to all the problems in game development.

Ok, they don't view it as a silver bullet. This seems promising for an evenhanded discussion. I'm curious what the author thinks the drawbacks are.

> The main problem with data-oriented design is that it’s different from what most programmers are used to or learned in school.

So the first drawback is that nobody knows your silver-bullet? That's a cop out.

> Also, because it’s a different approach, it can be challenging to interface with existing code, written in a more OOP or procedural way.

And the second drawback is that code was written without using your silver-bullet? Seriously?

If the only two things you believe are drawbacks about your tech are that not enough people know it, and not enough people are using it then it's not an even handed discussion of your tech.

Discuss the actual trade-offs you've learned from using it. Not nonsense like nobody knows how wonderful it is, nor is using it.

And that's coming from someone who agrees that OOP has huge flaws and with the most common applications of inheritance creates many flawed program architectures.

[+] beders|4 years ago|reply

The original intent had nothing to do with "many contributors".

The main ideas: encapsulation, message passing and late binding i.e. dynamic binding.

[+] Zababa|4 years ago|reply

> OOP was envisioned as a way to manage software projects with many contributors at a time when we didn't have half the tools for hiding context that we do now.

> Micro-services and micro-kernels are far far far more prevalent these days.

I think that's a good analysis. If OOP was a solution to an organization problem, then microservices are the "new" way to do it. Microservices respect late binding, message passing, encapsulation. I don't really know how inheritence would fit into the equation, as I don't know exactly how companies with hundreds of microservices do it. And since we don't care about what's inside the objects (services), we're now free to write them in Java, C++, Smalltalk, Erlang, Haskell or Pascal.

[+] davedx|4 years ago|reply

I was expecting to see some example code, or some actual performance metrics to show why data-oriented design is better.

I actually have written a game that was pure functional style with a single giant state object for the game data and it worked well for me. But I'd want to see some evidence for this approach before changing the entire architecture of my game.

[+] meheleventyone|4 years ago|reply

This graph made the rounds on Twitter last week and I think encapsulates the answer really well: https://twitter.com/eric81766/status/1407393532562841607

Most games won’t benefit. Most AAA games won’t benefit for their gameplay code and are otherwise very data-oriented already.

[+] lmilcin|4 years ago|reply

Sigh... This again.

I am going to repeat this for what seems like a ten thousandth time.

"OOP is a tool to solve a particular type of problem. It is your responsibility to know the tool, to understand its strengths and weaknesses and when it is applicable and when it is not. If the tool does not work it is not the tool that is faulty, it is you who are the problem -- by using the tool in a wrong situation or incorrectly."

In particular I detest "we are OOP shop" type of approach. This immediately advertises they have absolutely no idea how to use stuff -- by saying you know only one tool and sure you are going to use it to solve every kind of problem.

Those languages that were supposed to be "everything is an object" like Java? Now are learning that maybe that is not the most sound approach and trying to evolve to allow other paradigms under one roof.

[+] throwaway894345|4 years ago|reply

This is boilerplate that can be used to defend any idea. First of all, the article is trying to explain a domain for which OOP is not well-suited. Secondly, it’s unhelpful to write-off this article with “there are places for which OOP is well-suited” without any specifics about when it is the better approach, especially how it compares and contrasts with other approaches.

[+] radiospiel|4 years ago|reply

This article was "originally printed in the September 2009 issue of Game Developer." Any news since then?

[+] shadowgovt|4 years ago|reply

Something I really enjoy about golang was the design decision to make structured data stay separate from function, but then provide a simple mechanism for defining functions that worked in the context of structured data. It feels like a good split between the two desired uses for structured data... On the one hand, it's easy to encapsulate the data for common use case, and on the other hand I can generally trust that if I need to bit-bash a data structure (i.e. use only a piece of it, or introspect it, or serialize it), I can do that with a minimum of care as to ny object-like metadata it might be carrying.

[+] stephc_int13|4 years ago|reply

In my opinion, the most important aspect of data-oriented design is to always consider collections instead of so-called "objects".

It is then logical to optimize for access pattern instead of the processing of a single entity.

[+] cryptica|4 years ago|reply

Can someone please point me to an example of a well-designed (modular, maintainable) FP project (e.g. on GitHub)?

I've only had negative experiences so far and I can't imagine how to use FP in a modular way (high cohesion, loose coupling) so I'd like some examples to look at. A simple game would be nice.

[+] dang|4 years ago|reply

Discussed at the time:

Data-Oriented Design (Why You Might Be Shooting Yourself in The Foot With OOP) - https://news.ycombinator.com/item?id=1004569 - Dec 2009 (28 comments)

[+] 0xbkt|4 years ago|reply

What is “flat” codebase, if you don't mind? Is it opposite to the hadouken-style nested code?

[+] tabtab|4 years ago|reply

Somewhat related: it's time to toss out file trees as our primary module management technique:

https://news.ycombinator.com/item?id=25347043

We've outgrown file trees.

[+] dgb23|4 years ago|reply

One thing that is interesting about this, is that people sometimes end up building an (incomplete) implementation of relational algebra to achieve this, where any given system in the game logic pipeline might join over multiple components.

[+] FpUser|4 years ago|reply

I've been "shooting myself in the foot" for 40 years already to, thank you.

OOP, FP, Data Oriented, insert your favorite here are just a friggin' tools in the arsenal and all are fine when used appropriately. One does not negate the other.

You just do not use the approach used when writing firmware for low power microcontroller for writing Solidworks. And it is ok to mix and match if type of software to be written benefits.

There are no silver bullets in this world and one must learn what to use and when, Trying to convince people to stick to just one is a great disservice and looks more like a religious propaganda: if I do it this way the others should do the same disregarding.

Oh and btw OOP can be cache friendly as well. Nobody forces one to organize internal data representation in any particular way.

[+] talolard|4 years ago|reply

Isn’t the middle ground writing custom allocators ? Allocators allocate blocks continuously and developers keep developing with standard OOP?

[+] ratww|4 years ago|reply

How you allocate data is only half of the solution. The other half is also organising access patterns.

In games, for example, DOD requires you to break up a hypothetical Update method into multiple methods that get called at different times (first do all the collision for all objects, then do all movement for all objects, then all rendering, etc). If you skip this step, you get the neatly organised memory but the same random access as before.

Changing the data structure to the way presented in the article without changing the way you access it might even degrade your performance compared to what it was with normal OOP memory organisation.

[+] mamcx|4 years ago|reply

What I wish to know is how apply this in a normal CRUD-like scenario against a db.

It truly will help for, like, a shopping cart app?

[+] freddealmeida|4 years ago|reply

I always hated OOP. thank god for this.

[+] menotyou|4 years ago|reply

Me too. Lot's of superfluous concepts called "abstractions" to make simple things complex.

Interestingly all discussions about other paradigms on HN end very fast in "Oh, I can solve this somehow with [put in your favorite design pattern in here]" (Big deal, both paradigms are obviously turing complete). "Design Pattern" was for me always short for "Complex workaround for a problem you would not have if you wouldn't use object oriented paradigm".

But more generally, OOP is for developers whose mental model of the world is categorizing things into object hierarchies. For them it is the most intuitive approach to model the world.

For me this is as counter-intuitive as it can be. My mental model of the world just does not work like that.

359 comments