I believe the easiest way to think about it is to get away from your programming tools and to start modeling your problem domain as tables in excel.
Once you have a relational schema that the business can look at and understand, then you go implement it with whatever tools and techniques you see fit.
This is what “data-oriented” programming means to me. It’s not some elegant code abstraction. It’s mostly just a process involving people and business.
Even for non serious business, these techniques can wrangle complexity that would otherwise be insurmountable.
I still think the central piece of magic is embracing a relational model. This allows for things like circular dependencies to be modeled exactly as they are in reality.
This confused me as when I hear data-orientated I think data structures that are optimised around minimising CPU cache misses by using better alignments, using enums where possible, not storing results of simple calculations etc. There is a popular book and popular talks on the subject. Probably confuses other people as well I'd imagine
I'm having a hard time thinking of a way code can ever be fully decoupled from data. When we decide it's better to have a name field rather than firstName and lastName does that mean we simplify NameCalculation.fullName to just return data.name? This seems to suggest we still have coupled code to data (the data structure being an object), it's just now a coupled function, but you have decoupled it enough to use NameCalculation in different contexts. Single responsibility classes are already recommended for reuse like this in OO.
Also when it comes to data validation, OO performs all of this validation too and in a much more compact and code-oriented, extensible way. Why would I write a separate schema when the object itself knows what it will accept, what is optional, and what range the values should be in? I'd imagine the schema and code could become incoherent.
Just imagine you're sending the data across a network, instead of between local functions. If you have a web service that spits out JSON, then you have data that is decoupled from code. That's not to say that the JSON data isn't then read and manipulated by code; just that no specific code is associated with the data.
As for why you'd want to do this, well, one reason is that it makes it easier to bounce data between different services. You don't need to perform any sort of conversion if you're operating directly on the data you're receiving and sending.
The second argument for this style is perhaps more ideological. In the Clojure community in particular, complexity is seen as arising from coupled components. The more things you can decouple, the less complex your codebase. The less complex your codebase, the more reliable and extensible it is.
Edit: another potential advantage is that its easier to use generic functions to interrogate and manipulate data that isn't encapsulated in specific types or objects.
Because the problem in OO is if you have any kind of cross-cutting concern, then it collapses totally.
For example, I have a Wizard object. My wizard has a wand, we store the wand object on our Wizard object. Simple. But then my wizard casts a spell, and does damage to a goblin. Do we put the cast method on the wand, the wizard? There is no real reason to pick one over the other (this problem comes up a lot with game development, which is why this pattern is more common there...Spring is another example, aspect-oriented programming/dependency injection works from similar principles). It is far easier to separate that out totally and have a pure, reusable cast function that takes the wizard, goblin, and weapon.
Another aspect of this problem (which Rust, as an example, makes clear) is that you introduce runtime bugs or hurt performance when you start carrying around a lot of references everywhere. Once you start to think about what actually needs a reference to another object (in Rust, this is limited by the borrow checker) then you realise why OOP doesn't work in some cases.
OO doesn't perform validation, your code performs validation on the data. You can write a separate schema, you can write one schema, but the problem is that OOP tries to fit a round peg in a square hole with some applications.
Very generally, it is harder to make mistakes if you use something like data-oriented. If you have a lot of code with calculations or interactions, it is very pure, easy to test, and fits well with how people think about those elements (one area I have found is financial applications, I actually worked this out and then found out data-oriented program existed when building financial-related stuff). In these cases, introducing OOP means state changing in unpredictable ways (and then someone comes into the project, doesn't understand the abstraction, calls a method that is named erroneously and it all goes wrong).
to use Rich Hickey’s definitions, data is an observation or measurement at a point in time. [:fullname “John Doe” t1] [:first “John” t1] no code needed to see the denormalization. Which came from the symbolic model that was chosen. the code only exists to translate between incompatible data models of a program’s inputs and outputs.
> Why would I write a separate schema when the object itself knows what it will accept, what is optional, and what range the values should be in?
Because data is just data and the meaning to it is given at the time of application. If you want to couple validation to the data itself - how do you decide which N of the meanings to validate against?
OO schemas are very strict and in many situations difficult to extend. For example, let's say you have a class named Name that contains firstName and lastName. Let's say that you have a function that consumes lists of Names. Let's say you have yet another class called OtherName that contains firstName and lastName. That class will not be compatible with the function. Usual OOP suggests you solve this via inheritance, but if you don't own Name or OtherName that won't help you. OOP's tools for polymorphism are very limited, especially if you don't own all the code you're trying to use (third party libraries). If the "schema" enforced by the type system didn't include the name of the object that opens up a lot of possibilities.
For validation, this approach would have you write a set of functions to validate the properties of the data.
Nothing forbids a function that applies validation to inputs before returning a data object? Extensibility can be done through functional means(e.g. higher order functions, function composition, lens) or oop(strategy pattern and equivalents, code object composition and inheritance,...).
Not sure what you mean by more compact and code-oriented?
Code is always coupled to an interface, implicitly or explicitly. In the case of oop, code is coupled to the class, which can represent something specific with very concrete semantics (e.g. employee, author) or something generic that is meant to be subclassed(e.g. person).
These examples spring to mind: 1) high performance computing (vector processing / SIMD), 2) deep neural nets, 3) graphics. Each of these computation models process a small number of large blocks of data whose efficient movement is just as important as their efficient number crunching. OOP doesn't serve those emphases as well as DOP does.
Thank you for all of the different viewpoints, it's starting to make sense now. I've used JSON schema before in one of my previous projects. I'll keep it in mind for next time.
Code is itself data, so a full decoupling is logically impossible.
Data is going to have an implicit schema regardless because that is just how data works. And once there is a schema, it may as well be expressed explicitly independently of the code because then you get the whole basket of standard schema operations for free (validate the data against a schema, provide a schema to an external consumer when moving data around, talking/operating more generally on schema to manipulate data, generating glue code or APIs programmatically).
Your description sounds like you are using your objects as schema references which is fine, but if there is a 1:1 correspondence with schema then you are already doing data oriented programming, and if there isn't then you can't have 3rd party libraries that support schema-based operations. And losing those schema-based operations hasn't gained anything because the data still has a schema, it just isn't well organised.
TLDR; Data oriented programming isn't essential. But if you plan on passing data around between systems schema should be mandatory, and if you pass data around within a system schema are recommended.
> Why would I write a separate schema when the object itself knows what it will accept, what is optional, and what range the values should be in?
In practice, I have seen a fair number of complex objects where that information is obscure. If there isn't an explicit schema there is a chance of bugs where the object doesn't understand the data it is ingesting and that it won't share its knowledge that there is a problem until it fails in some obscure way in runtime. It wastes a lot of time fixing those bugs because the easiest way to clean that up is to tease out an explicit schema & start thoroughly validating inputs.
If you have a hard time thinking this way then you really need to try other paradigms in coding because OOP is not the only way and it is getting less and less popular.
For example C, is not OOP. Linus Torvalds hates OOP, so linux is in written in C. Go was created by Robert Pike who's also subtly against it, and it shows in the language. Additionally Rust pretty much gets rid of objects as well. React is also moving away from class based representation of components.
These are just modern languages that are moving away from OOP. In addition to this... behind the modern languages there's a whole universe and history of other styles of programming.
Not against OOP... But I'm saying it's not a good sign if OOP is the only perspective you're capable of seeing.
Reading the discussion here, I can't help but thinking that people are defending their own philosophies: OOP vs FP vs DOP vs etc. I wish the author had killer applications or killer examples in different categories, like can I code an operating system easier, can I code a database easier, can I create a complex streaming job easier, can I write a library as complex as Apache BEAM easier, can I write a compiler easier, can I create a web framework easier, can I write a JSON parser easier, you get the idea. Or maybe examples that contrast existing solutions: how do I use DOP to write a better RxJava? how do I use DOP to write a better SqlLite? How do I use DOP to write a better graph library? How do I use DOP to write a better tensor library? how to do use DOP to write a better Time/Date library? You know, something that's so compelling and so obvious.
I have to agree here - there is total disconnect from context in these discussions.
I am writing business line applications - I don't have much need for "generic" functions like outlined in the article. My framework/language provides for example generic .Sum() I could use if I implement specific interface.
But usually I have to make specific sum and put it in database or in the interface.
Like I need to sum age or sum prices or sum amount of items in inventory - and I have to show these in the interface. I think it is quite BS to say there can be "generic" data structure and "generic" functions in context of business line application.
Other stuff I was doing was warehouse automation system and if I had X,Y,Z coordinates I had these in generic data structure named Coordinates - but any function that was going to do anything with coordinates had to be implemented in the context of machine. For example lift should never operate on X cooridinate I could calculate distances - but then there was never use case to calculate distance between machines because these had static access points and one would calculate distances to these access points only.
You'll have to define "OOP" first. Everyone thinks its defined, but even among OOP proponents there isn't consensus ("it's about message passing", "it's about encapsulation", "it's about inheritance", "it's about dot-method syntax", etc).
>> Take, for example, AuthorData, a class that represents an author entity made of three fields: firstName, lastName, and books. Suppose that you want to add a field called fullName with the full name of the author. If we fail to adhere to Principle #2, a new class AuthorDataWithFullName must be defined
Wait, what? Just add another field/property to the existing class. It's a silly example anyway as normally you would just add a function to concatenate the two strings.
The stated advantage is to be able to add the new property "on the fly". I suppose this means without changing the code. It does beg the question "what can existing code possibly do with this" (other than display it in a generic way or count the number of fields)? Furthermore, adding something new is rarely much of a problem as it is a non-breaking change. A more difficult example would be removing the "firstName" field. Assessing the impact of such a change in a large code base would be extremely difficult. Get good a grep and hope that the test suite is comprehensive.
Truly misguided article as most of the things by the author, unfortunately.
I was so excited about the book "Data-oriented programming" when it was first being released...it was so heavily publicized as well that it was constantly in my face which likely pushed me over the edge to give it a shot and buy it.
Unfortunately, not all that glitters is gold. It feels extremely beginner oriented, only touched basic concepts taught at uni level and it shows a huge disconnect between the theory and the real world work of a developer leveraging data in any way, shape or form.
Awful stuff. These principles could only ever make sense in a dynamic language since it's mostly manually enforcing some of the basic functionality of a type system, but the fact that he also tries to argue this style could be used in a language like C# throws that defense out the window.
It's not impossible to type check heterogeneous maps at compile time, but most static type systems don't support this. I think you'd certainly see much more friction trying to program like this in C# than you would in Clojure.
Kinda funny to say seeing as C# actually has a dynamic type.
Even if you don't use that, you could certainly orient your data as "structs of arrays instead" of "arrays of structs" (so to speak). It's fairly common in games.
There's something called "Data-Oriented Programming" and something else called "Data-Oriented Design". I can never remember which is which. This post changes nothing.
I think the abbreviated forms of the paradigm are often more "stable" than the full names, because people keep hand-waving the names and thus mixing them up. For me, "DOD" is the thing where you are very performance-oriented and you have flat, cache friendly arrays of data and affinity to ECS (entity-component-system) stuff etc. This is clearly not it, but eyeballing it, the ideas seem somewhat compatible with it. (Except the immutability part.)
One is ridiculous and uses immutable data. The other is made famous by a guy in a Hawaiin shirt and preforms well. Designer on vacation might help you to remember which is which
Could anything be more confusing with a large code base? Also, lots of nice key not found and invalid cast exception errors to debug with this approach. Sometimes boxing makes a material difference to performance as well.
I think it very much depends on what problems you're trying to solve and whether or not proper data types have been defined.
If your primary data structure is
Map<Integer, List<String>>
we have a huge problem.
On the other hand, if your primary data structure is
Map<CustomerId, List<Purchase>>
Then I'd rather see that than IPurchaseMappingByCustomerIdAbstractFactory or whatever other abomination OO priests will conjure. Generally speaking, generic structures are simpler and they allow for easier transformations.
I am not entirely sure that most data-oriented programs do go this way. You can split functionality out of the data but the data can still be represented with an object or whatever. I would agree though, you might as well be using Python at that point.
An example of what I mean is Spring. Obviously, from what I recall, that goes to other extreme with lots of XML configuration. But there is no need for vague types that can cause all sorts of mischief at runtime. The key idea is splitting code from data, not necessarily the representation of the data (although that can come into it if you have performance-sensitive apps).
Two concepts that seem entirely missing are "ownership" and "transformation". The root of ownership is "system of record" with intermediate systems combining data and doing transformation. "Getting data" becomes a question of where, in a differentiating tree that has it's root in the SoR, you connect to, and the trade-offs that implies.
This post (and the book it points to) is perhaps teaching a new generation what has been known for a long time: the "body" of your business is the data, not the code. E.g. if you have limited space on a thumbdrive and can only keep one thing in a datacenter fire, your database or your codebase, you keep the database.
Rich Hickey's talks "Simple Made Easy" [1] and "Effective Programs" [2] provide a better explanation of these ideas IMO. The specific definition of "simple" is pretty crucial.
I thought this is talking about data-oriented design, which focuses on the data layout to make programs more efficient, e.g. structure of array that can be more cache friendly in some cases.
> Principle #2: Representing data with generic data structures.
Nothing about keeping values in functions is ‘non-functional’. That’s like saying that hard coding the quadratic formula inside some function instead of using a lambda as an input is ‘non-functional’. His language in that statement is poorly chosen. I am almost certain he is not implying any action at a distance ‘state’, he is trying to talk about including context for some data inside functions that operate on the data. It would be like hardcoding an ISBN-to-title list inside a function that takes a list of authors and there books as input for processing. I think he’s saying the ISBN-to-title should be a part of the data structure, and storing it inside the functions breaks these rules he has invinted.
"Anything"-oriented programming is dumb. Every big problem is a collection of smaller, different problems. Different problems call for different approaches. Sometimes the best approach has data-oriented features, sometimes object, sometimes functional, sometimes piped. For big problems you want a language good at all of them.
It is why C++ continues to grow. Some complain about that, but every single feature got there over fierce opposition by making some common programming problem more tractable.
I wonder if Alan Kay would agree or argue with this sentiment, but it seems to me that the choice of 'oriented' was intentional and that we've consistently fucked it up ever since then.
Orientation should have been 'a preference for' not 'a dogmatic adherence to'. A hot-dog based diet still contains bread, ketchup, mustard and pickles, possibly some sort of cheese. A hot dog diet is just hot dogs, which is much, much less interesting, and there is no question that it is unhealthy, whereas the former might have some plausible deniability (especially if you add beans).
Its a good thing that these paradigm exist so you know their value, and eventually understand when they're appropriate.
You're right that there may not be a one-size-fits-all.
bob1029|3 years ago
Once you have a relational schema that the business can look at and understand, then you go implement it with whatever tools and techniques you see fit.
This is what “data-oriented” programming means to me. It’s not some elegant code abstraction. It’s mostly just a process involving people and business.
Even for non serious business, these techniques can wrangle complexity that would otherwise be insurmountable.
I still think the central piece of magic is embracing a relational model. This allows for things like circular dependencies to be modeled exactly as they are in reality.
jackosdev|3 years ago
alphanumeric0|3 years ago
Also when it comes to data validation, OO performs all of this validation too and in a much more compact and code-oriented, extensible way. Why would I write a separate schema when the object itself knows what it will accept, what is optional, and what range the values should be in? I'd imagine the schema and code could become incoherent.
weavejester|3 years ago
As for why you'd want to do this, well, one reason is that it makes it easier to bounce data between different services. You don't need to perform any sort of conversion if you're operating directly on the data you're receiving and sending.
The second argument for this style is perhaps more ideological. In the Clojure community in particular, complexity is seen as arising from coupled components. The more things you can decouple, the less complex your codebase. The less complex your codebase, the more reliable and extensible it is.
Edit: another potential advantage is that its easier to use generic functions to interrogate and manipulate data that isn't encapsulated in specific types or objects.
skippyboxedhero|3 years ago
For example, I have a Wizard object. My wizard has a wand, we store the wand object on our Wizard object. Simple. But then my wizard casts a spell, and does damage to a goblin. Do we put the cast method on the wand, the wizard? There is no real reason to pick one over the other (this problem comes up a lot with game development, which is why this pattern is more common there...Spring is another example, aspect-oriented programming/dependency injection works from similar principles). It is far easier to separate that out totally and have a pure, reusable cast function that takes the wizard, goblin, and weapon.
Another aspect of this problem (which Rust, as an example, makes clear) is that you introduce runtime bugs or hurt performance when you start carrying around a lot of references everywhere. Once you start to think about what actually needs a reference to another object (in Rust, this is limited by the borrow checker) then you realise why OOP doesn't work in some cases.
OO doesn't perform validation, your code performs validation on the data. You can write a separate schema, you can write one schema, but the problem is that OOP tries to fit a round peg in a square hole with some applications.
Very generally, it is harder to make mistakes if you use something like data-oriented. If you have a lot of code with calculations or interactions, it is very pure, easy to test, and fits well with how people think about those elements (one area I have found is financial applications, I actually worked this out and then found out data-oriented program existed when building financial-related stuff). In these cases, introducing OOP means state changing in unpredictable ways (and then someone comes into the project, doesn't understand the abstraction, calls a method that is named erroneously and it all goes wrong).
dustingetz|3 years ago
snidane|3 years ago
Because data is just data and the meaning to it is given at the time of application. If you want to couple validation to the data itself - how do you decide which N of the meanings to validate against?
oivey|3 years ago
ajuc|3 years ago
As for why see for example Command Query Separation (the data oriented way) vs Tell Don't Ask (the encapsulate everything way).
drpyser22|3 years ago
Nothing forbids a function that applies validation to inputs before returning a data object? Extensibility can be done through functional means(e.g. higher order functions, function composition, lens) or oop(strategy pattern and equivalents, code object composition and inheritance,...). Not sure what you mean by more compact and code-oriented?
Code is always coupled to an interface, implicitly or explicitly. In the case of oop, code is coupled to the class, which can represent something specific with very concrete semantics (e.g. employee, author) or something generic that is meant to be subclassed(e.g. person).
randcraw|3 years ago
alphanumeric0|3 years ago
roenxi|3 years ago
Data is going to have an implicit schema regardless because that is just how data works. And once there is a schema, it may as well be expressed explicitly independently of the code because then you get the whole basket of standard schema operations for free (validate the data against a schema, provide a schema to an external consumer when moving data around, talking/operating more generally on schema to manipulate data, generating glue code or APIs programmatically).
Your description sounds like you are using your objects as schema references which is fine, but if there is a 1:1 correspondence with schema then you are already doing data oriented programming, and if there isn't then you can't have 3rd party libraries that support schema-based operations. And losing those schema-based operations hasn't gained anything because the data still has a schema, it just isn't well organised.
TLDR; Data oriented programming isn't essential. But if you plan on passing data around between systems schema should be mandatory, and if you pass data around within a system schema are recommended.
> Why would I write a separate schema when the object itself knows what it will accept, what is optional, and what range the values should be in?
In practice, I have seen a fair number of complex objects where that information is obscure. If there isn't an explicit schema there is a chance of bugs where the object doesn't understand the data it is ingesting and that it won't share its knowledge that there is a problem until it fails in some obscure way in runtime. It wastes a lot of time fixing those bugs because the easiest way to clean that up is to tease out an explicit schema & start thoroughly validating inputs.
deltaonefour|3 years ago
For example C, is not OOP. Linus Torvalds hates OOP, so linux is in written in C. Go was created by Robert Pike who's also subtly against it, and it shows in the language. Additionally Rust pretty much gets rid of objects as well. React is also moving away from class based representation of components.
These are just modern languages that are moving away from OOP. In addition to this... behind the modern languages there's a whole universe and history of other styles of programming.
Not against OOP... But I'm saying it's not a good sign if OOP is the only perspective you're capable of seeing.
g9yuayon|3 years ago
ozim|3 years ago
I am writing business line applications - I don't have much need for "generic" functions like outlined in the article. My framework/language provides for example generic .Sum() I could use if I implement specific interface.
But usually I have to make specific sum and put it in database or in the interface.
Like I need to sum age or sum prices or sum amount of items in inventory - and I have to show these in the interface. I think it is quite BS to say there can be "generic" data structure and "generic" functions in context of business line application.
Other stuff I was doing was warehouse automation system and if I had X,Y,Z coordinates I had these in generic data structure named Coordinates - but any function that was going to do anything with coordinates had to be implemented in the context of machine. For example lift should never operate on X cooridinate I could calculate distances - but then there was never use case to calculate distance between machines because these had static access points and one would calculate distances to these access points only.
throwaway894345|3 years ago
viebel|3 years ago
osigurdson|3 years ago
Wait, what? Just add another field/property to the existing class. It's a silly example anyway as normally you would just add a function to concatenate the two strings.
The stated advantage is to be able to add the new property "on the fly". I suppose this means without changing the code. It does beg the question "what can existing code possibly do with this" (other than display it in a generic way or count the number of fields)? Furthermore, adding something new is rarely much of a problem as it is a non-breaking change. A more difficult example would be removing the "firstName" field. Assessing the impact of such a change in a large code base would be extremely difficult. Get good a grep and hope that the test suite is comprehensive.
brunooliv|3 years ago
Unfortunately, not all that glitters is gold. It feels extremely beginner oriented, only touched basic concepts taught at uni level and it shows a huge disconnect between the theory and the real world work of a developer leveraging data in any way, shape or form.
Don't buy the book, it's so not worth it.
blain_the_train|3 years ago
bo0O0od|3 years ago
https://blog.klipse.tech/databook/2022/06/22/generic-data-st...
The examples also contradict his other principles, i.e. immutability.
weavejester|3 years ago
unknown|3 years ago
[deleted]
jayd16|3 years ago
Even if you don't use that, you could certainly orient your data as "structs of arrays instead" of "arrays of structs" (so to speak). It's fairly common in games.
agrafix|3 years ago
[0] https://en.wikipedia.org/wiki/Structural_type_system
mtVessel|3 years ago
GolDDranks|3 years ago
ArrayBoundCheck|3 years ago
osigurdson|3 years ago
Could anything be more confusing with a large code base? Also, lots of nice key not found and invalid cast exception errors to debug with this approach. Sometimes boxing makes a material difference to performance as well.
readthenotes1|3 years ago
qsort|3 years ago
If your primary data structure is
we have a huge problem.On the other hand, if your primary data structure is
Then I'd rather see that than IPurchaseMappingByCustomerIdAbstractFactory or whatever other abomination OO priests will conjure. Generally speaking, generic structures are simpler and they allow for easier transformations.skippyboxedhero|3 years ago
An example of what I mean is Spring. Obviously, from what I recall, that goes to other extreme with lots of XML configuration. But there is no need for vague types that can cause all sorts of mischief at runtime. The key idea is splitting code from data, not necessarily the representation of the data (although that can come into it if you have performance-sensitive apps).
javajosh|3 years ago
This post (and the book it points to) is perhaps teaching a new generation what has been known for a long time: the "body" of your business is the data, not the code. E.g. if you have limited space on a thumbdrive and can only keep one thing in a datacenter fire, your database or your codebase, you keep the database.
frogulis|3 years ago
[1] https://youtu.be/LKtk3HCgTa8 [2] https://youtu.be/2V1FtfBDsLU
xixixao|3 years ago
This is not OOP, this is a way to do functional programming in a class-based language that lacks top-level function declarations / modules.
While this might seem a nit pick it makes me sceptical about the rest of the content.
osigurdson|3 years ago
revskill|3 years ago
One code example is worth 1000 images, and 1 image is worth 1000 words.
Always use code block to illustrate your point, as it help reader understand better your point.
Writing a book is more about getting reader into your thought rather than make them think.
pca006132|3 years ago
> Principle #2: Representing data with generic data structures.
OK probably this is not what I expected.
andreareina|3 years ago
If that's happening a lot that's not really FP anymore, is it?
throwaway17_17|3 years ago
jokoon|3 years ago
But when you use a framework that enforces OOP, it's quickly difficult to use DOP.
crummy|3 years ago
eurasiantiger|3 years ago
forgotusername6|3 years ago
ncmncm|3 years ago
It is why C++ continues to grow. Some complain about that, but every single feature got there over fierce opposition by making some common programming problem more tractable.
hinkley|3 years ago
Orientation should have been 'a preference for' not 'a dogmatic adherence to'. A hot-dog based diet still contains bread, ketchup, mustard and pickles, possibly some sort of cheese. A hot dog diet is just hot dogs, which is much, much less interesting, and there is no question that it is unhealthy, whereas the former might have some plausible deniability (especially if you add beans).
drpyser22|3 years ago
irrational|3 years ago
viktorcode|3 years ago
banq|3 years ago