top | item 12358421

Arguments against JSON-driven development

257 points| afroisalreadyin | 9 years ago |okigiveup.net

294 comments

order
[+] wtbob|9 years ago|reply
> The fundamental advice on Unicode is decode and encode on system boundaries. That is, you should never be working on non-unicode strings within your business logic. The same should apply to JSON. Decode it into business logic objects on entry into system, rejecting invalid data. Instead of relying on key errors and membership lookups, leave the orthogonal business of type validity to object instantiation.

This right here is the correct approach. Serialisation formats should be serialisation formats, whether they be JSON, S-expressions, protobufs, XML, Thrift or what-have-you; application data should be application data. There are cases where it makes sense to operate on the serialised data directly, for performance or because it makes sense in context, but in the general case operate on typed application values.

[+] qwertyuiop924|9 years ago|reply
Part of the problem is that JSON and Sexprs aren't that they AREN'T serialization formats. They've been pressed into service as such, but they are actually notation for datastructures: In python, it may not be idiomatic to crawl dicts like this, but in JS, those aren't dicts, they're objects. If they've been de-serialized to some degree, they may even have their own methods.

By the same token, in Lisp, Sexprs aren't a serialization format. They're a notation for the linked cons cells that Lisp data is made of. In Lisp, that Sexpr will be crawled for data, or maybe even executed.

So while in Python, both may seem to be serialization formats, they aren't.

Either way, if the application programmer has any sense, they'll abstract away the format of their data. In a lisp app, you won't be cdring down a sexpr, you'll be calling a function to grab the necessary data for you, usually from a set of functions that abstract away the underlying sexpr implementation, and treat whatever it is as a separate datatype.

Of course, the sexpr might have been fed to an object constructor. Heck, it might be an object constructor, or a struct constructor. All of those types typically provide O(1) access, and autogenerated access functions, so it's the same story.

[+] Cthulhu_|9 years ago|reply
When using parsers like e.g. Jackson or Gson for Java, this process is completely transparent and does not require any active thought from the developer - well, maybe if there's very specific formats that don't map 1:1 with the class that should be instantiated or generated from the json object.

It's a bit more tricky in JS, both client-side and node. You can't work with the json string there, but after that you work directly with the json object. They're not OOP languages, really. I wouldn't want to work with too much untyped / unstructured json in back-end land myself to be fair.

[+] VikingCoder|9 years ago|reply
...unless your application is doing an in-place edit.

For instance, if your image compression application throws out my EXIF data that it doesn't understand, I'm going to be pissed. (Unless you give me an option to preserve it.)

[+] Goladus|9 years ago|reply
This right here is the correct approach. Serialisation formats should be serialisation formats, ... application data should be application data.

True, although the OP seems to be advocating having your app pretty much ignore serialization altogether in favor of object-oriented design. In particular the author objects to use of dictionaries and lists instead of objects.

It is true that if you're designing an application with a json api in mind, you're likely to stick with the data structures that are easiest to serialize.

Personally, I started writing programs that way before json became so common. I did it simply to take full advantage of the native data structures and to avoid prematurely confining myself into an object hierarchy that wasn't a good fit for the problem domain. It also winds up making code more generic and easier to rewrite in a different language if necessary (for example, moving server-side code to client javascript).

[+] sidlls|9 years ago|reply
I'm not entirely in agreement.

Use of object-oriented programming paradigms here would merely distribute the logic that is necessary to achieve the desired mapping over multiple points in the code.

The example function presented is only marginally too complicated. I'd split it in two: one to obtain the book list given the same arguments as the example function, and one taking the result as its only argument to build the mapping.

I find myself shying away from rigorous adherence to encapsulation more and more these days. I prefer small functions that operate on data explicitly.

Edit: and I'm a bit confused how the example has anything to do with "JSON-driven development", other than the coincidence that a hash/dictionary is the core data structure being manipulated here. This example function could exist and be (mostly) reasonable had JSON never existed. I'd expect to see an argument that the JSON serialization schemes that abound are problematic, given the title.

[+] milesvp|9 years ago|reply
This. I've been programming this way for over a decade. Long before JSON was a thing. I find I rarely need anythng more than a list or a dict for most of the data manipulation I do. Being on the web has only strengthened my tendency for this, since everything ends up being stringly typed anyways. Nearly every function/API I write is: get some data from somewhere (hopefully serialized), manipulate the data, return data (very possibly serialized). Nearly every time I've seen coworkers try to improve things with classes, it complicates the code, and often adds little encapsulation given how much we do is reliant on external data sources.

Every once in a while I think how nice it would be to be able to use typed data and smart setters to avoid much of the bounds checking I have to do, but I find there's never enough code between the boundaries of serialization to make it worth the added complexity that this introduces (also my problem domain involves mostly copy so most things are basically strings, ints, or datetimes anyways).

[+] mattnewton|9 years ago|reply
With the added benefit that you can reuse these functions for new data shapes more often than I expected when switching to this style.
[+] afroisalreadyin|9 years ago|reply
I think I failed to represent the scale in the example. That's a heavily modified function from a code base I'm working on, and the inventory, book, cell etc. dictionaries and lists containing them are all over the place, with similar looping logic (e.g. find item with given label) and combinations are all over the place. Adding the objects to the above function would of course complicate it in the sense that it would get longer, but it would improve the actual example considerably. I will try to come up with better sample code that represents my worries better.
[+] mikekchar|9 years ago|reply
I'm also confused about this example. I've often seen similar code in C using structs. What's really missing for me is some context about why he wants these data structures and what he's going to do with them. Essentially he's doing a join on 2 tables and you are left with the thought, "Why do you need to do that join?"

I think what he's really trying to get at is that he dislikes the style of programming espoused by one of the child posters: Make everything a dict/hash and write filters than manipulate those dicts/hashes. I think the reason he dislikes it is for exactly the reason I'm confused about his example: you can lose track of why you need the types in the first place.

One thing you often see in Javascript (and I presume Python, although I don't have much experience in that ecosystem) is the idea that types don't matter. You have an object (essentially a hash) and you can transform it any way you want. If it is slightly more convenient to access your data in a different way, then transform, transform, transform.

Now all your functions have different signatures: "No, in this function we use the store inventory, which is exactly the same as a book list, but grouped by store". And then you have 25 different functions all doing slightly different versions of the same thing to keep track of all the weird mutations of types along the way.

Again, this isn't new stuff. We've been writing crappy code like this for decades. One of the nice things about languages like C++ is that it's such a PITA to define arbitrary data structures that you avoid doing it, but you still see variations of that theme even there.

As for OO or not OO, I think it's a red herring. If I have functions: make_foo(bar, baz), print_foo(foo), manipulate_foo(foo), or if I have a class called Foo with a constructor(bar, baz) and 2 methods called print() and manipulate(), it's exactly the same thing. Even if you write the equivalent code functionally, mostly all you are doing is moving the context (bar and baz) out of the heap and putting it on the stack (yeah... I know... lack of mutability is a pretty important bit too ;-) ).

This is almost as long as the original rant, but I'll jam one more thing in. Serialization, I think, has little to do with the problem except that people don't know how to separate their concerns at layer boundaries. The main bad idea that perpetuates is that I should have the same data structure in my database as is in my business logic as is in my UI views as is in my UI wigits as is in my communication protocols. Back in my day, we even thought that it was a good idea to serialize entire objects (with executable code!) from one end to the other, so I guess it's getting slightly better ;-)

To sum up: you can't ignore types even when it is easy to morph types in your language. At your layer boundaries you also need to transform your data from one set of types to the other set of types (and you should never expect that a 1:1 mapping is automatically going to be a good idea). Within your layers you should never mutate your types and you should write functions with clear signatures. OO helps you do this. Non-mutating state is also a really good idea and functional helps you do this.

[+] Cthulhu_|9 years ago|reply
I disagree with the anemic object argument. If an object is just there to store data and no behaviour, then that's fine - don't add behaviour if it doesn't need it. A large portion of back-end services are CRUD and data wrangling operations anyway - as in, convert data format A to data format B (which I guess could be a constructor or factory method if you're comfortable with having the conversion logic in a data class).
[+] tantalor|9 years ago|reply
Especially true if your business objects are generated code, e.g., protocol buffers.

Combining business logic with business objects is a mistake. That's a textbook example of tight coupling.

[+] caconym_|9 years ago|reply
I agree, but the problem is that usually the representation of the data comes before the logic you need around it, which can accumulate over a period of months or years. Depending on the application, depending on the programmer(s), that logic can turn into a real mess since there's no obvious place for it to live. This reduces code reuse, which leads to bugs.

It's not always appropriate, but building some language-idiomatic encapsulation around data from the very start makes it much less likely that the inevitable addition of hundreds or thousands of lines of logic will descend into incomprehensible spaghetti hell. This doesn't have to be OOP; it could just as easily be e.g. a module in a purely functional language.

[+] afroisalreadyin|9 years ago|reply
Very good point. I would say that if your object is doing e.g. validation, or if/then/else'ing on field values to normalize them somehow, it's already far from anemic. But the key point is that you should not put data in objects, and then put the business logic, as in the small code sample, into some routine that simply accesses fields. That's the anti-pattern.
[+] clifanatic|9 years ago|reply
> If an object is just there to store data and no behavior

Then why do you have it at all?

[+] lmm|9 years ago|reply
The main reason this happens in Python is that creating actual datatypes is incredibly clunky (by Python standards) because of the tedious "def __init__(self, x): self.x = x". The solution here is to have a very lightweight syntax for more specific types, e.g. Scala's "case class".

I'd also argue for using thrift, protobuf or even WS-* to put a little more strong typing into what goes over the network. Such schemata won't catch everything (they have to have a lowest-common-denominator notion of type) but distributed bugs are the hardest bugs to track down; anything that helps you spot a bad network request earlier is well worth having.

[+] aeruder|9 years ago|reply
An article about the "attrs" library was posted here a couple weeks ago. Really highlighted the tedium of Python objects while offering a neat solution.

https://glyph.twistedmatrix.com/2016/08/attrs.html

Regarding protobuf, I'm a bit disappointed with the direction of version 3. Fields can no longer be marked as required - everything is optional; i.e. almost every protobuf needs to be wrapped with some sort of validator to ensure that necessary fields are present. I understand the arguments, but I did enjoy letting protobuf do the bulk of the work making sure fields were present.

[+] amyjess|9 years ago|reply
At one company I worked at, we used Avro to transfer data over the network. It's strongly typed with schemas, and it has both a compact binary form for transfer over the network and a text-based form for storage on disk that looks like JSON except field order matters (the schema and data are stored in separate files).
[+] afroisalreadyin|9 years ago|reply
aeruder already posted the awesome glyphobet post on attrs; I agree with everything in there. The Python object protocol is great, but difficult to use for small classes. If you are not doing some kind of schema validation on REST endpoints, you're doing it wrong, I would say. But JSONSchema is also really sucky; write more JSON to validate JSON is not my idea of simplicity. Will have to look at the alternatives at some point.
[+] catnaroek|9 years ago|reply
> The main reason this happens in Python is that creating actual datatypes is incredibly clunky

It's not clunky, it's outright impossible. Datatypes are inhabited by compound values (data constructors applied to arguments), but Python simply doesn't have compound values. All it has is object identities, which are primitive and indecomposable values no matter how compound the object is.

Sadly, the same is true in Scala.

[+] mhd|9 years ago|reply
This basically repeats the ORM arguments/counter-arguments, but now it's a slightly more complex data structure instead of the DB-row-as-hash/array you get there. "row-driven" in this context often leads to barely wrapped DAO Objects.

On the other hand, sometimes (surprisingly often) a hash is good enough and the effort spent in modeling the database (...) doesn't need to be replicated.

And as with ORMs/SQL generators/DAOs/etc., there's a whole spectrum of solutions and you really have to look at the task to see what's appropriate...

[+] mythz|9 years ago|reply
This isn't JSON-driven development, it's just choosing to apply logic over loose-typed data structures instead of named constructs. It's more awkward in Python because it doesn't have sugar syntax to index an object like JavaScript has.

But using clean built-in data structures instead of named types has its benefits especially if you need to serialize for persistence of communication as it doesn't require any additional knowledge of Types in order to access serialized data, so you can happily consume data structures in separate processes without the additional dependency of an external type system that's coupled and needs to be carried along with your data.

This is why Redux uses vanilla data structures in its store or why JSON has become popular for data interchange, any valid JSON can be converted into a JavaScript object with just `JSON.parse()` which saves a tonne of ceremony and manual effort then the old school way of having to extract data from data formats with poor programatic fit like an XML document into concrete types.

If your data objects don't need to be serialized or accessed outside of the process boundary than there's little benefit to using loose-typed data structures, in which case my preference would be using classes in a static type system to benefit from the static analysis feedback of using Types.

[+] Nullabillity|9 years ago|reply
> as it doesn't require any additional knowledge of Types in order to access serialized data

You still need to know the shape of the data you're working with, or you won't get anything useful done. So you can't skip defining types or a format, you're just skipping the tools that help you follow said format.

[+] codedokode|9 years ago|reply
Maybe they use untyped hashes and arrays just because there is no other data structures in JS?
[+] mcms|9 years ago|reply
Anemic objects and whether they are harmful or harmless has been debated in software engineering for long.

I find over-relying on encapsulation more harmful than useful nowadays specially if you are going to write scalable software that are inherently distributed. For example, hiding accessing a database behind a simple getter function makes another programmer ignore performance implication and other issues that may arise.

[+] qwertyuiop924|9 years ago|reply
Yes, but OTOH, it lessens the likelyhood of errors, and means you'll have to rewrite minimum amounts of code when you, say, switch from MySQL to Postgres.

Abstraction always lessens awareness of that which is abstracted. Decide where to draw the line for your app.

[+] Millennium|9 years ago|reply
It sounds to me like these arguments aren't so much against JSON, per se. They're against using JSON.parse() (or json.loads() in Python, json_decode() in PHP, or whatever) as your entire data-import process.

Instead, the argument goes, one should load the JSON, walk the redulting structure, and use it to build your native data structure/objects/whatever. Similarly, when the time comes to save, you crawl through your native structure to build a dict/array/primitive structure, then call JSON.stringify() (or the analogous function) to serialize that.

Uncoupling your data structure from the serialization format, though, is really just basic good software design anyway, is it not? Does anyone argue in favor what this article calls "JSON-driven development" as a design principle? Or is it just a shortcut that developers -and I am no less guilty of this than anyone else- sometimes take in the interest of getting a quick-and-dirty solution out the door?

Yes, working directly on the output of JSON.parse() is a code smell. But I'm not sure that claiming there's a rising trend of "JSON-driven development" is entirely founded. It's just people taking shortcuts.

[+] ramblenode|9 years ago|reply
This. "${PRACTICE}-driven development" suggests a practice that someone actively pursues because of perceived merit rather than a shortcut taken because of time/resource constraints.
[+] micimize|9 years ago|reply
While I see the point the Ulaş is getting at, I wouldn't call this JSON-driven development. I think JSON-driven development would use abstraction layers that are based on JSON, like JSON schema, and perhaps an OOP library that leverages it.

What I'd actually call this problem is a lack of abstraction. In functional programming, simple data structures are often preferred, and composable functions are used to manage complexity. A functional programmer might declare a function `to_structured_dict(enumerable, path)` and call it with `to_structured_dict(book_list, path=('shop_label', 'cell_label, 'book_id', count'))`

[+] falcolas|9 years ago|reply
If you're in Python, and are afraid of "anemic" objects, I would recommend checking out collections.namedtuple. It's a fantastic lightweight and performant object-like data structure.

You also get a few additional features, such as in-order iteration, the parameters are fixed at run time, and there's a method for turning it into an ordered dictionary (which is serializable in, wait for it, JSON).

[+] Singletoned|9 years ago|reply
> Once you go dict, you won't go back. This style of development is too easy, since dictionaries are baked into Python, and there are many facilities for working effectively with them.

How is this an argument against using dictionaries?

After 10 years of Python development, I do find myself using dictionaries rather than objects, in just the way that the author proscribes, but I'm finding it to be a genuine pleasure.

[+] Roboprog|9 years ago|reply
REPENT!!! :-)

Yeah, get shit done, YAGNI, and all that.

If needs wrapper, make one for repeated access types. If needs one or more bonafide objects for passing around, updating, and general good behavior, the make them as needed. Otherwise, there are 15 other little jobs that need coding, and we gotta move on.

[+] st3v3r|9 years ago|reply
And what happens when one of those keys changes?
[+] beat|9 years ago|reply
I think this article is somewhat off-base. The problem isn't JSON, it's lack of respect for separation of duties. JSON is just a data exchange format.

Want to program in an OO way using JSON? Easy. Just build a factory to generate objects from JSON input. Put your validation and error handling right there. Now you can get a known valid object from the JSON, a class instance with all the encapsulation and business logic your heart desires. Need to share it with the outside world? Provide a JSON output method.

Translating data formats is at the heart of day-to-day programming. It ain't rocket surgery. Fix the problem, not the blame.

(And if you think JSON sucks, believe me, you never dealt with data file formats from the pre-XML days!)

[+] wvenable|9 years ago|reply
The author isn't saying the problem is JSON.

The same programming style exists quite a bit in older PHP code as well. This is because one of it's primary data types is an list/hashtable hybrid. And JSON is similar -- it promotes those same structural types; the list and the hashtable (array and object, respectively). So programmers are using them, not just for building structures for data interchange, but for actual programming logic.

The fix for the problem is just education.

[+] nbevans|9 years ago|reply
There is so much wrong with this blog post that I don't even know where to begin. He appears to have included Python-specific details in his list of why he hates lists and dictionaries. Apparently Python throws exceptions if a key doesn't exist and seemingly has no Maybe/Option alternative? I don't know if that is true or not.

He claims using lists and dictionaries means you lose encapsulation - does it? A smarter programmer would realise that actually it entirely depends on the _types_ you are storing in those data structures.

[+] digisth|9 years ago|reply
The rule of thumb I've always used for when to use OO is "will there be more than one extant object at once or not?" If yes, and especially if these objects need real behavior, then use OO.

If you're essentially going through one object at a time, then discarding them, you're may just be doing conduit data processing, and so there's little advantage to using objects. I think what's missing in this (well-written) analysis is this distinction; if you're slurping data from one place, making a few changes (or especially if you're not making any), then sticking into a DB or vice versa, OO may be the wrong choice.

Ask yourself while writing the code: "are these active, behavior-driven objects that need encapsulation and relatively sophisticated behaviors, or is this just data I'm doing some relatively simple processing on?"

[+] corysama|9 years ago|reply
The author has lots of good points. Because I write most of my Python in the style he is advising against, I recognize that style has issues. The main issue for me is that a dict of dicts of dicts is not an interface. It doesn't have any constraints. It doesn't communicate expectations for use for the actual intent of the code. The best you can do is a comment explaining what to expect and a lot of error checking.

That said, almost all of the python I write these days is in the form of functional transforms on built-in data structures. And I love it!

There was a great Pycon2012 talk titled "Stop Writing Classes". You can find it linked and discussed here https://news.ycombinator.com/item?id=3717715

[+] agentgt|9 years ago|reply
On the one hand I agree with OP on directly interacting with JSON is not really a good idea but on the other hand I completely disagree with that behavior should be shoved into data objects. Also I think part of the problem is Python doesn't have much typing (I know they recently added optional typing in python but I don't think many use it).

As more of an FP guy I'm firm believer of the separation behavior and data. Clojure's Hickey sort of has a valid point... its freaking data... stop making it complicated to access it.

[+] thesmallestcat|9 years ago|reply
It's called a hash. Or a dict. Or a map. Not JavaScript Object Notation, FFS.
[+] spullara|9 years ago|reply
At this point everyone should be using an evolvable (thrift, protocol buffers, avro, etc) schema format when they are storing or transmitting their data if they want to run an always on service - there is no downtime for migrations in the real world. Trying to do this ad-hoc with JSON is a lost cause and will eventually lead you to failure at runtime or worse, data loss situations.
[+] crucini|9 years ago|reply
JSON isn't un-evolvable. In fact, thrift can serialize to JSON.

What makes thrift evolvable in practice is that we don't remove fields and don't add mandatory fields. The same discipline can be applied to JSON definitions.

Well thrift also tags all fields with integers, so a consumer with an older schema can parse a record with a newer schema, skipping the new fields. Of course JSON trivially has this property.

Maybe the key here is "ad-hoc"; something like JSON-schema is needed.

[+] mixmastamyk|9 years ago|reply
Anyone have a good blog post handy on this?
[+] jjzieve|9 years ago|reply
At least lists, dictionaries map relatively well to a tabular (SQL) format. Objects don't map well at all! Anyone who's spent enough time with "mature" ORMs knows this. Especially when there's a deadline and you have to write "native" SQL just to get whatever the hell you needed in the first place. "Well maybe you should have read everything and understood the ORM to its most minute detail..." NO! That's the whole point of abstraction! If I understood everything about that code, I'd be better off re-writing it to better suit MY specific problem. Look, I don't want to be another OO basher. OO definitely has a place in complex systems like game development, where the lives of the objects are longer than a page refresh. But in web dev, its becoming increasingly obvious to me that the OO paradigm is a huge time suck. /rant
[+] okreallywtf|9 years ago|reply
I feel like we have this discussion at work daily involving nhibernate. It is an abstraction that makes 80% of work quicker and easier, but what it makes easier and cleaner would have been trivial anyways.