Arguments against JSON-driven development

[+] wtbob|9 years ago|reply

> The fundamental advice on Unicode is decode and encode on system boundaries. That is, you should never be working on non-unicode strings within your business logic. The same should apply to JSON. Decode it into business logic objects on entry into system, rejecting invalid data. Instead of relying on key errors and membership lookups, leave the orthogonal business of type validity to object instantiation.

This right here is the correct approach. Serialisation formats should be serialisation formats, whether they be JSON, S-expressions, protobufs, XML, Thrift or what-have-you; application data should be application data. There are cases where it makes sense to operate on the serialised data directly, for performance or because it makes sense in context, but in the general case operate on typed application values.

[+] qwertyuiop924|9 years ago|reply

Part of the problem is that JSON and Sexprs aren't that they AREN'T serialization formats. They've been pressed into service as such, but they are actually notation for datastructures: In python, it may not be idiomatic to crawl dicts like this, but in JS, those aren't dicts, they're objects. If they've been de-serialized to some degree, they may even have their own methods.

By the same token, in Lisp, Sexprs aren't a serialization format. They're a notation for the linked cons cells that Lisp data is made of. In Lisp, that Sexpr will be crawled for data, or maybe even executed.

So while in Python, both may seem to be serialization formats, they aren't.

Either way, if the application programmer has any sense, they'll abstract away the format of their data. In a lisp app, you won't be cdring down a sexpr, you'll be calling a function to grab the necessary data for you, usually from a set of functions that abstract away the underlying sexpr implementation, and treat whatever it is as a separate datatype.

Of course, the sexpr might have been fed to an object constructor. Heck, it might be an object constructor, or a struct constructor. All of those types typically provide O(1) access, and autogenerated access functions, so it's the same story.

[+] Cthulhu_|9 years ago|reply

When using parsers like e.g. Jackson or Gson for Java, this process is completely transparent and does not require any active thought from the developer - well, maybe if there's very specific formats that don't map 1:1 with the class that should be instantiated or generated from the json object.

It's a bit more tricky in JS, both client-side and node. You can't work with the json string there, but after that you work directly with the json object. They're not OOP languages, really. I wouldn't want to work with too much untyped / unstructured json in back-end land myself to be fair.

[+] VikingCoder|9 years ago|reply

...unless your application is doing an in-place edit.

For instance, if your image compression application throws out my EXIF data that it doesn't understand, I'm going to be pissed. (Unless you give me an option to preserve it.)

[+] Goladus|9 years ago|reply

This right here is the correct approach. Serialisation formats should be serialisation formats, ... application data should be application data.

True, although the OP seems to be advocating having your app pretty much ignore serialization altogether in favor of object-oriented design. In particular the author objects to use of dictionaries and lists instead of objects.

It is true that if you're designing an application with a json api in mind, you're likely to stick with the data structures that are easiest to serialize.

Personally, I started writing programs that way before json became so common. I did it simply to take full advantage of the native data structures and to avoid prematurely confining myself into an object hierarchy that wasn't a good fit for the problem domain. It also winds up making code more generic and easier to rewrite in a different language if necessary (for example, moving server-side code to client javascript).

[+] spc476|9 years ago|reply

That's the approach I took with my DNS library (https://github.com/spc476/SPCDNS)---extract the DNS packet into a C structure that's easier to deal with (for instance, the A RR structure: https://github.com/spc476/SPCDNS/blob/master/src/dns.h#L270).

[+] sidlls|9 years ago|reply

I'm not entirely in agreement.

Use of object-oriented programming paradigms here would merely distribute the logic that is necessary to achieve the desired mapping over multiple points in the code.

The example function presented is only marginally too complicated. I'd split it in two: one to obtain the book list given the same arguments as the example function, and one taking the result as its only argument to build the mapping.

I find myself shying away from rigorous adherence to encapsulation more and more these days. I prefer small functions that operate on data explicitly.

Edit: and I'm a bit confused how the example has anything to do with "JSON-driven development", other than the coincidence that a hash/dictionary is the core data structure being manipulated here. This example function could exist and be (mostly) reasonable had JSON never existed. I'd expect to see an argument that the JSON serialization schemes that abound are problematic, given the title.

[+] milesvp|9 years ago|reply

This. I've been programming this way for over a decade. Long before JSON was a thing. I find I rarely need anythng more than a list or a dict for most of the data manipulation I do. Being on the web has only strengthened my tendency for this, since everything ends up being stringly typed anyways. Nearly every function/API I write is: get some data from somewhere (hopefully serialized), manipulate the data, return data (very possibly serialized). Nearly every time I've seen coworkers try to improve things with classes, it complicates the code, and often adds little encapsulation given how much we do is reliant on external data sources.

Every once in a while I think how nice it would be to be able to use typed data and smart setters to avoid much of the bounds checking I have to do, but I find there's never enough code between the boundaries of serialization to make it worth the added complexity that this introduces (also my problem domain involves mostly copy so most things are basically strings, ints, or datetimes anyways).

[+] mattnewton|9 years ago|reply

With the added benefit that you can reuse these functions for new data shapes more often than I expected when switching to this style.

[+] afroisalreadyin|9 years ago|reply

I think I failed to represent the scale in the example. That's a heavily modified function from a code base I'm working on, and the inventory, book, cell etc. dictionaries and lists containing them are all over the place, with similar looping logic (e.g. find item with given label) and combinations are all over the place. Adding the objects to the above function would of course complicate it in the sense that it would get longer, but it would improve the actual example considerably. I will try to come up with better sample code that represents my worries better.

[+] mikekchar|9 years ago|reply

I'm also confused about this example. I've often seen similar code in C using structs. What's really missing for me is some context about why he wants these data structures and what he's going to do with them. Essentially he's doing a join on 2 tables and you are left with the thought, "Why do you need to do that join?"

I think what he's really trying to get at is that he dislikes the style of programming espoused by one of the child posters: Make everything a dict/hash and write filters than manipulate those dicts/hashes. I think the reason he dislikes it is for exactly the reason I'm confused about his example: you can lose track of why you need the types in the first place.

One thing you often see in Javascript (and I presume Python, although I don't have much experience in that ecosystem) is the idea that types don't matter. You have an object (essentially a hash) and you can transform it any way you want. If it is slightly more convenient to access your data in a different way, then transform, transform, transform.

Now all your functions have different signatures: "No, in this function we use the store inventory, which is exactly the same as a book list, but grouped by store". And then you have 25 different functions all doing slightly different versions of the same thing to keep track of all the weird mutations of types along the way.

Again, this isn't new stuff. We've been writing crappy code like this for decades. One of the nice things about languages like C++ is that it's such a PITA to define arbitrary data structures that you avoid doing it, but you still see variations of that theme even there.

As for OO or not OO, I think it's a red herring. If I have functions: make_foo(bar, baz), print_foo(foo), manipulate_foo(foo), or if I have a class called Foo with a constructor(bar, baz) and 2 methods called print() and manipulate(), it's exactly the same thing. Even if you write the equivalent code functionally, mostly all you are doing is moving the context (bar and baz) out of the heap and putting it on the stack (yeah... I know... lack of mutability is a pretty important bit too ;-) ).

This is almost as long as the original rant, but I'll jam one more thing in. Serialization, I think, has little to do with the problem except that people don't know how to separate their concerns at layer boundaries. The main bad idea that perpetuates is that I should have the same data structure in my database as is in my business logic as is in my UI views as is in my UI wigits as is in my communication protocols. Back in my day, we even thought that it was a good idea to serialize entire objects (with executable code!) from one end to the other, so I guess it's getting slightly better ;-)

To sum up: you can't ignore types even when it is easy to morph types in your language. At your layer boundaries you also need to transform your data from one set of types to the other set of types (and you should never expect that a 1:1 mapping is automatically going to be a good idea). Within your layers you should never mutate your types and you should write functions with clear signatures. OO helps you do this. Non-mutating state is also a really good idea and functional helps you do this.

[+] Cthulhu_|9 years ago|reply

I disagree with the anemic object argument. If an object is just there to store data and no behaviour, then that's fine - don't add behaviour if it doesn't need it. A large portion of back-end services are CRUD and data wrangling operations anyway - as in, convert data format A to data format B (which I guess could be a constructor or factory method if you're comfortable with having the conversion logic in a data class).

[+] tantalor|9 years ago|reply

Especially true if your business objects are generated code, e.g., protocol buffers.

Combining business logic with business objects is a mistake. That's a textbook example of tight coupling.

[+] caconym_|9 years ago|reply

I agree, but the problem is that usually the representation of the data comes before the logic you need around it, which can accumulate over a period of months or years. Depending on the application, depending on the programmer(s), that logic can turn into a real mess since there's no obvious place for it to live. This reduces code reuse, which leads to bugs.

It's not always appropriate, but building some language-idiomatic encapsulation around data from the very start makes it much less likely that the inevitable addition of hundreds or thousands of lines of logic will descend into incomprehensible spaghetti hell. This doesn't have to be OOP; it could just as easily be e.g. a module in a purely functional language.

[+] afroisalreadyin|9 years ago|reply

Very good point. I would say that if your object is doing e.g. validation, or if/then/else'ing on field values to normalize them somehow, it's already far from anemic. But the key point is that you should not put data in objects, and then put the business logic, as in the small code sample, into some routine that simply accesses fields. That's the anti-pattern.

[+] clifanatic|9 years ago|reply

> If an object is just there to store data and no behavior

Then why do you have it at all?

[+] lmm|9 years ago|reply

The main reason this happens in Python is that creating actual datatypes is incredibly clunky (by Python standards) because of the tedious "def __init__(self, x): self.x = x". The solution here is to have a very lightweight syntax for more specific types, e.g. Scala's "case class".

I'd also argue for using thrift, protobuf or even WS-* to put a little more strong typing into what goes over the network. Such schemata won't catch everything (they have to have a lowest-common-denominator notion of type) but distributed bugs are the hardest bugs to track down; anything that helps you spot a bad network request earlier is well worth having.

[+] aeruder|9 years ago|reply

An article about the "attrs" library was posted here a couple weeks ago. Really highlighted the tedium of Python objects while offering a neat solution.

https://glyph.twistedmatrix.com/2016/08/attrs.html

Regarding protobuf, I'm a bit disappointed with the direction of version 3. Fields can no longer be marked as required - everything is optional; i.e. almost every protobuf needs to be wrapped with some sort of validator to ensure that necessary fields are present. I understand the arguments, but I did enjoy letting protobuf do the bulk of the work making sure fields were present.

[+] tantalor|9 years ago|reply

Named tuples assign meaning to each position in a tuple and allow for more readable, self-documenting code. They can be used wherever regular tuples are used, and they add the ability to access fields by name instead of position index.

https://docs.python.org/2/library/collections.html#collectio...

[+] joshmarlow|9 years ago|reply

I've not yet made use of it, but using the `type` keyword to create new classes quickly looks promising.

https://docs.python.org/3.5/library/functions.html#type

[+] amyjess|9 years ago|reply

At one company I worked at, we used Avro to transfer data over the network. It's strongly typed with schemas, and it has both a compact binary form for transfer over the network and a text-based form for storage on disk that looks like JSON except field order matters (the schema and data are stored in separate files).

[+] afroisalreadyin|9 years ago|reply

aeruder already posted the awesome glyphobet post on attrs; I agree with everything in there. The Python object protocol is great, but difficult to use for small classes. If you are not doing some kind of schema validation on REST endpoints, you're doing it wrong, I would say. But JSONSchema is also really sucky; write more JSON to validate JSON is not my idea of simplicity. Will have to look at the alternatives at some point.

[+] catnaroek|9 years ago|reply

> The main reason this happens in Python is that creating actual datatypes is incredibly clunky

It's not clunky, it's outright impossible. Datatypes are inhabited by compound values (data constructors applied to arguments), but Python simply doesn't have compound values. All it has is object identities, which are primitive and indecomposable values no matter how compound the object is.

Sadly, the same is true in Scala.

[+] mhd|9 years ago|reply

This basically repeats the ORM arguments/counter-arguments, but now it's a slightly more complex data structure instead of the DB-row-as-hash/array you get there. "row-driven" in this context often leads to barely wrapped DAO Objects.

On the other hand, sometimes (surprisingly often) a hash is good enough and the effort spent in modeling the database (...) doesn't need to be replicated.

And as with ORMs/SQL generators/DAOs/etc., there's a whole spectrum of solutions and you really have to look at the task to see what's appropriate...

[+] mythz|9 years ago|reply

This isn't JSON-driven development, it's just choosing to apply logic over loose-typed data structures instead of named constructs. It's more awkward in Python because it doesn't have sugar syntax to index an object like JavaScript has.

But using clean built-in data structures instead of named types has its benefits especially if you need to serialize for persistence of communication as it doesn't require any additional knowledge of Types in order to access serialized data, so you can happily consume data structures in separate processes without the additional dependency of an external type system that's coupled and needs to be carried along with your data.

This is why Redux uses vanilla data structures in its store or why JSON has become popular for data interchange, any valid JSON can be converted into a JavaScript object with just `JSON.parse()` which saves a tonne of ceremony and manual effort then the old school way of having to extract data from data formats with poor programatic fit like an XML document into concrete types.

If your data objects don't need to be serialized or accessed outside of the process boundary than there's little benefit to using loose-typed data structures, in which case my preference would be using classes in a static type system to benefit from the static analysis feedback of using Types.

[+] Nullabillity|9 years ago|reply

> as it doesn't require any additional knowledge of Types in order to access serialized data

You still need to know the shape of the data you're working with, or you won't get anything useful done. So you can't skip defining types or a format, you're just skipping the tools that help you follow said format.

[+] codedokode|9 years ago|reply

Maybe they use untyped hashes and arrays just because there is no other data structures in JS?

[+] mcms|9 years ago|reply

Anemic objects and whether they are harmful or harmless has been debated in software engineering for long.

I find over-relying on encapsulation more harmful than useful nowadays specially if you are going to write scalable software that are inherently distributed. For example, hiding accessing a database behind a simple getter function makes another programmer ignore performance implication and other issues that may arise.

[+] qwertyuiop924|9 years ago|reply

Yes, but OTOH, it lessens the likelyhood of errors, and means you'll have to rewrite minimum amounts of code when you, say, switch from MySQL to Postgres.

Abstraction always lessens awareness of that which is abstracted. Decide where to draw the line for your app.

[+] Millennium|9 years ago|reply

It sounds to me like these arguments aren't so much against JSON, per se. They're against using JSON.parse() (or json.loads() in Python, json_decode() in PHP, or whatever) as your entire data-import process.

Instead, the argument goes, one should load the JSON, walk the redulting structure, and use it to build your native data structure/objects/whatever. Similarly, when the time comes to save, you crawl through your native structure to build a dict/array/primitive structure, then call JSON.stringify() (or the analogous function) to serialize that.

Uncoupling your data structure from the serialization format, though, is really just basic good software design anyway, is it not? Does anyone argue in favor what this article calls "JSON-driven development" as a design principle? Or is it just a shortcut that developers -and I am no less guilty of this than anyone else- sometimes take in the interest of getting a quick-and-dirty solution out the door?

Yes, working directly on the output of JSON.parse() is a code smell. But I'm not sure that claiming there's a rising trend of "JSON-driven development" is entirely founded. It's just people taking shortcuts.

[+] ramblenode|9 years ago|reply

This. "${PRACTICE}-driven development" suggests a practice that someone actively pursues because of perceived merit rather than a shortcut taken because of time/resource constraints.

[+] micimize|9 years ago|reply

While I see the point the Ulaş is getting at, I wouldn't call this JSON-driven development. I think JSON-driven development would use abstraction layers that are based on JSON, like JSON schema, and perhaps an OOP library that leverages it.

What I'd actually call this problem is a lack of abstraction. In functional programming, simple data structures are often preferred, and composable functions are used to manage complexity. A functional programmer might declare a function `to_structured_dict(enumerable, path)` and call it with `to_structured_dict(book_list, path=('shop_label', 'cell_label, 'book_id', count'))`

[+] falcolas|9 years ago|reply

If you're in Python, and are afraid of "anemic" objects, I would recommend checking out collections.namedtuple. It's a fantastic lightweight and performant object-like data structure.

You also get a few additional features, such as in-order iteration, the parameters are fixed at run time, and there's a method for turning it into an ordered dictionary (which is serializable in, wait for it, JSON).

[+] Singletoned|9 years ago|reply

> Once you go dict, you won't go back. This style of development is too easy, since dictionaries are baked into Python, and there are many facilities for working effectively with them.

How is this an argument against using dictionaries?

After 10 years of Python development, I do find myself using dictionaries rather than objects, in just the way that the author proscribes, but I'm finding it to be a genuine pleasure.

[+] Roboprog|9 years ago|reply

REPENT!!! :-)

Yeah, get shit done, YAGNI, and all that.

If needs wrapper, make one for repeated access types. If needs one or more bonafide objects for passing around, updating, and general good behavior, the make them as needed. Otherwise, there are 15 other little jobs that need coding, and we gotta move on.

[+] st3v3r|9 years ago|reply

And what happens when one of those keys changes?

[+] beat|9 years ago|reply

I think this article is somewhat off-base. The problem isn't JSON, it's lack of respect for separation of duties. JSON is just a data exchange format.

Want to program in an OO way using JSON? Easy. Just build a factory to generate objects from JSON input. Put your validation and error handling right there. Now you can get a known valid object from the JSON, a class instance with all the encapsulation and business logic your heart desires. Need to share it with the outside world? Provide a JSON output method.

Translating data formats is at the heart of day-to-day programming. It ain't rocket surgery. Fix the problem, not the blame.

(And if you think JSON sucks, believe me, you never dealt with data file formats from the pre-XML days!)

[+] wvenable|9 years ago|reply

The author isn't saying the problem is JSON.

The same programming style exists quite a bit in older PHP code as well. This is because one of it's primary data types is an list/hashtable hybrid. And JSON is similar -- it promotes those same structural types; the list and the hashtable (array and object, respectively). So programmers are using them, not just for building structures for data interchange, but for actual programming logic.

The fix for the problem is just education.

[+] nbevans|9 years ago|reply

There is so much wrong with this blog post that I don't even know where to begin. He appears to have included Python-specific details in his list of why he hates lists and dictionaries. Apparently Python throws exceptions if a key doesn't exist and seemingly has no Maybe/Option alternative? I don't know if that is true or not.

He claims using lists and dictionaries means you lose encapsulation - does it? A smarter programmer would realise that actually it entirely depends on the _types_ you are storing in those data structures.

[+] digisth|9 years ago|reply

The rule of thumb I've always used for when to use OO is "will there be more than one extant object at once or not?" If yes, and especially if these objects need real behavior, then use OO.

If you're essentially going through one object at a time, then discarding them, you're may just be doing conduit data processing, and so there's little advantage to using objects. I think what's missing in this (well-written) analysis is this distinction; if you're slurping data from one place, making a few changes (or especially if you're not making any), then sticking into a DB or vice versa, OO may be the wrong choice.

Ask yourself while writing the code: "are these active, behavior-driven objects that need encapsulation and relatively sophisticated behaviors, or is this just data I'm doing some relatively simple processing on?"

[+] corysama|9 years ago|reply

The author has lots of good points. Because I write most of my Python in the style he is advising against, I recognize that style has issues. The main issue for me is that a dict of dicts of dicts is not an interface. It doesn't have any constraints. It doesn't communicate expectations for use for the actual intent of the code. The best you can do is a comment explaining what to expect and a lot of error checking.

That said, almost all of the python I write these days is in the form of functional transforms on built-in data structures. And I love it!

There was a great Pycon2012 talk titled "Stop Writing Classes". You can find it linked and discussed here https://news.ycombinator.com/item?id=3717715

[+] davidism|9 years ago|reply

Start Writing More Classes: http://lucumr.pocoo.org/2013/2/13/moar-classes/

HN discussion: https://news.ycombinator.com/item?id=5204967

[+] agentgt|9 years ago|reply

On the one hand I agree with OP on directly interacting with JSON is not really a good idea but on the other hand I completely disagree with that behavior should be shoved into data objects. Also I think part of the problem is Python doesn't have much typing (I know they recently added optional typing in python but I don't think many use it).

As more of an FP guy I'm firm believer of the separation behavior and data. Clojure's Hickey sort of has a valid point... its freaking data... stop making it complicated to access it.

[+] Robin_Message|9 years ago|reply

I'm surprised no-one has linked Steve Yegge's Universal Design Pattern – http://steve-yegge.blogspot.co.uk/2008/10/universal-design-p...

It argues that loosely defined objects are an excellent design pattern, but I'm too tired to decide if it is directly relevant to this.

[+] thesmallestcat|9 years ago|reply

It's called a hash. Or a dict. Or a map. Not JavaScript Object Notation, FFS.

[+] spullara|9 years ago|reply

At this point everyone should be using an evolvable (thrift, protocol buffers, avro, etc) schema format when they are storing or transmitting their data if they want to run an always on service - there is no downtime for migrations in the real world. Trying to do this ad-hoc with JSON is a lost cause and will eventually lead you to failure at runtime or worse, data loss situations.

[+] crucini|9 years ago|reply

JSON isn't un-evolvable. In fact, thrift can serialize to JSON.

What makes thrift evolvable in practice is that we don't remove fields and don't add mandatory fields. The same discipline can be applied to JSON definitions.

Well thrift also tags all fields with integers, so a consumer with an older schema can parse a record with a newer schema, skipping the new fields. Of course JSON trivially has this property.

Maybe the key here is "ad-hoc"; something like JSON-schema is needed.

[+] mixmastamyk|9 years ago|reply

Anyone have a good blog post handy on this?

[+] jjzieve|9 years ago|reply

At least lists, dictionaries map relatively well to a tabular (SQL) format. Objects don't map well at all! Anyone who's spent enough time with "mature" ORMs knows this. Especially when there's a deadline and you have to write "native" SQL just to get whatever the hell you needed in the first place. "Well maybe you should have read everything and understood the ORM to its most minute detail..." NO! That's the whole point of abstraction! If I understood everything about that code, I'd be better off re-writing it to better suit MY specific problem. Look, I don't want to be another OO basher. OO definitely has a place in complex systems like game development, where the lives of the objects are longer than a page refresh. But in web dev, its becoming increasingly obvious to me that the OO paradigm is a huge time suck. /rant

[+] okreallywtf|9 years ago|reply

I feel like we have this discussion at work daily involving nhibernate. It is an abstraction that makes 80% of work quicker and easier, but what it makes easier and cleaner would have been trivial anyways.

294 comments