top | item 46244858

(no title)

ellisv | 2 months ago

I wish devs would normalize their data rather than shove everything into a JSON(B) column, especially when there is a consistent schema across records.

It's much harder to setup proper indexes, enforce constraints, and adds overhead every time you actually want to use the data.

discuss

order

nh2|2 months ago

JSON columns shine when

* The data does not map well to database tables, e.g. when it's tree structures (of course that could be represented as many table rows too, but it's complicated and may be slower when you always need to operate on the whole tree anyway)

* your programming language has better types and programming facilities than SQL offers; for example in our Haskell+TypeScript code base, we can conveniently serialise large nested data structures with 100s of types into JSON, without having to think about how to represent those trees as tables.

cies|2 months ago

You do need some fancy in-house way to migrate old JSONs to new JSON in case you want to evolve the (implicit) JSON schema.

I find this one of the hardest part of using JSON, and the main reason why I rather put it in proper columns. Once I go JSON I needs a fair bit of code to deal with migrartions (either doing them during migrations; or some way to do them at read/write time).

tracker1|2 months ago

I find that JSON(B) works best when you have a collection of data with different or variant concrete types of data that aren't 1:1 matches. Ex: the actual transaction result if you have different payment processors (paypal, amazon, google, apple-pay, etc)... you don't necessarily want/care about having N different tables for a clean mapping (along with the overhead of a join) to pull the transaction details in the original format(s).

Another example is a classifieds website, where your extra details for a Dress are going to be quite a bit different than the details for a Car or Watch. But, again, you don't necessarily want to inflate the table structure for a fully normalized flow.

If you're using a concretely typed service language it can help. C# does a decent job here. But even then, mixing in Zod with Hono and OpenAPI isn't exactly difficult on the JS/TS front.

dzonga|2 months ago

Yeah document formats (jsonb) are excellent for apps etc that interface with the messy real world. ecommerce, gvt systems etc, anything involving forms, payments etc

tryna map everything in a relational way etc - you're in a world of pain

crazygringo|2 months ago

For very simple JSON data whose schema never changes, I agree.

But the more complex it is, the more complex the relational representation becomes. JSON responses from some API's could easily require 8 new tables to store the data in, with lots of arbitrary new primary keys and lots of foreign key constraints, your queries will be full of JOIN's that need proper indexing set up...

Oftentimes it's just not worth it, especially if your queries are relatively simple, but you still need to store the full JSON in case you need the data in the future.

Obviously storing JSON in a relational database feels a bit like a Frankenstein monster. But at the end of the day, it's really just about what's simplest to maintain and provides the necessary performance.

And the whole point of the article is how easy it is to set up indexes on JSON.

jasonthorsness|2 months ago

When a data tree is tightly coupled (like a complex sample of nested data with some arrays from a sensor) and the entire tree is treated like a single thing by writes, the JSON column just keeps things easier. Reads can be accelerated with indexes as demonstrated here.

whizzter|2 months ago

I fully agree that's wrong (can't imagine the overhead of some larger tables I have if that had happened), that said, often people want weird customizations in medium-sized tables that would set one on a path to having annoying 100 column tables if we couldn't express customizations in a "simple" JSON column (that is more or less polymorphic).

Typical example is a price-setting product I work on.. there's price ranges that are universal (and DB columns reflect that part) but they all have weird custom requests for pricing like rebates on the 3rd weekend after X-mas (but only if the customer is related to Uncle Rudolph who picks his nose).

fauigerzigerk|2 months ago

But if you have to model those custom pricing structures anyway, the question what you gain by not reflecting them in the database schema.

There's no reason to put all those extra fields in the same table that contains the universal pricing information.

konart|2 months ago

Normalisation brings its own overhead though.