top | item 30295681

(no title)

mdellavo | 4 years ago

EAV is a pattern you can easily roll out - https://docs.sqlalchemy.org/en/14/orm/examples.html#module-e...

Another commenter illustrated some of the issues you will run into with this pattern at scale.

It would probably help to benchmark some complex queries on a sizable data set. And compare against mongo, postgres jsonb, vanilla eav, clickhouse, etc. Without much information to go on, it's hard to know what this is.

discuss

etirk|4 years ago

I agree with getting some benchmarks up there. Until that's done, I can share what some preliminary tests revealed. Just to give you more of feel for where this tool might fit.

We imported the complete Clinical Trials dataset (380k docs ~ 200 attributes each) into SFSQL, Mongo and a Postgres JSON column.

Import speed with raw documents: Mongo and Postgres win hands down. Why? Very little processing to be done on their parts, while SFSQL unravels the structures and stores everything with indexes already in place. Excluded use cases for SFSQL? high-volume logging, data-sinks, etc.

Query speed against raw document elements: Mongo and Postgres very fast, SFSQL respectable. Why? Mongo is optimized for querying raw documents, Postgres obviously did their work as well. Excluded use cases for SFSQL? storage of raw, unprocessed json documents.

Then we extracted all unique instances of a particular attribute from that data and put them into their own collection in mongo and it's own table in Postgres. The number of distinct objects extracted to external collection/table was just 11. Then we modified the queries so that they JOINed to the external collection/table. Result: SFSQL still respectable (nothing changed internally or speed wise). Postgres and mongo displayed a huge slowdown (and this was just a single join). Included use cases for SFSQL? complex relational/referential data.

mdellavo|4 years ago

what about applying a GIN index on the JSONB?