top | item 30295423

(no title)

etirk | 4 years ago

Yes, an EAV pattern is being used. The end result in query speed with the particular table designs and indexes is essentially just like what index intersection gives you but without the setup. And of course EAV is not a pattern that you can easily roll out by hand when you need it.

Please try this demo which is a sample of some queries against Clinical Trials data. https://schemafreesql.com/demo.html#clinicalTrial Although this demo data set is a limited size, the same queries when run against the entire clinicalTrial dataset performed very respectably.

discuss

mdellavo|4 years ago

EAV is a pattern you can easily roll out - https://docs.sqlalchemy.org/en/14/orm/examples.html#module-e...

Another commenter illustrated some of the issues you will run into with this pattern at scale.

It would probably help to benchmark some complex queries on a sizable data set. And compare against mongo, postgres jsonb, vanilla eav, clickhouse, etc. Without much information to go on, it's hard to know what this is.

etirk|4 years ago

I agree with getting some benchmarks up there. Until that's done, I can share what some preliminary tests revealed. Just to give you more of feel for where this tool might fit.

We imported the complete Clinical Trials dataset (380k docs ~ 200 attributes each) into SFSQL, Mongo and a Postgres JSON column.

Import speed with raw documents: Mongo and Postgres win hands down. Why? Very little processing to be done on their parts, while SFSQL unravels the structures and stores everything with indexes already in place. Excluded use cases for SFSQL? high-volume logging, data-sinks, etc.

Query speed against raw document elements: Mongo and Postgres very fast, SFSQL respectable. Why? Mongo is optimized for querying raw documents, Postgres obviously did their work as well. Excluded use cases for SFSQL? storage of raw, unprocessed json documents.

Then we extracted all unique instances of a particular attribute from that data and put them into their own collection in mongo and it's own table in Postgres. The number of distinct objects extracted to external collection/table was just 11. Then we modified the queries so that they JOINed to the external collection/table. Result: SFSQL still respectable (nothing changed internally or speed wise). Postgres and mongo displayed a huge slowdown (and this was just a single join). Included use cases for SFSQL? complex relational/referential data.