(no title)
dandermotj | 8 years ago
I'd recommend the paper What Goes Around Comes Around[1], the first paper in Readings in Database Systems[2]
[1] https://scholar.google.com/scholar?cluster=73661829057771494... [2]redbook.io
dandermotj | 8 years ago
I'd recommend the paper What Goes Around Comes Around[1], the first paper in Readings in Database Systems[2]
[1] https://scholar.google.com/scholar?cluster=73661829057771494... [2]redbook.io
wvenable|8 years ago
I still can't imagine what sparse heterogeneous data exists in the world that makes sense to store. Any type of querying or processing requires some kind of structure (even if implicit in the code) which you can just put in different table structures.
You have to make sense of data to process it and that kind of implies a structure, doesn't it? Am I missing some obvious example of heterogeneous data?
threeseed|8 years ago
One customer column, tens of thousands of attribute columns.
If you need everything about a customer it is a single, O(1) fetch operation which makes it perfect for driving chat bots, call centres, websites, operational decisioning engines, dashboards etc. Almost every large company will have one of these.
You can't really do it in relational systems properly because (a) you hit the column limit, (b) often it is sparse i.e. lots of NULLs everywhere, (c) you need this system to be distributed since it often gets a lot of load.
growse|8 years ago
bpicolo|8 years ago
https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80...
threeseed|8 years ago
EAVT is great as an intermediate format but it is absolutely useless to query for since most of the time you are trying to find a set of attributes for a given entity i.e. full table scan.
What you want is a "wide table". One entity column and all the attribute columns to the right. Often with most of the values set to null.
This is the dream use case for MongoDB since it you can ignore sparse values yet when you query it via their drivers it will appear as a wide table. You can't do this at all in PostgreSQL since you will hit a column limit.