top | item 26912378

(no title)

hpeinar | 4 years ago

Very interesting topic!

I feel like this would be the right time to ask for an advice regarding doing something similar for user search results with PostgreSQL (v11)

Eg. User "subscribes" to product searches for "Women - Nike - Size M" and the system sends her a daily notification or email if there are new result within her filter.

How would one solve this kind of subscription logic? So far what I've up with is just to save and re-run the user queries from the API side and have last results last primary ID in a separate table to know which results are to be considered new. (So I can use primary key filter on the query to lessen the products DB has to go through, EG: "SELECT id FROM products WHERE [user filters] AND id > 1234"

But it doesn't feel right to bombard the DB with 10k queries on daily basis to achieve this, but maybe I'm overthinking and it is viable.

I looked through some of the comments mentioning NOTIFY / LISTEN but I don't think I could viably use this as I can't apply separate filters on this on the DB level.

Note: I'm looking for general ideas / things I should consider. I'm not expecting anyone to do my work.

discuss

order

vaughan|4 years ago

It's probably the simplest solution if you are running batch job once per day. Just ensure you are running one query per filter set, and you have your indexes setup correctly.

Maybe you could speed up your queries by using a materialized view for `FROM products WHERE id > 1234`. You could then maybe derive more materialized views for other attributes depending on the clustering of your watched queries...e.g. gender might split the dataset in half, but this is not guaranteed to improve perf and indexes might actually be faster, place there are space concerns and additional indexes.

If you want to run less queries you could combine queries like [Hasura does][1].

To avoid running queries with no new results you would need to watch DB writes, and map DML statements to query subs...but pointless in a batch setting.

[1]: https://github.com/hasura/graphql-engine/blob/master/archite...

eloff|4 years ago

Because this can be done in a background batch job during off-peak hours, there's no harm in throwing 10K queries at the database - you should be able to run a 100x that with commodity cloud hardware without noticeably impacting performance, provided you take care to not just throw queries at the db at the max rate it can handle.

The simple approach is best here IMHO.

nicoburns|4 years ago

You could probably find some way to combine every subscription into a single query (by storing the query parameters themselves in the database).