top | item 43628788

(no title)

retakeming | 10 months ago

I'm one of the pg_search maintainers. Hello! A few thoughts.

First, both strategies - the one outlined by the Neon/ParadeDB article, and the one used here -- are presented as viable alternatives by the Postgres docs: https://www.postgresql.org/docs/current/textsearch-tables.ht....

Second - as the article correctly demonstrates, the problem with Postgres FTS isn't "how can I pick and optimize a single pre-defined query" it's "how do I bring Postgres to Elastic-level performance across a wide range of real-world boolean, fuzzy, faceted, relevance-ranked, etc. queries?"

`pg_search` is designed to solve the latter problem, and the benchmarks were made to reflect that. You can always cherry-pick a query and optimize it at the expense of data duplication and complexity. The Neon/ParadeDB benchmarks contained 12 queries in total, and the benchmarks could have:

- Created composite b-tree indexes for each of the queries with boolean predicates

- Extracted the all the text fields from JSONBs, stored and indexed them as a separate columns for queries against JSONB

But that's not realistic for many real-world use cases. `pg_search` doesn't require that - it's a simple index definition that works for a variety of "Elastic style" queries and Postgres types and doesn't ask the user to duplicate every text column.

discuss

lostb1t|10 months ago

just an fyi: The blog link in your readme does not work.

stuhood|10 months ago

Thanks for reporting this! I'm having trouble finding the link you are referring to though. Would you mind sharing a link to the file/page containing the dead link?

cryptonector|10 months ago

From the blog about pg_search linked by TFA:

  This is what we did: 

    DB with pg_search: We created a single BM25 index 
    DB without pg_search: We created all these indexes
        GIN index on message (for full-text search)
        GIN index on country (for text-based filtering)
        B-tree indexes on severity, timestamp, and metadata->>'value' (to speed up filtering, ordering, and aggregations)

See the problem? You didn't create an index on the vector in the without-pg_search case. You didn't compare apples to apples. TFA is all about that.

Perhaps you can argue that creating a fastupdates=on index would have been the right comparison, but you didn't do that in that blog.

> You can always cherry-pick a query and optimize it at the expense of data duplication and complexity. The Neon/ParadeDB benchmarks contained 12 queries in total, and the benchmarks could have:

TFA isn't cherry-picking to show you that one query could have gone faster. TFA is showing that you didn't compare apples to apples. Looking at those 12 queries nothing screams at me that TFA's approach of storing the computed tsvector wouldn't work for those too.

Perhaps pg_search scales better and doesn't require trading off update for search performance, and that would be a great selling point, but why not just make that point?

supermatt|10 months ago

Why so angry?

> "You didn't ..."

No, they didn't. They aren't Neon and didnt do the benchmarks in the linked article. They are a postgres maintainer.

If you actually read their comment instead of raging you will see that they are saying that pg_search is a simple generic index definition that makes a _variety_ of queries work with little effort, and you can still add the additional optimisations (that are already documented - which they linked to) where needed.

wodenokoto|10 months ago

Are you sure parent is the author of that blog post?

Maybe I’m reading the whole thread wrong, but it looks like you are screaming at a maintainer of pg_search that someone else did a poor benchmark

moogleii|10 months ago

There is an art to communicating that I think people learn around their college years...