top | item 40505638

(no title)

andrscyv | 1 year ago

So a super complicated work around instead of just doing sql queries or using a query builder ???

discuss

Some of us just aren't smart enough for sql. I'm perpetually running into the situation where I want to join one table with another that has multiple rows. Like a blog post with tags. Exactly like this: https://stackoverflow.com/questions/8201462/join-with-anothe...

For which the answer is oh, just use GROUP_CONCAT, which isn't even SQL. And I've still got to fix it up by running split when I get it back. Nor does it work particularly well if you're joining against something that isn't a single string column.

So I just bang the rocks together like a caveman and write a second query to go back and fetch all the tags, then loop around assign them to the matching posts.

superice|1 year ago

This seems fundamentally unfixable with current ORMs. You either have to pick between lazy loading per entity, or eager load and carthesian product everything, which breaks badly if you are dealing with multiple one-to-many relations.

Our solution was to write our own ORM-like system that “remembers” which entities you’ve loaded within the context of the transaction, and then will fetch the relation for all of them. So if you access a1.b then it will also fetch a2.b and cache it if you loaded a2 within that same transaction. The call to a2.b will then resolve instantly. So instead of n queries with n being the number of entities loaded in lazy loading, you’re doing n queries with n being the number of relations touched.

The one bad failure case is if you update entities one-by-one but also accessing their relations in a loop since you invalidate the cache for that entity type on every write.

Mavvie|1 year ago

I think that's exactly correct. You either do split queries (with more latency) or you do a join (and risk Cartesian explosion). Most ORMs should do this for you.

tristor|1 year ago

SQL is easy to understand from the perspective of syntax, but quickly becomes challenging to understand depending on the data model how to actually get what you want out of a query. I have spent many years working in SQL across different server products (PostgreSQL, MySQL, Microsoft SQL Server, and others), and I still sometimes find myself struggling to build the most efficient query for task. Most of the time, it's not SQL that's the problem, it's a broken data model that preceded me (often created by an ORM) that I am forced to work with.

All that said, while it may be painful sometimes, learning it can at least let you poke your ORM more towards the right direction, or give you an opportunity to identify the hot spots where it's worth investing the effort to write a more efficient query directly, much like where sometimes it makes sense to write some subset of an application in assembly for performance, while most of the time you're better off just letting the compiler optimize for you.

For the problem you're talking about above, you're trying to avoid a Cartesian explosion, which is probably best handled through query splitting as you mentioned, but it depends on the database engine how best to approach that. For all its warts, Microsoft SQL Server is pretty good about things like the ability to use query splitting within the context of a single transaction to make it serializable and ensure data consistency (although that can come with its own challenges for performance).

The example you provided is nearly exactly the case where I would expect an ORM to produce a bad query, where-as a direct query build would be significantly more performant because you can incorporate a clear understanding of the data model and use limits and ordering to minimize the size of the result set you need to deal with.

nurettin|1 year ago

> Some of us just aren't smart enough for sql.

I've heard this all my career. And I went through several phases with SQL including the banging rocks phase.

No, some of us just haven't had time to improve. I wrote my first sql in the early 2000s. I wasn't aware of window functions and lateral joins until 10 years later. Took me another couple of years to start using CTEs.

I don't even write SQL that much. I know enough SQL to make ORMs just do what I want.

lainga|1 year ago

Or else, if the post is talking about a "public-facing API resource", can someone tell me why the API wouldn't implement querying for multiple of the same record type at once? It just seems to me that choosing between getting 1 owner, and "get ALL owners" (as TFA puts it), is like a law of the excluded middle

metadat|1 year ago

It's tricky because in some cases you might be able to batch all the queries up front, but in others you will only know the IDs you need to fetch after you get one or more intermediate results back and apply business logic to arrive at a decision.

As of today there's no silver bullet beyond having a solid and principles-first understanding of your database and related infrastructure.

Ozzie_osman|1 year ago

This might be true for some APIs but if you're letting the caller control what they fetch, you don't really have that luxury. An extreme example is Graphql-based APIs but even Rest-based APIs might allow some flexibility to decide what they fetch.

sgarland|1 year ago

That was my first thought as well. Devs will do anything to avoid learning SQL.

You’re spending 20x the code and probably 1000x the time coming up with this nonsense, and for what – so you can say there isn’t any raw SQL in your code base? It’s as ridiculous as denouncing someone for occasionally dropping into Assembly for a specific purpose, or writing a bit of C to call from Python, etc.

stephen|1 year ago

I can assure you that brandur knows SQL. :-)

(I don't know him personally, but have been following his blog for years.)

What these "just write SQL" rants are missing is encapsulation--let's say you've got a business logic, like "if the account is disabled, render the account name as 'Foo (disabled)'".

You want to write this logic in your preferred backend language, Go/TS/C/etc.

This works fine, in the /account/X endpoint (just load the account, and apply the logic).

But now what about the /accounts/client:Y endpoint (load all accounts, all the logic for all of their accounts)

As time goes by, you end up having 10-20 endpoints that all "return some part of account" as their payload, and you want the same "Foo (disabled)" business logic.

Your options are:

1. Build a single, giant SQL statement that strings together every snippet of business logic applicable to this endpoint (bulk friendly b/c the db is doing all the cross-entity work w/joins)

2. Push the business logic down into the db layer (simple for this, just string concate with an if, but what about anything that is a loop? doable in SQL but tedious)

3. Use your language's abstractions, like functions, to organize the business logic (what Brandur is attempting to do).

Nearly everyone wants to do 3, because 1 doesn't scale as your business logic becomes more & more sophisticated (copy/pasting it around every endpoint's single-giant SQL statements, or pushing it down into the db as views/stored procedures).

ec109685|1 year ago

Without two phases: figure out what you need for the resource, execute query, how can you avoid N+1 problem, raw sql or not.

unrealhoang|1 year ago

TFA is using raw SQL (via sqlc) if you read carefully. And no, raw SQL doesn't get rid of N+1 issues.

unknown|1 year ago

[deleted]