top | item 30852439

(no title)

Brakenshire | 3 years ago

How do you build your intuition about creating queries in a way to avoid this?

Is it a matter of having a conceptual model of relational algebra and the way the different db engines work, or is it more an accumulation of heuristics over time for what probably will cause problems, and an iterative process of using EXPLAIN, adjusting the query and seeing what happens?

discuss

order

tluyben2|3 years ago

I am old :) But experience is a big thing; I can sniff by just skimming the table definitions where probably something is very wrong. In uni in the 90s I studied both relational theory and formal methods and I had to spend a lot of time figuring out and fixing complexity; if you take a university level book on big O complexity and work through it, you will have a good feeling what software can and cannot do and in what way. That has not really changed; we have more efficient and more cache, we have improved algorithms, but things that cannot be looked up in O(1) are still dangerous and possibly can incur enormous IO even with only a few million records. Naive developers see that things are blazingly fast locally on their laptops and that’s it. I have met, especially in the last few years (In my bubble this is getting worse, quite fast), quite a lot of lead devs that actually do not know what an index is for and so I see entire dbs without any or only on the id field. I know people (for some reasons) do not like ORMs that create tables and indexes, but it would prevent many rookie mistakes if they did.

jiggawatts|3 years ago

I can’t remember the last time I came across a non-CotS database schema that has secondary indexes in a significant number. Like more than half a dozen for a hundred tables or more.

I’ve never seen a database use “advanced” features like clustered columnstore or even just page compression.

I just have an email in my inbox from this morning from a small vendor that “doesn’t recommend” columnstore for a database containing 10 TB of numeric metrics in one table.

That would compress to a few gigabytes and query times would go from minutes to milliseconds.

But they “don’t support it”.

Which I now translate to: “we haven’t even flipped through the manual and when we googled it in a panic we didn’t understand it.”

This is how your data is being managed at huge enterprises and government agencies around the world.

pfarrell|3 years ago

First realize that SQL is not a procedural language, you are only describing the result set. The data store will then create an execution plan which is the actual code that gets run. Learn to read the explain in your data store of choice (very few swe do this). If you have access to a database administrator in your company, befriend and learn from them. Read about how databases store and retrieve data: from sql to data pages. Measure measure measure. Learn about different types of indexes and their trade offs. Remember that “it depends” is the answer to almost every db question and that you should be thinking through all you codes interactions. That is the path to mastery of dbs.

zasdffaa|3 years ago

IME just run the DB, pick up and look at the query plans for the most common/time consuming, then add indexes. That's 80% of it. So...

> Is it a matter of having a conceptual model of relational algebra and the way the different db engines work

...no....

> or is it more an accumulation of heuristics over time for what probably will cause problems

...no....

> an iterative process of using EXPLAIN, adjusting the query and seeing what happens?

...that's more like it!

Once you understand the underlying data structures, all the magic goes away. As it should.

tluyben2|3 years ago

Yeah, that works. My process is very different, but your advice is better as it hangs more on actionable tactics than learning intuition.

> Once you understand the underlying data structures, all the magic goes away. As it should.

Once you actually understand them, I feel you don’t need explain in most cases; you will simply ‘see’ why certain queries or definitions or structures are bad.