top | item 39800607

(no title)

htag | 1 year ago

I learned SQL before I learned set theory. While learning set theory I remember thinking "oh this notation is just SQL backwards." Afterwards I began to find SQL much harder because I realized there are so many ways to mathematically ask for the same data, but SQL servers will computationally arrive at the end differently and with very different performance. This is a minor deal if you're just doing small transactions on the database, because if you are dealing with pages of 100 objects it's trivial to hit good-enough performance benchmarks, even with a few joins.

I was first introduced to the issue of needing hyper optimized SQL in ETL type tasks, dealing with very large relational databases. The company switched to non-relational database shortly after I left, and it was the first time I professional witness someone make the switch and agreed that it was obviously required for them. We were dealing with very large batch operations every night, and our fortune 500 customers expected to have the newest data and to be able to do Business Intelligence operations on the data every morning. After acquiring bigger and bigger customers, and collecting longer and longer histories of data, our DBA team had exhausted every trick to get maximum performance from SQL. I was writing BI sql scripts against this large pool of SQL data to white-glove some high value customers, and constantly had to ask people for help optimizing the sql. I did this for a year at the beginning of my career, before deciding to move cities for better opportunities.

Lately, I've began seeing the requirements of high performance SQL again with the wave of microservice architectures. The internal dependency chain, even of what would have been a mid size monolith project a decade ago, can be huge. If your upstream sets a KBI of a response time, it's likely you'll get asked to reduce your response time if your microservice takes up more than a few percentage points of the total end to end time. Often, if you are using relational SQL with an ORM you can find performance increases in your slowest queries by hand writing the SQL. Many ORMs have a really good library for generating sql queries they expose to users, but almost all ORMs will allow you to write a direct sql query or call a stored procedure. The trick to getting performance gains is to capture the SQL your ORM is generating and show it to the best sql expert that will agree to help you. If they can write better SQL than the ORM generated than incorporate it into your app and have the SQL expert and a security expert on the PR. You might also need to do a SQL migration to modify indexes.

So in summary, I think your experiences with SQL depends heavily on your mathematical background and your professional experience. It's important to look at SQL as computational steps to reach your required data and not simply as a way to describe the data you would like the SQL server to give you.

discuss

fifilura|1 year ago

Was this before BigQuery/Presto/Trino? To me it seems like those technologies would have been a good fit.

They don't really work with indexes but instead regular files stored in partitions (where date is typically one of them).

This means that they only have to worry about the data (e.g. dates) that you are actually querying. And they scale up to the number of CPUs that particular calculation needs. They rarely choke on big query sizes. And big tables are not really an issue as long as you query only the partitions you need.

htag|1 year ago

Those technologies were brand new at the time, the discussions about the problem started in 2013. The company (I had zero input) choose a more established vendor with an older product. Given the time and institutional customers that were trusting us with their data, I suspect any cloud based offerings were a nonstarter, and open source felt like a liability.

Of course with 20/20 hindsight that decision is easy to criticize. I suspect their primary concerns were to minimize risk and costs while meeting our customer's requirements. Even today, making a brand new Google product or Facebook backed open source project a hard dependency would be too much risk for an established business.