IMO there's a big difference between university level SQL and the ability to competently write SQL in a professional setting. In university, at least at my school, you are getting maybe at most 3 tables overall, with 3-5 columns each. Professionally, you're dealing with hundreds, maybe thousands of tables, each with between 10-1000 columns, and usually not even in 1NF either. It really takes a lot of practice writing queries professionally to get to a level of competence here.
I think it's bc of how popular ORM's are. You can get pretty far with an ORM without knowing any SQL, but you can inadvertently do some really dumb things if you don't understand how SQL works.
SQL was the first language I ever learned, so perhaps my perspective is skewed, but it seems like it would be pretty hard to use an ORM without gaining an intuition into how SQL works.
What is a trap I see often among ORM users is them running into the n+1 problem. However, that's actually a consequence of database implementations not being theoretically pure. If you had an ideal SQL database then their approach would be technically correct. They aren't wrong to think that way.
It's just that we don't have ideal SQL databases, so we have to resort to hacks to make things work in the real world. Why we need to break from the theoretical model and use those hacks can be difficult to understand if you aren't familiar with the implementation at a lower level.
I don't think it's the syntax of SQL that causes problems (and which ORMs try to replace)... it's the complex underlying logic necessary to accurately join multiple tables, set where and having conditions, and aggregating records, and knowing to which dataset each should apply.
Countless "replacements" for SQL have come around to make data accessible by non-expert, but regardless if your using a BI tool, reporting platform, ORM, whatever, pretty soon you're going to need to understand and express the logic, and that's when you really appreciate SQL's directness and terseness.
Every single org I've found that says "we don't use ORMs because they're slow" ends up constructing their own bespoke, wheel-reinventing "orm" or sticks all their application logic in sprocs, which are nightmares for imperative logic.
My memory could be a little hazy, but I don't remember any required course that dealt with SQL when I was in the CS program at a pretty highly-regarded university 25 years ago.
I took a course in which I learned quite a lot about SQL and in retrospect it was an extremely useful course to have taken.
I remember taking a databases class (either junior or senior year) that covered database design and also SQL with Oracle. We even got into Pro*C, which was some crazy Oracle-specific C pre-processor. It definitely wasn't required. My roommate took it and failed.
Mine ~15 years ago in the US had a required course that did some relational algebra and database normalization without ever touching a database (as a small part of a broader course, IIRC), and an elective where we did really simple stuff with a database (I came out of it not really understanding the difference between INNER JOIN and LEFT JOIN, for example).
At a Dutch CS study we had an SQL course that started with first-order logic, relational algebra and went to on to project that into SQL. It also taught 3NF/4NF and BCNF, indices, r-trees, query planning and optimisation.
Database education in CS programs in North America seems minimal. Most people I meet haven't taken any, or maybe just took a practical "how to query" course. In North America, it seems like only CMU has an active (excellent) program around this topic.
There's a woeful ignorance about what the relational data model is, how the industry arrived here, and how this is implemented. Problems or archaisms with SQL specifically become synonymous in people's heads with the relational model generally, and for a while that led down the quite problematic NoSQL road. Then slingshotted back to SQL -- but from my perspective SQL itself (not the relational model) is a problem. It doesn't compose well. It doesn't handle recursive relations well. It has an awkward syntax. It conflates concepts. It has an archaic datatype model. None of this is intrinsic to the relational model, but SQL becomes a limiting factor.
Disclaimer: I work for a DB company doing awesome stuff with the relational data model, but not SQL (RelationalAI) so am ... biased. Though I have always had those biases. :-)
I'm a faculty member at the Rochester Institute of Technology. We have multiple courses for both our undergrads and grad students in databases. I primarily teach Introduction to Big Data for our graduate students. We happen to cover all the things that you mention in your second paragraph :)
I've worked with many data scientists whose typical SQL usage was to load entire rows (or with simple 'with' filtering) into Pandas / R dataframes and then do all their work there.
I think it's a combination of Pandas and R having much simpler APIs and a ton of documentation on Stack Overflow, and modern hardware just being so good that you can load a big enough chunk of your dataset in-memory on a laptop.
I mostly use SAS, I tend to prefer using plain sql queries where I typically depart SQL and jump into code is doing what SAS calls "By Group processing" (example https://support.sas.com/kb/26/013.html#)
I am not as familiar with R. Last time I worked in R (some years ago) equivalent R code was something like this caution I'm no expert in writing R so might be a better /more intuitive way...
> Have they stopped teaching this in University or something?
It wasn't part of the mandatory coursework when I went to school. I really think it should be. It's not as though my CS program was 'pure theory', they taught lots of 'vocational' stuff but not SQL.
As for why people put off learning it.. I put off learning it longer than I care to admit because I thought I could simply do without it (ORMs, BerkeleyDB, etc), and fake it if put on the spot (I always understood the very basics, insofar as a SQL query can be understood if read as declarative English.) Ever since I bit the bullet and actually learned it properly, I've been kicking myself for not learning it upfront.
I think one of the main uncertainties in higher education the last couple decades is what balance, if any, they should strike between job training for software developers, coding as an engineering discipline, and preparation for an academic career in math.
Most programs include parts of all three, but SQL per se (rather than like, relational algrebra) is pretty firmly in the professional training for software developers set, and schools that reject that aspect may not teach it.
In my experience recent code school/boot camp grads have as much or more practical sql as recent CS grads; probably because those schools are nearly totally focused on professionally applicable skills.
That may be likely, but then schools which emphasize the math/theory part are then missing the opportunity to really teach the theoretical part of the relational algebraic stuff; which is really set theory and first order logic. There's a lot of math-y stuff to dig into there.
It should be mandatory along with understanding O notation and datastructures and algorithms, because really... unless you're doing a career which is like... 100% embedded development... you're going to be encountering databases and datamodeling as part of your career.
I'd be much happier if CS programs would graduate people who did a whole semester of first order logic, Date&Codd's foundational papers and why network&hierarchical databases are problematic, relational algebra / calculus, Datalog & friends, and then just toss in a "and this is how SQL does some of this but also mangles all of this..." at the end.
+ as a bonus, a DB implementation/internals course so people can understand what goes into query execution etc.
Because those people would then have some proper context before going off and butchering the world with ORMs and microservices...
The issue isn't usually competency, can vs. cannot. It is usually a matter of should or should not. I have seen databases thrown at problems where they decidedly should not have, and have seen the product development cycle suffer where it should not have. While some engineers and analysts relish CTE spaghetti -- solving complex logic utilizing enormous wads of SQL, others are more wary and rightly look to tackle complexity with more accountable and powerful tools when possible. I admit that tools like DBT can help address some of the shortcomings of SQL, but it still does not have the generality and conciseness of modern languages.
Anecdata point here, personally my SQL skills have lapsed because I've been incentivized to know NoSQL more. Same for frontend stuff like JS/React btw.
throwaway23597|3 years ago
mritchie712|3 years ago
randomdata|3 years ago
What is a trap I see often among ORM users is them running into the n+1 problem. However, that's actually a consequence of database implementations not being theoretically pure. If you had an ideal SQL database then their approach would be technically correct. They aren't wrong to think that way.
It's just that we don't have ideal SQL databases, so we have to resort to hacks to make things work in the real world. Why we need to break from the theoretical model and use those hacks can be difficult to understand if you aren't familiar with the implementation at a lower level.
cnity|3 years ago
listenallyall|3 years ago
Countless "replacements" for SQL have come around to make data accessible by non-expert, but regardless if your using a BI tool, reporting platform, ORM, whatever, pretty soon you're going to need to understand and express the logic, and that's when you really appreciate SQL's directness and terseness.
whoopdeepoo|3 years ago
Kon-Peki|3 years ago
I took a course in which I learned quite a lot about SQL and in retrospect it was an extremely useful course to have taken.
icedchai|3 years ago
Izkata|3 years ago
edejong|3 years ago
SCdF|3 years ago
robertlagrant|3 years ago
cmrdporcupine|3 years ago
There's a woeful ignorance about what the relational data model is, how the industry arrived here, and how this is implemented. Problems or archaisms with SQL specifically become synonymous in people's heads with the relational model generally, and for a while that led down the quite problematic NoSQL road. Then slingshotted back to SQL -- but from my perspective SQL itself (not the relational model) is a problem. It doesn't compose well. It doesn't handle recursive relations well. It has an awkward syntax. It conflates concepts. It has an archaic datatype model. None of this is intrinsic to the relational model, but SQL becomes a limiting factor.
Disclaimer: I work for a DB company doing awesome stuff with the relational data model, but not SQL (RelationalAI) so am ... biased. Though I have always had those biases. :-)
michaelmior|3 years ago
alephxyz|3 years ago
bigger_cheese|3 years ago
I am not as familiar with R. Last time I worked in R (some years ago) equivalent R code was something like this caution I'm no expert in writing R so might be a better /more intuitive way...
Output_data <-merge(x=T1, y=T2, by="Date", all.x="True") %>% mutate(My_var = NAME) %>% fill(My_var)
In SQL the equivalent would need to use Over (Partition by) which is less intuitive for me to write.
o_nate|3 years ago
VeninVidiaVicii|3 years ago
https://cran.r-project.org/web/packages/data.table/vignettes...
LarryMullins|3 years ago
It wasn't part of the mandatory coursework when I went to school. I really think it should be. It's not as though my CS program was 'pure theory', they taught lots of 'vocational' stuff but not SQL.
As for why people put off learning it.. I put off learning it longer than I care to admit because I thought I could simply do without it (ORMs, BerkeleyDB, etc), and fake it if put on the spot (I always understood the very basics, insofar as a SQL query can be understood if read as declarative English.) Ever since I bit the bullet and actually learned it properly, I've been kicking myself for not learning it upfront.
giraffe_lady|3 years ago
Most programs include parts of all three, but SQL per se (rather than like, relational algrebra) is pretty firmly in the professional training for software developers set, and schools that reject that aspect may not teach it.
In my experience recent code school/boot camp grads have as much or more practical sql as recent CS grads; probably because those schools are nearly totally focused on professionally applicable skills.
cmrdporcupine|3 years ago
It should be mandatory along with understanding O notation and datastructures and algorithms, because really... unless you're doing a career which is like... 100% embedded development... you're going to be encountering databases and datamodeling as part of your career.
I'd be much happier if CS programs would graduate people who did a whole semester of first order logic, Date&Codd's foundational papers and why network&hierarchical databases are problematic, relational algebra / calculus, Datalog & friends, and then just toss in a "and this is how SQL does some of this but also mangles all of this..." at the end.
+ as a bonus, a DB implementation/internals course so people can understand what goes into query execution etc.
Because those people would then have some proper context before going off and butchering the world with ORMs and microservices...
spaniard89277|3 years ago
But AFAIK so does my local university.
pc86|3 years ago
waffletower|3 years ago
maerF0x0|3 years ago
cess11|3 years ago