Not sure I get the point of this? Perhaps I’m being dense? The hard part about SQL isn’t picking out the keywords in the select/insert statement, it’s understanding the DB relations, its structure and constraints.
If you want to make SQL easier to understand then take a look at a 15 year old stored procedure that’s been hacked at by a dozen devs is over 1,000 lines long, has sketchy rollback error protection, uses two CTEs, a pivot, no temp tables and does some xml shenanigans in the middle (I’m being hyperbolic obviously)
This feels like trying to loose weight by trimming your toe nails, you’re technically lighter but not so as it matters.
I understand the wonder, this is not some kind of magic that will help us solve all SQL issues. That's way too hard. This is a development tool that helps write and read SQL code. So simply saves development time by automating the easy stuff, leaving the hard and important stuff for us to deal with. It is different from other formatters that it lints code inside Python strings. For some context, I am a data engineer and I write a lot of SQL code inside Python code (in the future it should support other programming languages as well).
However, the tool might suggest improvements on SQL code in the future
Leaving aside the modest achievements of the code so far, the missing prerequisite to "automatically" formatting SQL is... just formatting SQL.
Because SQL (broadly) was designed to be "human" readable in the first place, it's grammatically very inconsistent and with a lot of keywords. Much more than other languages in use today such as C.
I've yet to find a pattern of indentation, brackets etc. that satisfies my OCD.
Coming up with an example of a nicely formatted SQL statement is not difficult, but turning that into consistent 'rules' and immediately you find counterexamples using other parts of SQL.
I make an effort to indent on most keywords, using a Python-like structure to indent related lines to the same level, one more than the keyword operating on them all, with the heuristic that if I can comment one or more lines to debug, I have a readable query.
I tend to classify SQL statements into two kinds, those that when wrapped in a calling function fit in one screen, and those other longer ones that I'm inclined to write in an imperative language.
Edit: For the author of the repository, the list of reserved words gets longer and more complex when you support different implementations of SQL, and regex may be insufficient once you consider such parsing questions as whether the keyword is within quotation marks or part of a user-defined name.
You could take this idea further by fully parsing queries according to the grammar of the SQL language, rather than using simple pattern matching.
In fact it is a result of theoretical computer science that you _cannot_ correctly parse languages like SQL, HTML, Python, etc. with regular expressions: Any attempt to do anything non-trivial will have cases where it misunderstands the code.
So you would want to find a SQL grammar (an outdated example in [1]) and a module[2] that can use this to parse queries into a data structure to which you can apply transformations (e.g., changing case of keyword tokens) and then write back out as a string.
SQLite's documentation has some nice diagrams[3] to get an idea for how it parses a query string. The table of links at the top lets you dive into, e.g., all the optional parts of a SELECT statement.
Thank you for the feedback, it will be useful. This is something I had in mind and I believe this would be even more powerful but I started with this minimalistic approach. It might grow into something like this later
Was expecting something like DESCRIBE but for less technical audience.
As a heavy SQL user, I don't see much benefit in this as of yet. It is very misleading in saying it is the "most human readable" format, when the example shows the "format" to be identical to the original. Just by upper-casing keywords doesn't make it any more readable to be honest.
Anyways a good attempt, hope you're not offended by critical feedbacks and hope they are useful for some ideas to improve the tool.
Thank you for the feedback. The example in Github has code highlighting. I wonder if I should remove that and if that would make more difference. I would be interested to know how this tool could help you with your SQL usage
Adding line breaks before FROM, JOIN, WHERE, GROUP BY, ORDER BY, LIMIT/OFFSET (and usually before some of the ANDs) is the first thing I do when analyzing some SQL. Capitalization is rarely needed after that and useful for longer term readability.
I also normalize by removing optional/redundant keyword noise, I'm looking at you INNER/OUTER.
For now it only upper-cases but I would like it to "prettify" the code as well. So it is an upcoming feature. Hopefully it will even suggest SQL code improvements in the future. Contributions are welcome
simonbarker87|5 years ago
If you want to make SQL easier to understand then take a look at a 15 year old stored procedure that’s been hacked at by a dozen devs is over 1,000 lines long, has sketchy rollback error protection, uses two CTEs, a pivot, no temp tables and does some xml shenanigans in the middle (I’m being hyperbolic obviously)
This feels like trying to loose weight by trimming your toe nails, you’re technically lighter but not so as it matters.
azisk1|5 years ago
However, the tool might suggest improvements on SQL code in the future
zbuf|5 years ago
Because SQL (broadly) was designed to be "human" readable in the first place, it's grammatically very inconsistent and with a lot of keywords. Much more than other languages in use today such as C.
I've yet to find a pattern of indentation, brackets etc. that satisfies my OCD.
Coming up with an example of a nicely formatted SQL statement is not difficult, but turning that into consistent 'rules' and immediately you find counterexamples using other parts of SQL.
techbio|5 years ago
I tend to classify SQL statements into two kinds, those that when wrapped in a calling function fit in one screen, and those other longer ones that I'm inclined to write in an imperative language.
Edit: For the author of the repository, the list of reserved words gets longer and more complex when you support different implementations of SQL, and regex may be insufficient once you consider such parsing questions as whether the keyword is within quotation marks or part of a user-defined name.
https://www.drupal.org/docs/develop/coding-standards/list-of...
https://github.com/AzisK/readsql/blob/master/readsql/regexes...
rgovostes|5 years ago
In fact it is a result of theoretical computer science that you _cannot_ correctly parse languages like SQL, HTML, Python, etc. with regular expressions: Any attempt to do anything non-trivial will have cases where it misunderstands the code.
So you would want to find a SQL grammar (an outdated example in [1]) and a module[2] that can use this to parse queries into a data structure to which you can apply transformations (e.g., changing case of keyword tokens) and then write back out as a string.
SQLite's documentation has some nice diagrams[3] to get an idea for how it parses a query string. The table of links at the top lets you dive into, e.g., all the optional parts of a SELECT statement.
1: https://ronsavage.github.io/SQL/sql-92.bnf.html
2: https://tomassetti.me/parsing-in-python/
3: https://sqlite.org/lang.html
azisk1|5 years ago
kaizoku111|5 years ago
As a heavy SQL user, I don't see much benefit in this as of yet. It is very misleading in saying it is the "most human readable" format, when the example shows the "format" to be identical to the original. Just by upper-casing keywords doesn't make it any more readable to be honest.
Anyways a good attempt, hope you're not offended by critical feedbacks and hope they are useful for some ideas to improve the tool.
azisk1|5 years ago
timwis|5 years ago
azisk1|5 years ago
Noumenon72|5 years ago
karmakaze|5 years ago
I also normalize by removing optional/redundant keyword noise, I'm looking at you INNER/OUTER.
azisk1|5 years ago
wnoise|5 years ago
ingvul|5 years ago
unknown|5 years ago
[deleted]
phpisatrash|5 years ago
newintellectual|5 years ago
[deleted]