top | item 25550280

Show HN: Readsql – convert SQL to most human readable format

43 points| azisk1 | 5 years ago |github.com

29 comments

order

simonbarker87|5 years ago

Not sure I get the point of this? Perhaps I’m being dense? The hard part about SQL isn’t picking out the keywords in the select/insert statement, it’s understanding the DB relations, its structure and constraints.

If you want to make SQL easier to understand then take a look at a 15 year old stored procedure that’s been hacked at by a dozen devs is over 1,000 lines long, has sketchy rollback error protection, uses two CTEs, a pivot, no temp tables and does some xml shenanigans in the middle (I’m being hyperbolic obviously)

This feels like trying to loose weight by trimming your toe nails, you’re technically lighter but not so as it matters.

azisk1|5 years ago

I understand the wonder, this is not some kind of magic that will help us solve all SQL issues. That's way too hard. This is a development tool that helps write and read SQL code. So simply saves development time by automating the easy stuff, leaving the hard and important stuff for us to deal with. It is different from other formatters that it lints code inside Python strings. For some context, I am a data engineer and I write a lot of SQL code inside Python code (in the future it should support other programming languages as well).

However, the tool might suggest improvements on SQL code in the future

zbuf|5 years ago

Leaving aside the modest achievements of the code so far, the missing prerequisite to "automatically" formatting SQL is... just formatting SQL.

Because SQL (broadly) was designed to be "human" readable in the first place, it's grammatically very inconsistent and with a lot of keywords. Much more than other languages in use today such as C.

I've yet to find a pattern of indentation, brackets etc. that satisfies my OCD.

Coming up with an example of a nicely formatted SQL statement is not difficult, but turning that into consistent 'rules' and immediately you find counterexamples using other parts of SQL.

techbio|5 years ago

I make an effort to indent on most keywords, using a Python-like structure to indent related lines to the same level, one more than the keyword operating on them all, with the heuristic that if I can comment one or more lines to debug, I have a readable query.

I tend to classify SQL statements into two kinds, those that when wrapped in a calling function fit in one screen, and those other longer ones that I'm inclined to write in an imperative language.

Edit: For the author of the repository, the list of reserved words gets longer and more complex when you support different implementations of SQL, and regex may be insufficient once you consider such parsing questions as whether the keyword is within quotation marks or part of a user-defined name.

https://www.drupal.org/docs/develop/coding-standards/list-of...

https://github.com/AzisK/readsql/blob/master/readsql/regexes...

rgovostes|5 years ago

You could take this idea further by fully parsing queries according to the grammar of the SQL language, rather than using simple pattern matching.

In fact it is a result of theoretical computer science that you _cannot_ correctly parse languages like SQL, HTML, Python, etc. with regular expressions: Any attempt to do anything non-trivial will have cases where it misunderstands the code.

So you would want to find a SQL grammar (an outdated example in [1]) and a module[2] that can use this to parse queries into a data structure to which you can apply transformations (e.g., changing case of keyword tokens) and then write back out as a string.

SQLite's documentation has some nice diagrams[3] to get an idea for how it parses a query string. The table of links at the top lets you dive into, e.g., all the optional parts of a SELECT statement.

1: https://ronsavage.github.io/SQL/sql-92.bnf.html

2: https://tomassetti.me/parsing-in-python/

3: https://sqlite.org/lang.html

azisk1|5 years ago

Thank you for the feedback, it will be useful. This is something I had in mind and I believe this would be even more powerful but I started with this minimalistic approach. It might grow into something like this later

kaizoku111|5 years ago

Was expecting something like DESCRIBE but for less technical audience.

As a heavy SQL user, I don't see much benefit in this as of yet. It is very misleading in saying it is the "most human readable" format, when the example shows the "format" to be identical to the original. Just by upper-casing keywords doesn't make it any more readable to be honest.

Anyways a good attempt, hope you're not offended by critical feedbacks and hope they are useful for some ideas to improve the tool.

azisk1|5 years ago

Thank you for the feedback. The example in Github has code highlighting. I wonder if I should remove that and if that would make more difference. I would be interested to know how this tool could help you with your SQL usage

timwis|5 years ago

Ah, I was hoping it would do multi-line and indentation formatting. I find that helps with readability a lot in sql.

azisk1|5 years ago

That is coming soon!

Noumenon72|5 years ago

Does it do anything but uppercase keywords? Is it going to add line breaks or indentation at some point?

karmakaze|5 years ago

Adding line breaks before FROM, JOIN, WHERE, GROUP BY, ORDER BY, LIMIT/OFFSET (and usually before some of the ANDs) is the first thing I do when analyzing some SQL. Capitalization is rarely needed after that and useful for longer term readability.

I also normalize by removing optional/redundant keyword noise, I'm looking at you INNER/OUTER.

azisk1|5 years ago

For now it only upper-cases but I would like it to "prettify" the code as well. So it is an upcoming feature. Hopefully it will even suggest SQL code improvements in the future. Contributions are welcome

ingvul|5 years ago

It's a SQL beautifier.