top | item 28232180

(no title)

ofiryanai | 4 years ago

Hi all, I have a bit off topic question but seems related.

I'm trying to write sort of a SQL compiler. The current goal is to analyze queries and find similarities, later maybe to translate between sql dialects. I found Uber's QueryParser[1] but it's in haskell, so I started wrapping the python sqlparse[2] library and implement a Visitor to traverse their weird AST. 1. How close is it to implementing a compiler? 2. Is there theory you can suggest further reading for that matter? 3. Would you use a different language/library then I picked?

Thanks :)

[1] https://github.com/uber/queryparser [2] https://github.com/andialbrecht/sqlparse

discuss

order

eatonphil|4 years ago

In Go you've got: go-mysql-server [0], vitesse's parser [1], and one I wrote [2].

In JavaScript you've got: Alasql's [3] (this is a large file) and here's another SQLite parser [4].

In Java there's JSQLParser [5] and I think you can access Presto's parser too [6].

[0] https://github.com/dolthub/go-mysql-server

[1] https://github.com/blastrain/vitess-sqlparser

[2] https://github.com/eatonphil/gosql

[3] https://github.com/agershun/alasql/blob/develop/src/alasqlpa...

[4] https://github.com/codeschool/sqlite-parser

[5] https://github.com/JSQLParser/JSqlParser

[6] https://github.com/prestodb/presto

codr7|4 years ago

Depends on the complexity of your queries, but if you have a narrow subset that you're interested in, implementing a recursive descent parser for just those parts of the syntax that fits your problem like a glove could be a better solution.

https://github.com/codr7/swifties

ofiryanai|4 years ago

I'm aiming to analyze all the BI queries in my organization, some of them are quite complicated and most of them are hundreds lines of sql. Thanks for the direction, I'll dive into it

brinox|4 years ago

ZetaSQL[1] seems like it could be a fit for your use case. I've worked with Apache Calcite in the past and found it to be very complex to work with. I found ZetaSQL to be a little easier to use.

[1] https://github.com/google/zetasql

setheron|4 years ago

Lookup Apache Calcite that does most of what you describe.

You can contribute to the project.

dotcommand|4 years ago

> The current goal is to analyze queries

Then you should focus on the optimization phase. Where indexes, table statistics, etc are involved. The parsing and even translating sql from one dialect to another wouldn't give much for you to analyze.

I'm assuming you want to learn about query performance and execution plans?

Frankly, if you want to analyze queries, just install the rdbms, set up a data warehouse and use the tools available to analyze query plans/etc. There are so much that affects queries beyond the actual sql since the hardware and data load affects the queries. You could run the same sql query and end up with wildly different query/execution plans depending on how the data/tables/indexes are changed on your system.