(no title)
ofiryanai | 4 years ago
I'm trying to write sort of a SQL compiler. The current goal is to analyze queries and find similarities, later maybe to translate between sql dialects. I found Uber's QueryParser[1] but it's in haskell, so I started wrapping the python sqlparse[2] library and implement a Visitor to traverse their weird AST. 1. How close is it to implementing a compiler? 2. Is there theory you can suggest further reading for that matter? 3. Would you use a different language/library then I picked?
Thanks :)
[1] https://github.com/uber/queryparser [2] https://github.com/andialbrecht/sqlparse
refneb|4 years ago
[1] https://www.antlr.org/
[2] https://github.com/antlr/grammars-v4
eatonphil|4 years ago
In JavaScript you've got: Alasql's [3] (this is a large file) and here's another SQLite parser [4].
In Java there's JSQLParser [5] and I think you can access Presto's parser too [6].
[0] https://github.com/dolthub/go-mysql-server
[1] https://github.com/blastrain/vitess-sqlparser
[2] https://github.com/eatonphil/gosql
[3] https://github.com/agershun/alasql/blob/develop/src/alasqlpa...
[4] https://github.com/codeschool/sqlite-parser
[5] https://github.com/JSQLParser/JSqlParser
[6] https://github.com/prestodb/presto
codr7|4 years ago
https://github.com/codr7/swifties
ofiryanai|4 years ago
brinox|4 years ago
[1] https://github.com/google/zetasql
setheron|4 years ago
You can contribute to the project.
unknown|4 years ago
[deleted]
dotcommand|4 years ago
Then you should focus on the optimization phase. Where indexes, table statistics, etc are involved. The parsing and even translating sql from one dialect to another wouldn't give much for you to analyze.
I'm assuming you want to learn about query performance and execution plans?
Frankly, if you want to analyze queries, just install the rdbms, set up a data warehouse and use the tools available to analyze query plans/etc. There are so much that affects queries beyond the actual sql since the hardware and data load affects the queries. You could run the same sql query and end up with wildly different query/execution plans depending on how the data/tables/indexes are changed on your system.