- The user has a fixed relational database
- The user wants to estimate the execution time / cardinality / cost of a query
- The SQL2Circuits framework estimates the metrics with quantum circuits with the following pipeline
- The SQL query is parsed into an abstract syntax tree which is represented with a context-free grammar diagram
- The context-free grammar diagram is mapped functorially to a pregroup grammar diagram
- The pregroup grammar diagram is mapped functorially to a parametrized quantum circuit
- The system utilizes various classical optimization methods to tune the parameters of the quantum circuit so that the measurement result of the circuit corresponds to a classification which estimates the wanted database metric
- The user and the database can use the estimated metric to optimize the query further
The core idea of this implementation is influenced by Quantum natural language processing. The implementation roughly follows the pipeline described in Lambeq documentation. There is a lot more related work on quantum natural language processing.
- SQL parser (based on SQLite syntax since it was easiest to make work in ANTRL4)
- Mapping the abstract syntax trees into context-free grammar diagrams which are represented as string diagrams (contribution of this work)
- Mapping the CFG diagrams functorially to pregroup grammars diagrams (DisCoPy) (mapping is contribution of this work and defined in pregroup_functor_data.json file)
- Functorially rewriting pregroup diagrams to remove the cups and thus reduce the required number of qubits and post-processing in the final circuit (Snake removal example in DisCoPy)
- Translating cupless pregroup diagrams into quantum circuits using lambeq and IQPansatz
- Optimize the circuits to make predictions about SQL queries with SPSA and Adam optimizers
- Download and create the IMDB database as described in Join Order Benchmark.
- Clone this repository and install the requirements. Note the required versions of the packages.
- Run the
main.py
file with the desired parameters. The possible parameter values are described in thesql2circuits_config.json
file. - Depending on the selected parameters, the following quantum machine learning training pipeline will be executed:
- The training, validation and test queries are generated based on the query seed file provided.
- The queries are executed on PostgreSQL database and depending on the initial configuration, either the execution time, cardinality or cost is measured.
- The SQL queries are parsed into abstract syntax trees and mapped into parametrized quantum circuits.
- The circuits are optimized with the selected classical algorithm. The optimization is performed iteratively: we first optimize a batch of circuits and then add more depending on the parameters we defined initially.
- The results are saved in the
results
folder.