Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deparse #66

Draft
wants to merge 2 commits into
base: 15-latest
Choose a base branch
from
Draft

Deparse #66

wants to merge 2 commits into from

Conversation

gregnr
Copy link
Contributor

@gregnr gregnr commented May 28, 2024

Adds ability to deparse AST back into a SQL string.

We might not need this PR

After starting work on this, I realized there was this PR that already implements this. But since I took a different approach which may reduce bundle size, I thought I'd still open a PR. Might be worth picking some pieces from this PR if there's value.

Background

Under the hood libpg_query supports both parsing and deparsing. It supports parsing SQL into JSON or protobuf, but it only supports deparsing from protobuf, not JSON. libpg-query-node uses the JSON approach for parsing, so this makes deparsing tricky. We need a way to first convert the AST to a protobuf, then pass into libpg_query's deparse function.

What this approach does differently

This approach does the JSON -> protobuf conversion in C++ vs JS. It uses the protobuf C++ lib to convert the JSON AST to a profobuf directly in C++ before passing to deparse.

libpg_query actually comes with a protobuf generated C++ file, so no pre-processing step was required to compile the .proto file to another language.

Affect on output size

queryparser.node went from ~2.8Mb to ~5.9Mb 7.7Mb (static linking, macOS). This is more than double the size which is not ideal, but maybe still smaller than using a protobuf generated JS file. For my own use, the bigger concern would be the WASM file size increase, which I haven't got building yet (see below).

If we want the ability to deparse though, I'm not sure there is any way around bringing in a big protobuf generated file, apart from manually implementing a JSON deparsing function and upstreaming to libpg_query (probably a lot of work?).

Recommendation

Offer 2 variants - one with deparse included and one without so that those who don't care about deparsing can benefit from a smaller bundle size. This probably only applies to WASM. Something like:

// Import full package that includes deparse
import { parseQuery, deparseQuery } from 'libpg-query';

// Import lite package that does not include deparse and uses a smaller WASM file
import { parseQuery } from 'libpg-query/lite';

Any downsides?

The biggest challenge is the libprotobuf C++ dependency. People will need this lib available on their machines in order for these bindings to compile. We could clone the protobuf repo and build it ourselves via a script (like we do with libpg_query), but this is probably more complicated since it uses build tools like bazel. We could try to rely on prebuilt versions of libprotobuf targeting different platforms, but I haven't found a pre-built emscripten version of protobuf yet, so that's a blocker for WASM support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant