Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery raw string support has regressed #1691

Closed
pmsanford opened this issue May 25, 2023 · 3 comments
Closed

BigQuery raw string support has regressed #1691

pmsanford opened this issue May 25, 2023 · 3 comments
Assignees

Comments

@pmsanford
Copy link
Contributor

pmsanford commented May 25, 2023

This issue is present on the current release and on main.

In the following:

>>> q = "SELECT r'\s'"
>>> print(sqlglot.parse_one(q, "bigquery").sql("bigquery"))
SELECT '\s'

SELECT r'\s' is not the same as SELECT '\s' - the r prefix indicates the following is a "raw string," mostly used in REGEXP_REPLACE/REGEXP_EXTRACT (the docs say it is also referred to as a "regex string")

It looks like this was addressed in #218, but it appears to have regressed in d2377e0

The test here in test_bigquery.py is now incorrect:

        self.validate_all(
            r'R"""/\*.*\*/"""',
            write={
                "bigquery": r"'/\*.*\*/'",
                "duckdb": r"'/\*.*\*/'",
                "presto": r"'/\*.*\*/'",
                "hive": r"'/\*.*\*/'",
                "spark": r"'/\*.*\*/'",
            },
        )

The query string R"""/\*.*\*/""" starts with R", so if the leading R is removed, all the backslashes in the resulting strings should be double escaped.

I'm working on wrapping my head around the tokenizer to see if I can fix this, but I thought I'd post an issue in the interim in case I'm called away from working on this before I can get to a PR.

Fully reproducible code snippet
assert sqlglot.parse_one(r"SELECT r'\s'", "bigquery").sql("bigquery") == r"SELECT '\\s'"

Official Documentation
https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical#quoted_literals

@georgesittas
Copy link
Collaborator

I can take a look shortly, thanks for the report.

@tobymao tobymao self-assigned this May 25, 2023
@pmsanford
Copy link
Contributor Author

Here's a possible fix:
pmsanford@653d6ab

It feels a little clunky to me though

@tobymao
Copy link
Owner

tobymao commented May 25, 2023

i'm working on something @pmsanford appreciate the attempt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants