Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for regex flags for BigQuery #253

Conversation

lookslikeitsnot
Copy link
Contributor

What does this PR do?

Add support for regex flags for BigQuery (#252)

Change description

Inline flags in non-capturing block

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Where has this been tested?

  • OS: Manjaro Linux 22.0.2
  • Python: 3.10.9
  • dbt: 1.4.5
  • dbt-expectations: 0.8.3
  • dbt-bigquery: 1.4.3

@clausherther clausherther added enhancement New feature or request and removed enhancement New feature or request labels Mar 29, 2023
@clausherther
Copy link
Contributor

Thanks! Will take a closer look tomorrow or Thursday.

@lookslikeitsnot
Copy link
Contributor Author

lookslikeitsnot commented Mar 29, 2023

Test failed for Snowflake

Database Error in test dbt_expectations_expect_column_values_to_match_regex_data_text_email_address__i___i_A_Z_ (models/schema_tests/schema.yml)
02:05:13    100048 (2201B): Invalid regular expression: '(?i)[A-Z]', no argument for repetition operator: ?

I should've read the Snowflake docs more carefully, sorry. Snowflake, since it's using POSIX ERE, does not support inline flags.
I'll make the test BigQuery only since its purpose is to check the in-lining of flag parameters for the BigQuery adapter.

@@ -18,7 +18,14 @@ models:
regex: "[A-Z]"
flags: i
config:
enabled: "{{ target.type in ['postgres', 'snowflake', 'redshift' ] }}"
enabled: "{{ target.type in ['postgres', 'snowflake', 'redshift', 'bigquery' ] }}"
Copy link
Contributor

@clausherther clausherther Mar 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This covers all of our supported platforms, I'm wondering if we should just remove the enabled config for this test then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now removed all occurrences of that enabled config

@@ -59,7 +66,7 @@ models:
regex_list: ["[A-G]", "[H-Z]"]
flags: i
config:
enabled: "{{ target.type in ['postgres', 'snowflake', 'redshift' ] }}"
enabled: "{{ target.type in ['postgres', 'snowflake', 'redshift', 'bigquery' ] }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment here re: removed config

{% if not is_match %}
{# Using raise_compiler_error causes disabled tests with invalid flags to fail compilation #}
{{ exceptions.warn(
"flags " ~ flags ~ " isn't a valid re2 flag pattern"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a user passes in an unsupported flag, like c on BigQuery, this will generate a compiler warning, but the test will ultimately fail (e.g. "Cannot parse regular expression: invalid perl operator: (?c"). Is that the intent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue if we raise a compiler error is that the tests don't compile anymore.
As you can see in the pipeline test, even though the test for the "c" flag is not enabled for BigQuery (since it isn't a valid re2 flag), we have a warning in the output.

02:04:16  flags c isn't a valid re2 flag pattern
02:04:16  flags c isn't a valid re2 flag pattern
02:04:16  flags c isn't a valid re2 flag pattern

Here's the output with raise_compiler_error instead of warn:

Encountered an error:
Compilation Error in test dbt_expectations_expect_column_values_to_match_regex_data_text_email_address__c___A_Z_ (models/schema_tests/schema.yml)
  flags c isn't a valid re2 flag pattern

Having a warning is nice since the test will compile properly but it can be forced to error with the --warn-error flag.
i.e.:
running dbt --warn-error build will fail build for invalid flags in test (even if the invalid-flag tests are disabled).
running dbt build will build for invalid flags in test (but display a warning message and ultimately fail)

However, I get that this might be considered a regression since it changes the default behavior to be more lax.
This gives us 2 options:

  1. Remove the "c" flag test
  2. Add support for the "c" flag in BigQuery by translating it to "-i". This improves adapter flexibility but might be unintuitive for the user since "c" is not supposed be a valid re2 flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed response. I think it's to leave as is, including the c flag tests with the defined targets. If we find that this creates usability issues for folks, we can address in a later PR (and maybe implement option 2). Thanks for the thorough explanation!

@clausherther clausherther self-requested a review April 2, 2023 15:24
@clausherther clausherther merged commit 73f2d00 into calogica:main Apr 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants