-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADAP-383] [Feature] Support Dynamic Data Masking in CTAS Statements #85
Comments
This has also been brought up in the slack community a few times, adding links for reference: |
@jtcohen6 I'm sorry if we already discussed this... but is that something on your radar for contracts? |
Thanks @jdoldis for the write-up, you make a great case for it :) |
Thanks @Fleid , I'm writing a custom materialization to support the syntax as described above. Let me know if you'd like those changes contributed here. Otherwise, I look forward to seeing it at some point down the road 🙂 |
I have the exact same requirement, but for row access policies ! I think this can be implemented in one go, as it share the same CTAS syntax CTAS:
Config with masking and row access policies could like this:
Currently the only option is via post_hook Since one has to check the information_schema first if that relation (view or table) already has said RAP applied, because otherwise the ALTER command fails, this whole process can take up to 10 seconds [Edit: that was mainly due to a |
@jdoldis Would you release your custom materialization as a package, until this gets implemented into dbt-snowflake? I'd be very interested as well! |
Hey @ingolevin, the materialisation I wrote is essentially a copy paste of the standard v1.5 table materialisation. The difference is I have modified the table_columns_and_constraints macro to support masking policies. In this modified macro I build the ddl by looping through the model columns and outputting the name/datatype, and then adding |
Since I raised this it seems the ddl logic has moved around a bit, the relevant code that could be modified in the Snowflake adapter to support masking policies would now be this function I think. |
Regarding creating a package, it would be good to hear back from @Fleid first. Ideally we could implement here, but if that's not possible I would be open to it 🙂 |
I'm overdue responding here! It's true that, starting in v1.5, for models with enforced contracts, dbt will be able to template out That's the prerequisite to defining row-level access policies & column-level masking policies while the table is being created, rather than via an I hadn't had row-level & column-level access/masking policies in scope for For the moment, it would be possible to stand this up via some macro overrides. (Maybe these policies could even be a constraint of |
Sounds great @jtcohen6 , let me know if I can help! |
I'm following up here after a good chat with @graciegoheen @dbeatty10 @dataders, given the prompt to support similar functionality in
I think the good implementation of this functionality would look like:
*These map to DWH objects, so they are members of the DAG, and models / other functions could call (= depend on) them. I don't think these are models because they aren't (1) is a bigger lift than (2), and it's not something we have the capacity to prioritize right now — but in the meantime, I've asked @dataders to do a bit more thinking about what a good UX might look like :) |
Databricks also allows data masking in CTA as well!
|
Moving this feature request to the dbt-adapters repo for further refinement since the underlying functionality is supported on many cloud data warehouses now (Redshift, Snowflake (Enterprise only), BigQuery, Databricks, Azure, etc.) |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
I still think this would be a great security improvement for DBT, would be good to keep the issue open. |
Today with contracts and constraints, we actually support tag-based masking policies with the follwing syntax
This requires defining all the columns and their types though. Would it work for your use case? |
That's great news, hadn't realised that, thanks @b-per ! I don't currently have all column types documented, but I can move to that 👍 |
@b-per This seems promising but is there any documentation on custom constraints? In particular docs on the expression "tag (my_tag = 'my_value')" |
Okay I figured it out. The expression is simply the SQL to represent the constraint on the column. |
@b-per That's interesting, I didn't know that was possible! I'll have to give it a shot. If only I had a good way to add row access policies to relations in Snowflake, too. @jdoldis You might find something like dbt-osmosis useful for quickly generating model docs. |
The reason we need a contract with all columns is that it is the only way where we can send a To generate a YAML with all the columns you could also use As there seems to be some interest from folks about the custom constraint "trick", I will reach out to the Docs team to see if we could add it to the docs. |
Also, I have not tried it, but I think that you could add directly a masking policy instead of using tags
|
Add better sample profile in case of dbt init
Is this your first time submitting a feature request?
Describe the feature
Currently you cannot specify column masking policies in Snowflake
CTAS
statements with dbt-snowflake. For example,CREATE TABLE <table_name>(<col_name> <col_type> WITH MASKING POLICY <policy_name>) AS SELECT <query>
.As a workaround masking policies can be applied to columns in a dbt post hook using an
ALTER TABLE
statement. The issue with doing this is that theCTAS
andALTER TABLE
statements cannot be issued in the same transaction, as per the Snowflake documentation - "Each DDL statement executes as a separate transaction". As a result there is there is a small window of time between theCTAS
andALTER TABLE <table_name> MODIFY COLUMN <column_name> SET MASKING POLICY <policy_name>
statements where the data is not masked, and if theALTER TABLE
statement fails it would remain that way.Supporting masking policy specification in the CTAS statement would fix this. As per the Snowflake documentation "Executing a CREATE TABLE … AS SELECT (CTAS) statement applies any masking policies on columns included in the statement before the data is populated in the new table".
It also may not be difficult to support this given the recent work on model contracts which provides the
CREATE TABLE <table_name>(<col_name> <col_type>) AS SELECT <query>
syntax. All that would need to be added is theWITH MASKING POLICY <policy_name>
part of the statement.One way to provide the config would be something like:
The masking policy could then be applied in get_columns_spec_ddl.
Describe alternatives you've considered
Using an
ALTER TABLE
statement in a post hook to apply the masking policy. As described above due to Snowflake DDL statements always being executed in separate transactions this leaves the possibility of unmasked data.Who will this benefit?
Anyone that wants to take advantage of dynamic data masking in Snowflake using dbt.
Are you interested in contributing this feature?
Yes
Anything else?
No response
The text was updated successfully, but these errors were encountered: