Lift + shift for cross-db macros #5298

dbeatty10 · 2022-05-25T17:40:07Z

A step on the way toward resolving #4813. Continues #5265.

Description

Move dispatched + default__ macros into global project, macros/utils
Move Postgres-specific implementations into dbt-postgres
Move test cases into dbt-tests-adapter, with options to override the macro_namespace. This allows us to test for backwards compatibility when people call the dbt_utils-namespaced versions.

Conflict for `current_timestamp()`

Context

Both dbt-utils and dbt-core contained definitions for current_timestamp()
The text of the definitions is not identical (see here)
However the SQL result of the definitions appears to be equal (see here)

Options

We have some options for how to handle this:

Keep the existing definition within each database adapter and discard the dbt-utils definition
Overwrite the existing definition within each database adapter in favor of the dbt-utils definition
Keep the existing definition within each database adapter and re-institute the dbt-utils definition back into the macros/cross_db_utils folder.

Initial decision

I went with the following decision since dbt-core is more foundational:

Option 1: Keep the original definitions from dbt-core and discard the definitions from dbt-core

But all else being equal, I probably would have preferred Option 2. Risk aversion tilted the scales to Option 1.

Further options (per adapter)

If desired, any or all of the following could be done without any known issues:

Move definition of current_timestamp() from dbt/include/global_project/macros/adapters/freshness.sql to dbt/include/global_project/macros/utils/current_timestamp.sql (or equivalent location per adapter)
Replace the original definition in dbt-core with the original definition from dbt-utils

To do before merge

Review and affirm the default definition of current_timestamp() here
Review initial decision for handing current_timestamp() and change tack if desired
Determine how to handle tests decorated with @pytest.mark.skip within tests/adapter/dbt/tests/adapter/utils/

dbeatty10 · 2022-05-31T15:33:36Z

@jtcohen6 could you take a look at the "to do before merge" items above?

Everything has been been moved into dbt-core, and all tests are passing. Also affirmed that all the adapters are working when using this branch.

I made an initial decision how to handle the conflict with current_timestamp(), but we still have full optionality to change that decision. The tests in dbt-utils didn't see any functional differences between the competing definitions. My main worry is that that there is some edge case that our tests aren't exposing (like some particular database setting for handling timestamps).

Do you have a preference between options 1, 2, and 3 for handing the current_timestamp() conflict?

jtcohen6

@dbeatty10 Great work here!!

As discussed the other day, let's kick out the current_timestamp + type_* macros/tests for now. We have other issues to track that work: https://github.com/dbt-labs/dbt-utils/issues/599, dbt-labs/dbt-utils#598

The rest of this looks good to move forward. This is the first PR we'll need to merge, and then we can move forward with the rest.

.changes/unreleased/Features-20220518-114604.yaml

ChenyuLInx

There seems to be a lot of tests that didn't got implemented, do we have anything tracking those?

Otherwise this is mainly moving things in dbt-utils to core right?

jtcohen6 · 2022-06-03T12:14:33Z

There seems to be a lot of tests that didn't got implemented, do we have anything tracking those?

Yes! Let's kick those type_* macros + tests out of this PR. They'll be added in dbt-labs/dbt-utils#586 instead.

Same for current_timestamp macros + tests. We're tracking the work there in #5521.

Otherwise this is mainly moving things in dbt-utils to core right?

✅

Co-authored-by: Jeremy Cohen <[email protected]>

jtcohen6

Let's ship it! And then, let's ship the rest of it!

jtcohen6 · 2022-06-13T13:32:07Z

core/dbt/include/global_project/macros/utils/cast_bool_to_text.sql

+{% macro cast_bool_to_text(field) %}
+  {{ adapter.dispatch('cast_bool_to_text', 'dbt') (field) }}
+{% endmacro %}
+
+{% macro default__cast_bool_to_text(field) %}
+    cast({{ field }} as {{ api.Column.translate_type('string') }})
+{% endmacro %}


Clever work in 1001e42! That brought this macro to my attention, along with a few of the others.

This one feels pretty close to the type_* macro behavior, and like it really just exists to handle a wonky behavior around cast() on Redshift. But I don't think we need to split hairs, beyond the extent to which I already have :)

Yep, I think I just took the type_* macro behavior based on api.Column.translate_type() from your dbt-utils branch.

If/when we push a version of the the type_* macro into core, we have the option of replacing these references with that macro instead.

jtcohen6 · 2022-06-15T16:16:10Z

core/dbt/include/global_project/macros/utils/listagg.sql

@@ -0,0 +1,30 @@
+{% macro listagg(measure, delimiter_text="','", order_by_clause=none, limit_num=none) -%}


In my work on dbt-labs/dbt-spark#359, I've realized that order_by_clause is really not supported on the SparkSQL analogue for listagg. I don't think that's a reason to exclude it from the standard / default implementation, since so many other SQL dialects support it so handily, but:

We might be able to support it, via some array sorting, if it were expressed as order_by=<column_name>, instead of expecting a SQL order by column_name expression

I'm curious what you think about the approach I took here — basically, raise a warning if the argument is passed, then ignore it

The approach you took with ignoring the order_by_clause parameter and raising a warning is completely reasonable.

Definitely worth a look if we can replace a user-supplied value like order by column_name expression with order_by=<column_name> instead. I'll take a peek and report back.

Overall, I think we should defer this work to a separate PR (if we do in fact take it on).

Before taking it on, here are the main things for us to answer:

Do all databases operate correctly with an expression like order_by_clause="order by column_name"?

Would it be a breaking change? If so, is there any path forward that is non-breaking?

Findings for question 1

Summary: Yes, it looks like they all support order_by_clause="order by column_name"

Details:

Here are current implementations across databases.

Looks like our supported adapters each needs a different implementation, possibly because it was a relative late-comer first appearing in SQL:2016. (Fascinating discussion on listagg here)

The pytests here include expressions like "order by order_col" (exactly as you expected)

The pytests use positional arguments exclusively and don't currently cover named keyword arguments.

🟢 Covering the positional arguments is most crucial since messing with order of those can affect consumers relying on a particular order.

🟡 We might want to add keyword testing for good measure though.

Findings for question 2

It would be a breaking change if we drop order_by_clause="order by column_name" and replace it with something like order_by="column_name".

Options:

Accept the breaking change

Reject the breaking change

Introduce the new parameter and deprecate the old parameter in a non-breaking fashion.

The last option would look something like this:

New signature:{% macro listagg(measure, delimiter_text="','", order_by_clause=none, limit_num=none, order_by=none) -%}

Doing it this way prevents unexpected errors for any consumers relying upon positional arguments

Raise an error (or warn) if both order_by_clause and order_by are provided

When only order_by_clause is provided:

This is the "base case", and all the current code is relevant

Provide a deprecation warning that this parameter will be removed in a future release.

When only order_by_clause is provided:

For non-spark adapters, convert order_by into the relevant order_by_clause value. Proceed to the code in the base case above.

For spark, use order_by according to your plans 😈

When doing the final deprecation in the future, we'd probably want to leave order_by_clause=none as a non-functional vestigial parameter to preserve the positions of each keyword parameter.

@dbeatty10 Thanks for going deep on this—much deeper than I did!

I did some more poking, and the amount of struct + array_sort nonsense that would be required even just to get order_by=column_name working is nontrivial. More effort than it's worth until we have a proven need. It's an optional argument, and feels like a nice-to-have.

I'm quite happy to leave this in place, and document / warn that order_by_clause is not supported on SparkSQL. It does seem to be a (recent) standard, and most other databases support it in exactly this form (order_by_clause). Perhaps we could convince the fine folks developing Apache Spark to add support for it, someday? :)

jtcohen6 · 2022-06-16T18:07:48Z

🎉 🎉 🎉

jtcohen6 and others added 6 commits May 18, 2022 11:39

Initialize lift + shift, dateadd + datediff

ce88aba

Placeholder changelog for now

8952617

Lift and shift cross-database macros, fixtures, and tests from dbt-utils

66d59fe

Switch namespace from dbt_utils to dbt

154e38a

Remove unreferenced variable

7c64b11

Remove conflicting definition of current_timestamp()

38d26f1

cla-bot bot added the cla:yes label May 25, 2022

dbeatty10 added 4 commits May 26, 2022 09:19

Trim leading and trailing whitespace

e9aca6c

Apply Black formatting

4f1e614

Remove unused import

35c0906

Remove references to other profiles

c1cd547

dbeatty10 marked this pull request as ready for review May 31, 2022 15:33

dbeatty10 requested a review from a team May 31, 2022 15:33

dbeatty10 requested review from a team as code owners May 31, 2022 15:33

dbeatty10 requested review from stu-k, VersusFacit and jtcohen6 May 31, 2022 15:33

dbeatty10 mentioned this pull request May 31, 2022

Lift + shift for cross-db macros dbt-labs/dbt-utils#597

Merged

9 tasks

jtcohen6 mentioned this pull request Jun 2, 2022

Initialize lift + shift for cross-db macros #5265

Closed

6 tasks

jtcohen6 reviewed Jun 2, 2022

View reviewed changes

.changes/unreleased/Features-20220518-114604.yaml Outdated Show resolved Hide resolved

jtcohen6 added the Team:Adapters Issues designated for the adapter area of the code label Jun 2, 2022

ChenyuLInx reviewed Jun 2, 2022

View reviewed changes

jtcohen6 mentioned this pull request Jun 6, 2022

[CT-314] [Feature] Migrate cross-db functions from dbt-utils to definition in Core, implementation in adapters #4813

Closed

1 task

dbeatty10 and others added 4 commits June 6, 2022 18:48

Update .changes/unreleased/Features-20220518-114604.yaml

14e0802

Co-authored-by: Jeremy Cohen <[email protected]>

Kick out the type_* tests

bd4a996

Kick out the type_* macros

63db7e8

Kick out the current_timestamp tests

8d0629b

dbeatty10 added 4 commits June 6, 2022 19:35

Kick out the current_timestamp macros

2670833

Kick out the current_timestamp macros

4a2c49d

Kick out the type_* macros

583b7ee

Use built-in adapter functionality for converting string datatypes

1001e42

ChenyuLInx approved these changes Jun 7, 2022

View reviewed changes

jtcohen6 approved these changes Jun 13, 2022

View reviewed changes

jtcohen6 reviewed Jun 15, 2022

View reviewed changes

jtcohen6 mentioned this pull request Jun 16, 2022

Initialize lift + shift for cross-db macros dbt-labs/dbt-spark#359

Merged

4 tasks

Move comment above macro for postgres__any_value

bfa6448

jtcohen6 merged commit a02db03 into main Jun 16, 2022

jtcohen6 deleted the dbeatty/utils-lift-shift branch June 16, 2022 18:07

dbeatty10 mentioned this pull request Jun 25, 2022

Initialize documentation for v1.2 dbt-labs/docs.getdbt.com#1619

Merged

6 tasks

jtcohen6 mentioned this pull request Sep 21, 2022

[CT-1223] [Feature] consolidate listagg macros from packages and dbt core #5898

Closed

3 tasks

jlarue26 mentioned this pull request Oct 24, 2022

dbt-core 1.2 upgrade: Cross-database Macros dremio/dbt-dremio#41

Closed

jtcohen6 mentioned this pull request Nov 1, 2022

[CT-1457] Move dbt-postgres into its own repository #6189

Closed

dbeatty10 mentioned this pull request Jun 6, 2023

Fix not equals comparison to be null-safe for adapters/utils tests #7776

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lift + shift for cross-db macros #5298

Lift + shift for cross-db macros #5298

dbeatty10 commented May 25, 2022 •

edited

Loading

dbeatty10 commented May 31, 2022

jtcohen6 left a comment

ChenyuLInx left a comment

jtcohen6 commented Jun 3, 2022 •

edited

Loading

jtcohen6 left a comment

jtcohen6 Jun 13, 2022 •

edited

Loading

dbeatty10 Jun 13, 2022

jtcohen6 Jun 15, 2022 •

edited

Loading

dbeatty10 Jun 15, 2022

dbeatty10 Jun 16, 2022

jtcohen6 Jun 16, 2022

jtcohen6 commented Jun 16, 2022

		@@ -0,0 +1,30 @@
		{% macro listagg(measure, delimiter_text="','", order_by_clause=none, limit_num=none) -%}

Lift + shift for cross-db macros #5298

Lift + shift for cross-db macros #5298

Conversation

dbeatty10 commented May 25, 2022 • edited Loading

Description

Conflict for current_timestamp()

Context

Options

Initial decision

Further options (per adapter)

To do before merge

dbeatty10 commented May 31, 2022

jtcohen6 left a comment

Choose a reason for hiding this comment

ChenyuLInx left a comment

Choose a reason for hiding this comment

jtcohen6 commented Jun 3, 2022 • edited Loading

jtcohen6 left a comment

Choose a reason for hiding this comment

jtcohen6 Jun 13, 2022 • edited Loading

Choose a reason for hiding this comment

dbeatty10 Jun 13, 2022

Choose a reason for hiding this comment

jtcohen6 Jun 15, 2022 • edited Loading

Choose a reason for hiding this comment

dbeatty10 Jun 15, 2022

Choose a reason for hiding this comment

dbeatty10 Jun 16, 2022

Choose a reason for hiding this comment

Findings for question 1

Findings for question 2

jtcohen6 Jun 16, 2022

Choose a reason for hiding this comment

jtcohen6 commented Jun 16, 2022

dbeatty10 commented May 25, 2022 •

edited

Loading

Conflict for `current_timestamp()`

jtcohen6 commented Jun 3, 2022 •

edited

Loading

jtcohen6 Jun 13, 2022 •

edited

Loading

jtcohen6 Jun 15, 2022 •

edited

Loading