-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lift + shift for cross-db macros #5298
Conversation
@jtcohen6 could you take a look at the "to do before merge" items above? Everything has been been moved into dbt-core, and all tests are passing. Also affirmed that all the adapters are working when using this branch. I made an initial decision how to handle the conflict with Do you have a preference between options 1, 2, and 3 for handing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dbeatty10 Great work here!!
As discussed the other day, let's kick out the current_timestamp
+ type_*
macros/tests for now. We have other issues to track that work: https://github.com/dbt-labs/dbt-utils/issues/599, dbt-labs/dbt-utils#598
The rest of this looks good to move forward. This is the first PR we'll need to merge, and then we can move forward with the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a lot of tests that didn't got implemented, do we have anything tracking those?
Otherwise this is mainly moving things in dbt-utils to core right?
Yes! Let's kick those Same for
✅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's ship it! And then, let's ship the rest of it!
{% macro cast_bool_to_text(field) %} | ||
{{ adapter.dispatch('cast_bool_to_text', 'dbt') (field) }} | ||
{% endmacro %} | ||
|
||
{% macro default__cast_bool_to_text(field) %} | ||
cast({{ field }} as {{ api.Column.translate_type('string') }}) | ||
{% endmacro %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clever work in 1001e42! That brought this macro to my attention, along with a few of the others.
This one feels pretty close to the type_*
macro behavior, and like it really just exists to handle a wonky behavior around cast()
on Redshift. But I don't think we need to split hairs, beyond the extent to which I already have :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I think I just took the type_*
macro behavior based on api.Column.translate_type()
from your dbt-utils branch.
If/when we push a version of the the type_*
macro into core, we have the option of replacing these references with that macro instead.
@@ -0,0 +1,30 @@ | |||
{% macro listagg(measure, delimiter_text="','", order_by_clause=none, limit_num=none) -%} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my work on dbt-labs/dbt-spark#359, I've realized that order_by_clause
is really not supported on the SparkSQL analogue for listagg
. I don't think that's a reason to exclude it from the standard / default implementation, since so many other SQL dialects support it so handily, but:
- We might be able to support it, via some array sorting, if it were expressed as
order_by=<column_name>
, instead of expecting a SQLorder by column_name
expression - I'm curious what you think about the approach I took here — basically, raise a warning if the argument is passed, then ignore it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach you took with ignoring the order_by_clause
parameter and raising a warning is completely reasonable.
Definitely worth a look if we can replace a user-supplied value like order by column_name expression
with order_by=<column_name>
instead. I'll take a peek and report back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think we should defer this work to a separate PR (if we do in fact take it on).
Before taking it on, here are the main things for us to answer:
- Do all databases operate correctly with an expression like
order_by_clause="order by column_name"
? - Would it be a breaking change? If so, is there any path forward that is non-breaking?
Findings for question 1
Summary: Yes, it looks like they all support order_by_clause="order by column_name"
Details:
- Here are current implementations across databases.
- Looks like our supported adapters each needs a different implementation, possibly because it was a relative late-comer first appearing in SQL:2016. (Fascinating discussion on
listagg
here)
- Looks like our supported adapters each needs a different implementation, possibly because it was a relative late-comer first appearing in SQL:2016. (Fascinating discussion on
- The pytests here include expressions like
"order by order_col"
(exactly as you expected) - The pytests use positional arguments exclusively and don't currently cover named keyword arguments.
- 🟢 Covering the positional arguments is most crucial since messing with order of those can affect consumers relying on a particular order.
- 🟡 We might want to add keyword testing for good measure though.
Findings for question 2
It would be a breaking change if we drop order_by_clause="order by column_name"
and replace it with something like order_by="column_name"
.
Options:
- Accept the breaking change
- Reject the breaking change
- Introduce the new parameter and deprecate the old parameter in a non-breaking fashion.
The last option would look something like this:
- New signature:
{% macro listagg(measure, delimiter_text="','", order_by_clause=none, limit_num=none, order_by=none) -%}
- Doing it this way prevents unexpected errors for any consumers relying upon positional arguments
- Raise an error (or warn) if both
order_by_clause
andorder_by
are provided - When only
order_by_clause
is provided:- This is the "base case", and all the current code is relevant
- Provide a deprecation warning that this parameter will be removed in a future release.
- When only
order_by_clause
is provided:- For non-spark adapters, convert
order_by
into the relevantorder_by_clause
value. Proceed to the code in the base case above. - For spark, use
order_by
according to your plans 😈
- For non-spark adapters, convert
When doing the final deprecation in the future, we'd probably want to leave order_by_clause=none
as a non-functional vestigial parameter to preserve the positions of each keyword parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dbeatty10 Thanks for going deep on this—much deeper than I did!
I did some more poking, and the amount of struct
+ array_sort
nonsense that would be required even just to get order_by=column_name
working is nontrivial. More effort than it's worth until we have a proven need. It's an optional argument, and feels like a nice-to-have.
I'm quite happy to leave this in place, and document / warn that order_by_clause
is not supported on SparkSQL. It does seem to be a (recent) standard, and most other databases support it in exactly this form (order_by_clause
). Perhaps we could convince the fine folks developing Apache Spark to add support for it, someday? :)
🎉 🎉 🎉 |
A step on the way toward resolving #4813. Continues #5265.
Description
default__
macros into global project,macros/utils
dbt-postgres
dbt-tests-adapter
, with options to override themacro_namespace
. This allows us to test for backwards compatibility when people call thedbt_utils
-namespaced versions.Conflict for
current_timestamp()
Context
current_timestamp()
Options
We have some options for how to handle this:
Initial decision
I went with the following decision since dbt-core is more foundational:
But all else being equal, I probably would have preferred Option 2. Risk aversion tilted the scales to Option 1.
Further options (per adapter)
If desired, any or all of the following could be done without any known issues:
current_timestamp()
fromdbt/include/global_project/macros/adapters/freshness.sql
todbt/include/global_project/macros/utils/current_timestamp.sql
(or equivalent location per adapter)To do before merge
current_timestamp()
herecurrent_timestamp()
and change tack if desired@pytest.mark.skip
withintests/adapter/dbt/tests/adapter/utils/