Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lift + shift for cross-db macros #5298

Merged
merged 19 commits into from
Jun 16, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .changes/unreleased/Features-20220518-114604.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
kind: Features
body: Move cross-db macros from dbt-utils into dbt-core global project
time: 2022-05-18T11:46:04.557104+02:00
custom:
Author: jtcohen6 dbeatty10
Issue: "4813"
PR: "5265"
9 changes: 9 additions & 0 deletions core/dbt/include/global_project/macros/utils/any_value.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{% macro any_value(expression) -%}
{{ return(adapter.dispatch('any_value', 'dbt') (expression)) }}
{% endmacro %}

{% macro default__any_value(expression) -%}

any_value({{ expression }})

{%- endmacro %}
9 changes: 9 additions & 0 deletions core/dbt/include/global_project/macros/utils/bool_or.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{% macro bool_or(expression) -%}
{{ return(adapter.dispatch('bool_or', 'dbt') (expression)) }}
{% endmacro %}

{% macro default__bool_or(expression) -%}

bool_or({{ expression }})

{%- endmacro %}
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{% macro cast_bool_to_text(field) %}
{{ adapter.dispatch('cast_bool_to_text', 'dbt') (field) }}
{% endmacro %}

{% macro default__cast_bool_to_text(field) %}
cast({{ field }} as {{ api.Column.translate_type('string') }})
{% endmacro %}
Comment on lines +1 to +7
Copy link
Contributor

@jtcohen6 jtcohen6 Jun 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever work in 1001e42! That brought this macro to my attention, along with a few of the others.

This one feels pretty close to the type_* macro behavior, and like it really just exists to handle a wonky behavior around cast() on Redshift. But I don't think we need to split hairs, beyond the extent to which I already have :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I think I just took the type_* macro behavior based on api.Column.translate_type() from your dbt-utils branch.

If/when we push a version of the the type_* macro into core, we have the option of replacing these references with that macro instead.

7 changes: 7 additions & 0 deletions core/dbt/include/global_project/macros/utils/concat.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{% macro concat(fields) -%}
{{ return(adapter.dispatch('concat', 'dbt')(fields)) }}
{%- endmacro %}

{% macro default__concat(fields) -%}
{{ fields|join(' || ') }}
{%- endmacro %}
7 changes: 7 additions & 0 deletions core/dbt/include/global_project/macros/utils/date_trunc.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{% macro date_trunc(datepart, date) -%}
{{ return(adapter.dispatch('date_trunc', 'dbt') (datepart, date)) }}
{%- endmacro %}

{% macro default__date_trunc(datepart, date) -%}
date_trunc('{{datepart}}', {{date}})
{%- endmacro %}
14 changes: 14 additions & 0 deletions core/dbt/include/global_project/macros/utils/dateadd.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{% macro dateadd(datepart, interval, from_date_or_timestamp) %}
{{ return(adapter.dispatch('dateadd', 'dbt')(datepart, interval, from_date_or_timestamp)) }}
{% endmacro %}


{% macro default__dateadd(datepart, interval, from_date_or_timestamp) %}

dateadd(
{{ datepart }},
{{ interval }},
{{ from_date_or_timestamp }}
)

{% endmacro %}
14 changes: 14 additions & 0 deletions core/dbt/include/global_project/macros/utils/datediff.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{% macro datediff(first_date, second_date, datepart) %}
{{ return(adapter.dispatch('datediff', 'dbt')(first_date, second_date, datepart)) }}
{% endmacro %}


{% macro default__datediff(first_date, second_date, datepart) -%}

datediff(
{{ datepart }},
{{ first_date }},
{{ second_date }}
)

{%- endmacro %}
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{% macro escape_single_quotes(expression) %}
{{ return(adapter.dispatch('escape_single_quotes', 'dbt') (expression)) }}
{% endmacro %}

{# /*Default to replacing a single apostrophe with two apostrophes: they're -> they''re*/ #}
{% macro default__escape_single_quotes(expression) -%}
{{ expression | replace("'","''") }}
{%- endmacro %}
9 changes: 9 additions & 0 deletions core/dbt/include/global_project/macros/utils/except.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{% macro except() %}
{{ return(adapter.dispatch('except', 'dbt')()) }}
{% endmacro %}

{% macro default__except() %}

except

{% endmacro %}
7 changes: 7 additions & 0 deletions core/dbt/include/global_project/macros/utils/hash.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{% macro hash(field) -%}
{{ return(adapter.dispatch('hash', 'dbt') (field)) }}
{%- endmacro %}

{% macro default__hash(field) -%}
md5(cast({{ field }} as {{ api.Column.translate_type('string') }}))
{%- endmacro %}
9 changes: 9 additions & 0 deletions core/dbt/include/global_project/macros/utils/intersect.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{% macro intersect() %}
{{ return(adapter.dispatch('intersect', 'dbt')()) }}
{% endmacro %}

{% macro default__intersect() %}

intersect

{% endmacro %}
15 changes: 15 additions & 0 deletions core/dbt/include/global_project/macros/utils/last_day.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{% macro last_day(date, datepart) %}
{{ return(adapter.dispatch('last_day', 'dbt') (date, datepart)) }}
{% endmacro %}

{%- macro default_last_day(date, datepart) -%}
cast(
{{dbt.dateadd('day', '-1',
dbt.dateadd(datepart, '1', dbt.date_trunc(datepart, date))
)}}
as date)
{%- endmacro -%}

{% macro default__last_day(date, datepart) -%}
{{dbt.default_last_day(date, datepart)}}
{%- endmacro %}
11 changes: 11 additions & 0 deletions core/dbt/include/global_project/macros/utils/length.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{% macro length(expression) -%}
{{ return(adapter.dispatch('length', 'dbt') (expression)) }}
{% endmacro %}

{% macro default__length(expression) %}

length(
{{ expression }}
)

{%- endmacro -%}
30 changes: 30 additions & 0 deletions core/dbt/include/global_project/macros/utils/listagg.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
{% macro listagg(measure, delimiter_text="','", order_by_clause=none, limit_num=none) -%}
Copy link
Contributor

@jtcohen6 jtcohen6 Jun 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my work on dbt-labs/dbt-spark#359, I've realized that order_by_clause is really not supported on the SparkSQL analogue for listagg. I don't think that's a reason to exclude it from the standard / default implementation, since so many other SQL dialects support it so handily, but:

  • We might be able to support it, via some array sorting, if it were expressed as order_by=<column_name>, instead of expecting a SQL order by column_name expression
  • I'm curious what you think about the approach I took here — basically, raise a warning if the argument is passed, then ignore it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach you took with ignoring the order_by_clause parameter and raising a warning is completely reasonable.

Definitely worth a look if we can replace a user-supplied value like order by column_name expression with order_by=<column_name> instead. I'll take a peek and report back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think we should defer this work to a separate PR (if we do in fact take it on).

Before taking it on, here are the main things for us to answer:

  1. Do all databases operate correctly with an expression like order_by_clause="order by column_name"?
  2. Would it be a breaking change? If so, is there any path forward that is non-breaking?

Findings for question 1

Summary: Yes, it looks like they all support order_by_clause="order by column_name"

Details:

  • Here are current implementations across databases.
    • Looks like our supported adapters each needs a different implementation, possibly because it was a relative late-comer first appearing in SQL:2016. (Fascinating discussion on listagg here)
  • The pytests here include expressions like "order by order_col" (exactly as you expected)
  • The pytests use positional arguments exclusively and don't currently cover named keyword arguments.
    • 🟢 Covering the positional arguments is most crucial since messing with order of those can affect consumers relying on a particular order.
    • 🟡 We might want to add keyword testing for good measure though.

Findings for question 2

It would be a breaking change if we drop order_by_clause="order by column_name" and replace it with something like order_by="column_name".

Options:

  1. Accept the breaking change
  2. Reject the breaking change
  3. Introduce the new parameter and deprecate the old parameter in a non-breaking fashion.

The last option would look something like this:

  • New signature:{% macro listagg(measure, delimiter_text="','", order_by_clause=none, limit_num=none, order_by=none) -%}
    • Doing it this way prevents unexpected errors for any consumers relying upon positional arguments
  • Raise an error (or warn) if both order_by_clause and order_by are provided
  • When only order_by_clause is provided:
    • This is the "base case", and all the current code is relevant
    • Provide a deprecation warning that this parameter will be removed in a future release.
  • When only order_by_clause is provided:
    • For non-spark adapters, convert order_by into the relevant order_by_clause value. Proceed to the code in the base case above.
    • For spark, use order_by according to your plans 😈

When doing the final deprecation in the future, we'd probably want to leave order_by_clause=none as a non-functional vestigial parameter to preserve the positions of each keyword parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbeatty10 Thanks for going deep on this—much deeper than I did!

I did some more poking, and the amount of struct + array_sort nonsense that would be required even just to get order_by=column_name working is nontrivial. More effort than it's worth until we have a proven need. It's an optional argument, and feels like a nice-to-have.

I'm quite happy to leave this in place, and document / warn that order_by_clause is not supported on SparkSQL. It does seem to be a (recent) standard, and most other databases support it in exactly this form (order_by_clause). Perhaps we could convince the fine folks developing Apache Spark to add support for it, someday? :)

{{ return(adapter.dispatch('listagg', 'dbt') (measure, delimiter_text, order_by_clause, limit_num)) }}
{%- endmacro %}

{% macro default__listagg(measure, delimiter_text, order_by_clause, limit_num) -%}

{% if limit_num -%}
array_to_string(
array_slice(
array_agg(
{{ measure }}
){% if order_by_clause -%}
within group ({{ order_by_clause }})
{%- endif %}
,0
,{{ limit_num }}
),
{{ delimiter_text }}
)
{%- else %}
listagg(
{{ measure }},
{{ delimiter_text }}
)
{% if order_by_clause -%}
within group ({{ order_by_clause }})
{%- endif %}
{%- endif %}

{%- endmacro %}
7 changes: 7 additions & 0 deletions core/dbt/include/global_project/macros/utils/literal.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{%- macro string_literal(value) -%}
{{ return(adapter.dispatch('string_literal', 'dbt') (value)) }}
{%- endmacro -%}

{% macro default__string_literal(value) -%}
'{{ value }}'
{%- endmacro %}
11 changes: 11 additions & 0 deletions core/dbt/include/global_project/macros/utils/position.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{% macro position(substring_text, string_text) -%}
{{ return(adapter.dispatch('position', 'dbt') (substring_text, string_text)) }}
{% endmacro %}

{% macro default__position(substring_text, string_text) %}

position(
{{ substring_text }} in {{ string_text }}
)

{%- endmacro -%}
14 changes: 14 additions & 0 deletions core/dbt/include/global_project/macros/utils/replace.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{% macro replace(field, old_chars, new_chars) -%}
{{ return(adapter.dispatch('replace', 'dbt') (field, old_chars, new_chars)) }}
{% endmacro %}

{% macro default__replace(field, old_chars, new_chars) %}

replace(
{{ field }},
{{ old_chars }},
{{ new_chars }}
)


{% endmacro %}
12 changes: 12 additions & 0 deletions core/dbt/include/global_project/macros/utils/right.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{% macro right(string_text, length_expression) -%}
{{ return(adapter.dispatch('right', 'dbt') (string_text, length_expression)) }}
{% endmacro %}

{% macro default__right(string_text, length_expression) %}

right(
{{ string_text }},
{{ length_expression }}
)

{%- endmacro -%}
9 changes: 9 additions & 0 deletions core/dbt/include/global_project/macros/utils/safe_cast.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{% macro safe_cast(field, type) %}
{{ return(adapter.dispatch('safe_cast', 'dbt') (field, type)) }}
{% endmacro %}

{% macro default__safe_cast(field, type) %}
{# most databases don't support this function yet
so we just need to use cast #}
cast({{field}} as {{type}})
{% endmacro %}
26 changes: 26 additions & 0 deletions core/dbt/include/global_project/macros/utils/split_part.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{% macro split_part(string_text, delimiter_text, part_number) %}
{{ return(adapter.dispatch('split_part', 'dbt') (string_text, delimiter_text, part_number)) }}
{% endmacro %}

{% macro default__split_part(string_text, delimiter_text, part_number) %}

split_part(
{{ string_text }},
{{ delimiter_text }},
{{ part_number }}
)

{% endmacro %}

{% macro _split_part_negative(string_text, delimiter_text, part_number) %}

split_part(
{{ string_text }},
{{ delimiter_text }},
length({{ string_text }})
- length(
replace({{ string_text }}, {{ delimiter_text }}, '')
) + 2 {{ part_number }}
)

{% endmacro %}
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{#- /*Postgres doesn't support any_value, so we're using min() to get the same result*/ -#}

{% macro postgres__any_value(expression) -%}

min({{ expression }})

{%- endmacro %}
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{% macro postgres__dateadd(datepart, interval, from_date_or_timestamp) %}

{{ from_date_or_timestamp }} + ((interval '1 {{ datepart }}') * ({{ interval }}))

{% endmacro %}
32 changes: 32 additions & 0 deletions plugins/postgres/dbt/include/postgres/macros/utils/datediff.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{% macro postgres__datediff(first_date, second_date, datepart) -%}

{% if datepart == 'year' %}
(date_part('year', ({{second_date}})::date) - date_part('year', ({{first_date}})::date))
{% elif datepart == 'quarter' %}
({{ datediff(first_date, second_date, 'year') }} * 4 + date_part('quarter', ({{second_date}})::date) - date_part('quarter', ({{first_date}})::date))
{% elif datepart == 'month' %}
({{ datediff(first_date, second_date, 'year') }} * 12 + date_part('month', ({{second_date}})::date) - date_part('month', ({{first_date}})::date))
{% elif datepart == 'day' %}
(({{second_date}})::date - ({{first_date}})::date)
{% elif datepart == 'week' %}
({{ datediff(first_date, second_date, 'day') }} / 7 + case
when date_part('dow', ({{first_date}})::timestamp) <= date_part('dow', ({{second_date}})::timestamp) then
case when {{first_date}} <= {{second_date}} then 0 else -1 end
else
case when {{first_date}} <= {{second_date}} then 1 else 0 end
end)
{% elif datepart == 'hour' %}
({{ datediff(first_date, second_date, 'day') }} * 24 + date_part('hour', ({{second_date}})::timestamp) - date_part('hour', ({{first_date}})::timestamp))
{% elif datepart == 'minute' %}
({{ datediff(first_date, second_date, 'hour') }} * 60 + date_part('minute', ({{second_date}})::timestamp) - date_part('minute', ({{first_date}})::timestamp))
{% elif datepart == 'second' %}
({{ datediff(first_date, second_date, 'minute') }} * 60 + floor(date_part('second', ({{second_date}})::timestamp)) - floor(date_part('second', ({{first_date}})::timestamp)))
{% elif datepart == 'millisecond' %}
({{ datediff(first_date, second_date, 'minute') }} * 60000 + floor(date_part('millisecond', ({{second_date}})::timestamp)) - floor(date_part('millisecond', ({{first_date}})::timestamp)))
{% elif datepart == 'microsecond' %}
({{ datediff(first_date, second_date, 'minute') }} * 60000000 + floor(date_part('microsecond', ({{second_date}})::timestamp)) - floor(date_part('microsecond', ({{first_date}})::timestamp)))
{% else %}
{{ exceptions.raise_compiler_error("Unsupported datepart for macro datediff in postgres: {!r}".format(datepart)) }}
{% endif %}

{%- endmacro %}
14 changes: 14 additions & 0 deletions plugins/postgres/dbt/include/postgres/macros/utils/last_day.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{% macro postgres__last_day(date, datepart) -%}

{%- if datepart == 'quarter' -%}
-- postgres dateadd does not support quarter interval.
cast(
{{dbt.dateadd('day', '-1',
dbt.dateadd('month', '3', dbt.date_trunc(datepart, date))
)}}
as date)
{%- else -%}
{{dbt.default_last_day(date, datepart)}}
{%- endif -%}

{%- endmacro %}
23 changes: 23 additions & 0 deletions plugins/postgres/dbt/include/postgres/macros/utils/listagg.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{% macro postgres__listagg(measure, delimiter_text, order_by_clause, limit_num) -%}

{% if limit_num -%}
array_to_string(
(array_agg(
{{ measure }}
{% if order_by_clause -%}
{{ order_by_clause }}
{%- endif %}
))[1:{{ limit_num }}],
{{ delimiter_text }}
)
{%- else %}
string_agg(
{{ measure }},
{{ delimiter_text }}
{% if order_by_clause -%}
{{ order_by_clause }}
{%- endif %}
)
{%- endif %}

{%- endmacro %}
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{% macro postgres__split_part(string_text, delimiter_text, part_number) %}

{% if part_number >= 0 %}
{{ dbt.default__split_part(string_text, delimiter_text, part_number) }}
{% else %}
{{ dbt._split_part_negative(string_text, delimiter_text, part_number) }}
{% endif %}

{% endmacro %}
33 changes: 33 additions & 0 deletions tests/adapter/dbt/tests/adapter/utils/base_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
import pytest
from dbt.tests.util import run_dbt

macros__test_assert_equal_sql = """
{% test assert_equal(model, actual, expected) %}
select * from {{ model }} where {{ actual }} != {{ expected }}

{% endtest %}
"""


class BaseUtils:
# setup
@pytest.fixture(scope="class")
def macros(self):
return {"test_assert_equal.sql": macros__test_assert_equal_sql}

# make it possible to dynamically update the macro call with a namespace
# (e.g.) 'dateadd', 'dbt.dateadd', 'dbt_utils.dateadd'
def macro_namespace(self):
return ""

def interpolate_macro_namespace(self, model_sql, macro_name):
macro_namespace = self.macro_namespace()
return (
model_sql.replace(f"{macro_name}(", f"{macro_namespace}.{macro_name}(")
if macro_namespace
else model_sql
)

# actual test sequence
def test_build_assert_equal(self, project):
run_dbt(["build"]) # seed, model, test
Loading