Experiment with functional testing #577

dbeatty10 · 2022-05-09T14:37:04Z

Continuing the work from #575.

This is a:

documentation update
bug fix with no breaking changes
new functionality
a breaking change

Checklist

I have verified that these changes work locally on the following warehouses (Note: it's okay if you do not have access to all warehouses, this helps us understand what has been covered)
- BigQuery* (caveat: dateadd test is skipped currently)
- Postgres
- Redshift
- Snowflake
I have added tests & descriptions to my models (and macros if applicable)
I have added an entry to CHANGELOG.md

Description & motivation

Followed the provided pattern for functional testing for all cross-database macros except as noted below.

Questions

Is there a way to export the dbt projects that are created by these tests (to enable hands-on troubleshooting)?
Is there a way to inspect the target folder created by each test?

Details

Was mostly able to follow the assert_equal actual/expected pattern for test comparisons via YAML tests. Exceptions noted below.

Created @pytest.mark.only_profile to support different implementations for escape_single_quotes. Is there a better way to enable different implementations for each adapter?

Tests that are not yet implemented are marked as @pytest.mark.skip_profile.

Other tests not yet implemented contain the text TODO somewhere within the code (so it can be grep'ed easily).

Included

any_value
bool_or
cast_bool_to_text
concat
date_trunc
dateadd
datediff
escape_single_quotes
except
hash
intersect
last_day
length
listagg
position
replace
right
safe_cast
split_part
string_literal

These were a little funky, complicated, or different:

cast_bool_to_text
string_literal
escape_single_quotes
except
intersect

Not included

We'll need to decide how we want to approach testing for these macros

type_bigint
type_float
type_int
type_numeric
type_string
type_boolean
type_timestamp
_is_ephemeral
_is_relation
current_timestamp_in_utc
current_timestamp

These are either not going to be moved, will be deprecated, or merely haven't been fully discussed yet:

identifier
width_bucket
get_table_types_sql

To do

Decide how to test type_*, _is_*, and current_timestamp* macros
Get dateadd test to work for BigQuery
Clean up implementations as-needed

Reflections

Postgres, Snowflake, BigQuery, and Redshift all allow selection of attributes without a from clause. But other databases do require a from clause (most notably Oracle's DUAL table).
- We could consider creating a cross-database from_dual macro that returns an empty string in the default case. This would allow us to create datasets on the fly in a cross-database manner using CTE for the purposes of testing.
It looks like the implementation of check_relations_equal might rely on the existence of EXCEPT within the database. Once we have the except macro moved from dbt-utils to dbt-core, we might want to consider dispatching EXCEPT instead (i.e., to use MINUS when applicable for databases like Oracle and MySQL).

Approaches for testing cross-db macros

It seems there are 3 types of uses for cross-db macros:

Produce a value for an attribute. This is most common. Examples: concat and date_trunc.
Produce a relation. This is relatively rare. Examples: intersect and except.
Produce a string literal. This is most rare. Examples: string_literal and cast_bool_to_text.

A way to test each:

Standard assert_equal actual/expected pattern for test comparisons via YAML tests.
Override the main test case (e.g.test_build_assert_equal) and utilize check_relations_equal to compare the actual/expected relations.
Create the actual + expected dataset using a CTE and union all to separate each case. See discussion regarding from_dual. Use assert_equal like the standard case.

gshank · 2022-05-09T15:28:04Z

If you run tests with -s you should get a line that lists the temporary project directory, like: === Test project_root: /private/var/folders/qt/vw8wqdgx4w381wh14b9y25m40000gn/T/pytest-of-gerda/pytest-487/project0. You can use that look at the project directory. We were also thinking about a pytest custom option to copy that somewhere, but so far haven't needed it.

We have a number of tests that look at things in the target directory. If you grep for "get_artifact" and "get_manifest" you'll find some of them.

jtcohen6

Thanks for the diligent work here! And thanks especially for calling out what's odd & left out. I think you managed to get something workable in just about all cases.

The three types of cross-database macros which you've called out as being most difficult to test, are also the ones that might want to be adapter methods (Python), IMO:

_is_relation, _is_ephemeral: These should absolutely become Python context methods—not even adapter methods, just good old standard base/provider context stuff. They'll be able to perform real Python isinstance, instead of the nonsense they have to do right now, checking side-effect class attributes of the available object.
type_* macros: These feel duplicative with the Column object methods (overridable by adapter), and the adapter convert_*_type macros, which are used during type inference while loading seeds. These feel like the right places to define them. If we can find a reasonable way to use them as one-offs, I bet we can plumb this logic back into dbt_utils (for backwards compatibility) from the Column object or those adapter methods. They're not perfectly tested today, but I think it better to test them "closer to the metal," by hooking into the adapter's Python client (if available/supported). E.g. we use Column.convert_type here to translate between BigQuery's schema object and dbt's Column objects.
current_timestamp: I'm on the fence about this one, and how best to test it. There's already a macro for this (snapshot_get_time) and an (unused, untested) adapter method date_function. Let's consolidate them together? current_timestamp_in_utc is a bit trickier, and I definitely see the value. I'm actually surprised we don't have a convert_timezone macro here — the snowplow package does.

The trade-offs with all of these: If these are defined as Python methods, users can still use them, but they can't override them, and the source code is less accessible / trickier to find. That said, the whole initiative here with cross-database macros is recognizing there exists some unopinionated low-level functionality that is best left to the adapter plugin maintainer, not the package / project maintainer.

Not mentioned in our issues / PRs so far, but worth calling out in a similar vein:

There's tremendous overlap between:

the equality, cardinality_equality, and equal_rowcount custom generic tests in dbt-utils
the theurgical query we use to power the check_relations_equal utility in the New Core Testing Framework

This code does more visible work, so I think it's good to have it visible, rather than hidden away in adapter code. But I also think it's tremendously valuable to end users (unit testing SQL transformations!), and we should aim to consolidate between the logic that's living in both places right now. Right now, all the testing framework logic is Python-only methods and f-stringified SQL, inaccessible to user-space code. See: recent slack thread with Marius from Trino/Starburst

jtcohen6 · 2022-05-10T11:27:42Z

tests/functional/cross_db_utils/test_dateadd.py

+        }
+
+
+@pytest.mark.skip_profile("bigquery", reason="TODO - need to figure out timestamp vs. datetime behavior!")


jtcohen6 · 2022-05-10T11:40:44Z

tests/functional/cross_db_utils/fixture_escape_single_quotes.py

@@ -0,0 +1,26 @@
+
+# escape_single_quotes


I like the way you've factored this one. Ideally, we'd have a single fixture that works across the board. The fact that it's hard to do, justifies the existence of the cross-db macro :)

I think it might make sense to define two "variants" of this test. Those could be two different test cases:

models__test_escape_single_quotes_quote → BaseEscapeSingleQuotesQuote

models__test_escape_single_quotes_backslash → BaseEscapeSingleQuotesBackslash

Or a single test case with a new fixture, which subclasses can override:

@pytest.fixture(scope="class") def escape_character(): return "quote" # or "backslash" @pytest.fixture(scope="class") def models(): if self.escape_character() == "quote": return ... elif self.escape_character() == "backslash": return ... else: # pseudo code raise "I don't know that one"

Then, each adapter (including Postgres/Redshift/Snowflake/BigQuery) can opt into one of the two standard variants with no changes:

@pytest.mark.only_profile("postgres") class TestEscapeSingleQuotesPostgres(BaseEscapeSingleQuotesQuote): pass @pytest.mark.only_profile("snowflake") class TestEscapeSingleQuotesSnowflake(BaseEscapeSingleQuotesBackslash): pass

Or:

@pytest.mark.only_profile("postgres") class TestEscapeSingleQuotesPostgres(BaseEscapeSingleQuotes): pass @pytest.mark.only_profile("snowflake") class TestEscapeSingleQuotesBigQuery(BaseEscapeSingleQuotes): @pytest.fixture(scope="class") def escape_character(): return "backslash"

Of course, if neither of those standard variants work, an adapter can define its own fixture and override the test case from the ground up.

Decision

I chose the "two different test cases" option. The implementation came out very straightforward.

This allowed me to bypass reasoning about the case of an unknown escape_character value. 😅

The alternative

However, the single test case with the escape_character definition was really interesting. I wonder if it would be worth adding the escape_character definition within each adapter itself?

An idea

Create a collection of low-level definitions within each adapter:

literal_quote - the symbol used to demarcate a (string) literal value within the database.

ANSI standard is single quote.

We'd then use this within the default implementation of the string_literal macro.

standard_escape_character - The symbol used to interpret a character literally rather than interpret it.

The ANSI standard is the backslash character.

invalid_for_standard_escape - List of symbols that can't be escaped by the standard_escape_character.

The ANSI standard includes both single and double quotes.

We'd default this to the literal_quote value (and maybe double quote too).

special_escape_character - The symbol used to escape invalid_for_standard_escape characters.

The ANSI standard is a single quote (same as literal_quote).

Use within default implementation of the escape_single_quotes macro.

These low-level configurations would then inform the intermediate-level macros like the following:

string_literal

escape_single_quotes (maybe rename this to something else more abstract)

References

https://www.ibm.com/docs/en/informix-servers/12.10?topic=statements-quotation-marks-escape-characters

https://4js.com/online_documentation/fjs-fgl-3.00.05-manual-html/c_fgl_sql_programming_080.html

Sidenote

Some macros assume single quotes to demarcate string literals rather than utilizing string_literal. This is one example. Not sure how common this is or not.

jtcohen6 · 2022-05-10T11:41:59Z

tests/functional/cross_db_utils/test_except.py

+class TestExcept(BaseExcept):
+    def test_build_assert_equal(self, project):


This test is cleverly written!

Any reason to have test_build_assert_equal defined on TestExcept, rather than on BaseExcept and then

class TestExcept(BaseExcept): pass

Good idea. Done.

jtcohen6 · 2022-05-10T11:43:42Z

tests/functional/cross_db_utils/test_intersect.py

+        }
+
+
+class TestIntersect(BaseIntersect):


Same comment here as for TestExcept — let's move test_build_assert_equal into BaseIntersect, I think

jtcohen6 · 2022-05-10T12:00:32Z

tests/functional/cross_db_utils/fixture_current_timestamp.py

+# TODO how can we test this better?
+models__test_current_timestamp_sql = """
+select
+    {{ dbt_utils.current_timestamp() }} as actual,
+    {{ dbt_utils.current_timestamp() }} as expected
+"""


Hm. Juice might not be worth the squeeze on this one.

We could use Python/Jinja {{ datetime.datetime.now() }}, and perform a comparison that shaves off seconds? To be clear, I think this is a bad idea:

models__test_current_timestamp_sql = """ select left(cast({{ dbt_utils.current_timestamp() }} as text), 16) as actual, '{{ modules.datetime.datetime.now().strftime("%Y-%m-%d %H:%M") }}' as expected """

Worth calling out that dbt_utils.current_timestamp is actually duplicative with:

The adapter classmethod date_now, which is required for all adapters but actually unused / untested (lol): (docs, abstractmethod, Postgres implementation)

snapshot_get_time, which is indeed a macro (not method) — not tested directly, but well tested insofar as snapshots are well tested, including their "right now" behavior (the tests for which can be flakey)

…nt test cases

dbeatty10 · 2022-05-10T18:57:06Z

If you run tests with -s you should get a line that lists the temporary project directory, like: === Test project_root: /private/var/folders/qt/vw8wqdgx4w381wh14b9y25m40000gn/T/pytest-of-gerda/pytest-487/project0. You can use that look at the project directory. We were also thinking about a pytest custom option to copy that somewhere, but so far haven't needed it.

We have a number of tests that look at things in the target directory. If you grep for "get_artifact" and "get_manifest" you'll find some of them.

This was very helpful -- exactly what I needed. Thank you @gshank ! 🏆

Was able to see that directory as you described when I ran a single test module locally:

python3 -m pytest tests/functional/cross_db_utils/test_dateadd.py --profile bigquery -s

dbeatty10 · 2022-05-11T17:38:52Z

@jtcohen6 I think I've responded to all your crucial feedback, so requested another review.

Do you see any remaining barriers we should overcome before merging this?

Feedback not incorporated here

You surfaced a few things that we should split out into new issues for either dbt-utils or dbt-core, namely:

consolidate between equality, cardinality_equality, and equal_rowcount custom generic tests in dbt-utils and the theurgical query we use to power the check_relations_equal
promote _is_relation and _is_ephemeral to become Python context methods
consolidate the type_* macros with the Column object methods and the adapter convert_*_type macros (to the extent it makes sense)
consolidate current_timestamp with snapshot_get_time and adapter method date_function
promote convert_timezone into dbt-utils

jtcohen6 · 2022-05-12T11:13:20Z

No blockers from me! Let's move forward!

I see you managed to get the dateadd test passing for BigQuery! I've gone back and forth on the actual logic in the bigquery__dateadd macro. By converting to datetime, it loses time zone information—but that's actually consistent with the behavior of timestamp (a.k.a. timestamp_ntz a.k.a. timestamp without time zone) on other databases.

So, on the whole — it sounds like we didn't actually have to make any breaking changes to these macros, which means that we don't first need a dbt-utils minor version.

Next steps

I think next up is lift & shift. I think this is the order of operations:

Move dispatched macros, default__ implementations, and test cases into the dbt-core repo. The test cases should land in the "adapter zone," a.k.a. dbt-tests-adapter.
In this repo, open a PR that installs dbt-core + dbt-tests-adapter from git+.../dbt-core@branchname. That PR can:
- Inherit the new functional test cases from dbt-tests-adapter.
- Keep macros as lightweight wrappers that just return dbt.this_macro_name(). (Or, to maintain backwards compatibility with older versions of dbt-core: Use dbt_version to check the installed version.)
- By setting the dispatch config in the integration_tests project, we should be able to keep all tests passing while the base/default macros move into dbt-core, and the adapter-specific versions still live here.
Move the adapter-specific versions into each adapter plugin, and inherit the "adapter zone". For backwards compatibility, we should probably leave each adapter-specific macro here, too—in case someone's using an older version of dbt, and in case someone has been in the habit of calling dbt_utils.redshift__dateadd directly.
With help from @dataders: Start doing the same for spark-utils → dbt-spark, tsql-utils → dbt-sqlserver, etc. The test cases will be in the "adapter zone," so they should be much easier to inherit + run than the current submodule nonsense.

Related work items

Thanks for catching all those other items! Let's open up some issues to keep track of this work. These issues are interrelated, so I'm taking best guesses at where they should live, and which units of work can be pursued independently.

To be clear: Not saying you need to go create all these issues. Also not saying you can't, if you feel inspired!

New `dbt-utils` issues

New `dbt-core` issues

New `docs.getdbt.com` issue

Utility macros in core + plugins docs.getdbt.com#1505

dbeatty10 added 30 commits May 7, 2022 17:54

Skip BigQuery dateadd for now

b5b23f9

Split into one fixture and test per cross-database macro

551c1c8

Move the skip decorator

d59dac1

Scaffolding for fixtures and tests

da58f27

Functional tests for any_value macro

3860605

Remove TODO

7bdaa49

Functional tests for bool_or macro

2f17185

Functional tests for concat macro

171a8e6

Functional tests for date_trunc macro

ce3bff2

Functional tests for hash macro

8a9cb50

Functional tests for last_day macro

10c845e

Functional tests for length macro

6f3320c

Functional tests for listagg macro

c3f6235

Functional tests for position macro

7204519

Functional tests for replace macro

d421350

Functional tests for right macro

1bac8c8

Functional tests for safe_cast macro

2521a6b

Functional tests for split_part macro

0c3d208

Missing newline

e11ccf0

Conform bool_or macro to actual/expected test pattern

71375c4

Conform any_value macro to actual/expected test pattern

d4cbc0d

Functional tests for current_timestamp macro

7e2ea34

Remove extraneous newline

de6e223

Functional tests for current_timestamp_in_utc macro

1820d76

Functional tests for cast_bool_to_text macro

e702f28

Functional tests for string_literal macro

ade997b

Functional tests for escape_single_quotes macro

6b78b23

Functional tests for except macro

1618379

Functional tests for intersect macro

2d386a2

Fix scaffolded SQL so that tests can pass

1161066

Postgres does not treat integers as booleans without casting

83af300

dbeatty10 requested a review from jtcohen6 May 9, 2022 14:38

jtcohen6 reviewed May 10, 2022

View reviewed changes

dbeatty10 added 3 commits May 10, 2022 09:01

Refactor test case override into the base class

fb8b406

Refactor the two variants for escaping single quotes into two differe…

e565c9c

…nt test cases

Fix functional tests for dateadd macro for BigQuery

3cbe0e6

dbeatty10 requested a review from jtcohen6 May 11, 2022 17:22

jtcohen6 mentioned this pull request May 12, 2022

Use built-in adapter functionality for datatypes #586

Merged

16 tasks

jtcohen6 approved these changes May 12, 2022

View reviewed changes

dbeatty10 merged commit 8ba53d7 into add-pytest-functional May 12, 2022

This was referenced May 12, 2022

Experiment with functional testing #575

Closed

Functional testing #588

Merged

This was referenced Jun 1, 2022

Rewrite type_* macros to use built-in adapter capabilities #598

Closed

[CT-895] [Spike] Consolidate current_timestamp & associates dbt-labs/dbt-core#5521

Closed

dbeatty10 mentioned this pull request Sep 12, 2022

add missing test for escape single quotes dbt-labs/dbt-core#5810

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment with functional testing #577

Experiment with functional testing #577

dbeatty10 commented May 9, 2022 •

edited

Loading

gshank commented May 9, 2022

jtcohen6 left a comment

jtcohen6 May 10, 2022

dbeatty10 May 10, 2022

jtcohen6 May 10, 2022

dbeatty10 May 10, 2022

dbeatty10 May 10, 2022

dbeatty10 May 10, 2022

dbeatty10 May 10, 2022

jtcohen6 May 10, 2022

dbeatty10 May 10, 2022

jtcohen6 May 10, 2022

dbeatty10 May 10, 2022

jtcohen6 May 10, 2022

dbeatty10 commented May 10, 2022

dbeatty10 commented May 11, 2022

jtcohen6 commented May 12, 2022 •

edited

Loading

		}


		@pytest.mark.skip_profile("bigquery", reason="TODO - need to figure out timestamp vs. datetime behavior!")

		class TestExcept(BaseExcept):
		def test_build_assert_equal(self, project):

Experiment with functional testing #577

Experiment with functional testing #577

Conversation

dbeatty10 commented May 9, 2022 • edited Loading

Checklist

Description & motivation

Questions

Details

Included

Not included

To do

Reflections

Approaches for testing cross-db macros

gshank commented May 9, 2022

jtcohen6 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Decision

Choose a reason for hiding this comment

The alternative

An idea

References

Choose a reason for hiding this comment

Sidenote

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbeatty10 commented May 10, 2022

dbeatty10 commented May 11, 2022

Feedback not incorporated here

jtcohen6 commented May 12, 2022 • edited Loading

Next steps

Related work items

New dbt-utils issues

New dbt-core issues

New docs.getdbt.com issue

dbeatty10 commented May 9, 2022 •

edited

Loading

jtcohen6 commented May 12, 2022 •

edited

Loading

New `dbt-utils` issues

New `dbt-core` issues

New `docs.getdbt.com` issue