Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2057] [Regression] Flaky generic tests on multiple threads in dbt 1.4.1 #6885

Closed
2 tasks done
gastlich opened this issue Feb 7, 2023 · 4 comments · Fixed by #7023
Closed
2 tasks done

[CT-2057] [Regression] Flaky generic tests on multiple threads in dbt 1.4.1 #6885

gastlich opened this issue Feb 7, 2023 · 4 comments · Fixed by #7023
Assignees
Labels
bug Something isn't working regression
Milestone

Comments

@gastlich
Copy link

gastlich commented Feb 7, 2023

Is this a regression in a recent version of dbt-core?

  • I believe this is a regression in dbt-core functionality
  • I have searched the existing issues, and I could not find an existing issue for this regression

Current Behavior

Hello,
Our project faced failures in CI test jobs after upgrading to the latest DBT version (1.4.1). The failures were due to flaky generic tests. After investigating the issue I was able to end up having 2 generic tests running on a single model, which causes the failure.

We are still running the older version 1.3.2 on main and everything is working fine and reliably.

The issue appears to be related to a concurrency/race condition problem. This was confirmed when the following was discovered:

  1. Setting the number of threads to 1 caused the failing tests to pass.
  2. Testing each of the two generic tests individually (by commenting out the other one) showed that they both succeeded.
  3. Sometimes the mentioned tests succeed altogether, but it's rare.

The issues is related to:
https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/compilation.py#L243

token = sqlparse.sql.Token(sqlparse.tokens.Keyword, ", ".join(c.sql for c in ctes))
parsed.insert_after(with_stmt, token)

Some of the CTEs don't have their sql attribute set:

[
    InjectedCTE(id='model.onb.int_loan_monitor__investment_derived_kpis_actuals', sql=None),
    InjectedCTE(id='model.onb.int_loan_monitor__investment_derived_kpis_scenarios', sql=None),
    ...
]

I'm not familiar with the algorithm how the entities of InjectedCTE are created in the concurrent mode, but it would be good to hear from you if you have any suggestions/workarounds for this error.

Expected/Previous Behavior

We expect all the tests to succeed in multi-threaded environment with DBT version 1.4.1

Steps To Reproduce

I haven't tried yet setting a new project and running tests on a clean setup, but based on our findings, we should focus on:

  1. Set more than 1 thread in your profiles.yml
  2. Create at least 2 generics tests for different fields on a single model (in our case not_null)
  3. Run all generics tests for a single model:
pipenv run dbt --debug test --select model_name

Relevant log output

12:56:39.551807 [debug] [Thread-1 (]: Traceback (most recent call last):
  File ".local/share/virtualenvs/data-platform-s3J5s7em/lib/python3.10/site-packages/dbt/task/base.py", line 367, in safe_run
    result = self.compile_and_execute(manifest, ctx)
  File ".local/share/virtualenvs/data-platform-s3J5s7em/lib/python3.10/site-packages/dbt/task/base.py", line 304, in compile_and_execute
    ctx.node = self.compile(manifest)
  File ".local/share/virtualenvs/data-platform-s3J5s7em/lib/python3.10/site-packages/dbt/task/compile.py", line 37, in compile
    return compiler.compile_node(self.node, manifest, {})
  File ".local/share/virtualenvs/data-platform-s3J5s7em/lib/python3.10/site-packages/dbt/compilation.py", line 521, in compile_node
    node, _ = self._recursively_prepend_ctes(node, manifest, extra_context)
  File ".local/share/virtualenvs/data-platform-s3J5s7em/lib/python3.10/site-packages/dbt/compilation.py", line 319, in _recursively_prepend_ctes
    injected_sql = self._inject_ctes_into_sql(
  File ".local/share/virtualenvs/data-platform-s3J5s7em/lib/python3.10/site-packages/dbt/compilation.py", line 242, in _inject_ctes_into_sql
    token = sqlparse.sql.Token(sqlparse.tokens.Keyword, ", ".join(c.sql for c in ctes))
TypeError: sequence item 0: expected str instance, NoneType found

Environment

- OS: macOS 12.6.1 / Ubuntu 22.04.1
- Python: 3.10.8 / 3.10.9
- dbt (working version): 1.3.2
- dbt (regression version): 1.4.1

Which database adapter are you using with dbt?

bigquery

Additional Context

No response

@gastlich gastlich added bug Something isn't working regression triage labels Feb 7, 2023
@github-actions github-actions bot changed the title [Regression] Flaky generic tests on multiple threads in dbt 1.4.1 [CT-2057] [Regression] Flaky generic tests on multiple threads in dbt 1.4.1 Feb 7, 2023
@jtcohen6
Copy link
Contributor

jtcohen6 commented Feb 7, 2023

@gastlich Thanks for opening! Are the two generic tests both defined on an ephemeral model? Or a model that depends on an ephemeral model?

The code you linked is related to how ephemeral models are compiled into downstream nodes (as CTEs). We made a few changes to that code in v1.4, which were behind-the-scenes refactoring and cleanup of our node types (#6384, #6427). While those chnages shouldn't have had any functional effect, it's of course possible that this is an unintended consequence.

A clear & reliable reproduction case would help us get to the bottom of this.

I've tried reproducing with:

-- models/my_model.sql
{{ config(materialized = 'ephemeral') }}
select 1 as id
# models/config.yml
version: 2
models:
  - name: my_model
    columns:
      - name: id
        tests:
          - unique
          - not_null

But I haven't been able to reproduce the error. I'm able to run both tests with multiple threads successfully.

@jtcohen6 jtcohen6 added this to the v1.4.x milestone Feb 7, 2023
@gastlich
Copy link
Author

gastlich commented Feb 7, 2023

hello @jtcohen6 . Thanks for coming back to me so quickly :)

I've managed to recreate the issue on a clean project:

Setup

# schema.yml
version: 2

models:
  - name: int_eph_first
    columns:
      - name: first_column
        tests:
          - not_null
      - name: second_column
        tests:
          - not_null

  - name: fct_eph_first
    columns:
      - name: first_column
        tests:
          - not_null
      - name: second_column
        tests:
          - not_null
-- int_eph_first.sql
{{ config(materialized='ephemeral') }}

select
    1 as first_column,
    2 as second_column,
-- fct_eph_first.sql
{{ config(materialized='ephemeral') }}

with int_eph_first as(
    select * from {{ ref('int_eph_first') }}
)

select * from int_eph_first

Screenshot 2023-02-07 at 20 39 59

Single-threaded outcome

20:40:24  Running with dbt=1.4.1
20:40:24  Found 2 models, 4 tests, 0 snapshots, 0 analyses, 334 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
20:40:24
20:40:24  Concurrency: 1 threads (target='dev')
20:40:24
20:40:24  1 of 4 START test not_null_fct_eph_first_first_column .......................... [RUN]
20:40:25  1 of 4 PASS not_null_fct_eph_first_first_column ................................ [PASS in 1.54s]
20:40:25  2 of 4 START test not_null_fct_eph_first_second_column ......................... [RUN]
20:40:26  2 of 4 PASS not_null_fct_eph_first_second_column ............................... [PASS in 0.82s]
20:40:26  3 of 4 START test not_null_int_eph_first_first_column .......................... [RUN]
20:40:27  3 of 4 PASS not_null_int_eph_first_first_column ................................ [PASS in 0.78s]
20:40:27  4 of 4 START test not_null_int_eph_first_second_column ......................... [RUN]
20:40:28  4 of 4 PASS not_null_int_eph_first_second_column ............................... [PASS in 1.03s]
20:40:28
20:40:28  Finished running 4 tests in 0 hours 0 minutes and 4.18 seconds (4.18s).
20:40:28
20:40:28  Completed successfully
20:40:28
20:40:28  Done. PASS=4 WARN=0 ERROR=0 SKIP=0 TOTAL=4

Multi-threaded outcome

20:41:51  Running with dbt=1.4.1
20:41:51  Unable to do partial parsing because profile has changed
20:41:52  Found 2 models, 4 tests, 0 snapshots, 0 analyses, 334 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
20:41:52
20:41:52  Concurrency: 10 threads (target='dev')
20:41:52
20:41:52  1 of 4 START test not_null_fct_eph_first_first_column .......................... [RUN]
20:41:52  2 of 4 START test not_null_fct_eph_first_second_column ......................... [RUN]
20:41:52  3 of 4 START test not_null_int_eph_first_first_column .......................... [RUN]
20:41:52  4 of 4 START test not_null_int_eph_first_second_column ......................... [RUN]
20:41:52  Unhandled error while executing test.jaffle_shop.not_null_fct_eph_first_second_column.19482e284e
sequence item 0: expected str instance, NoneType found
20:41:52  2 of 4 ERROR not_null_fct_eph_first_second_column .............................. [ERROR in 0.03s]
20:41:53  3 of 4 PASS not_null_int_eph_first_first_column ................................ [PASS in 1.30s]
20:41:53  1 of 4 PASS not_null_fct_eph_first_first_column ................................ [PASS in 1.32s]
20:41:53  4 of 4 PASS not_null_int_eph_first_second_column ............................... [PASS in 1.60s]
20:41:53
20:41:53  Finished running 4 tests in 0 hours 0 minutes and 1.61 seconds (1.61s).
20:41:53
20:41:53  Completed with 1 error and 0 warnings:
20:41:53
20:41:53  sequence item 0: expected str instance, NoneType found
20:41:53
20:41:53  Done. PASS=3 WARN=0 ERROR=1 SKIP=0 TOTAL=4
20:43:33.491653 [debug] [Thread-1 (]: Began running node test.jaffle_shop.not_null_fct_eph_first_first_column.47390b8f26
20:43:33.491759 [debug] [Thread-2 (]: Began running node test.jaffle_shop.not_null_fct_eph_first_second_column.19482e284e
20:43:33.491907 [debug] [Thread-3 (]: Began running node test.jaffle_shop.not_null_int_eph_first_first_column.7fd5ab6772
20:43:33.492013 [debug] [Thread-4 (]: Began running node test.jaffle_shop.not_null_int_eph_first_second_column.f4df3457a3
20:43:33.492243 [info ] [Thread-1 (]: 1 of 4 START test not_null_fct_eph_first_first_column .......................... [RUN]
20:43:33.492427 [info ] [Thread-2 (]: 2 of 4 START test not_null_fct_eph_first_second_column ......................... [RUN]
20:43:33.492551 [info ] [Thread-3 (]: 3 of 4 START test not_null_int_eph_first_first_column .......................... [RUN]
20:43:33.492669 [info ] [Thread-4 (]: 4 of 4 START test not_null_int_eph_first_second_column ......................... [RUN]
20:43:33.492995 [debug] [Thread-1 (]: Acquiring new bigquery connection 'test.jaffle_shop.not_null_fct_eph_first_first_column.47390b8f26'
20:43:33.493316 [debug] [Thread-2 (]: Acquiring new bigquery connection 'test.jaffle_shop.not_null_fct_eph_first_second_column.19482e284e'
20:43:33.493584 [debug] [Thread-3 (]: Acquiring new bigquery connection 'test.jaffle_shop.not_null_int_eph_first_first_column.7fd5ab6772'
20:43:33.493831 [debug] [Thread-4 (]: Acquiring new bigquery connection 'test.jaffle_shop.not_null_int_eph_first_second_column.f4df3457a3'
20:43:33.493952 [debug] [Thread-1 (]: Began compiling node test.jaffle_shop.not_null_fct_eph_first_first_column.47390b8f26
20:43:33.494067 [debug] [Thread-2 (]: Began compiling node test.jaffle_shop.not_null_fct_eph_first_second_column.19482e284e
20:43:33.494175 [debug] [Thread-3 (]: Began compiling node test.jaffle_shop.not_null_int_eph_first_first_column.7fd5ab6772
20:43:33.494293 [debug] [Thread-4 (]: Began compiling node test.jaffle_shop.not_null_int_eph_first_second_column.f4df3457a3
20:43:33.502155 [debug] [Thread-1 (]: Writing injected SQL for node "test.jaffle_shop.not_null_fct_eph_first_first_column.47390b8f26"
20:43:33.504563 [debug] [Thread-2 (]: Timing info for test.jaffle_shop.not_null_fct_eph_first_second_column.19482e284e (compile): 2023-02-07 20:43:33.502304 => 2023-02-07 20:43:33.504528
20:43:33.509717 [debug] [Thread-3 (]: Writing injected SQL for node "test.jaffle_shop.not_null_int_eph_first_first_column.7fd5ab6772"
20:43:33.512284 [debug] [Thread-4 (]: Writing injected SQL for node "test.jaffle_shop.not_null_int_eph_first_second_column.f4df3457a3"
20:43:33.512788 [error] [Thread-2 (]: Unhandled error while executing test.jaffle_shop.not_null_fct_eph_first_second_column.19482e284e
sequence item 0: expected str instance, NoneType found
20:43:33.516319 [debug] [Thread-1 (]: Writing injected SQL for node "test.jaffle_shop.not_null_fct_eph_first_first_column.47390b8f26"
20:43:33.516659 [debug] [Thread-3 (]: Timing info for test.jaffle_shop.not_null_int_eph_first_first_column.7fd5ab6772 (compile): 2023-02-07 20:43:33.504692 => 2023-02-07 20:43:33.516634
20:43:33.516912 [debug] [Thread-4 (]: Timing info for test.jaffle_shop.not_null_int_eph_first_second_column.f4df3457a3 (compile): 2023-02-07 20:43:33.509885 => 2023-02-07 20:43:33.516889
20:43:33.517100 [debug] [Thread-3 (]: Began executing node test.jaffle_shop.not_null_int_eph_first_first_column.7fd5ab6772
20:43:33.517922 [debug] [Thread-1 (]: Writing injected SQL for node "test.jaffle_shop.not_null_fct_eph_first_first_column.47390b8f26"
20:43:33.518077 [debug] [Thread-4 (]: Began executing node test.jaffle_shop.not_null_int_eph_first_second_column.f4df3457a3
20:43:33.526652 [debug] [Thread-3 (]: Writing runtime sql for node "test.jaffle_shop.not_null_int_eph_first_first_column.7fd5ab6772"
20:43:33.527066 [debug] [Thread-2 (]: Traceback (most recent call last):
  File ".local/share/virtualenvs/dbt-test-gTAqrA9S/lib/python3.11/site-packages/dbt/task/base.py", line 367, in safe_run
    result = self.compile_and_execute(manifest, ctx)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".local/share/virtualenvs/dbt-test-gTAqrA9S/lib/python3.11/site-packages/dbt/task/base.py", line 304, in compile_and_execute
    ctx.node = self.compile(manifest)
               ^^^^^^^^^^^^^^^^^^^^^^
  File ".local/share/virtualenvs/dbt-test-gTAqrA9S/lib/python3.11/site-packages/dbt/task/compile.py", line 37, in compile
    return compiler.compile_node(self.node, manifest, {})
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".local/share/virtualenvs/dbt-test-gTAqrA9S/lib/python3.11/site-packages/dbt/compilation.py", line 519, in compile_node
    node, _ = self._recursively_prepend_ctes(node, manifest, extra_context)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".local/share/virtualenvs/dbt-test-gTAqrA9S/lib/python3.11/site-packages/dbt/compilation.py", line 317, in _recursively_prepend_ctes
    injected_sql = self._inject_ctes_into_sql(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".local/share/virtualenvs/dbt-test-gTAqrA9S/lib/python3.11/site-packages/dbt/compilation.py", line 241, in _inject_ctes_into_sql
    token = sqlparse.sql.Token(sqlparse.tokens.Keyword, ", ".join(c.sql for c in ctes))
                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: sequence item 0: expected str instance, NoneType found

20:43:33.528679 [debug] [Thread-4 (]: Writing runtime sql for node "test.jaffle_shop.not_null_int_eph_first_second_column.f4df3457a3"
20:43:33.528863 [debug] [Thread-1 (]: Timing info for test.jaffle_shop.not_null_fct_eph_first_first_column.47390b8f26 (compile): 2023-02-07 20:43:33.494398 => 2023-02-07 20:43:33.528834
20:43:33.528994 [error] [Thread-2 (]: 2 of 4 ERROR not_null_fct_eph_first_second_column .............................. [ERROR in 0.04s]
20:43:33.529244 [debug] [Thread-1 (]: Began executing node test.jaffle_shop.not_null_fct_eph_first_first_column.47390b8f26
20:43:33.529416 [debug] [Thread-3 (]: Opening a new connection, currently in state init
20:43:33.530175 [debug] [Thread-2 (]: Finished running node test.jaffle_shop.not_null_fct_eph_first_second_column.19482e284e

@jtcohen6
Copy link
Contributor

jtcohen6 commented Feb 7, 2023

@gastlich Thanks for the full reproduction case! I was able to reliably produce the error while running locally with 1.4.latest. I also saw the error after removing the tests on fct_eph_first, and defining tests on fct_eph_first only. I don't see the error while running with v1.3.

It sounds like the needed ingredients are:

  • two ephemeral models, one referencing the other
  • two generic tests defined on the second model
  • running dbt test with multiple threads

We'll need to dig in some more to understand what's actually going on behind the scenes. It sounds like a thread safety issue with _inject_ctes_into_sql / _recursively_prepend_ctes.

@fathom-parth
Copy link

@gshank is this scheduled to be in a 1.4 release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working regression
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants