Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Microbatch: Using backfill results in inefficient delete statements (Snowflake adapter) #364

Closed
2 tasks done
brian-franklin opened this issue Nov 25, 2024 · 8 comments · May be fixed by #151
Closed
2 tasks done
Labels
feature:incremental Issues related to incremental materializations feature:microbatch Issues related to the microbatch incremental strategy triage:awaiting-response Awaiting a response from the reporter type:bug Something isn't working as documented

Comments

@brian-franklin
Copy link

brian-franklin commented Nov 25, 2024

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When running a model configured with the new microbatch strategy to backfill older dates with some previously loaded data, dbt produces a delete statement to remove records for the periods being backfilled. The delete statement produces a SQL statement for the Snowflake adapter that results in a cartesian product causing the statement to run for much longer than is necessary.

Expected Behavior

Existing records should be deleted efficiently without a cartesian join.

SQL to delete existing records should not be generated with a "using" clause with the temp table. (see steps to reproduce)

Steps To Reproduce

  1. using dbt-snowflake adapter version 1.9.0-b1 and dbt-core version 1.9.0-b4
  2. Create a model using the microbatch incremental strategy (
  3. Run the model to load some data.
    dbt run -s <model>
  4. Run the model again as a backfill where the time range overlaps with data that was prevsiously loaded
    dbt run -s <model> --event-time-start "<start timestamp>" --event-time-end "<end timestamp>"
  5. After creating a temp table for the first batch, a SQL statement to delete existing records is run to remove existing records from the target before inserting the new records in the temp table. This SQL is written such that a cartesian join is created resulting in a long-running query.

Example model config:

{{
  config(
    materialized='incremental',
    incremental_strategy='microbatch',
    event_time='content_recognition_start_timestamp',
    begin='2024-01-01',
    batch_size='day',
    )
}}

Example dbt commands:

dbt run -s content
dbt run -s content --event-time-start "2024-11-20" --event-time-end "2024-11-25"

dbt generated SQL to delete existing records:

delete from CURATED_CLONE.inscape.content DBT_INTERNAL_TARGET
    using CURATED_CLONE.inscape.content__dbt_tmp
    where (
    DBT_INTERNAL_TARGET.content_recognition_start_timestamp >= TIMESTAMP '2023-11-24 00:00:00+00:00'
    and DBT_INTERNAL_TARGET.content_recognition_start_timestamp < TIMESTAMP '2023-11-25 00:00:00+00:00'    
    );

The using with the temp table is unnecessary and causes a cartesian join.

Relevant log output

============================== 11:18:47.298457 | cdd2a509-9a0f-463d-b3e1-54879743f040 ==============================
�[0m11:18:47.298457 [info ] [MainThread]: Running with dbt=1.9.0-b4
�[0m11:18:47.299457 [debug] [MainThread]: running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'log_cache_events': 'False', 'write_json': 'True', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None', 'version_check': 'True', 'debug': 'False', 'log_path': 'C:\\repos\\hmi_dbt_lib\\logs', 'profiles_dir': 'C:\\Users\\BFranklin\\.dbt', 'fail_fast': 'False', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'empty': 'False', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'introspect': 'True', 'log_format': 'default', 'invocation_command': 'dbt run -s content --event-time-start 2023-11-24 --event-time-end 2023-11-25 --target sso', 'target_path': 'None', 'static_parser': 'True', 'send_anonymous_usage_stats': 'True'}
�[0m11:18:47.628432 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'project_id', 'label': 'cdd2a509-9a0f-463d-b3e1-54879743f040', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x00000233BE7DF8F0>]}
�[0m11:18:47.673431 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': 'cdd2a509-9a0f-463d-b3e1-54879743f040', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x00000233BF4FCA40>]}
�[0m11:18:47.673431 [info ] [MainThread]: Registered adapter: snowflake=1.9.0-b1
�[0m11:18:47.883198 [debug] [MainThread]: checksum: 0a22b8766ab724bfaebfe4fc1d2ad384caab58113fa005bc2edb50f306de7ed8, vars: {}, profile: , target: sso, version: 1.9.0b4
�[0m11:18:47.982669 [info ] [MainThread]: Unable to do partial parsing because of a version mismatch
�[0m11:18:47.982669 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'partial_parser', 'label': 'cdd2a509-9a0f-463d-b3e1-54879743f040', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x00000233BE7AB7A0>]}
�[0m11:18:51.581484 [warn ] [MainThread]: [�[33mWARNING�[0m]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.hmi_dbt_lib.processed
�[0m11:18:51.591511 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'load_project', 'label': 'cdd2a509-9a0f-463d-b3e1-54879743f040', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x00000233C0FC7020>]}
�[0m11:18:51.710445 [debug] [MainThread]: Wrote artifact WritableManifest to C:\repos\hmi_dbt_lib\target\manifest.json
�[0m11:18:51.714445 [debug] [MainThread]: Wrote artifact SemanticManifest to C:\repos\hmi_dbt_lib\target\semantic_manifest.json
�[0m11:18:51.745931 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': 'cdd2a509-9a0f-463d-b3e1-54879743f040', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x00000233C0E0E7B0>]}
�[0m11:18:51.746931 [info ] [MainThread]: Found 20 models, 40 data tests, 24 sources, 1351 macros
�[0m11:18:51.747932 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'cdd2a509-9a0f-463d-b3e1-54879743f040', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x00000233C1042210>]}
�[0m11:18:51.749940 [info ] [MainThread]: 
�[0m11:18:51.749940 [info ] [MainThread]: Concurrency: 2 threads (target='sso')
�[0m11:18:51.751089 [info ] [MainThread]: 
�[0m11:18:51.752127 [debug] [MainThread]: Acquiring new snowflake connection 'master'
�[0m11:18:51.753175 [debug] [ThreadPool]: Acquiring new snowflake connection 'list_HMI_CURATED_CLONE_PROD_NS1_656'
�[0m11:18:51.821249 [debug] [ThreadPool]: Using snowflake connection "list_HMI_CURATED_CLONE_PROD_NS1_656"
�[0m11:18:51.822331 [debug] [ThreadPool]: On list_HMI_CURATED_CLONE_PROD_NS1_656: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "hmi_dbt_lib", "target_name": "sso", "connection_name": "list_HMI_CURATED_CLONE_PROD_NS1_656"} */
show terse schemas in database HMI_CURATED_CLONE_PROD_NS1_656
    limit 10000
�[0m11:18:51.822331 [debug] [ThreadPool]: Opening a new connection, currently in state init
�[0m11:18:55.494947 [debug] [ThreadPool]: SQL status: SUCCESS 21 in 3.673 seconds
�[0m11:18:55.499987 [debug] [ThreadPool]: Acquiring new snowflake connection 'list_HMI_CURATED_CLONE_PROD_NS1_656_orkin'
�[0m11:18:55.507128 [debug] [ThreadPool]: Using snowflake connection "list_HMI_CURATED_CLONE_PROD_NS1_656_orkin"
�[0m11:18:55.508119 [debug] [ThreadPool]: Acquiring new snowflake connection 'list_HMI_LANDED_CLONE_PROD_NS_725_orkin'
�[0m11:18:55.508119 [debug] [ThreadPool]: On list_HMI_CURATED_CLONE_PROD_NS1_656_orkin: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "hmi_dbt_lib", "target_name": "sso", "connection_name": "list_HMI_CURATED_CLONE_PROD_NS1_656_orkin"} */
show objects in HMI_CURATED_CLONE_PROD_NS1_656.orkin limit 10000;
�[0m11:18:55.511017 [debug] [ThreadPool]: Using snowflake connection "list_HMI_LANDED_CLONE_PROD_NS_725_orkin"
�[0m11:18:55.512039 [debug] [ThreadPool]: Opening a new connection, currently in state init
�[0m11:18:55.512039 [debug] [ThreadPool]: On list_HMI_LANDED_CLONE_PROD_NS_725_orkin: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "hmi_dbt_lib", "target_name": "sso", "connection_name": "list_HMI_LANDED_CLONE_PROD_NS_725_orkin"} */
show objects in HMI_LANDED_CLONE_PROD_NS_725.orkin limit 10000;
�[0m11:18:55.512039 [debug] [ThreadPool]: Opening a new connection, currently in state init
�[0m11:18:59.077076 [debug] [ThreadPool]: SQL status: SUCCESS 3 in 3.565 seconds
�[0m11:18:59.079075 [debug] [ThreadPool]: Re-using an available connection from the pool (formerly list_HMI_LANDED_CLONE_PROD_NS_725_orkin, now list_hmi_dev_sandbox)
�[0m11:18:59.081656 [debug] [ThreadPool]: Using snowflake connection "list_hmi_dev_sandbox"
�[0m11:18:59.082661 [debug] [ThreadPool]: On list_hmi_dev_sandbox: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "hmi_dbt_lib", "target_name": "sso", "connection_name": "list_hmi_dev_sandbox"} */
show objects in hmi_dev.sandbox limit 10000;
�[0m11:18:59.219623 [debug] [ThreadPool]: SQL status: SUCCESS 8 in 0.136 seconds
�[0m11:18:59.220703 [debug] [ThreadPool]: Re-using an available connection from the pool (formerly list_hmi_dev_sandbox, now list_HMI_CURATED_CLONE_PROD_NS1_656_inscape)
�[0m11:18:59.222631 [debug] [ThreadPool]: Using snowflake connection "list_HMI_CURATED_CLONE_PROD_NS1_656_inscape"
�[0m11:18:59.223692 [debug] [ThreadPool]: On list_HMI_CURATED_CLONE_PROD_NS1_656_inscape: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "hmi_dbt_lib", "target_name": "sso", "connection_name": "list_HMI_CURATED_CLONE_PROD_NS1_656_inscape"} */
show objects in HMI_CURATED_CLONE_PROD_NS1_656.inscape limit 10000;
�[0m11:18:59.296905 [debug] [ThreadPool]: SQL status: SUCCESS 3 in 3.785 seconds
�[0m11:18:59.338385 [debug] [ThreadPool]: SQL status: SUCCESS 13 in 0.115 seconds
�[0m11:18:59.342383 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'cdd2a509-9a0f-463d-b3e1-54879743f040', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x00000233C1678CE0>]}
�[0m11:18:59.346079 [debug] [Thread-1 (]: Began running node model.hmi_dbt_lib.content
�[0m11:18:59.347151 [info ] [Thread-1 (]: 1 of 1 START sql microbatch model HMI_CURATED_CLONE_PROD_NS1_656.inscape.content  [RUN]
�[0m11:18:59.347151 [debug] [Thread-1 (]: Acquiring new snowflake connection 'model.hmi_dbt_lib.content'
�[0m11:18:59.348133 [debug] [Thread-1 (]: Began compiling node model.hmi_dbt_lib.content
�[0m11:18:59.354156 [debug] [Thread-1 (]: Writing injected SQL for node "model.hmi_dbt_lib.content"
�[0m11:18:59.355189 [debug] [Thread-1 (]: Began executing node model.hmi_dbt_lib.content
�[0m11:18:59.411111 [info ] [Thread-1 (]: 1 of 1 START batch 2023-11-24 of HMI_CURATED_CLONE_PROD_NS1_656.inscape.content  [RUN]
�[0m11:18:59.416974 [debug] [Thread-1 (]: Writing injected SQL for node "model.hmi_dbt_lib.content"
�[0m11:18:59.451838 [debug] [Thread-1 (]: Using snowflake connection "model.hmi_dbt_lib.content"
�[0m11:18:59.452831 [debug] [Thread-1 (]: On model.hmi_dbt_lib.content: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "hmi_dbt_lib", "target_name": "sso", "node_id": "model.hmi_dbt_lib.content"} */
create or replace temporary table HMI_CURATED_CLONE_PROD_NS1_656.inscape.content__dbt_tmp
         as
        (
with source as (
  select * from (select * from HMI_CURATED_CLONE_PROD_NS1_656.inscape.vw_content where content_recognition_start_timestamp >= '2023-11-24 00:00:00+00:00' and content_recognition_start_timestamp < '2023-11-25 00:00:00+00:00')
  
  limit 1000000
  
)
select
  hmi_uuid
  ,hashed_unique_tv_id
  ,zip_code
  ,dma
  ,content_tms_id as content_id
  ,content_title
  ,scheduled_content_start_time
  ,station_callsign
  ,content_start_media_time
  ,content_recognition_start_timestamp
  ,content_recognition_end_timestamp
  ,network
  ,live
  ,sha2(HMI_CURATED_CLONE_PROD_NS1_656.common.regexp_replace2(ip_address, '(\\b0+)(?!\\b)', ''), 256) as ip_address_hash
  ,input_category
  ,input_device
  ,app_service
  ,landed_timestamp
  ,created_timestamp
  ,modified_timestamp
  ,partition_date
from source
        );
�[0m11:18:59.452831 [debug] [Thread-1 (]: Opening a new connection, currently in state init
�[0m11:19:07.724147 [debug] [Thread-1 (]: SQL status: SUCCESS 1 in 8.272 seconds
�[0m11:19:07.731156 [debug] [Thread-1 (]: Using snowflake connection "model.hmi_dbt_lib.content"
�[0m11:19:07.732163 [debug] [Thread-1 (]: On model.hmi_dbt_lib.content: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "hmi_dbt_lib", "target_name": "sso", "node_id": "model.hmi_dbt_lib.content"} */
describe table HMI_CURATED_CLONE_PROD_NS1_656.inscape.content__dbt_tmp
�[0m11:19:07.812949 [debug] [Thread-1 (]: SQL status: SUCCESS 21 in 0.080 seconds
�[0m11:19:07.817650 [debug] [Thread-1 (]: Using snowflake connection "model.hmi_dbt_lib.content"
�[0m11:19:07.817650 [debug] [Thread-1 (]: On model.hmi_dbt_lib.content: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "hmi_dbt_lib", "target_name": "sso", "node_id": "model.hmi_dbt_lib.content"} */
describe table HMI_CURATED_CLONE_PROD_NS1_656.inscape.content
�[0m11:19:07.884428 [debug] [Thread-1 (]: SQL status: SUCCESS 21 in 0.066 seconds
�[0m11:19:07.895469 [debug] [Thread-1 (]: Using snowflake connection "model.hmi_dbt_lib.content"
�[0m11:19:07.895469 [debug] [Thread-1 (]: On model.hmi_dbt_lib.content: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "hmi_dbt_lib", "target_name": "sso", "node_id": "model.hmi_dbt_lib.content"} */
describe table HMI_CURATED_CLONE_PROD_NS1_656.inscape.content__dbt_tmp
�[0m11:19:07.965229 [debug] [Thread-1 (]: SQL status: SUCCESS 21 in 0.069 seconds
�[0m11:19:07.968238 [debug] [Thread-1 (]: Using snowflake connection "model.hmi_dbt_lib.content"
�[0m11:19:07.969238 [debug] [Thread-1 (]: On model.hmi_dbt_lib.content: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "hmi_dbt_lib", "target_name": "sso", "node_id": "model.hmi_dbt_lib.content"} */
describe table "HMI_CURATED_CLONE_PROD_NS1_656"."INSCAPE"."CONTENT"
�[0m11:19:08.035148 [debug] [Thread-1 (]: SQL status: SUCCESS 21 in 0.066 seconds
�[0m11:19:08.043139 [debug] [Thread-1 (]: 
    In "HMI_CURATED_CLONE_PROD_NS1_656"."INSCAPE"."CONTENT":
        Schema changed: False
        Source columns not in target: []
        Target columns not in source: []
        New column types: []
  
�[0m11:19:08.050146 [debug] [Thread-1 (]: Writing runtime sql for node "model.hmi_dbt_lib.content"
�[0m11:19:08.051146 [debug] [Thread-1 (]: Using snowflake connection "model.hmi_dbt_lib.content"
�[0m11:19:08.052139 [debug] [Thread-1 (]: On model.hmi_dbt_lib.content: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "hmi_dbt_lib", "target_name": "sso", "node_id": "model.hmi_dbt_lib.content"} */
delete from HMI_CURATED_CLONE_PROD_NS1_656.inscape.content DBT_INTERNAL_TARGET
    using HMI_CURATED_CLONE_PROD_NS1_656.inscape.content__dbt_tmp
    where (
    DBT_INTERNAL_TARGET.content_recognition_start_timestamp >= TIMESTAMP '2023-11-24 00:00:00+00:00'
    and DBT_INTERNAL_TARGET.content_recognition_start_timestamp < TIMESTAMP '2023-11-25 00:00:00+00:00'
    
    );
�[0m11:19:40.732795 [debug] [Thread-1 (]: Snowflake adapter: Snowflake query id: 01b89c53-0712-4153-001a-6a831bd4980a
�[0m11:19:40.732795 [debug] [Thread-1 (]: Snowflake adapter: Snowflake error: 000604 (57014): SQL execution canceled
�[0m11:19:40.733801 [error] [Thread-1 (]: 1 of 1 ERROR creating batch 2023-11-24 of HMI_CURATED_CLONE_PROD_NS1_656.inscape.content  [�[31mERROR�[0m in 41.32s]
�[0m11:19:40.734799 [error] [Thread-1 (]: Encountered an error while running operation: Database Error
  000604 (57014): SQL execution canceled

Environment

- OS: Windows 10 Enterprise
- Python: 3.12.7
- dbt: 1.9.0-b4
- dbt-snowflake: 1.9.0-b1

Which database adapter are you using with dbt?

snowflake

Additional Context

dbt cartesian join
@brian-franklin brian-franklin added type:bug Something isn't working as documented triage:product In Product's queue labels Nov 25, 2024
@dbeatty10 dbeatty10 added the feature:incremental Issues related to incremental materializations label Nov 26, 2024
@StEybert
Copy link

FYI: I have the same issue using dbt Cloud CLI (versionless). Here's the code generated by the dbt run ... --eventime... command (screenshot from Snowflake):
image
As you can see, the two tables are never joined.

@dbeatty10
Copy link
Contributor

Thanks for reporting this @brian-franklin !

Based on the generated SQL that you shared, this looks like it is coming from here, so I'm going to transfer this to the dbt-adapters repo.

This looks to me like the same thing reported in #150. If I remember correctly, we'd like to review and merge #110 prior to #151.

@dbeatty10 dbeatty10 transferred this issue from dbt-labs/dbt-core Nov 26, 2024
@dbeatty10 dbeatty10 added feature:microbatch Issues related to the microbatch incremental strategy and removed triage:product In Product's queue labels Nov 26, 2024
@brian-franklin
Copy link
Author

brian-franklin commented Nov 27, 2024

dbt debug log.txt
Hi @dbeatty10 I have some additional information on this issue. I have noticed that the delete statement constructed is not working as intended. Here is the delete sql generated by dbt for deleting records for the lookback period:

delete from HMI_CURATED.inscape.commercial DBT_INTERNAL_TARGET
    using HMI_CURATED.inscape.commercial__dbt_tmp
    where (
    DBT_INTERNAL_TARGET.landed_timestamp >= TIMESTAMP '2024-11-26 00:00:00+00:00'
    and DBT_INTERNAL_TARGET.landed_timestamp < TIMESTAMP '2024-11-27 00:00:00+00:00'
    
    );

However, despite there being records during this timeframe that were inserted the previous day, the result of this delete is 0 records updated. From the subsequent log:
2024-11-27 10:02:27.490873 (MainThread): 10:02:27 SQL status: SUCCESS 0 in 0.364 seconds.

This is resulting in the records being inserted again duplicating the data.

I noticed that if I remove the "TIMESTAMP" (before the string values) from the delete statement and cast the strings, then the query deletes records as expected. Here is the revised SQL I tested:

delete from HMI_CURATED.inscape.commercial DBT_INTERNAL_TARGET
    using HMI_CURATED.inscape.commercial__dbt_tmp
    where (
    DBT_INTERNAL_TARGET.landed_timestamp >=  '2024-11-26 00:00:00+00:00'
    and DBT_INTERNAL_TARGET.landed_timestamp <  '2024-11-27 00:00:00+00:00'
    
    );

I believe the timezone is affecting how the values are being selected by the delete, but I cannot confirm.

@StEybert
Copy link

@brian-franklin @dbeatty10
Shouldn't the query simply exclude the second row in the query? I would assume that microbatching would

  1. Delete the microbatch partition
  2. Append the new microbatch partition

So that the query above would look like

delete from HMI_CURATED.inscape.commercial DBT_INTERNAL_TARGET
    where (
    DBT_INTERNAL_TARGET.landed_timestamp >=  cast('2024-11-26 00:00:00+00:00'as timestamp_tz)
    and DBT_INTERNAL_TARGET.landed_timestamp <  cast('2024-11-27 00:00:00+00:00' as timestamp_tz)
    );

@brian-franklin
Copy link
Author

@dbeatty10 Yes, however looking at the dbt logs I can see that he code generated by dbt for the delete to handle the backfill is handling the timestamps differently than when it is generating the select for new records to insert. The proposed solution is to remove the TIMESTAMP from the backfill delete so that the 2 statements operate using the same timestamp logic.

See below:

Here is the logic to create the insert. Notice there is no special transformation of the landed_timestamp selection in the where clause in the source CTE.

 2024-11-28 10:01:45.232391 (MainThread): 10:01:45  On model.hmi_dbt_lib.content: /* {"app": "dbt", "dbt_version": "2024.11.25+e9c6e22", "profile_name": "user", "target_name": "prod", "node_id": "model.hmi_dbt_lib.content"} */
create or replace temporary table HMI_CURATED.inscape.content__dbt_tmp
         as
        (
with source as (
  select * from (select * from HMI_CURATED.inscape.vw_content where landed_timestamp >= '2024-11-26 00:00:00+00:00' and landed_timestamp < '2024-11-27 00:00:00+00:00')
  
)
select
  hmi_uuid
  ,hashed_unique_tv_id
  ,zip_code
  ,dma
  ,content_tms_id as content_id
  ,content_title
  ,scheduled_content_start_time
  ,station_callsign
  ,content_start_media_time
  ,content_recognition_start_timestamp
  ,content_recognition_end_timestamp
  ,network
  ,live
  ,sha2(HMI_CURATED.common.regexp_replace2(ip_address, '(\\b0+)(?!\\b)', ''), 256) as ip_address_hash
  ,input_category
  ,input_device
  ,app_service
  ,landed_timestamp
  ,created_timestamp
  ,modified_timestamp
  ,partition_date
from source
        );

Here is the logic generated for the delete to handle the specified backfill configured as 1 day in my case. The use of TIMESTAMP in front of the timestamp values is resulting in a selection of different records due to the timezone. In my table, the landed_timestamp is of type TIMESTAMP_TZ.

delete from HMI_CURATED.inscape.content DBT_INTERNAL_TARGET
    using HMI_CURATED.inscape.content__dbt_tmp
    where (
    DBT_INTERNAL_TARGET.landed_timestamp >= TIMESTAMP '2024-11-26 00:00:00+00:00'
    and DBT_INTERNAL_TARGET.landed_timestamp < TIMESTAMP '2024-11-27 00:00:00+00:00'
    
    );

@dbeatty10
Copy link
Contributor

@brian-franklin thank you for mentioning the issue with using the TIMESTAMP data type prefix to cast the string.

That looks like the issue reported in dbt-labs/dbt-snowflake#1256 and fixed in dbt-labs/dbt-snowflake#1257.

Wanna try with the latest release of dbt v1.9 and see if it resolves that particular piece?

In the meantime, #151 is in code review to solve #150.

@dbeatty10 dbeatty10 added the triage:awaiting-response Awaiting a response from the reporter label Dec 12, 2024
@brian-franklin
Copy link
Author

brian-franklin commented Dec 16, 2024

Hey @dbeatty10. I am happy to report that after installing the latest versions of core and snowflake adapter that this issue is resolved. Thank you for the assist.

@boxysean
Copy link

A team I'm working with also confirmed their load times dropped from 16 minutes to under 1 minute with this change 👍 Thanks team

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature:incremental Issues related to incremental materializations feature:microbatch Issues related to the microbatch incremental strategy triage:awaiting-response Awaiting a response from the reporter type:bug Something isn't working as documented
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants