Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] truncate() partition transformation does not work when it includes more than 100 partitions #429

Open
2 tasks done
alex-antonison opened this issue Mar 11, 2024 · 1 comment
Labels
pkg:dbt-athena Issue affects dbt-athena type:bug Something isn't working as documented

Comments

@alex-antonison
Copy link

Is this a new bug in dbt-athena?

  • I believe this is a new bug in dbt-athena
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When you use a truncate() partition transformation for a column that will result in more than 100 partitions, the batch partitioning functionality starts up and allows you to exceed 100 partitions.

{{
    config(
        materialized = 'table',
        table_type = 'iceberg',
        force_batch = true,
        partitioned_by = ['truncate(string_partition,2)', 'month(date_partition)']
    )
}}

However, when the query reaches out to Athena to pull in the distinct partitions, it uses truncate() in the query which is not a supported method of extracting values from a string in Athena.

select distinct truncate(string_partition,2), date_trunc('month', date_partition)
from "awsdatacatalog"."data_lake"."table__ha__tmp_not_partitioned"
order by truncate(string_partition,2), date_trunc('month', date_partition)

Instead, it could use something like substring() to pull back the unique partial values

select distinct substring(string_partition,1,2), date_trunc('month', date_partition)
from "awsdatacatalog"."data_lake"."table__ha__tmp_not_partitioned"
order by substring(string_partition,1,2), date_trunc('month', date_partition)

Expected Behavior

When I do a truncate() Iceberg partition transformation on a column, it is capable of handling something with greater than 100 partitions.

Steps To Reproduce

Create a model with a column that when a partition transformation of truncate() is used, it will result in more than 100 partitions.

Environment

- OS: MacOS
- Python: 3.11
- dbt: 1.7.7
- dbt-athena-community: 1.7.1

Additional Context

This is out of a Slack conversation: https://getdbt.slack.com/archives/C013MLFR7BQ/p1709755667814619

This method was referenced as where the work would need to be changed: https://github.com/dbt-athena/dbt-athena/blob/289be4f4f44f3d5a6cf575d8fe218209c4a41171/dbt/adapters/athena/impl.py#L1279

Apache Iceberg Truncate Partition documentation: https://iceberg.apache.org/spec/#truncate-transform-details

@alex-antonison alex-antonison added the type:bug Something isn't working as documented label Mar 11, 2024
@nicor88
Copy link
Contributor

nicor88 commented Mar 11, 2024

@svdimchenko FYI

@mikealfare mikealfare added the pkg:dbt-athena Issue affects dbt-athena label Jan 10, 2025
@mikealfare mikealfare transferred this issue from dbt-labs/dbt-athena Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg:dbt-athena Issue affects dbt-athena type:bug Something isn't working as documented
Projects
None yet
Development

No branches or pull requests

3 participants