Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of TABLE_VERSION resources exceeds the account limit #267

Closed
VDFaller opened this issue Apr 28, 2023 · 9 comments
Closed

Number of TABLE_VERSION resources exceeds the account limit #267

VDFaller opened this issue Apr 28, 2023 · 9 comments

Comments

@VDFaller
Copy link

Problem

When I try to run I occasionally get this.
Number of TABLE_VERSION resources exceeds the account limit (Service: AmazonDataCatalog; Status Code: 400; Error Code: ResourceNumberLimitExceededException;

Now I'm told that this is because glue is keeping revisions every time we make a table or view (it doesn't seem to affect seeds). My athena person said I should drop and create instead of what dbt is doing which is
create or replace table or view.

@Jrmyy
Copy link
Contributor

Jrmyy commented May 2, 2023

Hello 👋🏻
Can you provide a basic model which will help us reproduce the issue ? Especially, I would be interested in the materialization you are using and how frequently and how, in terms of process, do you rebuild your model ?

@nicor88
Copy link
Contributor

nicor88 commented May 2, 2023

As far as I know there are only 2 type of materializations that can lead to this issue:

  • iceberg
  • table_hive_ha -> that should have a default table version expiration, means we keep only 4 table version only. As we catch the exception means that there is no guarantee that previous versions are really expired.

If you are using iceberg is pretty normal that every transactions is a creating a new snapshot id, means a new table version.

You can consider to build a custom macro that calls this function: https://github.com/dbt-athena/dbt-athena/blob/main/dbt/adapters/athena/impl.py#LL625C9-L625C35 and call it as a post hook in your models.
something like that: https://github.com/dbt-athena/dbt-athena/blob/84a92e52d7328b64ba4c1197376e951d3851de6d/dbt/include/athena/macros/materializations/models/table/table_hive_ha.sql#LL61C1-L61C1

Looking at here: https://docs.aws.amazon.com/general/latest/gr/glue.html seems that limits can be increased.

@VDFaller
Copy link
Author

VDFaller commented May 2, 2023

Yeah our limit is 1M and we're hitting it. The aws guy here is looking into it. We don't really have much in the way of separate environments so our dev is the thing that's doing the polluting. We run some many times a day. All views or tables. The entire 1M isn't just the DBT project but we're not being good neighbors basically.

I like the expiry macro. I'll try it out.

Edit It is a hive metastore. I don't know why it it's not expiring. Is there anything I can do in order to help debug that?

@nicor88
Copy link
Contributor

nicor88 commented May 2, 2023

Glue catalog is a managed hive metastore, that is also used by iceberg tables (managed) to point to the latest committed snapshots.

Do you use iceberg? or do you use table_hive_ha in some of your models? If so, I would start with the tables produced by those materializations.

@VDFaller
Copy link
Author

VDFaller commented May 2, 2023

Oh sorry, all of our tables are unmanaged and I'm pretty sure we don't use iceberg anywhere (at least that I know of).

Right now I've got this and it SEEMS to be working. But I need to upgrade and I'm running into permissions issues from #262 .

{% macro expire_glue_tables() %}
-- this just expires all glue tables in the project that are not ephemeral more the 3 versions old
{% if execute %}
   {% if selected_resources %}
      {% set resources = selected_resources | selectattr("resource_type", "equalto", "model") %}
   {% else %}
      {% set resources = graph.nodes.values() | selectattr("resource_type", "equalto", "model") %}
   {% endif %}
   {% for model in resources  %}
      {% if model.config.materialized != 'ephemeral' and model.package_name == project_name %}
         {{ log("removing old versions of "+model.schema+"."+model.name, info=True) }}
         {% set result_table_version_expiration = adapter.expire_glue_table_versions(model.schema, model.name, 3, False) %}
      {% endif %}
   {% endfor %} 
{% endif %}
{% endmacro %}

@VDFaller
Copy link
Author

VDFaller commented May 3, 2023

Oh, it's not working automatically because I lack the glue:DeleteTableVersion permission.

@nicor88
Copy link
Contributor

nicor88 commented May 3, 2023

currently expire_glue_table_versions catch the error and it a silent failing in case you lack permissions, we might consider to remove that eventually.

@mattiamatrix
Copy link
Contributor

@VDFaller we recently released this page that lists all the IAM permissions required.

@VDFaller VDFaller closed this as completed May 9, 2023
@VDFaller
Copy link
Author

VDFaller commented May 9, 2023

As this seems to be a permissions issue. i'm closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants