Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Athena Partition Projection with Dynamic Id partitioning #312

Open
edniescior opened this issue May 17, 2022 · 2 comments
Open

Using Athena Partition Projection with Dynamic Id partitioning #312

edniescior opened this issue May 17, 2022 · 2 comments

Comments

@edniescior
Copy link

I have a table with a large number of partitions in S3. I use partition projection with dynamic id partitioning to query the dataset using Athena, so there are no logical partitions created for the table in the Glue Data Catalog. The first two of the three partition values in the projection are 'injected'. Running a S3F2 deletion job against this table runs successfully, but always returns a count of 0 deletes. I'm assuming that this is because the data catalog is returning an empty list of partitions each time. I want to confirm whether partition projection is supported or not.

@matteofigus
Copy link
Member

matteofigus commented May 18, 2022

Hi, thanks for opening an issue. Partition Projection is a relatively new feature for Athena, and I don't think we've done any extensive testing on that. What I know, is that partitions are fetched during query generation, and if none are returned a query should be made to fetch all results and Athena should hopefully sort out the mechanism to project them as it would do with any other query similarly for how it would do it with an unpartitioned query - so it should work.

I can do a quick test on my side, in the meanwhile, can you provide some details on your side:

  1. Which version are you using of the S3 Find & Forget solution?
  2. Is this dataset parquet?
  3. You mentioned three partitions are injected - can you provide a sample of that? If you could share a sample of your glue table metadata (so, how you configured the partitions) and their values, I can try to see if I can reproduce on my side with the same schema
  4. When you inspect your job, you should see in the UI/API the number of queries executed. Can you confirm you see a query for that particular dataset? To dig into it, you can check the Job Events using UI/API and search for a QuerySucceeded event
  5. You mention that you see 0 deletions - have you checked with a manual query whether that is a false negative? If you run a query with athena for instance, such as select * from table where customer_id in ('value', 'value1') do you actually see any rows? I guess what I'm asking here is, have you checked your table's dynamic partitioning configuration is correctly setup?

@edniescior
Copy link
Author

Thanks for the prompt response.

  1. (version:v0.47)
  2. Yes. Parquet.
  3. Glue metadata [attached.]
  4. I need to check that.
  5. Yes. I confirmed with a manual query in Athena before and after running the delete jobs. The same rows were returned.
    table_schema.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants