Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make partitions loading for iceberg table lazy to avoid unnecessary loading #23645

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

hantangwangd
Copy link
Member

@hantangwangd hantangwangd commented Sep 13, 2024

Description

Currently, when querying Iceberg tables, we always eagerly load all partition values of the table in PickTableLayout or IcebergFilterPushdown during the optimization phase. Due to the fact that this eagerly loaded partition values are currently only used in metadata based optimization rules, in many cases this information is not used at all. It can result in a lot of waste of resources and performance in the following cases:

  • For queries that cannot be optimized based on metadata, we do not use these partition values at all, so we shouldn't load them eagerly.
  • For tables with a huge number of partitions that are not suitable for metadata optimization, we need to limit the max number that can be loaded in the loading phase, rather than loading all of them first and then determine whether they exceed the threshold.

This PR makes the loading behavior of partitions lazy and support setting a threshold for the maximum number of partitions that can be loaded during the loading phase. In this way, we can avoid a lot of unnecessary loading in many scenarios, as well as the resulting resource consumption and performance loss.

The benchmark's results also support the above conclusion. We execute regular query statements, query statements which are applicable for metadata optimization, and query statements which are applicable for further reducible optimization on tables with two different partition numbers. Among them, 300 * 3 will make the table contain 900 partitions (not reaching the default threshold 1000), while 400 * 4 will make the table contain 1600 partitions (exceeding the default threshold of 1000). The code can be viewed here: https://github.com/hantangwangd/presto/blob/benchmark_for_lazy_load/presto-iceberg/src/test/java/com/facebook/presto/iceberg/BenchmarkIcebergQuery.java

The benchmark test result before this change is as follows:


----Before this change, always load partitions eagerly----

Benchmark                                  (recordCount)  Mode  Cnt     Score     Error  Units
BenchmarkIcebergQuery.testFurtherOptimize        300 * 3  avgt   10   154.888 ±   2.762  ms/op
BenchmarkIcebergQuery.testFurtherOptimize        400 * 4  avgt   10   201.214 ±   5.199  ms/op
BenchmarkIcebergQuery.testNormalQuery            300 * 3  avgt   10   673.455 ±  26.239  ms/op
BenchmarkIcebergQuery.testNormalQuery            400 * 4  avgt   10   907.111 ± 103.223  ms/op
BenchmarkIcebergQuery.testOptimize               300 * 3  avgt   10   252.138 ±   6.873  ms/op
BenchmarkIcebergQuery.testOptimize               400 * 4  avgt   10  1025.587 ±  78.809  ms/op

While the benchmark test result after this change is as follows:


----After this change, lazy load partitions and check the max threshold in loading phase----

Benchmark                                  (recordCount)  Mode  Cnt    Score    Error  Units
BenchmarkIcebergQuery.testFurtherOptimize        300 * 3  avgt   10  164.981 ±  3.552  ms/op
BenchmarkIcebergQuery.testFurtherOptimize        400 * 4  avgt   10  211.434 ±  4.327  ms/op
BenchmarkIcebergQuery.testNormalQuery            300 * 3  avgt   10  437.028 ± 49.367  ms/op
BenchmarkIcebergQuery.testNormalQuery            400 * 4  avgt   10  634.008 ± 54.919  ms/op
BenchmarkIcebergQuery.testOptimize               300 * 3  avgt   10  257.394 ±  7.223  ms/op
BenchmarkIcebergQuery.testOptimize               400 * 4  avgt   10  808.018 ± 36.952  ms/op

Due to the issues mentioned above, we found that this change significantly improves the performance of queries that are not suitable for metadata optimization, while for statements that can be optimized based on metadata, the performance also improves when the number of partitions exceeds the threshold. This is in line with expectations, and it can be anticipated that as the number of partitions increases, the performance improvement will further increase.

Motivation and Context

Make partitions loading for iceberg table lazy to avoid unnecessary loading

Impact

N/A

Test Plan

  • Make sure the change do not affect existing tests
  • Newly added test case in TestIcebergLogicalPlanner to show the behaviors with different max partition thresholds
  • Benchmark tests in local to make sure the improvement in performance

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== RELEASE NOTES ==


Iceberg Connector Changes
* Improve partitions loading by making them lazy for Iceberg tables to avoid unnecessary loading.

@hantangwangd hantangwangd force-pushed the make_partition_loading_lazy branch from 9028384 to e390db7 Compare September 14, 2024 12:11
@hantangwangd hantangwangd marked this pull request as ready for review September 15, 2024 03:31
@hantangwangd hantangwangd requested review from ZacBlanco and a team as code owners September 15, 2024 03:31
@hantangwangd hantangwangd requested review from presto-oss, imjalpreet, aaneja and tdcmeehan and removed request for presto-oss September 15, 2024 03:31
@hantangwangd hantangwangd changed the title [WIP]Make partitions loading for iceberg table lazy to avoid unnecessary loading Make partitions loading for iceberg table lazy to avoid unnecessary loading Sep 15, 2024
Copy link
Contributor

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the motivation for this, but I am a little concerned about the other parts which implement the limiting of the partition loading. It feels a little convoluted, especially requiring checking the instance type of the iterable to set the max number of partitions to iterate. I feel like it could be simplified by just creating an iterable implementation which accepts the limit in the first place, rather than updating it after the fact. I may be missing something in my understanding if that isn't a feasible solution

The current implementation of partition loading in IcebergPartitionLoader looks good though.


import static java.util.Objects.requireNonNull;

public class LazyLoadedPartitions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this class itself should just implement Iterable. Is there a reason we need both LazyLoadedPartitions and LazyPartitionsIterable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, after recheck the code I think it's feasible and reasonable. Fixed!

*/
package com.facebook.presto.spi;

public interface LazyIterable<T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like interface makes the implementation more complex. Is there a reason we need to have the max iteration count mutable? I feel like having this mutable iteration count is a bit of an anti-pattern. What if instead we use something like limit from the stream API to prevent loading too many items? e.g.

Stream.of(new Iterable<>(...){}).limit(<N>)...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option could be adding a new field inside of the DiscretePredicate class with the maximum number of predicates to iterate over? Then we don't have to rely on the casting + lazy iterable.

Another alternative could be putting a limit on the iterator we pass to the DiscretePredicates constructor too, rather than even having to modify the DiscretePredicate class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review and feedback, the entire thinking is described as follows, please let me know if there is anything unreasonable.

There are several points to consider when implementing setting max threshold for partitions lazy loading:

  1. If any delete files are found during the file scanning process, the scanning and loading of partition values would be no longer continue, but instead a single value representing unpartitioned should be returned. So we cannot just scan a part of files to get a specified number of available partition values. So it seems that we cannot set the max threshold and terminate the scan in advance through methods such like Stream.of(new Iterable<>(...){}).limit(<N>)....

  2. The max threshold can be passed into IcebergPartitionLoader and used there during the loading to terminate the entire scanning process. Because during the loading process, even if no delete files are encountered, when the number of loaded available partitions exceeds the max threshold, we can terminate directly and return a value representing unpartitioned. So we have to figure out a way to pass max threshold into IcebergPartitionLoader.

  3. The construction of the table layout, that is, the lazy partitions and DiscretePredicates information is built earlier than the actual loading. Meanwhile, the value of max threshold can only be determined when actual loading is required. For example, in IcebergMetadataOptimizer, if further reducible optimization is available, we do not limit the max number of partitions loaded; otherwise, the max number of loaded partitions is set 1000 by default. So it seems that we need to support this delay setting of maximum threshold for the partition loader.

  4. LazyLoadedPartitions and PartitionLoader are both defined in presto-hive-common, while DiscretePredicates is defined in presto-spi, which do not depend on presto-hive-common. As mentioned above, we need a way to set the max threshold into PartitionLoader through DiscretePredicates before actual loading. So we define an interface in presto-spi and let LazyLoadedPartitions in presto-hive-common to implement it, in this way, max threshold can be finally passed into PartitionLoader through LazyLoadedPartitons.

@hantangwangd hantangwangd force-pushed the make_partition_loading_lazy branch from e390db7 to 827feb5 Compare October 1, 2024 02:38
@hantangwangd hantangwangd force-pushed the make_partition_loading_lazy branch from 827feb5 to 819c643 Compare November 3, 2024 12:42
@steveburnett
Copy link
Contributor

Thanks for the release note entry! Minor suggestion to help follow the Order of changes in the Release Note Guidelines.

== RELEASE NOTES ==

Iceberg Connector Changes
* Improve partitions loading by making them lazy for Iceberg tables to avoid unnecessary loading. :pr:`23645`

@tdcmeehan tdcmeehan self-assigned this Nov 4, 2024
@hantangwangd
Copy link
Member Author

@steveburnett Thanks for your suggestion, fixed! Please take a look when convenient.

@steveburnett
Copy link
Contributor

@steveburnett Thanks for your suggestion, fixed! Please take a look when convenient.

@hantangwangd LGTM, thanks!

@hantangwangd hantangwangd force-pushed the make_partition_loading_lazy branch from 819c643 to ce6b43f Compare February 15, 2025 00:40
@steveburnett
Copy link
Contributor

Hi, new release note guidelines. Please remove the manual PR link in the following format from the release note entries for this PR.


:pr:`12345`

I have updated the Release Notes Guidelines to remove the examples of manually adding the PR link.

@hantangwangd
Copy link
Member Author

Hi, new release note guidelines. Please remove the manual PR link in the following format from the release note entries for this PR.

Oh sorry forgot that. Thanks for your reminder @steveburnett, fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants