-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Iceberg] Add manifest file caching for HMS-based deployments #24481
base: master
Are you sure you want to change the base?
[Iceberg] Add manifest file caching for HMS-based deployments #24481
Conversation
7db7896
to
666d248
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I may not have the correct context on this but is it possible to add some tests too?
@@ -67,7 +67,7 @@ public class IcebergConfig | |||
|
|||
private EnumSet<ColumnStatisticType> hiveStatisticsMergeFlags = EnumSet.noneOf(ColumnStatisticType.class); | |||
private String fileIOImpl = HadoopFileIO.class.getName(); | |||
private boolean manifestCachingEnabled; | |||
private boolean manifestCachingEnabled = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is intentional. Performance is significantly worse with it disabled, and I don't think there are any known downsides to making this enabled by default other than an increased memory footprint
public ManifestFileCache createManifestFileCache(IcebergConfig config, MBeanExporter exporter) | ||
{ | ||
Cache<ManifestFileCacheKey, ManifestFileCachedContent> delegate = CacheBuilder.newBuilder() | ||
.maximumWeight(config.getManifestCachingEnabled() ? config.getMaxManifestCacheSize() : 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the caching is disabled i think we should not have any caching instead of adding this via 0 weight
Description
Adds manifest file caching to the Iceberg connector for HMS-based deployments.
Motivation and Context
In order to optimize and plan iceberg queries we call the
planFiles()
API multiple times throughout the query optimization lifecycle. Each time it requires reading and parsing metadata files which usually exist on an external filesystem such as S3. For large tables there could be hundreds of files. They usually range in a few kilobytes in size up to a few megabytes. When not cached in memory within Presto it can lead to significant E2E query latency degradation.Impact
TBD
Test Plan
TBD
Contributor checklist
Release Notes