Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve latency for HBO optimizer #21297

Merged
merged 1 commit into from
Nov 10, 2023

Conversation

feilong-liu
Copy link
Contributor

@feilong-liu feilong-liu commented Nov 2, 2023

Description

Part of #20355

This PR optimizes the latency for HBO optimizer from the following aspects:

  • Check timeout for plan canonicalization, plan hash and return empty when timeout
    Previously it only checks timeout for meta data access, and do not return immediately when timeout. However, we found that plan canonicalization and hash can also be time consuming. And we need to check for these two, and also return immediately when timeout.
  • Add a flag to specify whether to fetch HBO related data from cache only. It's false when register the query, and true when getting statistics
    In the design of HBO, all HBO related stats are fetched and cached during query registration in HistoricalStatisticsEquivalentPlanMarkingOptimizer optimizer, and following fetch of stats will access from cache. This PR adds a new parameter cacheOnly to corresponding methods to enforce this behavior.
  • Add verbose run time stats for plan hash, plan canonicalization and meta data read
    This is only enabled when verbose runtime flag is set to true, and will help to debug HBO timeout.

Motivation and Context

This is to reduce the latency of HBO optimizer.

Impact

Tested with a few production queries, see significant drop of latency for HBO optimizer.

Test Plan

Add unit tests. Also tested locally end to end to verify that HBO optimizer latency decreased. In tests, queries which takes minutes to have HBO optimizer timeout now timeouts within specified timeout limit, i.e. 20seconds.

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Improve latency of history based optimizer

Sorry, something went wrong.

@feilong-liu feilong-liu requested review from shrinidhijoshi and a team as code owners November 2, 2023 04:56
@feilong-liu feilong-liu marked this pull request as draft November 2, 2023 04:56
@feilong-liu feilong-liu force-pushed the debug_hbo_timeout_log branch from 947f845 to ed5640c Compare November 2, 2023 18:46
if (enableVerboseRuntimeStats) {
profileStartTime = System.nanoTime();
}
planNodesWithHash.addAll(getPlanNodeHashes(plan, session, false).values());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache only is set to false, as this is during registration of plan, hence we are supposed to have nothing in cache and populate the cache for future use.

Optional<String> hash = planCanonicalInfoProvider.hash(session, statsEquivalentPlanNode, strategy);
allHashesBuilder.put(strategy, new PlanNodeWithHash(statsEquivalentPlanNode, hash));
Optional<String> hash = planCanonicalInfoProvider.hash(session, statsEquivalentPlanNode, strategy, cacheOnly);
if (hash.isPresent()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only return hash when it's present, the returned hash can be empty if the "planCanonicalInfoProvider.hash" timeout.

@@ -147,7 +165,7 @@ private PlanNodeStatsEstimate getStatistics(PlanNode planNode, Session session,
}

PlanNode plan = resolveGroupReferences(planNode, lookup);
Map<PlanCanonicalizationStrategy, PlanNodeWithHash> allHashes = getPlanNodeHashes(plan, session);
Map<PlanCanonicalizationStrategy, PlanNodeWithHash> allHashes = getPlanNodeHashes(plan, session, true);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is within "getStatistics" function which is called when reading estimates from HBO, and we set cacheOnly to true here, as we are supposed to already have cache here, and we do not want to calculate on the fly if cache miss during cost based planning.

@@ -163,7 +181,7 @@ private PlanNodeStatsEstimate getStatistics(PlanNode planNode, Session session,
for (PlanCanonicalizationStrategy strategy : historyBasedPlanCanonicalizationStrategyList()) {
for (Map.Entry<PlanNodeWithHash, HistoricalPlanStatistics> entry : statistics.entrySet()) {
if (allHashes.containsKey(strategy) && entry.getKey().getHash().isPresent() && allHashes.get(strategy).equals(entry.getKey())) {
Optional<List<PlanStatistics>> inputTableStatistics = getPlanNodeInputTableStatistics(plan, session);
Optional<List<PlanStatistics>> inputTableStatistics = getPlanNodeInputTableStatistics(plan, session, true);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly set to cacheOnly to true as it's within getStatistics function.

Map<CacheKey, PlanNodeCanonicalInfo> cache = historyBasedStatisticsCacheManager.getCanonicalInfoCache(session.getQueryId());
PlanNodeCanonicalInfo result = cache.get(key);
if (result != null) {
return Optional.of(result);
if (result != null || cacheOnly) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If cacheOnly set to true, return whatever got from cache. This field will be false during registration call in HBO optimizer, so that all data is calculated and cached during registration, and later usage will only read form cache.

profileTime("CanonicalPlanGenerator", profileStartTime, session);
}
if (loadValueTimeout(startTimeInNano, timeoutInMilliseconds)) {
return Optional.empty();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return early if timeout

Optional<String> hash = planCanonicalInfoProvider.hash(session, statsEquivalentPlanNode, strategy);
Optional<List<PlanStatistics>> inputTableStatistics = planCanonicalInfoProvider.getInputTableStatistics(session, statsEquivalentPlanNode);
Optional<String> hash = planCanonicalInfoProvider.hash(session, statsEquivalentPlanNode, strategy, true);
Optional<List<PlanStatistics>> inputTableStatistics = planCanonicalInfoProvider.getInputTableStatistics(session, statsEquivalentPlanNode, true);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is called at the end to collect the HBO information, hence cacheOnly set to true.

@feilong-liu feilong-liu marked this pull request as ready for review November 2, 2023 19:20
Copy link
Contributor

@mlyublena mlyublena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice finds!

@tdcmeehan
Copy link
Contributor

tdcmeehan commented Nov 3, 2023

Please improve the release note message (see guidelines). Suggestion:

Improve latency of history based optimizer

@feilong-liu feilong-liu changed the title Fix latency for HBO optimizer Improve latency for HBO optimizer Nov 3, 2023
Copy link
Contributor

@ajaygeorge ajaygeorge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % a nit

@feilong-liu feilong-liu force-pushed the debug_hbo_timeout_log branch from ed5640c to 797b59e Compare November 9, 2023 23:53
1. Check timeout for plan canonicalization, plan hash and return empty when timeout
2. Add a flag to specify whether to fetch HBO related data from cache only. It's false when register the query, and
true when getting statistics
3. Add verbose run time stats for plan hash, plan canonicalization and meta data read
@feilong-liu feilong-liu force-pushed the debug_hbo_timeout_log branch from 797b59e to 6e30e08 Compare November 10, 2023 00:24
@feilong-liu feilong-liu merged commit 8815076 into prestodb:master Nov 10, 2023
@feilong-liu feilong-liu deleted the debug_hbo_timeout_log branch November 10, 2023 06:10
@wanglinsong wanglinsong mentioned this pull request Dec 8, 2023
26 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants