Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MINOR] Fix usages of orElse #10435

Merged
merged 7 commits into from
Jan 10, 2024
Merged

Conversation

the-other-tim-brown
Copy link
Contributor

@the-other-tim-brown the-other-tim-brown commented Jan 1, 2024

Change Logs

There are places in the codebase where the code uses orElse when it should be orElseGet to avoid running some computation or creation of unnecessary objects.

Impact

Reduce amount of objects created and computation done

Risk level (write none, low medium or high below)

none

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@the-other-tim-brown the-other-tim-brown changed the title fix usage of orElse to use orElseGet throughout the codebase when nec… [MINOR] Fix usage of orElse Jan 1, 2024
@the-other-tim-brown the-other-tim-brown marked this pull request as ready for review January 1, 2024 17:15
@the-other-tim-brown the-other-tim-brown changed the title [MINOR] Fix usage of orElse [MINOR] Fix usages of orElse Jan 1, 2024
@@ -1016,7 +1016,7 @@ private List<String> getInstantsToRollbackForLazyCleanPolicy(HoodieTableMetaClie
@Deprecated
public boolean rollback(final String commitInstantTime, Option<HoodiePendingRollbackInfo> pendingRollbackInfo, boolean skipLocking) throws HoodieRollbackException {
final String rollbackInstantTime = pendingRollbackInfo.map(entry -> entry.getRollbackInstant().getTimestamp())
.orElse(createNewInstantTime(!skipLocking));
.orElseGet(() -> createNewInstantTime(!skipLocking));
return rollback(commitInstantTime, pendingRollbackInfo, rollbackInstantTime, skipLocking);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so #orElseGet is always preferrable than #orElse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, you do not want to execute methods or create objects you will not use. Therefore you can use orElse when returning a constant but otherwise you should avoid it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@@ -1016,7 +1016,7 @@ private List<String> getInstantsToRollbackForLazyCleanPolicy(HoodieTableMetaClie
@Deprecated
public boolean rollback(final String commitInstantTime, Option<HoodiePendingRollbackInfo> pendingRollbackInfo, boolean skipLocking) throws HoodieRollbackException {
final String rollbackInstantTime = pendingRollbackInfo.map(entry -> entry.getRollbackInstant().getTimestamp())
.orElse(createNewInstantTime(!skipLocking));
.orElseGet(() -> createNewInstantTime(!skipLocking));
return rollback(commitInstantTime, pendingRollbackInfo, rollbackInstantTime, skipLocking);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@@ -107,23 +107,19 @@ object HoodieSparkUtils extends SparkAdapterSupport with SparkVersionsSupport wi
// injecting [[SQLConf]], which by default isn't propagated by Spark to the executor(s).
// [[SQLConf]] is required by [[AvroSerializer]]
injectSQLConf(df.queryExecution.toRdd.mapPartitions { rows =>
if (rows.isEmpty) {
Iterator.empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does removal of this provide any benefit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will trigger the dag to see if it is in fact returning an empty set of rows. You can see this on the spark UI when running your jobs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

InputBatch inputBatch = readFromSource(instantTime, metaClient);
LOG.error("Time to read from source : " + (System.currentTimeMillis() - startInput));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use HoodieTimer to track execution time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was from my own debugging of slow tests. I will revert

@@ -448,7 +450,9 @@ public Pair<Option<String>, JavaRDD<WriteStatus>> syncOnce() throws IOException
}
}

long startWrite = System.currentTimeMillis();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar here on using HoodieTimer and below.

@hudi-bot
Copy link

hudi-bot commented Jan 6, 2024

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Copy link
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yihua yihua merged commit 57a0846 into apache:master Jan 10, 2024
25 of 31 checks passed
@the-other-tim-brown the-other-tim-brown deleted the fix-or-else-usage branch January 10, 2024 22:32
VitoMakarevich pushed a commit to VitoMakarevich/hudi that referenced this pull request Jan 13, 2024
VitoMakarevich pushed a commit to VitoMakarevich/hudi that referenced this pull request Jan 13, 2024
yihua pushed a commit that referenced this pull request Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants