Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(core): concat strings before computing hash #2773

Merged
merged 1 commit into from
Apr 23, 2021

Conversation

merceyz
Copy link
Member

@merceyz merceyz commented Apr 18, 2021

What's the problem this PR addresses?

hashUtils.makeHash calls Hash.update for every argument

How did you fix it?

Concat all string arguments in a row before calling update

Results

Testing on the repro provided in #973 (comment)

- 712ms (29%)     resolvePeerDependenciesImpl
+ 525ms (23.07%)  resolvePeerDependenciesImpl

Checklist

  • I have read the Contributing Guide.
  • I have set the packages that need to be released for my changes to be effective.
  • I will check that all automated PR checks pass before the PR gets reviewed.

@merceyz merceyz requested a review from arcanis as a code owner April 18, 2021 21:38
@andreialecu
Copy link
Contributor

Just wondering, but could makeHash instead be updated to concat the arrays to a string within it?

export function makeHash<T extends string = string>(...args: Array<BinaryLike | null>): T {
const hash = createHash(`sha512`);
for (const arg of args)
hash.update(arg ? arg : ``);
return hash.digest(`hex`) as T;
}

Seems like it would go at the root of the problem.

@merceyz
Copy link
Member Author

merceyz commented Apr 19, 2021

I tested that before (and again now) but it's actually slower

@andreialecu
Copy link
Contributor

That's weird, seems it would be equivalent to the changes in the PR. What could be the difference? How did the implementation look?

@merceyz
Copy link
Member Author

merceyz commented Apr 19, 2021

Well the changes in this PR gets rid of two iterations (the spread and the loop in makeHash), applying the attached diff on top of this PR makes the time spent in makeHash go from 752ms to 953ms

diff --git a/packages/yarnpkg-core/sources/hashUtils.ts b/packages/yarnpkg-core/sources/hashUtils.ts
index 247d826fc..989bdc39d 100644
--- a/packages/yarnpkg-core/sources/hashUtils.ts
+++ b/packages/yarnpkg-core/sources/hashUtils.ts
@@ -5,8 +5,18 @@ import globby                     from 'globby';
 export function makeHash<T extends string = string>(...args: Array<BinaryLike | null>): T {
   const hash = createHash(`sha512`);
 
-  for (const arg of args)
-    hash.update(arg ? arg : ``);
+  const acc: Array<Buffer> = [];
+  let totalLength = 0;
+  for (const arg of args) {
+    if (arg) {
+      // @ts-expect-error
+      const buffer = Buffer.from(arg);
+      totalLength += buffer.length;
+      acc.push(buffer);
+    }
+  }
+
+  hash.update(Buffer.concat(acc, totalLength));
 
   return hash.digest(`hex`) as T;
 }

@andreialecu
Copy link
Contributor

I think that's possibly due to the Buffer.from() overhead.

BinaryLike is actually string | Buffer so you can avoid the Buffer.from call and simply do args.filter(p=>p).join('') and update just once with that (assuming the args are string[] and not Buffer[])

@arcanis
Copy link
Member

arcanis commented Apr 22, 2021

I'm not a fan of "leaking" this optimization in the caller (because then it would suggest we should do that everywhere makeHash is called, and I'm worry it'd decrease readability; additionally, the semantic of separate arguments is lost, which may matter should we need to safeguard against hash generation attacks later on). What about optimizing inside makeHash for the specific string use case? Something like this:

export function makeHash<T extends string = string>(...args: Array<BinaryLike | null>): T {
  const hash = createHash(`sha512`);

  for (let t = 0, T = args.length; t < T; ++t) {
    let acc = args[t];
    while (typeof args[t + 1] === `string`)
      acc += args[++t];
    hash.update(acc);
  }

  return hash.digest(`hex`) as T;
}

@merceyz merceyz changed the title perf(core): reduce resolutions before calculating virtual hash perf(core): concat strings before computing hash Apr 22, 2021
@merceyz merceyz force-pushed the merceyz/perf/resolve-peers branch from ea1a3ee to 248d353 Compare April 22, 2021 22:28
@merceyz
Copy link
Member Author

merceyz commented Apr 22, 2021

Moved the logic to makeHash, it's about 15ms slower than the previous diff but I'm fine with that

@arcanis arcanis merged commit 5b8c0e2 into master Apr 23, 2021
@arcanis arcanis deleted the merceyz/perf/resolve-peers branch April 23, 2021 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants