perf(core): concat strings before computing hash #2773

merceyz · 2021-04-18T21:38:54Z

What's the problem this PR addresses?

hashUtils.makeHash calls Hash.update for every argument

How did you fix it?

Concat all string arguments in a row before calling update

Results

Testing on the repro provided in #973 (comment)

- 712ms (29%)     resolvePeerDependenciesImpl
+ 525ms (23.07%)  resolvePeerDependenciesImpl

Checklist

I have read the Contributing Guide.
I have set the packages that need to be released for my changes to be effective.
I will check that all automated PR checks pass before the PR gets reviewed.

andreialecu · 2021-04-19T08:19:58Z

Just wondering, but could makeHash instead be updated to concat the arrays to a string within it?

berry/packages/yarnpkg-core/sources/hashUtils.ts

Lines 5 to 12 in 9ff0615

    
           export function makeHash<T extends string = string>(...args: Array<BinaryLike | null>): T { 
        
             const hash = createHash(`sha512`); 
        
             for (const arg of args) 
        
               hash.update(arg ? arg : ``); 
        
             return hash.digest(`hex`) as T; 
        
           }

Seems like it would go at the root of the problem.

merceyz · 2021-04-19T09:41:56Z

I tested that before (and again now) but it's actually slower

andreialecu · 2021-04-19T09:48:46Z

That's weird, seems it would be equivalent to the changes in the PR. What could be the difference? How did the implementation look?

merceyz · 2021-04-19T10:12:36Z

Well the changes in this PR gets rid of two iterations (the spread and the loop in makeHash), applying the attached diff on top of this PR makes the time spent in makeHash go from 752ms to 953ms

diff --git a/packages/yarnpkg-core/sources/hashUtils.ts b/packages/yarnpkg-core/sources/hashUtils.ts
index 247d826fc..989bdc39d 100644
--- a/packages/yarnpkg-core/sources/hashUtils.ts
+++ b/packages/yarnpkg-core/sources/hashUtils.ts
@@ -5,8 +5,18 @@ import globby                     from 'globby';
 export function makeHash<T extends string = string>(...args: Array<BinaryLike | null>): T {
   const hash = createHash(`sha512`);
 
-  for (const arg of args)
-    hash.update(arg ? arg : ``);
+  const acc: Array<Buffer> = [];
+  let totalLength = 0;
+  for (const arg of args) {
+    if (arg) {
+      // @ts-expect-error
+      const buffer = Buffer.from(arg);
+      totalLength += buffer.length;
+      acc.push(buffer);
+    }
+  }
+
+  hash.update(Buffer.concat(acc, totalLength));
 
   return hash.digest(`hex`) as T;
 }

andreialecu · 2021-04-19T10:14:47Z

I think that's possibly due to the Buffer.from() overhead.

BinaryLike is actually string | Buffer so you can avoid the Buffer.from call and simply do args.filter(p=>p).join('') and update just once with that (assuming the args are string[] and not Buffer[])

arcanis · 2021-04-22T11:58:58Z

I'm not a fan of "leaking" this optimization in the caller (because then it would suggest we should do that everywhere makeHash is called, and I'm worry it'd decrease readability; additionally, the semantic of separate arguments is lost, which may matter should we need to safeguard against hash generation attacks later on). What about optimizing inside makeHash for the specific string use case? Something like this:

export function makeHash<T extends string = string>(...args: Array<BinaryLike | null>): T {
  const hash = createHash(`sha512`);

  for (let t = 0, T = args.length; t < T; ++t) {
    let acc = args[t];
    while (typeof args[t + 1] === `string`)
      acc += args[++t];
    hash.update(acc);
  }

  return hash.digest(`hex`) as T;
}

merceyz · 2021-04-22T22:30:21Z

Moved the logic to makeHash, it's about 15ms slower than the previous diff but I'm fine with that

merceyz requested a review from arcanis as a code owner April 18, 2021 21:38

perf(core): concat strings before computing hash

248d353

merceyz changed the title ~~perf(core): reduce resolutions before calculating virtual hash~~ perf(core): concat strings before computing hash Apr 22, 2021

merceyz force-pushed the merceyz/perf/resolve-peers branch from ea1a3ee to 248d353 Compare April 22, 2021 22:28

arcanis merged commit 5b8c0e2 into master Apr 23, 2021

arcanis deleted the merceyz/perf/resolve-peers branch April 23, 2021 07:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(core): concat strings before computing hash #2773

perf(core): concat strings before computing hash #2773

merceyz commented Apr 18, 2021 •

edited

Loading

andreialecu commented Apr 19, 2021

merceyz commented Apr 19, 2021

andreialecu commented Apr 19, 2021

merceyz commented Apr 19, 2021

andreialecu commented Apr 19, 2021

arcanis commented Apr 22, 2021 •

edited

Loading

merceyz commented Apr 22, 2021

perf(core): concat strings before computing hash #2773

perf(core): concat strings before computing hash #2773

Conversation

merceyz commented Apr 18, 2021 • edited Loading

andreialecu commented Apr 19, 2021

merceyz commented Apr 19, 2021

andreialecu commented Apr 19, 2021

merceyz commented Apr 19, 2021

andreialecu commented Apr 19, 2021

arcanis commented Apr 22, 2021 • edited Loading

merceyz commented Apr 22, 2021

merceyz commented Apr 18, 2021 •

edited

Loading

arcanis commented Apr 22, 2021 •

edited

Loading