-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
util: refactor format method. #12407
Conversation
Method format refactored to make it more maintenable, replacing the switch by a function factory, that returns the appropiated function given the character (d, i , f, j, s). Also, performance when formatting an string that contains several consecutive % symbols is improved. The test: `const numSamples = 10000000; const strPercents = '%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%s%%%%%%%%%%%%%%%%%i%%%%%%%%%%%%%%%%%%%%%%%%%%'; var s; console.time('Percents'); for (let i = 0; i < numSamples; i++) { s = util.format(strPercents, 'test', 12); } console.timeEnd('Percents');` Original time: 28399.708ms After refactor: 23763.788ms Improved: 16%
Thank you for the contribution! |
lib/util.js
Outdated
break; | ||
if (lastPos < i) | ||
str += f.slice(lastPos, i); | ||
str += tryStringify(arguments[a++]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this function was used only here (linter reports 'tryStringify' is defined but never used
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. I commited the changes. Also passed all tests.
lib/util.js
Outdated
lastPos = i = i + 2; | ||
continue; | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linter: Trailing spaces not allowed
|
Thanks for your pull request. I think you should write a test before refactoring. |
These changes as-is introduce performance regressions:
I didn't bother running with I believe similar changes had been suggested in the past (although with a lookup table instead of a function with a switch) and there were performance regressions back then as well. I'm not sure if (or how much) inlining the helper function would make a positive difference. |
I updated the PR with these changes:
About performance asked by @mscdex: Previous master:
This PR
Also I checked with my own script to compare between current master and this PR. This is the script that I used:
And these are the results with my script: Previous master
This PR:
After 10 different runs, I always get similar times for the Short, Normal, and Less Parameters, but for the Percents (use of '%%%%%%%%%%' inside the string) I'm getting always best performance with the PR code. |
test/parallel/test-util-format.js
Outdated
assert.strictEqual(util.format('o: %j, a: %j', {}, []), 'o: {}, a: []'); | ||
assert.strictEqual(util.format('o: %j, a: %j', {}), 'o: {}, a: %j'); | ||
assert.strictEqual(util.format('o: %j, a: %j'), 'o: %j, a: %j'); | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: unneccesary newline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if perf is the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM, I'm fine with a ~5% regression in perf of util.format since it's not used in "hot paths" of node apps anyway and I think the new code is a lot more readable.
@benjamingr I wouldn't say it's not in a "hot path." I would bet at least logging modules and such make use of it. I would also bet that having consecutive %'s like that is uncommon. |
@mscdex You're right, consecutive %'s is uncommon. But the objective of this PR is not to improve the performance, as you can read in the description the objective is to make it more maintenable. I checked the perf in a Linux, a Windows and a docker, and I'm not able to reproduce the -5%. If you provide me a way to reproduce it I will put my hands on it! :) Also, the final target of this PR is, perhaps in the future, add more usable formats (%u, %o, %x, %X, %e and %E) without having duplicated code go total nightmare. |
I was under the impression logging modules are never in hot paths anyway - but I'm ready to be educated :) |
@benjamingr Obviously it depends on the application. It could easily be a bigger issue when you enable extensive debugging in your application in production (even temporarily). You could make a similar argument for other node core modules/functions that may/may not be directly used by core itself (e.g. @jseijas Here's what I used:
Where |
Improved performance
@mscdex Changed the way of doing the refactor. With the new way:
I don't understand why for %j the performance is decreased. But also while refactoring I found some odd behaviours with the performance, example: |
@jseijas It's hard to say why offhand without running the benchmark with I would expect it to be somewhat slower (but not necessarily a regression) because the object type case is farther down in the switch and V8 matches each in order. |
lib/util.js
Outdated
str += '%'; | ||
lastPos = i = i + 2; | ||
continue; | ||
if (f.charCodeAt(i+1) === 37 || a < argLen) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linter: Infix operators must be spaced
Fix lint error: Infix operators must be spaced
I don't know - I just want to see a case where the performance is meaningful in util.format. Even a single telemetry tool that logs requests and uses
I might be seeing this the wrong way - but
Of course, a faster implementation would parse the format string once and then compile it to a function with |
@benjamingr Well, I just was doing several changes and learning how they affect to the performance. I'm with you, I never considered util.format as critical, I consider it more like an utility, and I think that could be interesting to evolve it to take into account the different formats of "printf". I think that can be interesting to have at least %e and %x provided by the core, without installing any other package. Even more interesting to have things like "%8.2f", but in this case a deeper modification of the function is needed. But if we think "ok, it's critical, we are using it for the logs of the app", then we can see that we got it improved a 15% for numbers and strings (the usual case of logs). For those that wants to stringify objects inside the messages, we are talking of a -5% performance, in my laptop it means 0.0001ms per message... then I have to store this log, so I don't think that this time is really relevant if we consider the I/O operation for storing. We can consider it relevant if our system is storing huge amounts of logs per second, but in this case I think there are a bunch of problems at architecture level of the app before even looking to this 5%. But well, is my point of view ^^ |
I would prefer we not knowingly allow a performance regression to util.format(). It is used in things like |
Ok, I changed the code to focus on performance. Is still more readable than the original one, but now the performance is:
So we have more readable code, and with better performance. Yay! Even with that, I'm not happy. I want to say:
|
One thing that I don't understand: There is a real problem with the no-replace performance, and is due to the fact that the behaviour of util.format(1,2) is totally different from util.format('', 1, 2), even returning the same result. If we take a look into the code the first is resolved with const objects = new Array(arguments.length);
for (var index = 0; index < arguments.length; index++) {
objects[index] = inspect(arguments[index]);
}
return objects.join(' '); and the second one with: while (a < argLen) {
const x = arguments[a++];
if (x === null || (typeof x !== 'object' && typeof x !== 'symbol')) {
str += ' ' + x;
} else {
str += ' ' + inspect(x);
}
} Where we see that the inspect is only called when the object is not null, and is an object or symbol, so no inspect will be called for strings or numbers. That's the reason why no-replace benchmark takes so much time compared with the other options. Replacing the first part of the util.format with this: var a = 0;
var str = '';
if (typeof f !== 'string') {
while (a < arguments.length) {
if (a > 0)
str += ' ';
const x = arguments[a++];
if (x === null || (typeof x !== 'object' && typeof x !== 'symbol')) {
str += x;
} else {
str += inspect(x);
}
}
return str;
} then we get this benchmark result for no-replace:
@mscdex Is this fix good, or it something that I'm not taking into account and there is a good reason for doing inspect for all the parameters only at the no-replace part? I think that the differences should be the stylize, the "-0" case, and the replacements done in strings. But if we are defending that |
[Silly comment] how about adding |
@jseijas I think backwards compatibility should be preferred since then more branches can receive at least some amount of increased performance, but if backwards compatibility can't be met and you feel the performance improvement is great enough and is worth marking as semver-major, then so be it. @refack I don't think getting into that kind of habit with the node API is a good goal. If anything it's usually a sign that the changes should just go into a userland module instead. |
@mscdex Ok, then I'll make this change ina new PR, if this one is merged, because currently this one has not backwards impact. And also are different topics. To give more information about how critical is the impact that this have, running |
LGTM but I'd like @mscdex to sign off before it lands |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fresh CI: https://ci.nodejs.org/job/node-test-commit/9516/
I’m good with landing this as it is once CI comes back good.
Hi @mscdex! Do you need more changes to be made, or you are ok with this? Or any other suggestion. |
lib/util.js
Outdated
|
||
var str = ''; | ||
var a = 1; | ||
var lastPos = 0; | ||
for (var i = 0; i < f.length;) { | ||
if (f.charCodeAt(i) === 37/*'%'*/ && i + 1 < f.length) { | ||
if (f.charCodeAt(i) === 37/*'%'*/ && f.length > i + 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having the "static" portion on the right-hand side is more traditional.
lib/util.js
Outdated
|
||
var str = ''; | ||
var a = 1; | ||
var lastPos = 0; | ||
for (var i = 0; i < f.length;) { | ||
if (f.charCodeAt(i) === 37/*'%'*/ && i + 1 < f.length) { | ||
if (f.charCodeAt(i) === 37/*'%'*/ && f.length > i + 1) { | ||
if (f.charCodeAt(i + 1) !== 37 && a >= arguments.length) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The /*'%'*/
is missing here for consistency.
lib/util.js
Outdated
@@ -143,7 +131,6 @@ exports.format = function(f) { | |||
return str; | |||
}; | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unintended whitespace change?
lib/util.js
Outdated
switch (f.charCodeAt(i + 1)) { | ||
case 100: // 'd' | ||
if (a >= argLen) | ||
break; | ||
if (lastPos < i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer this style because lastPos
will always be either less than or equal to i
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't understand this point. You meain only remain lastPos < i
instead of i > lastPos
? Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean I prefer lastPos < i
- Correct whitespace - Static parts of if before unstatic.
@mscdex Changes done! |
After rebase:
|
LGTM |
Thanks! Landed in 4191962. |
Method format refactored to make it more maintenable, replacing the switch by a function factory, that returns the appropiated function given the character (d, i , f, j, s). Also, performance when formatting an string that contains several consecutive % symbols is improved. The test: `const numSamples = 10000000; const strPercents = '%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%s%%%%%%%%%%%%%%%%%i%%%%%%%%%%%%%%%%%%%%%%%%%%'; var s; console.time('Percents'); for (let i = 0; i < numSamples; i++) { s = util.format(strPercents, 'test', 12); } console.timeEnd('Percents');` Original time: 28399.708ms After refactor: 23763.788ms Improved: 16% PR-URL: #12407 Reviewed-By: Benjamin Gruenbaum <[email protected]> Reviewed-By: Roman Reiss <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Brian White <[email protected]>
Method format refactored to make it more maintenable, replacing the switch by a function factory, that returns the appropiated function given the character (d, i , f, j, s). Also, performance when formatting an string that contains several consecutive % symbols is improved. The test: `const numSamples = 10000000; const strPercents = '%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%s%%%%%%%%%%%%%%%%%i%%%%%%%%%%%%%%%%%%%%%%%%%%'; var s; console.time('Percents'); for (let i = 0; i < numSamples; i++) { s = util.format(strPercents, 'test', 12); } console.timeEnd('Percents');` Original time: 28399.708ms After refactor: 23763.788ms Improved: 16% PR-URL: #12407 Reviewed-By: Benjamin Gruenbaum <[email protected]> Reviewed-By: Roman Reiss <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Brian White <[email protected]>
FWIW it looks like this could be causing issues due to the change in behavior (see #13665), so I've marked this as semver-major now. |
Method format refactored to make it more maintenable, replacing the
switch by a function factory, that returns the appropiated function
given the character (d, i , f, j, s).
Also, performance when formatting an string that contains several
consecutive % symbols is improved. The test:
Original time: 28399.708ms
After refactor: 23763.788ms
Improved: 16%
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesAffected core subsystem(s)