-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf_hooks: emit user timing marks, measures and timerify to trace events #18789
Conversation
Didn’t review closely but tests are missing. |
src/node_perf.cc
Outdated
// TODO(jasnell): Once Tracing API is fully implemented, this should | ||
// record a trace event also. | ||
TRACE_EVENT_COPY_MARK_WITH_TIMESTAMP( | ||
"node.perf,node.perf.usertiming", *name, now / 1e3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, this is doing a floating point divide which is more expensive than an integer divide. 1000
would be an integral constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah, good point :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed. Floating point divide + 2 conversions between integer and floating point.
#include <stdint.h>
uint64_t foo(uint64_t nanos) {
return nanos / 1e3;
}
// clang -O3 -mllvm --x86-asm-syntax=intel -S foo.c
movq xmm0, rdi
punpckldq xmm0, xmmword ptr [rip + LCPI0_0] ## xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
subpd xmm0, xmmword ptr [rip + LCPI0_1]
haddpd xmm0, xmm0
divsd xmm0, qword ptr [rip + LCPI0_2]
movsd xmm1, qword ptr [rip + LCPI0_3] ## xmm1 = mem[0],zero
movapd xmm2, xmm0
subsd xmm2, xmm1
cvttsd2si rax, xmm2
movabs rcx, -9223372036854775808
xor rcx, rax
cvttsd2si rax, xmm0
ucomisd xmm0, xmm1
cmovae rax, rcx
pop rbp
ret
@ofrobots ... nit fixed! |
src/node_perf.cc
Outdated
env->performance_entry_callback(), | ||
1, &object); | ||
1, &object, {0, 0}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: what would the the correct context here? To me it seems that an AsyncResource
or async_context
be created when the observer is constructed, and that should be used on the callback. Do you agree?
Aside: Do our compilers support async_context{0,0}
syntax yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still thinking about that. PerformanceObserver
currently is not an AsyncResource
and only actually exists on the JS side. Making it an AsyncResource
would be a minor divergence from the spec but is certainly not out of the question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I don't know about the compilers question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't want to inherit from AsyncResource
, you can attache an async_context
to the PerformanceObserver
instances for the same effect.
I'm okay if you want to do this in a follow-on, as the issue existed before this PR, and is orthogonal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added example to #18810 in case it is useful. If the CI passes there, then that would answer the compiler question :).
564a74f
to
c6ecad5
Compare
@ofrobots ... just went ahead and made |
c6ecad5
to
4108481
Compare
4108481
to
ac17e74
Compare
CI is looking good with one unrelated failure. |
#define INTERNAL_TRACE_EVENT_ADD_WITH_ID_TID_AND_TIMESTAMP( \ | ||
phase, category_group, name, id, thread_id, timestamp, flags, ...) \ | ||
UNIMPLEMENTED() | ||
#define INTERNAL_TRACE_EVENT_ADD_WITH_ID_TID_AND_TIMESTAMP( \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a sync with upstream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In sync with upstream? Nope, not yet. But that reminds me...
@ofrobots ... ^^ To implement this I needed this macro implemented also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jasnell created https://chromium-review.googlesource.com/c/v8/v8/+/924507 that adds the same macro upstream.
Note that for the trace-event macros, a sync with upstream is no longer a necessity. Our copy of trace_event.h
is derived from upstream, but has a slightly different implementation. It is okay for them to pick up features at a different pace. Please do continue to ping if we add more macros locally. There is no need to block on upstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a nit
doc/api/tracing.md
Outdated
|
||
The set of categories for which traces are recorded can be specified using the | ||
`--trace-event-categories` flag followed by a list of comma separated category names. | ||
`--trace-event-categories` flag followed by a list of comma separated category |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flag should be named --trace-events-categories
for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This flag already exists and is not introduced by this PR
ac17e74
to
f67e78d
Compare
Adds the `node.perf.usertiming` trace events category for recording usertiming marks and measures (e.g. `perf_hooks.performance.mark()`) in the trace events timeline. Adds the `node.perf.function` trace events category for recording `perf_hooks.performance.timerify()` durations in the trace events timeline.
f67e78d
to
02adcf7
Compare
PR-URL: #18789 Reviewed-By: Ali Ijaz Sheikh <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
Adds the `node.perf.usertiming` trace events category for recording usertiming marks and measures (e.g. `perf_hooks.performance.mark()`) in the trace events timeline. Adds the `node.perf.function` trace events category for recording `perf_hooks.performance.timerify()` durations in the trace events timeline. PR-URL: #18789 Reviewed-By: Ali Ijaz Sheikh <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
PR-URL: #18789 Reviewed-By: Ali Ijaz Sheikh <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
Should this be backported to |
This depends on #17640 which can't go in v9.x. |
PR-URL: nodejs#18789 Reviewed-By: Ali Ijaz Sheikh <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
Adds the `node.perf.usertiming` trace events category for recording usertiming marks and measures (e.g. `perf_hooks.performance.mark()`) in the trace events timeline. Adds the `node.perf.function` trace events category for recording `perf_hooks.performance.timerify()` durations in the trace events timeline. PR-URL: nodejs#18789 Reviewed-By: Ali Ijaz Sheikh <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
PR-URL: nodejs#18789 Reviewed-By: Ali Ijaz Sheikh <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
This reverts commit 009e418. AFAIU the discussion at [1], PerformanceObserver had been made to inherit from AsyncResource more or less as a band-aid in lack of a better async_context candidate to invoke it in. In order to enable access to AsyncLocalStores from PerformanceObservers invoked synchronously through e.g. measure() or mark(), the current async_context, if any, should be retained. Note that this is a breaking change, but - as has been commented at [1], PerformanceObserver being derived from AsyncResource is a "minor divergence from the spec" anyway, - to my knowledge this is an internal implementation detail which has never been documented and - I can't think of a good reason why existing PerformanceObserver implementations would possibly rely on it. OTOH, it's probably worthwhile to not potentially invoke before() and after() async_hooks for each and every PerformanceObserver notification. [1] nodejs#18789 Co-Authored-By: ZauberNerd <[email protected]> PR-URL: nodejs#36343 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Rich Trott <[email protected]>
This reverts commit 009e418. AFAIU the discussion at [1], PerformanceObserver had been made to inherit from AsyncResource more or less as a band-aid in lack of a better async_context candidate to invoke it in. In order to enable access to AsyncLocalStores from PerformanceObservers invoked synchronously through e.g. measure() or mark(), the current async_context, if any, should be retained. Note that this is a breaking change, but - as has been commented at [1], PerformanceObserver being derived from AsyncResource is a "minor divergence from the spec" anyway, - to my knowledge this is an internal implementation detail which has never been documented and - I can't think of a good reason why existing PerformanceObserver implementations would possibly rely on it. OTOH, it's probably worthwhile to not potentially invoke before() and after() async_hooks for each and every PerformanceObserver notification. [1] #18789 Co-Authored-By: ZauberNerd <[email protected]> PR-URL: #36343 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Rich Trott <[email protected]>
Emit user timing marks/measures and performance.timerify() measures to trace events.
/cc @nodejs/diagnostics @mcollina
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesAffected core subsystem(s)
perf_hooks, trace_events