-
Notifications
You must be signed in to change notification settings - Fork 66
Conversation
* `parentId` {Number} | ||
|
||
Called when a class is constructed that has the possibility to trigger an | ||
asynchronous event. This does mean the instance _will_ trigger a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does not mean
LGTM, except the mentioned nits. |
@trevnorris any chance to you have a branch somewhere with this API implemented that I could checkout and experiment with ? |
@mhdawson Not completely. While writing this a few tweaks were added for API consistency. Most of it is implemented in |
|
||
Since its initial introduction with the `AsyncListener` API, `AsyncWrap` has | ||
slowly evolved to ensure a generalized API that would serve as a solid base for | ||
module authors who wished to add events to the life cycle of the event loop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be worthwhile to describe these life-cycle events at a high level. That is, describe the user model for this API. A lot of what I read below is tied up with implementation details around the native handles. I think most users are concerned about when a callback was enqueued, when that callback started execution, and when that callback completed execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the high level of node's event loop is already covered in detail in the-event-loop-timers-and-nexttick.md
. i can reference that in the EP. also remember this isn't the actual docs entry. it's meant to be a high level overview of the change.
A lot of what I read below is tied up with implementation details around the native handles.
async wrap has always been partially tied to implementation details. that's the reason for its existence, and attempting to abstract that would remove the utility from the user. for example I was doing testing between 4 and 6 and realized that the order of events fires differently for certain operations. this is exactly the type of thing I wanted to know. it helps me debug my code. if we wanted all this abstracted away then we could simply wrap all the js calls. that's not the purpose for async wrap.
I think most users are concerned about when a callback was enqueued
It doesn't work like this. An oncomplete
or similar is assigned to the newly created request upon construction. Hence why I chose to notify on construction of the instance
I think what you want is when the I/O request itself was created, which I believe the actual request instance to perform I/O is always created when requested by the user. So init()
will handle that. the implementation is also simpler than injecting at all I/O calls. e.g. uv_write()
, and since they're run at approximately the same time any counter measurements should be sufficiently correct.
Then pre()
will be called when the async operation is complete, and post()
will be called when the callback has completed execution. Covering your final points, and both of these are documented in the Hook Callbacks section.
As far as measuring fire-to-completion, it's impossible to do reliably in node. Best we can give is when node knows the request completed, but since all completed requests are queued by the kernel until node asks for them it's impossible to get perfect measurements. For example if 100 requests complete at the same time and each callback runs for 10 ms, we wouldn't know the last request completed until 1 second after it actually did. This is fundamentally part of node and not something we can do much about. Users will be able to track these details, and referring back to exposing implementation details of node this is exactly why it's important and not something that can be abstracted in a way to be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Trevor for the link to the doc on the event loop - that's really helpful for me. If the goal of the API is to "add events to the life cycle of the event loop", then why isn't the event loop's phase exposed through API calls? Now, I'm not saying that it should be part of the API, but given stated goal, it isn't clear to why this is omitted.
Which is one of the things I'm driving at: Clarification/simplification around the goals API's goals will help get everyone on the same page around the utility of the API & how it is to be used. Ideally, this would include some definitions, goals/non-goals and some canonical use cases/examples.
async wrap has always been partially tied to implementation details. that's the reason for its existence, and attempting to abstract that would remove the utility from the user.
Can you expand on your example here? I'm not following how the description of the API impacts the ability to understand event ordering.
if we wanted all this abstracted away then we could simply wrap all the js calls. that's not the purpose for async wrap.
Again, precisely what I'm driving for - what is the purpose of the API? My understanding is the goal is to expose lifecycle events around async code execution. If not, then that's fine. If the goal is to provide lifecycle events around handles, then great, but it begs the question of "what is a handle and why does the user care?" (per @jasnell's comment).
It doesn't work like this. An oncomplete or similar is assigned to the newly created request upon construction. Hence why I chose to notify on construction of the instance
Sorry, I'm not following this.
I think what you want is...
I'm not following. We may be talking past each other with terminology. What I need is a notification model around async code execution in node. That is, I want to know precisely the following:
- when is a callback c "made available" for invocation, and what is the parent invocation "making it available"?
- when does a callback c begin invocation?
- when does a callback c end invocation?
- when is a callback c no longer available for invocation (arguably optional, but let's stick with it for now as it cleanly maps to proposed
destroy()
call).
As far as measuring fire-to-completion, it's impossible to do reliably in node.
I think this is OK, but again, it will help to explicitly list the goals & non-goals of the API. At a minimum it will get everyone reading this on the same page.
I'm happy to take a stab at writing up what I think is a user model for the API, and something that fits cleanly into current async wrap implementation. Likely not 100% correct, but at a minimum useful to highlight differences in understanding. Let me know if people think this is a useful effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it begs the question of "what is a handle and why does the user care?" (per @jasnell's comment).
I think this has already been answered but if you don't know it may be more helpful to dig though node core while we're working out a potential API for it. We shouldn't expect 100% of things to be documented for end users in an EP. (see the other EPs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been through the code. A clear definition of a "handle" and how it relates to the goals of the API aren't in the EP. Again, I'm happy to write this up and iterate on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is probably out of scope. This exposes existing mechanics in a more usable way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think telling someone to just go dig through node core while we're working out a potential API for it
is really all that helpful (or friendly). @mike-kaufman appears to be making the point that while this new API appears useful it's not clear given the description in this eps exactly how it would be used and what benefit it brings. Several examples and a description of the lifecycle of a hook would make that much clearer. No one is asking for 100% of things to be documented for end users. What is being asked for is a bit more clarity... and I think a bit more patience would likely be a good thing also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think defining exactly what a handle is, is out of scope here.
My biggest question after reading this and doing some tests with process.binding('async_wrap'); is: don't we have to define what you can/cannot assume in terms of the object passed as 'this' ? I'm thinking there needs to be a mapping between the providers, their callbacks and what object you can expect 'this' to be in each case. If that's true then I start to wonder about how much of the internals we'll be exposing and how that might constrain what we change in the future. Key would be to document what will or won't change across releases in terms of what you get for 'this' and the shape of those objects. As a concrete example for: crypto.randomBytes(): this is -> InternalFieldObject { ondone: { [Function] [length]: 0, [name]: '', [arguments]: null, [caller]: null, [prototype]: { [constructor]: [Circular] } } } crypto.pbkdf2() this is -> InternalFieldObject { ondone: { [Function] [length]: 2, [name]: '' } } and its not clear how I figure out from what's being passed to the hooks how you figure out which specific hook triggered the callback. Maybe more than is currently encoded in the private api is going to be encoded into type in the init call, currently it just looks like the provider. If the type will help to identify the specific callback per provider then defining what will be in type and the values would help. |
Per @mhdawson's comments, are there specific reasons why the actual handle object is being passed to the hooks? I am also concerned about the level of internal details being exposed here. Would be nice to understand the use cases for this. |
I'm fine with the idea that each constructor receives its own unique provider id. Not sure what you mean by "their callbacks", and the expected
All of this is already exposed via As for what we can change, there's no guarantee what fields will be available. Part of the initial concept of this API was users who wanted to know what node was doing. Not have just another abstracted API, that could be done easily enough in another way. I'm sure there'll be disagreement about what we should be able to rely on once a branch has reached stable, but that was never part of the initial design plan. It's basically "accessing Re:
Some users want to store information directly on the handle. Despite the id each has, it's the easiest way to propagate information and allow the GC to clean it up automatically. Ideally in the future there could be a basic set of calls that could be standardized (e.g. 'use strict';
const async_wrap = process.binding('async_wrap');
const print = process._rawDebug;
var handle;
async_wrap.setupHooks({ init() { handle = this } });
async_wrap.enable();
var server = require('net').createServer().listen(8080);
print(server._handle === handle);
server.close();
// output: true I use it for debugging as well. With the understanding that things change, that's part of its utility. By addition of the id, explicitly passing the provider, etc. we're not forcing use of the handle on anyone. Simply making it available in a way that makes sense for the context of the call, and in a way that users like APMs will find very useful. |
Providing storage for the async context is different than exposing the handle though.
IMO, I think providing a context object which has a consistent shape & properties like provider type and handle is a cleaner API than passing the handle directly. It still meets the criteria of providing arbitrary storage associated with the "handle", it provides a place to define a common interface across handles, and it can evolve independently of the underlying handle.
I'm still not following how APMs will utilize the handle. Is there specfiic data on the handle that is useful? If so, what is this? |
Creating and tracking a new async context for every handle, and tracking it, is expensive. By attaching properties directly to the handle instance GC will take care of it all automatically, and at the least expense.
This can be, or at least should be, construct-able by the user. Creating all these new objects filled with properties is expensive, and you're missing that printing the actual contents of the handle is useful. And I don't share the concern about possibly needing to standardize properties in the handle and making it difficult for node to move forward. I've been aiming for a more standardized lower-level API, and "hiding" properties on an object in a significant way has become easier with ES6. But this is a separate topic.
Here's a really basic example script that should explain how useful it is to be able to see the handles themselves while debugging: 'use strict';
const async_wrap = process.binding('async_wrap');
const print = process._rawDebug;
const ctx_array = [];
async_wrap.setupHooks({
init() { /*print(this)*/ },
pre() {
if (ctx_array.indexOf(this) === -1) {
ctx_array.push(this);
print(this);
}
},
});
async_wrap.enable();
process.on('exit', () => print(ctx_array.length));
require('net').createServer(function(c) {
require('fs').readFile(__filename, () => {
c.end(new Buffer(1024 * 1024 * 100).fill('a'));
this.close();
});
}).listen(8080);
require('net').connect(8080, function() { this.resume() }); In there you'll see a I hope this demonstrates the utility for being able to analyse each handle. All of the things mentioned in the previous paragraph cannot be obtained any other way. Removing the ability to see the handle would be a blow to the API, and basically be one step towards moving it to nothing more than a continuation-local-storage API. |
@trevnorris what I was referring to in respect to "their callbacks" was that for the CRYPTO provider there are multiple cases were callbacks are wrapped such that they pre, post methods are invoked (as in my example). I think your comment about making each of these have their own provider id addresses that question. I terms of the discussion about visibility of the handles, from what you describe we should document both in this eps what it's ok/not ok to use the handlers for and what expectations are. For example:
If we believe that documenting a list like this is enough protection from being boxed in later when users of the API are broken by later Node.js releases and complain, then passing the handles would be fine. If we were concerned that despite the warnings we'd still be trying to avoid breakage passing some other field from wrapper could make sense. |
@trevnorris a couple more clarifying questions ... Let's say I create a hook and some dependency module I'm using creates a hook... when those are called, are they passed the same id and handle, different id's same handle, same id's different handle or different ids and different handles? (and by handle here I mean the js object that wraps the actual handle). The main reason I ask is that if I'm attaching additional context to the handle, it would be helpful to also know that other hooks could be attaching their own context to the same handle. I'm still wondering about the potential cost of creating too many of these which is why I think describing the specific lifecycle from when a hook is created to when it is destroyed would be very helpful. While I understand that you've designed and implemented this to be as low impact on performance as possible, there is a non-zero cost to calling these hooks. Have you had the opportunity yet to benchmark an upper limit to the number of hooks that can be created without having a serious impact on performance? My key concern with this is that an app developer may not have any idea that dependency modules they may be using could be going out and creating hooks. Depending on how many such dependencies they have, they could end up seeing degraded performance without any clear indication as to why since installing the hook appears to be a completely transparent operation (that is, there's no indication that a new hook was created). |
9 votes in favor: @indutny @Trott @trevnorris @rvagg @evanlucas @ChALkeR @Fishrock123 @mscdex @cjihrig 0 votes against: 3 abstentions: @misterdjules @addaleax @thealphanerd 6 members who did not cast a vote and did not indicate they were abstaining: @chrisdickinson @shigeki @bnoordhuis @mhdawson @ofrobots @jasnell |
@trevnorris Thanks for the deep explanation. My hope with the create/queue event split was that CLS could link In its current state, async_wrap only solves part of the problem of async transaction tracing. You can get the code path the internals took, but you can't really get the more contextually useful path of how it got through the users code. |
@trevnorris By the way, thank you for all your work in Node and async_wrap.
Is this how the current unofficial async_wrap works? I'm sorry I'm having trouble following the changes in behavior b/w this EPS and the current released async_wrap. The real hope for myself and I think many others using CLS, is if this EPS version eliminates or help reduces the monkey-patching needed in order to have reliable CLS in node? I might be off-base, but it sounds like it won't b/c of its impact on performance? I'd give up a 💩 ton of performance to have reliable CLS. Can we make it opt-in? For the non-embedded folks, hardware and scaling is only getting cheaper. |
The current implementation is a hybrid. There were a lot of requests to allow multiple hook instances. Instead of having a single global set. This required propagating the state of hooks for each execution stack.
When it comes to patching
I put in many hours in an attempt to make this functionality opt-in, but the problem is a certain amount of state still needs to be maintained and propagated because a hook may be enabled at any time. In my PR nodejs/node#8531 I purposely left the commits intact so anyone could view the effort to keep the old functionality in. |
@Qard I think @trevnorris is going to be making |
Maybe. I'll have to dig into it at some point to see how it works. I don't work in APM anymore, so I haven't been paying super close attention to this stuff lately. 😕 |
@@ -286,9 +308,9 @@ const async_hooks = require('async_hooks'); | |||
const net = require('net'); | |||
|
|||
asyn_hooks.createHook({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a typo here, should be async_hooks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:-P Thanks. On it.
reasonable use case. This API is meant to err on the side of requiring explicit | ||
instructions from the user. | ||
|
||
To help simplify this `enable()` returns the instance of itself so the calls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should say "createHook()
returns the instance of itself..."
// used in other emit calls. | ||
async_hooks.emitInit(id, handle, type[, triggerId]); | ||
|
||
// Call the before() callbacks. The reason for requiring both arguments is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this line now redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forgot to add a parameter to emitBefore()
.
|
||
The following is a simple demonstration of this: | ||
|
||
```cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-```cpp
+```js
7c07d82
to
7c6199a
Compare
52d7c97
to
30b9268
Compare
After much investigation and communication this is the API for AsyncHooks that has evolved. Meant to be minimal, not impose any performance penalty to core when not being used, and minimal impact when it is used, this should serve public needs that have been expressed over the last two years. PR-URL: #18 Reviewed-By: Fedor Indutny <[email protected]> Reviewed-By: Jeremiah Senkpiel <[email protected]>
30b9268
to
523e9aa
Compare
After much investigation and communication this is the API that has
surfaced. Meant to be minimal, not impose any performance penalty to
core when not being used, and minimal impact when it is used, this
should serve public needs that have been expressed over the last two
years.
@nodejs/ctc I'd like the initial review explicitly from the CTC before this is opened for too much external debate. Because experience has shown me that there will be suggestions/changes for those who want specific features and/or additions to suit their specific use case. Usually not taking the time to realize that this API is enough. They just need to write the additional code for the hooks.