-
-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
extending workunits for declarative messaging in noninteractive v2 @rules #7071
Comments
A few thoughts...
|
This is interesting and I hadn't considered this. I was considering the opposite direction, of requesting a workunit, and then having a process execution request type which tees stdout/stderr to it (although this wasn't clear, because the first snippet did exactly what you describe in (4)). Something like: workunit = yield Get(Workunit, WorkunitRequest('rsc'))
# too many `req`s here
rsc_req = LoggingExecuteProcessRequest(req.req, workunit, [WorkUnitLabel.COMPILER])
# this tees stdout/stderr to the workunit in `rsc_req`
# would need a new intrinsic for LoggingExecuteProcessRequest -> ExecuteProcessResponse
res = yield Get(ExecuteProcessResponse, LoggingExecuteProcessRequest, rsc_req)
I agree! This was hidden in comments in the second snippet, but I was thinking that we can then add a non-hierarchical (also: non-synchronous) tag (separate from _ = yield Get(WorkunitOutputCompleted, WorkunitOutputRequest(workunit, 'compiling x:y [3/4]', [WorkUnitMessageType.MESSAGE] This wasn't clear because I kind of just thought of it as I was writing the above, but I think this sideways axis of "message type" vs "workunit context" would be the link to v2 output.
This is extremely interesting!!! I think this dovetails very nicely with the idea of introducing the |
Two thoughts from a message in
A thought I had on top of (2) was the the rule itself could identify a parent workunit, and a process execution a child workunit (using the |
It might also be really nice to use this work as an excuse to get a better handle (get it?) on workunit output in general so that it can be hygienically consumed by external processes. Currently, if a tool wants to read pants's output (e.g. to find compile failures in CI), they have to literally parse what we print to stdout, on top of whatever they actually want to parse out of each tool. We can use regex (:heart:) to hackily accomplish some of this (aka avoid parsing), but that gets away from the fact that we already have a very well-defined structure for output (which is currently hierarchical). I would love to be able to use some option (say:
, which would take all output sent to stdout from the And We would also then have a The extension to stdin seems extremely clear and very interesting and could potentially make I would want to keep all of this in mind (the user interface on the cli) when considering how to extend workunit logging to v2 rules (the rule developer interface in python). I think we don't need to worry about synchronization for file writing in the v1 case, by design. In the v2 case this becomes a concern (an easily addressable one, however). When developing the interface for declaring workunit output in v2 rules as described above, we should keep in mind and probably at the same time develop the interface for mapping them to cli options (do we have an automatic |
Also, thinking about zipkin tracing of pants runs (very relevant: #5903) is what made me think of this ticket in the first place, and it would be extremely wonderful if we could take whatever infrastructure we build to support the |
FINAL THOUGHT: We could also consider making
This may be easier to implement, as it maps directly to (in python) This also means we can use |
Some more followup from
|
There is a lot in this thread, some of which invalidates previous points. I think that getting it in a small design doc would be a better way to identify open questions, discuss it, and then move forward on implementation. Having said that: some responses. Re:
So, while logging is technically something that should always be async (since it will touch a disk), it is also fairly straightforward to make logging asynchronous with a synchronous API by just putting something in a queue and flushing that in the background. I think that I have been assuming that we would preserve a synchronous logging/workunit API. So then the only thing that would actually be async would be the creation of the My largest open question is where the hierarchy of workunits will come from in v2, or whether we're saying that in v2 we will use workunits more like zipkin spans, with no hierarchy, and just start/end times. But it does seem promising that the dapper/zipkin model "always" seems to result in a hierarchically renderable dataset. I'm not 100% sure how that happens, and it would likely be good to look at the datamodel a bit more deeply to see whether child zipkin/dapper |
Yeah, zipkin spans have a parent pointer :) |
Closing given the recent work on workunits in V2. |
These are a few thoughts on how to expose console logging to non-interactive tasks executing in v2
@rule
s, such as compilations, in a way which can support the complex and intricate logging we have for e.g. zinc compiles by strictly extending the current workunit concept, while maintaining declarativity (and therefore giving the engine more control over how to output the messages while the python code can just describe what is being said). Note that the solution(s) described in this issue (as elaborated at the bottom) is backwards-compatible with v1 workunit logging.See this slack message in
#engine
which this issue semi-quotes.This was the first kind of solution I was thinking of:
I then spent a little more time thinking about this after making the comment about formulating messaging as a
yield Get(...)
above, and arrived at the following. I believe this would strictly extend the v1 workunit logging by simply allowing individual workunit messages to have moreWorkUnitLabel
s in addition to the ones inherited from the workunit itself. The idea is that labels attached to individual messages would enable a dynamic view of executing workunits, in a way that is recently enabled by pantsd and the v2 engine's control over execution -- see #6004 (and this comment linking to the v2 frontend high-level design doc).Using
WorkUnitLabel
s both to describe workunits as well as their individual messages (e.g MESSAGE vs process output (which would be logged with the intrinsic mentioned above) in the above example) might be shoving competing concerns intoWorkUnitLabel
, so we could consider extending that or using a separate set of tags to differentiate messages within the workunit. Note that this would be backwards compatible with v1 logging -- replacingworkunit.output('stdout').write('asdf')
with a version which has such a per-message tag -- this is akin to the difference between the two code snippets in this issue.In the far future, workunit labels could maybe also be used for dynamically scheduling which v2 nodes to run first, but that's not at all relevant right now.
The text was updated successfully, but these errors were encountered: