Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extending workunits for declarative messaging in noninteractive v2 @rules #7071

Closed
cosmicexplorer opened this issue Jan 13, 2019 · 10 comments
Closed

Comments

@cosmicexplorer
Copy link
Contributor

These are a few thoughts on how to expose console logging to non-interactive tasks executing in v2 @rules, such as compilations, in a way which can support the complex and intricate logging we have for e.g. zinc compiles by strictly extending the current workunit concept, while maintaining declarativity (and therefore giving the engine more control over how to output the messages while the python code can just describe what is being said). Note that the solution(s) described in this issue (as elaborated at the bottom) is backwards-compatible with v1 workunit logging.

See this slack message in #engine which this issue semi-quotes.

we might want the v2 equivalent of workunit output to be ~tied to process executions (like in v1), and that instead of either side of the spectrum of "i want a Console to do with as i please" to "only process executions can log to the console" we might be able to extend or wrap ExecuteProcessRequest ("similar" to how we extend it to FallibleExecuteProcessRequest) to cover logging in a way that's more declarative than "write what you want to the console" but less than "you can only write tagged raw strings".

This was the first kind of solution I was thinking of:

class LoggingExecuteProcessRequest(datatype([
  ('req', ExecuteProcessRequest)
  'run_info', # arbitrary string?
])):
  pass

# The GIL is in play so writing to the `Workunit` here is safe.
# Could consider @logging_rule or some other --
# would prefer to keep the number of rule decorators low.
@rule(ExecuteProcessResponse, [Select(LoggingExecuteProcessRequest), Select(Workunit)])
def execute_with_workunit(req, workunit):
  # this should absolutely be a yield Get(...), hm....
  workunit.output('stdout').write('running my process: {}'.format(req.run_info))
  res = yield Get(ExecuteProcessResponse, ExecuteProcessRequest, req.req)
  workunit.output('stdout').write('completed!')
  yield res

I then spent a little more time thinking about this after making the comment about formulating messaging as a yield Get(...) above, and arrived at the following. I believe this would strictly extend the v1 workunit logging by simply allowing individual workunit messages to have more WorkUnitLabels in addition to the ones inherited from the workunit itself. The idea is that labels attached to individual messages would enable a dynamic view of executing workunits, in a way that is recently enabled by pantsd and the v2 engine's control over execution -- see #6004 (and this comment linking to the v2 frontend high-level design doc).

class LoggingExecuteProcessRequest(datatype([
  ('req', ExecuteProcessRequest),
  ('workunit', Workunit),
  # hints for the v2 engine dynamic ui (corresponding to `WorkUnitLabel`)
  # this is a non-hierarchical description (as opposed to workunit output)
  ('labels', tuple),
]): pass
class WorkunitRequest(datatype([('name', text_type)])): pass
# workunit output coming from python code, not the output of a process -- 
# i'm thinking this makes sense for mirroring e.g.
# the zinc compile output we have now, see below
class WorkunitOutputRequest(datatype([
  ('workunit', Workunit),
  ('string', text_type), # the string to write
  ('labels', tuple), # same as 'labels' just above
])): pass

class RscInvocationRequest(datatype([('req', ExecuteProcessRequest)])): pass
# not being too imaginative here
class RscInvocationResponse(datatype([('res', ExecuteProcessResponse)])): pass

@rule(RscInvocationResponse, [Select(RscInvocationRequest)])
def execute_rsc_with_workunit(req):
  # Would it be useful to allow an abbreviated `yield Get(Workunit, WorkunitRequest('rsc'))` here?
  # This could be done with not too much parsing, but not that important right now -- using it for brevity here
  workunit = yield Get(Workunit, WorkunitRequest('rsc'))
  # I don't like `_ = yield Get(...)` if possible, but I suppose we might have the haskellers on board now.
  # I would also love to know if there is a useful response datatype to give here as the result of the yield.
  # I'm aware this is different than the current "request a workunit, which
  # itself has the labels attached" -- I think this non-hierarchical organization
  # could be a really neat way to /strictly/ extend the existing logic.
  # (if it was "strictly" extending it, we would probably have workunit labels
  # propagate from the Workunit (which would then still have its own labels) to
  # `WorkunitOutputRequest`s which use it (which may have their own labels))
  _ = yield Get(WorkunitOutputCompleted, WorkunitOutputRequest(workunit, 'compiling x:y [3/4]', [WorkUnitLabel.MESSAGE])
  # too many `req`s here
  rsc_req = LoggingExecuteProcessRequest(req.req, workunit, [WorkUnitLabel.COMPILER])
  # would obviously need a new intrinsic for LoggingExecuteProcessRequest -> ExecuteProcessResponse
  res = yield Get(ExecuteProcessResponse, LoggingExecuteProcessRequest, rsc_req)
  _ = yield Get(WorkunitOutputCompleted, WorkunitOutputRequest(workunit, 'x:y complete!', [WorkUnitLabel.MESSAGE])
  yield RscInvocationResponse(res)

Using WorkUnitLabels both to describe workunits as well as their individual messages (e.g MESSAGE vs process output (which would be logged with the intrinsic mentioned above) in the above example) might be shoving competing concerns into WorkUnitLabel, so we could consider extending that or using a separate set of tags to differentiate messages within the workunit. Note that this would be backwards compatible with v1 logging -- replacing workunit.output('stdout').write('asdf') with a version which has such a per-message tag -- this is akin to the difference between the two code snippets in this issue.

In the far future, workunit labels could maybe also be used for dynamically scheduling which v2 nodes to run first, but that's not at all relevant right now.

@illicitonion
Copy link
Contributor

A few thoughts...

  1. I believe ExecuteProcessResult has enough information to implicitly be a WorkUnit (it has a failure state, stdout/stderr, and a description); the tricky part is passing an optional parent context - if we somehow did that, implicitly creating WorkUnits for every process execution in a v1 run would (I think) be pretty trivial...
  2. It may be interesting to introduce a new read-only buffer type, alongside FileBackedRWBuf which we currently use for workunit output, which points at a Digest in the Store, to avoid passing full output strings around everywhere...
  3. There's a little bit of a tension between the synchronous model of WorkUnits and the asynchronous sampling model of heavy_hitters... We should probably try to solve the whole problem of "How do WorkUnits work in v2", when we address this...
  4. As well as interactive process executions, I suspect that a non-trivial number of non-interactive process executions want to tee stdout/stderr to the console at some point, and always having to write resp = yield Get(...) ; if resp.error: console.stdout.write(resp.stdout) ; console.stderr.write(resp.stderr) will be annoying...

@cosmicexplorer
Copy link
Contributor Author

cosmicexplorer commented Jan 15, 2019

  1. I believe ExecuteProcessResult has enough information to implicitly be a WorkUnit
  2. ...I suspect that a non-trivial number of non-interactive process executions want to tee stdout/stderr to the console at some point

This is interesting and I hadn't considered this. I was considering the opposite direction, of requesting a workunit, and then having a process execution request type which tees stdout/stderr to it (although this wasn't clear, because the first snippet did exactly what you describe in (4)). Something like:

  workunit = yield Get(Workunit, WorkunitRequest('rsc'))
  # too many `req`s here
  rsc_req = LoggingExecuteProcessRequest(req.req, workunit, [WorkUnitLabel.COMPILER])
  # this tees stdout/stderr to the workunit in `rsc_req`
  # would need a new intrinsic for LoggingExecuteProcessRequest -> ExecuteProcessResponse
  res = yield Get(ExecuteProcessResponse, LoggingExecuteProcessRequest, rsc_req)
  1. There's a little bit of a tension between the synchronous model of WorkUnits and the asynchronous sampling model of heavy_hitters

I agree! This was hidden in comments in the second snippet, but I was thinking that we can then add a non-hierarchical (also: non-synchronous) tag (separate from WorkUnitLabel) to workunit outputs, so a LoggingExecuteProcessRequest might have the tag WorkUnitMessageType.OUTPUT when it writes to output there, while a progress message (which compilation is running) might look like:

  _ = yield Get(WorkunitOutputCompleted, WorkunitOutputRequest(workunit, 'compiling x:y [3/4]', [WorkUnitMessageType.MESSAGE]

This wasn't clear because I kind of just thought of it as I was writing the above, but I think this sideways axis of "message type" vs "workunit context" would be the link to v2 output.

  1. It may be interesting to introduce a new read-only buffer type, alongside FileBackedRWBuf, which we currently use for workunit output, which points at a Digest in the Store, to avoid passing full output strings around everywhere...

This is extremely interesting!!! I think this dovetails very nicely with the idea of introducing the WorkUnitMessageType which demarcates message type from the workunit context (e.g. WorkUnitMessageType.OUTPUT for a process execution might be fed into a StoreBackedROBuf by default, and have to be retrieved from that by heavy_hitters?).

@cosmicexplorer
Copy link
Contributor Author

cosmicexplorer commented Jan 16, 2019

Two thoughts from a message in #engine by @illicitonion:

  1. ExecuteProcessRequests already have a description which can be used as the workunit name!
  2. Teeing the stdout/stderr of a process can be made much more implicit than the example above with LoggingExecuteProcessRequest.

A thought I had on top of (2) was the the rule itself could identify a parent workunit, and a process execution a child workunit (using the description field as mentioned in (1)). This would get us closer to the implicitness implicated in (2).

@cosmicexplorer
Copy link
Contributor Author

cosmicexplorer commented Jan 22, 2019

It might also be really nice to use this work as an excuse to get a better handle (get it?) on workunit output in general so that it can be hygienically consumed by external processes. Currently, if a tool wants to read pants's output (e.g. to find compile failures in CI), they have to literally parse what we print to stdout, on top of whatever they actually want to parse out of each tool. We can use regex (:heart:) to hackily accomplish some of this (aka avoid parsing), but that gets away from the fact that we already have a very well-defined structure for output (which is currently hierarchical).

I would love to be able to use some option (say: --workunit-stdout and --workunit-stderr) automatically attached to every task (I think what would make sense is an option or set of options attached to Task like we do with --cache-* options) to do something like:

> ./pants --compile-zinc-workunit-stdout="+{'compile-output.txt': ['COMPILE']}" test ::

, which would take all output sent to stdout from the zinc task in the compile goal which is sent to workunits with the label WorkUnitLabel.COMPILE and tee it to the file compile-output.txt in the current directory.

And --compile-zinc-workunit-stdout is a dict-valued option mapping output files to tee to with descriptions of workunit labels which should have their output teed, which defaults to {'-': ['COMPILE', 'TOOL', 'MULTITOOL', ...]}, where - means stdout (or whatever matches the current behavior). Workunit output would continue to go to the same backing log file that it currently does (I'm hazy on the details, but I know it involves a FileBackedRWBuf).

We would also then have a --workunit-stderr dict-valued option defaulting to {'-': ['COMPILE', ...} (the same as for --workunit-stdout), but - in this case would refer to stderr.

The extension to stdin seems extremely clear and very interesting and could potentially make repl tasks significantly easier to test (I am aware that ttys are weird).

I would want to keep all of this in mind (the user interface on the cli) when considering how to extend workunit logging to v2 rules (the rule developer interface in python). I think we don't need to worry about synchronization for file writing in the v1 case, by design. In the v2 case this becomes a concern (an easily addressable one, however). When developing the interface for declaring workunit output in v2 rules as described above, we should keep in mind and probably at the same time develop the interface for mapping them to cli options (do we have an automatic @optionable_rule generated for each one? do we use @optionable_rule directly to implement v2 workunit output logging?).

@cosmicexplorer
Copy link
Contributor Author

Also, thinking about zipkin tracing of pants runs (very relevant: #5903) is what made me think of this ticket in the first place, and it would be extremely wonderful if we could take whatever infrastructure we build to support the --workunit-std{out,err} options, and use it to create a zipkin trace viewer which allows jumping into the workunit output for each stage in the trace (which, if we start profiling every run as well, sounds like an absolutely incredible environment for debugging performance and other regressions!!! ok I'm done this is all very cool though).

@cosmicexplorer
Copy link
Contributor Author

cosmicexplorer commented Jan 22, 2019

FINAL THOUGHT: We could also consider making --workunit-output a 2-level dict mapping workunit output names to the above-described contents of --workunit-std{out,err}, e.g.:

> ./pants --compile-zinc-workunit-output="+{'stdout': {'compile-output.txt': ['COMPILE']}}" test ::

This may be easier to implement, as it maps directly to (in python) workunit.output('stdout') and workunit.output('stderr') (both of which are accessed by providing a string). We can very easily expose more output types to tasks than just stdout and stderr now, (solving a hard problem nobody has solved yet?)!!!

This also means we can use stdout and stderr as the WorkUnitMessageType described above, which unifies everything in v2 as well. This "unifies everything" because WorkUnitMessageType is a non-hierarchical message descriptor separate from the labels of the containing workunit, so the burgeoning v2 UI can make use of that separate information to decide how to display it, and that how maps directly back to this command-line option!!!! And all of this works with v1 seamlessly as well!!! QED

@cosmicexplorer
Copy link
Contributor Author

Some more followup from #engine:

@cosmicexplorer [Yesterday at 00:24]
had an EVEN BETTER IDEA in #7071 (comment) which can be summarized as "tee the stdout from zinc compiles to a file named output.txt":
> ./pants --compile-zinc-workunit-output="+{'stdout': {'output.txt': ['COMPILE']}}" test ::
and the stdout corresponds to workunit.output('stdout'), but we can extend that to other output types to allow for non-hierarchical output labelling for an asynchronous v2 UI! (edited)
6 replies
@blorente [1 day ago]
I really like the idea of teeing specific workunits’ outputs to specific files/streams(/sockets?), but I’m a bit lost as to how we would extend it to fit the V2 async model. I agree that file write synchronization is relatively easy to fix problem, but I find it hard to imagine the extensions of that command line, could you post an example?
cosmicexplorer [10 hours ago]
well it would be analogous to the WorkUnitMessageType i proposed in comments in the linked issue (edited)
cosmicexplorer [10 hours ago]
> ./pants --compile-zinc-workunit-output="+{'MESSAGE': {'message-output.txt': ['COMPILE']}, 'stdout': {'process-output.txt': ['COMPILE']}, 'stderr': {'error-log.txt': ['COMPILE']}, 'ERROR': {'error-log.txt': ['COMPILE']}}" test ::
cosmicexplorer [10 hours ago]
MESSAGE vs ERROR (WorkUnitMessageTypes) can be an extended form of log message types like debug/warn/info/exception (edited)
blorente [9 hours ago]
Goootcha! Thanks for the explaination, I like the idea!
cosmicexplorer [1 hour ago]
i can totally dive in more and write code as necessary! am very excited about this but my opinions are highly malleable!

@stuhood
Copy link
Member

stuhood commented Feb 18, 2019

There is a lot in this thread, some of which invalidates previous points. I think that getting it in a small design doc would be a better way to identify open questions, discuss it, and then move forward on implementation.

Having said that: some responses.


Re:

  # I don't like `_ = yield Get(...)` if possible, but I suppose we might have the haskellers on board now.
  # I would also love to know if there is a useful response datatype to give here as the result of the yield.
  # I'm aware this is different than the current "request a workunit, which
  # itself has the labels attached" -- I think this non-hierarchical organization
  # could be a really neat way to /strictly/ extend the existing logic.
  # (if it was "strictly" extending it, we would probably have workunit labels
  # propagate from the Workunit (which would then still have its own labels) to
  # `WorkunitOutputRequest`s which use it (which may have their own labels))
  _ = yield Get(WorkunitOutputCompleted, WorkunitOutputRequest(workunit, 'compiling x:y [3/4]', [WorkUnitLabel.MESSAGE])

So, while logging is technically something that should always be async (since it will touch a disk), it is also fairly straightforward to make logging asynchronous with a synchronous API by just putting something in a queue and flushing that in the background.

I think that I have been assuming that we would preserve a synchronous logging/workunit API. So then the only thing that would actually be async would be the creation of the StrawmanWorkunitLogger object, which you would then use synchronously.


My largest open question is where the hierarchy of workunits will come from in v2, or whether we're saying that in v2 we will use workunits more like zipkin spans, with no hierarchy, and just start/end times. But it does seem promising that the dapper/zipkin model "always" seems to result in a hierarchically renderable dataset. I'm not 100% sure how that happens, and it would likely be good to look at the datamodel a bit more deeply to see whether child zipkin/dapper Spans refer to their parents at all.

@illicitonion
Copy link
Contributor

My largest open question is where the hierarchy of workunits will come from in v2, or whether we're saying that in v2 we will use workunits more like zipkin spans, with no hierarchy, and just start/end times. But it does seem promising that the dapper/zipkin model "always" seems to result in a hierarchically renderable dataset. I'm not 100% sure how that happens, and it would likely be good to look at the datamodel a bit more deeply to see whether child zipkin/dapper Spans refer to their parents at all.

Yeah, zipkin spans have a parent pointer :)

@Eric-Arellano
Copy link
Contributor

Closing given the recent work on workunits in V2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants