create WorkunitOutput subsystem which redirects workunit output #8765

cosmicexplorer · 2019-12-06T07:08:54Z

TODO: test cases!!!

Problem

We would like to be able to get some parts of pants's output in different places, for multiple reasons:

We might want to see the output/logging from pants when running a ./pants run, but direct the process's output somewhere else (instead of amidst all the pants output).
We might want to parse compiler error messages from outside of pants (this is a real use case at Twitter), without having to parse the output of pants itself to know where the compiler errors are.
We might want to see helpful stdout/stderr messages from some tool like coursier, which pants currently just hides all output from, unless it fails.

For some background on how this particular interface came to be, see this comment which tangentially introduced the idea of redirecting output: #7071 (comment), although it was buried amidst a larger discussion of output in v2 rules.

Solution

(also see: ./pants help workunit-output)

Create the WorkunitOutput subsystem and attach it to the RunTracker.
- Parse --redirections as a dict mapping file names to "redirection specs", which are just nested dicts. This looks something like:

> ./pants --workunit-output-redirections="\
{'out.txt': [{
    'workunit_scope': '.*',
    'labels': ['TOOL', 'TEST'],
    'output_names': ['stdout', 'stderr'],
  }]}" test tests/python/pants_test/util: -- -vs

The above command line would tee all output sent to workunits with path matching regex .* (everything), which declare any of the TOOL or TEST labels, on the outputs named stdout and stderr (exact match). In this precise command line, that would mean that all pytest output would be appended to a file named out.txt.
Note that the output file name /dev/stdout is treated specially, and results in the tee subprocess inheriting the pants stdout. This means that workunit output which would normally be hidden (such as coursier) can be selectively toggled on by the user.
All of the above is accomplished via introducing OutputTeeingFileBackedRWBuf, which uses a tee subprocess to spread output across multiple files.

Result

Pants can redirect some of its output elsewhere with a command-line option without affecting the rest of the pants run!

accomplish this via introducing OutputTeeingFileBackedRWBuf, which uses a 'tee' subprocess to spread output across multiple files.

illicitonion · 2019-12-06T16:46:51Z

I'm not sure we want to be exposing this when the future of workunits in v2 is kind of up in the air, as it's presumably something we'd need to maintain compatibility for, or deprecate, which could constrain v2 direction noticeably... Is there a reason this is currently urgent?

Assuming we do do it, though, the interface of concatenating (and maybe interleaving? I didn't look at synchronisation) all of the outputs together into one file seems weird to me. How would you feel about the "output path" actually being a directory into which we'll put structure (either named files per workunit, or even a directory hierarchy, or something, given they have hierarchy)? If the user then wants to concat them, great, they can.

ShaneDelmore · 2019-12-06T16:59:02Z

This would be useful to me for multiple scenarios. Will a work-unit support saving to the build-cache and allow me to retrieve it for cached artifacts? For example, if a build completed with 3 compiler warnings, and later I rebuilt, it would be useful to still be able to retrieve the warnings from the cached artifact.

illicitonion · 2019-12-06T17:01:07Z

This would be useful to me for multiple scenarios. Will a work-unit support saving to the build-cache and allow me to retrieve it for cached artifacts? For example, if a build completed with 3 compiler warnings, and later I rebuilt, it would be useful to still be able to retrieve the warnings from the cached artifact.

That would not come from this change. We'd need to include stdout and stderr in those cache bundles (which personally I think we should do), but that would be orthogonal. This would only output this information onto a file on your local disk, only if you didn't hit a cache.

ShaneDelmore · 2019-12-06T17:06:15Z

Got it, I think I slightly misunderstood the change then. Although it wouldn't have to be std-in for my purposes as long as I could declare something as a cacheable artifact. For example, the diff-changed goal outputs a git patch to stdout, but I only did that as I am scraping stdout to retrieve it. I would love to be able to write it to a file instead, that I could easily retrieve later from the build cache.

illicitonion · 2019-12-06T17:11:56Z

Got it, I think I slightly misunderstood the change then. Although it wouldn't have to be std-in for my purposes as long as I could declare something as a cacheable artifact. For example, the diff-changed goal outputs a git patch to stdout, but I only did that as I am scraping stdout to retrieve it. I would love to be able to write it to a file instead, that I could easily retrieve later from the build cache.

That, you actually can do, as long as you're willing to modify the pants task. Anything you write to the vt.results_dir will be included in the cache bundle, e.g. as is done here:

pants/src/python/pants/backend/python/tasks/python_binary_create.py

Lines 92 to 101 in 0c39921

    
           with self.invalidated(binaries, invalidate_dependents=True) as invalidation_check: 
        
             python_deployable_archive = self.context.products.get('deployable_archives') 
        
             python_pex_product = self.context.products.get('pex_archives') 
        
             for vt in invalidation_check.all_vts: 
        
               pex_path = os.path.join(vt.results_dir, f'{vt.target.name}.pex') 
        
               if not vt.valid: 
        
                 self.context.log.debug(f'cache for {vt.target} is invalid, rebuilding') 
        
                 self._create_binary(vt.target, vt.results_dir) 
        
               else: 
        
                 self.context.log.debug(f'using cache for {vt.target}')

cosmicexplorer · 2019-12-06T20:16:52Z

How would you feel about the "output path" actually being a directory into which we'll put structure (either named files per workunit, or even a directory hierarchy, or something, given they have hierarchy)?

Especially after talking with @ShaneDelmore further about what he might want from this kind of option, I'm thinking of a two-fold plan to unify these:

(in this PR) Create cache entries for workunit output (so throwing away most of the current impl), and figure out some hacky way to print out the on-disk location of workunit outputs to a user (this is my interpretation of your suggestion -- let me know if I'm off).
(in a followup PR) make the way to print out the on-disk location of cache entries less hacky.

For (2), @ShaneDelmore suggested something like:

> ./pants fetch lspIndexes dataproducts/project/::

Translating that into some idea of pants options could be:

> ./pants fetch --task=compile.lsp-indexes dataproducts/project/::

which would be interpreted as "fetch the cached results_dir from a remote cache if necessary, then print out the local paths to the output of the task named lsp-indexes in the compile goal".

I'm going to spend 5 minutes making this into a google doc.

cosmicexplorer · 2019-12-06T20:28:55Z

https://docs.google.com/document/d/1F-sqqYX77m5iB6116bXxYugOiPbOGPGRwd75wSZYwBY/edit?usp=sharing for the above!

illicitonion · 2019-12-09T14:52:39Z

How would you feel about the "output path" actually being a directory into which we'll put structure (either named files per workunit, or even a directory hierarchy, or something, given they have hierarchy)?

Especially after talking with @ShaneDelmore further about what he might want from this kind of option, I'm thinking of a two-fold plan to unify these:

(in this PR) Create cache entries for workunit output (so throwing away most of the current impl), and figure out some hacky way to print out the on-disk location of workunit outputs to a user (this is my interpretation of your suggestion -- let me know if I'm off).

(in a followup PR) make the way to print out the on-disk location of cache entries less hacky.

Honestly, I'd probably rather just port scalafix to v2, which gets stdout and stderr caching and replay for free, rather than building an entire new caching mechanism for stdout and stderr specifically for v1, given this is already a solved problem in v2.

For (2), @ShaneDelmore suggested something like:
> ./pants fetch lspIndexes dataproducts/project/::
Translating that into some idea of pants options could be:
> ./pants fetch --task=compile.lsp-indexes dataproducts/project/::
which would be interpreted as "fetch the cached results_dir from a remote cache if necessary, then print out the local paths to the output of the task named lsp-indexes in the compile goal".

I'm going to spend 5 minutes making this into a google doc.

Commented on the doc.

Eric-Arellano · 2020-05-18T03:04:35Z

Closing due to being stale. Workunits have seen some substantial changes the past few months, from what I can tell.

create WorkunitOutput subsystem which redirects workunit output

038ef08

accomplish this via introducing OutputTeeingFileBackedRWBuf, which uses a 'tee' subprocess to spread output across multiple files.

cosmicexplorer requested review from stuhood, illicitonion, pierrechevalier83, blorente, Eric-Arellano and gshuflin December 6, 2019 07:08

cosmicexplorer mentioned this pull request Dec 19, 2019

option to output all command lines for all subprocesses #8838

Closed

Eric-Arellano closed this May 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

create WorkunitOutput subsystem which redirects workunit output #8765

create WorkunitOutput subsystem which redirects workunit output #8765

cosmicexplorer commented Dec 6, 2019 •

edited

Loading

illicitonion commented Dec 6, 2019

ShaneDelmore commented Dec 6, 2019

illicitonion commented Dec 6, 2019 •

edited

Loading

ShaneDelmore commented Dec 6, 2019

illicitonion commented Dec 6, 2019

cosmicexplorer commented Dec 6, 2019

cosmicexplorer commented Dec 6, 2019

illicitonion commented Dec 9, 2019

Eric-Arellano commented May 18, 2020

create WorkunitOutput subsystem which redirects workunit output #8765

create WorkunitOutput subsystem which redirects workunit output #8765

Conversation

cosmicexplorer commented Dec 6, 2019 • edited Loading

Problem

Solution

Result

illicitonion commented Dec 6, 2019

ShaneDelmore commented Dec 6, 2019

illicitonion commented Dec 6, 2019 • edited Loading

ShaneDelmore commented Dec 6, 2019

illicitonion commented Dec 6, 2019

cosmicexplorer commented Dec 6, 2019

cosmicexplorer commented Dec 6, 2019

illicitonion commented Dec 9, 2019

Eric-Arellano commented May 18, 2020

cosmicexplorer commented Dec 6, 2019 •

edited

Loading

illicitonion commented Dec 6, 2019 •

edited

Loading