Local caching for ExecuteProcess #6898

stuhood · 2018-12-11T03:16:32Z

Currently the only cache for ExecuteProcess exists in the remoting API, and we have so far biased toward assuming that pantsd will be up in order to avoid re-invoking local processes in situations where remoting is not available.

The text was updated successfully, but these errors were encountered:

illicitonion · 2018-12-11T17:51:42Z

I'd recommend we do this by:

Adding a new "Database" (not a ShardedLmdb, just an lmdb::Database) to the fs::local::InnerStore:

pants/src/rust/engine/fs/src/store.rs

Lines 657 to 664 in 537e163

    
           struct InnerStore { 
        
             pool: Arc<ResettablePool>, 
        
             // Store directories separately from files because: 
        
             //  1. They may have different lifetimes. 
        
             //  2. It's nice to know whether we should be able to parse something as a proto. 
        
             file_dbs: Result<Arc<ShardedLmdb>, String>, 
        
             directory_dbs: Result<Arc<ShardedLmdb>, String>, 
        
           }

- this should write to root.join("process_execution") or similar.

Expose an interface on fs::Store like record_directory and load_directory which is record_process_result and load_process_result. What type this should exactly take and return is an open question. An easy thing we could do is populate a bazel_protos::remote_execution::Action as we do in process_execution::remote::make_execute_request (extracting the common code somewhere useful), store things indexed by the sha256 of that proto serialized as bytes (as we already do in make_execute_request to populate the bazel_protos::remote_execution::ExecuteRequest), and serialize the FallibleExecuteProcessResult as a bazel_protos::remote_execution::ActionResult whose bytes we can store (the mapping is trivial). We could also use some kind of custom serialization if it could be justified.

Modify engine::ExecuteProcess::run to check the cache (accessible via context.core.store()) before running the process here:

pants/src/rust/engine/src/nodes.rs

Lines 370 to 381 in 537e163

    
             fn run(self, context: Context) -> NodeFuture<ProcessResult> { 
        
               let request = self.0; 
        
               context 
        
                 .core 
        
                 .command_runner() 
        
                 .run(request) 
        
                 .map(ProcessResult) 
        
                 .map_err(|e| throw(&format!("Failed to execute process: {}", e))) 
        
                 .to_boxed() 
        
             } 
        
           }

stuhood · 2018-12-11T18:00:52Z

Two thoughts:

The jdk_home facility will be a bit of a monkey wrench. When computing a local cache key, we'd want to actually include the destination of the symlink in the key. One way to do that would be to have an explicit method outside of process_execution::remote::make_execute_request that explicitly "amends" the resulting gRPC struct with local-only information in order to affect the cache key... and to then have that method add an env var with the JDK info.
Another possible factoring of the cache would be to make it an impl CommandRunner similar to BoundedCommandRunner, that wraps a cache around some other CommandRunner. Unclear whether this level of abstraction is actually useful, or just "obvious".

cosmicexplorer · 2019-01-25T10:14:54Z

I'm thinking of replacing any uses of PythonToolPrepBase with a synchronous call to the engine as a case study for this ticket. The idea here is to use a very specific file or directory on the local filesystem to serve as the "local caching", and that this would also be the link to v1 tasks. v1 tasks also have that context which is required for context._scheduler.capture_snapshots (and hence to be able to use the output of some in get_pants_cachedir() in the v2 engine), and I don't know how to get an instance of that directly in a v2 task (but I can make that happen).

illicitonion · 2019-01-25T10:43:34Z

That sounds kind of the opposite of what I described above?

What I was imagining was that we'd internally cache process executions inside the lmdb store (so that people who call processes don't know whether they hit a cache or not, it's opaque), and then v1 rules could call context._scheduler.materialize_directories - that way process execution caching is a black box, and the v1 rule is in control of the destination of files... It makes it much harder for things to corrupt the cache by editing files (the files exist in an opaque cache, rather than on disk where they may be edited).

cosmicexplorer · 2019-01-26T02:23:53Z

That makes sense, and I appreciate taking the time to describe it. I wasn't thinking super clearly about why we would need things on disk in the first place (was still thinking in v1 mode). I will look into implementing the above. Thanks again.

cosmicexplorer · 2019-01-26T03:19:01Z

Warming up to the impl CommandRunner idea since the work of separating out the common code for serializing an Action seems difficult without putting a lot of the logic in process_execution anyway (EDIT: this doesn't contradict the above in any way)

When using V2, unit tests take too long to run so we have to use `travis_wait`. We set the wait time too short, and have a few times had timeouts. Note that a better solution would be to figure out why V2 is slower in the first place via #7795, followed by implementing local caching via #6898.

Eric-Arellano · 2019-06-04T00:24:16Z

and we have so far biased toward assuming that pantsd will be up in order to avoid re-invoking local processes in situations where remoting is not available.

I'm a bit unclear on the high level of when we would use pantsd vs. when we would use this new local caching? For example, what is the use case for running without --enable-pantsd in the first place? Would we ever train pantsd to leverage this caching mechanism described here?

stuhood · 2019-06-04T14:40:53Z

For example, what is the use case for running without --enable-pantsd in the first place?

In (most) CI environments, we don't expect that pantsd is able to stay up across multiple runs, whereas on a laptop it will be able to.

Would we ever train pantsd to leverage this caching mechanism described here?

All runs would take advantage of this cache, with or without pantsd.

stuhood · 2019-07-27T01:48:10Z

Fixed by @illicitonion in #8040 and #7911. Hooray!

stuhood added engine labels Dec 11, 2018

This was referenced Dec 11, 2018

make a rust ruleset to wrap cargo and build the native engine #6891

Closed

[rsc-compile] break out java metacping into a separate job #6754

Closed

stuhood assigned wisechengyi Dec 15, 2018

stuhood unassigned wisechengyi Jan 15, 2019

cosmicexplorer mentioned this issue Jan 28, 2019

[WIP] Cached local process execution #7171

Closed

stuhood mentioned this issue Feb 21, 2019

cache python tools in ~/.cache/pants #7236

Merged

stuhood mentioned this issue May 21, 2019

Run majority of unit tests in CI with V2 Pytest runner #7724

Merged

stuhood mentioned this issue May 31, 2019

Investigate why V2 test runner is slower than V1 #7795

Closed

Eric-Arellano self-assigned this May 31, 2019

Eric-Arellano mentioned this issue Jun 1, 2019

Bump CI unit test timeout for fewer flaky runs #7831

Merged

Eric-Arellano removed their assignment Jun 4, 2019

stuhood assigned illicitonion Jul 27, 2019

stuhood closed this as completed Jul 27, 2019

cosmicexplorer mentioned this issue Aug 28, 2019

support using @dataclass for engine params #8181

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local caching for ExecuteProcess #6898

Local caching for ExecuteProcess #6898

stuhood commented Dec 11, 2018 •

edited

Loading

illicitonion commented Dec 11, 2018 •

edited

Loading

stuhood commented Dec 11, 2018

cosmicexplorer commented Jan 25, 2019

illicitonion commented Jan 25, 2019 •

edited

Loading

cosmicexplorer commented Jan 26, 2019

cosmicexplorer commented Jan 26, 2019 •

edited

Loading

Eric-Arellano commented Jun 4, 2019

stuhood commented Jun 4, 2019

stuhood commented Jul 27, 2019

Local caching for ExecuteProcess #6898

Local caching for ExecuteProcess #6898

Comments

stuhood commented Dec 11, 2018 • edited Loading

illicitonion commented Dec 11, 2018 • edited Loading

stuhood commented Dec 11, 2018

cosmicexplorer commented Jan 25, 2019

illicitonion commented Jan 25, 2019 • edited Loading

cosmicexplorer commented Jan 26, 2019

cosmicexplorer commented Jan 26, 2019 • edited Loading

Eric-Arellano commented Jun 4, 2019

stuhood commented Jun 4, 2019

stuhood commented Jul 27, 2019

stuhood commented Dec 11, 2018 •

edited

Loading

illicitonion commented Dec 11, 2018 •

edited

Loading

illicitonion commented Jan 25, 2019 •

edited

Loading

cosmicexplorer commented Jan 26, 2019 •

edited

Loading