-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Cross Crate Analysis 2: An alternative approach. (#163)
## What Changed? Adds the capability to load bodies and borrowcheck facts from non-local crates to facilitate analysis across the boundaries of the local crate. The design of this implementation is based on a similar mechanism in a purity analyzer by @artemagvanian. Adds the `--include` command line argument to specify names of crates that should be loadable in addition to the analysis target. The basic mechanism is that, in every selected crate (either main target or `--include`) we run a visitor after expansion that uses `get_body_with_borrowck_facts`, sanitizes that body by removing crate-local data and writes it plus relevant lifetime information to disk using `Encodable`. Downstream crates can locate the written data and load it back in using `Decodable`. A nicety of this approach is that we always do this, even or the main target crate, which means there is only one code path instead of a separate local and remote one. I the process of implementing this I realize that our points-to analysis actually doesn't need the whole `BodyWithBorrockFacts` but only `input_facts.subset_base` (and the body itself). I parameterized the points-to analysis, such that it can accept a reduced input, and we only store the relevant data. Also fully supports marker annotations in foreign crates. The implicit body cache `MIR_BODIES` we used before is now replaced by an explicit `BodyCache`. A large part of the changes is just changing `LocalDefId` to `DefId` in call strings and the like as well as ensuring that all places that used to call `rustc_utils`' `get_body_with_borrowck_facts` is replaced with loading from the `BodyCache`. ### Caveats The statistics about how many functions were seen etc are disabled for now (always 0). ## Why Does It Need To? Analyzing only the local crate is a severe limitation of the tool which would be lifted by this PR. There is a concurrent attempt (#153) to address the same issue. This is a simpler, but potentially less scalable approach. In addition this approach has full support for "both directions" of cross crate. The "forward direction" extends the PDG from the local crate (main analysis target) such that it also models functions in dependencies. The "backwards" direction ensures that function in the dependency, which are parameterized by traits where the impl is in the local trait, that impl composes with the function in the dependency. ## Checklist - [x] Above description has been filled out so that upon quash merge we have a good record of what changed. - [x] New functions, methods, types are documented. Old documentation is updated if necessary - [ ] Documentation in Notion has been updated - [x] Tests for new behaviors are provided - [x] New test suites (if any) ave been added to the CI tests (in `.github/workflows/rust.yml`) either as compiler test or integration test. *Or* justification for their omission from CI has been provided in this PR description.
- Loading branch information
1 parent
d53df7e
commit 54bf07d
Showing
44 changed files
with
1,729 additions
and
727 deletions.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,246 @@ | ||
use std::path::PathBuf; | ||
|
||
use flowistry::mir::FlowistryInput; | ||
|
||
use polonius_engine::FactTypes; | ||
use rustc_borrowck::consumers::{ConsumerOptions, RustcFacts}; | ||
use rustc_hir::{ | ||
def_id::{CrateNum, DefId, LocalDefId, LOCAL_CRATE}, | ||
intravisit::{self}, | ||
}; | ||
use rustc_macros::{Decodable, Encodable, TyDecodable, TyEncodable}; | ||
use rustc_middle::{ | ||
hir::nested_filter::OnlyBodies, | ||
mir::{Body, ClearCrossCrate, StatementKind}, | ||
ty::TyCtxt, | ||
}; | ||
|
||
use rustc_utils::cache::Cache; | ||
|
||
use crate::encoder::{decode_from_file, encode_to_file}; | ||
|
||
/// A mir [`Body`] and all the additional borrow checking facts that our | ||
/// points-to analysis needs. | ||
#[derive(TyDecodable, TyEncodable, Debug)] | ||
pub struct CachedBody<'tcx> { | ||
body: Body<'tcx>, | ||
input_facts: FlowistryFacts, | ||
} | ||
|
||
impl<'tcx> CachedBody<'tcx> { | ||
/// Retrieve a body and the necessary facts for a local item. | ||
/// | ||
/// Ensure this is called early enough in the compiler | ||
/// (like `after_expansion`) so that the body has not been stolen yet. | ||
fn retrieve(tcx: TyCtxt<'tcx>, local_def_id: LocalDefId) -> Self { | ||
let mut body_with_facts = rustc_borrowck::consumers::get_body_with_borrowck_facts( | ||
tcx, | ||
local_def_id, | ||
ConsumerOptions::PoloniusInputFacts, | ||
); | ||
|
||
clean_undecodable_data_from_body(&mut body_with_facts.body); | ||
|
||
Self { | ||
body: body_with_facts.body, | ||
input_facts: FlowistryFacts { | ||
subset_base: body_with_facts.input_facts.unwrap().subset_base, | ||
}, | ||
} | ||
} | ||
} | ||
|
||
impl<'tcx> FlowistryInput<'tcx> for &'tcx CachedBody<'tcx> { | ||
fn body(self) -> &'tcx Body<'tcx> { | ||
&self.body | ||
} | ||
|
||
fn input_facts_subset_base( | ||
self, | ||
) -> &'tcx [( | ||
<RustcFacts as FactTypes>::Origin, | ||
<RustcFacts as FactTypes>::Origin, | ||
<RustcFacts as FactTypes>::Point, | ||
)] { | ||
&self.input_facts.subset_base | ||
} | ||
} | ||
|
||
/// The subset of borrowcheck facts that the points-to analysis (flowistry) | ||
/// needs. | ||
#[derive(Debug, Encodable, Decodable)] | ||
pub struct FlowistryFacts { | ||
pub subset_base: Vec<( | ||
<RustcFacts as FactTypes>::Origin, | ||
<RustcFacts as FactTypes>::Origin, | ||
<RustcFacts as FactTypes>::Point, | ||
)>, | ||
} | ||
|
||
pub type LocationIndex = <RustcFacts as FactTypes>::Point; | ||
|
||
/// Allows loading bodies from previosly written artifacts. | ||
/// | ||
/// Ensure this cache outlives any flowistry analysis that is performed on the | ||
/// bodies it returns or risk UB. | ||
pub struct BodyCache<'tcx> { | ||
tcx: TyCtxt<'tcx>, | ||
cache: Cache<DefId, CachedBody<'tcx>>, | ||
} | ||
|
||
impl<'tcx> BodyCache<'tcx> { | ||
pub fn new(tcx: TyCtxt<'tcx>) -> Self { | ||
Self { | ||
tcx, | ||
cache: Default::default(), | ||
} | ||
} | ||
|
||
/// Serve the body from the cache or read it from the disk. | ||
/// | ||
/// Returns `None` if the policy forbids loading from this crate. | ||
pub fn get(&self, key: DefId) -> Option<&'tcx CachedBody<'tcx>> { | ||
let cbody = self.cache.get(key, |_| load_body_and_facts(self.tcx, key)); | ||
// SAFETY: Theoretically this struct may not outlive the body, but | ||
// to simplify lifetimes flowistry uses 'tcx anywhere. But if we | ||
// actually try to provide that we're risking race conditions | ||
// (because it needs global variables like MIR_BODIES). | ||
// | ||
// So until we fix flowistry's lifetimes this is good enough. | ||
unsafe { std::mem::transmute(cbody) } | ||
} | ||
} | ||
|
||
/// A visitor to collect all bodies in the crate and write them to disk. | ||
struct DumpingVisitor<'tcx> { | ||
tcx: TyCtxt<'tcx>, | ||
target_dir: PathBuf, | ||
} | ||
|
||
/// Some data in a [Body] is not cross-crate compatible. Usually because it | ||
/// involves storing a [LocalDefId]. This function makes sure to sanitize those | ||
/// out. | ||
fn clean_undecodable_data_from_body(body: &mut Body) { | ||
for scope in body.source_scopes.iter_mut() { | ||
scope.local_data = ClearCrossCrate::Clear; | ||
} | ||
|
||
for stmt in body | ||
.basic_blocks_mut() | ||
.iter_mut() | ||
.flat_map(|bb| bb.statements.iter_mut()) | ||
{ | ||
if matches!(stmt.kind, StatementKind::FakeRead(_)) { | ||
stmt.make_nop() | ||
} | ||
} | ||
} | ||
|
||
impl<'tcx> intravisit::Visitor<'tcx> for DumpingVisitor<'tcx> { | ||
type NestedFilter = OnlyBodies; | ||
fn nested_visit_map(&mut self) -> Self::Map { | ||
self.tcx.hir() | ||
} | ||
|
||
fn visit_fn( | ||
&mut self, | ||
function_kind: intravisit::FnKind<'tcx>, | ||
function_declaration: &'tcx rustc_hir::FnDecl<'tcx>, | ||
body_id: rustc_hir::BodyId, | ||
_: rustc_span::Span, | ||
local_def_id: rustc_hir::def_id::LocalDefId, | ||
) { | ||
let to_write = CachedBody::retrieve(self.tcx, local_def_id); | ||
|
||
let dir = &self.target_dir; | ||
let path = dir.join( | ||
self.tcx | ||
.def_path(local_def_id.to_def_id()) | ||
.to_filename_friendly_no_crate(), | ||
); | ||
|
||
if !dir.exists() { | ||
std::fs::create_dir(dir).unwrap(); | ||
} | ||
|
||
encode_to_file(self.tcx, path, &to_write); | ||
|
||
intravisit::walk_fn( | ||
self, | ||
function_kind, | ||
function_declaration, | ||
body_id, | ||
local_def_id, | ||
) | ||
} | ||
} | ||
|
||
/// A complete visit over the local crate items, collecting all bodies and | ||
/// calculating the necessary borrowcheck facts to store for later points-to | ||
/// analysis. | ||
/// | ||
/// Ensure this gets called early in the compiler before the unoptimmized mir | ||
/// bodies are stolen. | ||
pub fn dump_mir_and_borrowck_facts(tcx: TyCtxt) { | ||
let mut vis = DumpingVisitor { | ||
tcx, | ||
target_dir: intermediate_out_dir(tcx, INTERMEDIATE_ARTIFACT_EXT), | ||
}; | ||
tcx.hir().visit_all_item_likes_in_crate(&mut vis); | ||
} | ||
|
||
const INTERMEDIATE_ARTIFACT_EXT: &str = "bwbf"; | ||
|
||
/// Get the path where artifacts from this crate would be stored. Unlike | ||
/// [`TyCtxt::crate_extern_paths`] this function does not crash when supplied | ||
/// with [`LOCAL_CRATE`]. | ||
pub fn local_or_remote_paths(krate: CrateNum, tcx: TyCtxt, ext: &str) -> Vec<PathBuf> { | ||
if krate == LOCAL_CRATE { | ||
vec![intermediate_out_dir(tcx, ext)] | ||
} else { | ||
tcx.crate_extern_paths(krate) | ||
.iter() | ||
.map(|p| p.with_extension(ext)) | ||
.collect() | ||
} | ||
} | ||
|
||
/// Try to load a [`CachedBody`] for this id. | ||
fn load_body_and_facts(tcx: TyCtxt<'_>, def_id: DefId) -> CachedBody<'_> { | ||
let paths = local_or_remote_paths(def_id.krate, tcx, INTERMEDIATE_ARTIFACT_EXT); | ||
for path in &paths { | ||
let path = path.join(tcx.def_path(def_id).to_filename_friendly_no_crate()); | ||
if let Ok(data) = decode_from_file(tcx, path) { | ||
return data; | ||
}; | ||
} | ||
|
||
panic!("No facts for {def_id:?} found at any path tried: {paths:?}"); | ||
} | ||
|
||
/// Create the name of the file in which to store intermediate artifacts. | ||
/// | ||
/// HACK(Justus): `TyCtxt::output_filenames` returns a file stem of | ||
/// `lib<crate_name>-<hash>`, whereas `OutputFiles::with_extension` returns a file | ||
/// stem of `<crate_name>-<hash>`. I haven't found a clean way to get the same | ||
/// name in both places, so i just assume that these two will always have this | ||
/// relation and prepend the `"lib"` here. | ||
pub fn intermediate_out_dir(tcx: TyCtxt, ext: &str) -> PathBuf { | ||
let rustc_out_file = tcx.output_filenames(()).with_extension(ext); | ||
let dir = rustc_out_file | ||
.parent() | ||
.unwrap_or_else(|| panic!("{} has no parent", rustc_out_file.display())); | ||
let file = rustc_out_file | ||
.file_name() | ||
.unwrap_or_else(|| panic!("has no file name")) | ||
.to_str() | ||
.unwrap_or_else(|| panic!("not utf8")); | ||
|
||
let file = if file.starts_with("lib") { | ||
std::borrow::Cow::Borrowed(file) | ||
} else { | ||
format!("lib{file}").into() | ||
}; | ||
|
||
dir.join(file.as_ref()) | ||
} |
Oops, something went wrong.