Skip to content

Commit

Permalink
Cross Crate Analysis 2: An alternative approach. (#163)
Browse files Browse the repository at this point in the history
## What Changed?

Adds the capability to load bodies and borrowcheck facts from non-local
crates to facilitate analysis across the boundaries of the local crate.
The design of this implementation is based on a similar mechanism in a
purity analyzer by @artemagvanian.

Adds the `--include` command line argument to specify names of crates
that should be loadable in addition to the analysis target.

The basic mechanism is that, in every selected crate (either main target
or `--include`) we run a visitor after expansion that uses
`get_body_with_borrowck_facts`, sanitizes that body by removing
crate-local data and writes it plus relevant lifetime information to
disk using `Encodable`. Downstream crates can locate the written data
and load it back in using `Decodable`. A nicety of this approach is that
we always do this, even or the main target crate, which means there is
only one code path instead of a separate local and remote one.

I the process of implementing this I realize that our points-to analysis
actually doesn't need the whole `BodyWithBorrockFacts` but only
`input_facts.subset_base` (and the body itself). I parameterized the
points-to analysis, such that it can accept a reduced input, and we only
store the relevant data.

Also fully supports marker annotations in foreign crates.

The implicit body cache `MIR_BODIES` we used before is now replaced by
an explicit `BodyCache`.

A large part of the changes is just changing `LocalDefId` to `DefId` in
call strings and the like as well as ensuring that all places that used
to call `rustc_utils`' `get_body_with_borrowck_facts` is replaced with
loading from the `BodyCache`.

### Caveats

The statistics about how many functions were seen etc are disabled for
now (always 0).

## Why Does It Need To?

Analyzing only the local crate is a severe limitation of the tool which
would be lifted by this PR.

There is a concurrent attempt (#153) to address the same issue. This is
a simpler, but potentially less scalable approach. In addition this
approach has full support for "both directions" of cross crate. The
"forward direction" extends the PDG from the local crate (main analysis
target) such that it also models functions in dependencies. The
"backwards" direction ensures that function in the dependency, which are
parameterized by traits where the impl is in the local trait, that impl
composes with the function in the dependency.

## Checklist

- [x] Above description has been filled out so that upon quash merge we
have a
  good record of what changed.
- [x] New functions, methods, types are documented. Old documentation is
updated
  if necessary
- [ ] Documentation in Notion has been updated
- [x] Tests for new behaviors are provided
  - [x] New test suites (if any) ave been added to the CI tests (in
`.github/workflows/rust.yml`) either as compiler test or integration
test.
*Or* justification for their omission from CI has been provided in this
PR
    description.
  • Loading branch information
JustusAdam authored Jul 29, 2024
1 parent d53df7e commit 54bf07d
Show file tree
Hide file tree
Showing 44 changed files with 1,729 additions and 727 deletions.
11 changes: 6 additions & 5 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 9 additions & 3 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,13 @@ rustc_utils = { version = "=0.7.4-nightly-2023-08-25", features = [
] }
rustc_plugin = "=0.7.4-nightly-2023-08-25"
#flowistry = { path = "../flowistry/crates/flowistry", default-features = false }
flowistry = { git = "https://github.com/brownsys/flowistry", rev = "b9210041eb846ce88644a4a19569ed2afb26141c", default-features = false }

[workspace.dependencies.flowistry]
# path = "../flowistry/crates/flowistry"
git = "https://github.com/brownsys/flowistry"
rev = "08c4ad9587b3251a8f7c64aa60be31404e6e04c0"
default-features = false


[profile.release]
debug = true
Expand All @@ -29,9 +35,9 @@ debug = true
[replace."rustc_utils:0.7.4-nightly-2023-08-25"]
# path = "../rustc_plugin/crates/rustc_utils"
git = "https://github.com/JustusAdam/rustc_plugin"
rev = "6dcee5d8758f8ac456209e95dfa25e03142295f1"
rev = "beca0fdda699591052445a1da840c072fffa1f21"

[replace."rustc_plugin:0.7.4-nightly-2023-08-25"]
# path = "../rustc_plugin/crates/rustc_plugin"
git = "https://github.com/JustusAdam/rustc_plugin"
rev = "6dcee5d8758f8ac456209e95dfa25e03142295f1"
rev = "beca0fdda699591052445a1da840c072fffa1f21"
2 changes: 1 addition & 1 deletion Makefile.toml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ dependencies = ["format-all", "clippy-all"]

[tasks.cargo-tests]
command = "cargo"
args = ["test"]
args = ["test", "--no-fail-fast"]

[tasks.flowistry-tests]
command = "cargo"
Expand Down
4 changes: 2 additions & 2 deletions crates/flowistry_pdg/src/pdg.rs
Original file line number Diff line number Diff line change
Expand Up @@ -76,8 +76,8 @@ impl From<Location> for RichLocation {
#[derive(PartialEq, Eq, Hash, Clone, Copy, Debug, Serialize, Deserialize)]
pub struct GlobalLocation {
/// The function containing the location.
#[cfg_attr(feature = "rustc", serde(with = "rustc_proxies::LocalDefId"))]
pub function: LocalDefId,
#[cfg_attr(feature = "rustc", serde(with = "rustc_proxies::DefId"))]
pub function: DefId,

/// The location of an instruction in the function, or the function's start.
pub location: RichLocation,
Expand Down
2 changes: 1 addition & 1 deletion crates/flowistry_pdg/src/rustc_impls.rs
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ impl From<DefIndex> for def_id::DefIndex {
impl fmt::Display for GlobalLocation {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
tls::with_opt(|opt_tcx| match opt_tcx {
Some(tcx) => match tcx.opt_item_name(self.function.to_def_id()) {
Some(tcx) => match tcx.opt_item_name(self.function) {
Some(name) => name.fmt(f),
None => write!(f, "<closure>"),
},
Expand Down
4 changes: 2 additions & 2 deletions crates/flowistry_pdg_construction/src/async_support.rs
Original file line number Diff line number Diff line change
Expand Up @@ -173,14 +173,14 @@ fn get_async_generator<'tcx>(body: &Body<'tcx>) -> (LocalDefId, GenericArgsRef<'
/// which in this case is guaranteed to satisfy [`Asyncness::is_async`].
pub fn determine_async<'tcx>(
tcx: TyCtxt<'tcx>,
def_id: LocalDefId,
def_id: DefId,
body: &Body<'tcx>,
) -> Option<(Instance<'tcx>, Location, AsyncType)> {
let ((generator_def_id, args, loc), asyncness) = if tcx.asyncness(def_id).is_async() {
(get_async_generator(body), AsyncType::Fn)
} else {
(
try_as_async_trait_function(tcx, def_id.to_def_id(), body)?,
try_as_async_trait_function(tcx, def_id, body)?,
AsyncType::Trait,
)
};
Expand Down
246 changes: 246 additions & 0 deletions crates/flowistry_pdg_construction/src/body_cache.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,246 @@
use std::path::PathBuf;

use flowistry::mir::FlowistryInput;

use polonius_engine::FactTypes;
use rustc_borrowck::consumers::{ConsumerOptions, RustcFacts};
use rustc_hir::{
def_id::{CrateNum, DefId, LocalDefId, LOCAL_CRATE},
intravisit::{self},
};
use rustc_macros::{Decodable, Encodable, TyDecodable, TyEncodable};
use rustc_middle::{
hir::nested_filter::OnlyBodies,
mir::{Body, ClearCrossCrate, StatementKind},
ty::TyCtxt,
};

use rustc_utils::cache::Cache;

use crate::encoder::{decode_from_file, encode_to_file};

/// A mir [`Body`] and all the additional borrow checking facts that our
/// points-to analysis needs.
#[derive(TyDecodable, TyEncodable, Debug)]
pub struct CachedBody<'tcx> {
body: Body<'tcx>,
input_facts: FlowistryFacts,
}

impl<'tcx> CachedBody<'tcx> {
/// Retrieve a body and the necessary facts for a local item.
///
/// Ensure this is called early enough in the compiler
/// (like `after_expansion`) so that the body has not been stolen yet.
fn retrieve(tcx: TyCtxt<'tcx>, local_def_id: LocalDefId) -> Self {
let mut body_with_facts = rustc_borrowck::consumers::get_body_with_borrowck_facts(
tcx,
local_def_id,
ConsumerOptions::PoloniusInputFacts,
);

clean_undecodable_data_from_body(&mut body_with_facts.body);

Self {
body: body_with_facts.body,
input_facts: FlowistryFacts {
subset_base: body_with_facts.input_facts.unwrap().subset_base,
},
}
}
}

impl<'tcx> FlowistryInput<'tcx> for &'tcx CachedBody<'tcx> {
fn body(self) -> &'tcx Body<'tcx> {
&self.body
}

fn input_facts_subset_base(
self,
) -> &'tcx [(
<RustcFacts as FactTypes>::Origin,
<RustcFacts as FactTypes>::Origin,
<RustcFacts as FactTypes>::Point,
)] {
&self.input_facts.subset_base
}
}

/// The subset of borrowcheck facts that the points-to analysis (flowistry)
/// needs.
#[derive(Debug, Encodable, Decodable)]
pub struct FlowistryFacts {
pub subset_base: Vec<(
<RustcFacts as FactTypes>::Origin,
<RustcFacts as FactTypes>::Origin,
<RustcFacts as FactTypes>::Point,
)>,
}

pub type LocationIndex = <RustcFacts as FactTypes>::Point;

/// Allows loading bodies from previosly written artifacts.
///
/// Ensure this cache outlives any flowistry analysis that is performed on the
/// bodies it returns or risk UB.
pub struct BodyCache<'tcx> {
tcx: TyCtxt<'tcx>,
cache: Cache<DefId, CachedBody<'tcx>>,
}

impl<'tcx> BodyCache<'tcx> {
pub fn new(tcx: TyCtxt<'tcx>) -> Self {
Self {
tcx,
cache: Default::default(),
}
}

/// Serve the body from the cache or read it from the disk.
///
/// Returns `None` if the policy forbids loading from this crate.
pub fn get(&self, key: DefId) -> Option<&'tcx CachedBody<'tcx>> {
let cbody = self.cache.get(key, |_| load_body_and_facts(self.tcx, key));
// SAFETY: Theoretically this struct may not outlive the body, but
// to simplify lifetimes flowistry uses 'tcx anywhere. But if we
// actually try to provide that we're risking race conditions
// (because it needs global variables like MIR_BODIES).
//
// So until we fix flowistry's lifetimes this is good enough.
unsafe { std::mem::transmute(cbody) }
}
}

/// A visitor to collect all bodies in the crate and write them to disk.
struct DumpingVisitor<'tcx> {
tcx: TyCtxt<'tcx>,
target_dir: PathBuf,
}

/// Some data in a [Body] is not cross-crate compatible. Usually because it
/// involves storing a [LocalDefId]. This function makes sure to sanitize those
/// out.
fn clean_undecodable_data_from_body(body: &mut Body) {
for scope in body.source_scopes.iter_mut() {
scope.local_data = ClearCrossCrate::Clear;
}

for stmt in body
.basic_blocks_mut()
.iter_mut()
.flat_map(|bb| bb.statements.iter_mut())
{
if matches!(stmt.kind, StatementKind::FakeRead(_)) {
stmt.make_nop()
}
}
}

impl<'tcx> intravisit::Visitor<'tcx> for DumpingVisitor<'tcx> {
type NestedFilter = OnlyBodies;
fn nested_visit_map(&mut self) -> Self::Map {
self.tcx.hir()
}

fn visit_fn(
&mut self,
function_kind: intravisit::FnKind<'tcx>,
function_declaration: &'tcx rustc_hir::FnDecl<'tcx>,
body_id: rustc_hir::BodyId,
_: rustc_span::Span,
local_def_id: rustc_hir::def_id::LocalDefId,
) {
let to_write = CachedBody::retrieve(self.tcx, local_def_id);

let dir = &self.target_dir;
let path = dir.join(
self.tcx
.def_path(local_def_id.to_def_id())
.to_filename_friendly_no_crate(),
);

if !dir.exists() {
std::fs::create_dir(dir).unwrap();
}

encode_to_file(self.tcx, path, &to_write);

intravisit::walk_fn(
self,
function_kind,
function_declaration,
body_id,
local_def_id,
)
}
}

/// A complete visit over the local crate items, collecting all bodies and
/// calculating the necessary borrowcheck facts to store for later points-to
/// analysis.
///
/// Ensure this gets called early in the compiler before the unoptimmized mir
/// bodies are stolen.
pub fn dump_mir_and_borrowck_facts(tcx: TyCtxt) {
let mut vis = DumpingVisitor {
tcx,
target_dir: intermediate_out_dir(tcx, INTERMEDIATE_ARTIFACT_EXT),
};
tcx.hir().visit_all_item_likes_in_crate(&mut vis);
}

const INTERMEDIATE_ARTIFACT_EXT: &str = "bwbf";

/// Get the path where artifacts from this crate would be stored. Unlike
/// [`TyCtxt::crate_extern_paths`] this function does not crash when supplied
/// with [`LOCAL_CRATE`].
pub fn local_or_remote_paths(krate: CrateNum, tcx: TyCtxt, ext: &str) -> Vec<PathBuf> {
if krate == LOCAL_CRATE {
vec![intermediate_out_dir(tcx, ext)]
} else {
tcx.crate_extern_paths(krate)
.iter()
.map(|p| p.with_extension(ext))
.collect()
}
}

/// Try to load a [`CachedBody`] for this id.
fn load_body_and_facts(tcx: TyCtxt<'_>, def_id: DefId) -> CachedBody<'_> {
let paths = local_or_remote_paths(def_id.krate, tcx, INTERMEDIATE_ARTIFACT_EXT);
for path in &paths {
let path = path.join(tcx.def_path(def_id).to_filename_friendly_no_crate());
if let Ok(data) = decode_from_file(tcx, path) {
return data;
};
}

panic!("No facts for {def_id:?} found at any path tried: {paths:?}");
}

/// Create the name of the file in which to store intermediate artifacts.
///
/// HACK(Justus): `TyCtxt::output_filenames` returns a file stem of
/// `lib<crate_name>-<hash>`, whereas `OutputFiles::with_extension` returns a file
/// stem of `<crate_name>-<hash>`. I haven't found a clean way to get the same
/// name in both places, so i just assume that these two will always have this
/// relation and prepend the `"lib"` here.
pub fn intermediate_out_dir(tcx: TyCtxt, ext: &str) -> PathBuf {
let rustc_out_file = tcx.output_filenames(()).with_extension(ext);
let dir = rustc_out_file
.parent()
.unwrap_or_else(|| panic!("{} has no parent", rustc_out_file.display()));
let file = rustc_out_file
.file_name()
.unwrap_or_else(|| panic!("has no file name"))
.to_str()
.unwrap_or_else(|| panic!("not utf8"));

let file = if file.starts_with("lib") {
std::borrow::Cow::Borrowed(file)
} else {
format!("lib{file}").into()
};

dir.join(file.as_ref())
}
Loading

0 comments on commit 54bf07d

Please sign in to comment.