-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deduplicate crates in crate graph #14476
Comments
@Veykril what would be the best aproach on this? I'm thinking in two ways of doing it. 1- Something like this: pub fn extend(&mut self, other: CrateGraph, proc_macros: &mut ProcMacroPaths) -> u32 {
let start = self.arena.len() as u32;
let other: Vec<_> = other
.arena
.into_iter()
.filter(|(_, data)| {
!self.arena.iter().any(|(_, cd)| cd.root_file_id == data.root_file_id)
})
.map(|(_, mut data)| {
for dep in &mut data.dependencies {
let new_crate_id = RawIdx::from(u32::from(dep.crate_id.into_raw()) + start);
dep.crate_id = CrateId::from_raw(new_crate_id);
}
data
})
.collect();
self.arena.extend(other);
*proc_macros = mem::take(proc_macros)
.into_iter()
.map(|(id, macros)| {
(CrateId::from_raw(RawIdx::from(u32::from(id.into_raw()) + start)), macros)
})
.collect();
start
} This applies depuplication only to the extend methods, dunno if we should worry of other points of insertion. 2- Making some sort of an ArenaSet, using an IndexSet instead of a Vec, this would implie that every 'T' should implement 'Hash', and I'm not sure if this would be worse performance wise, due to the hashing. |
|
Isn't the root_file_id always the same if pointed to the same path? And if it points to the same path doen't mean that it points to the same crate? |
That aside the dependency crate id fix up in your code is wrong though, once we start deduplicating shifting ideas naively won't work, we'd need to keep a mapping from duplicate IDs to non-duplicate ID I believe, which in turns might make the order we process these in relevant. |
Deduplicate crates when extending crate graphs This is quadratic in runtime per deduplication attempt, but I don't think that'll be a problem for the workload here. Don't be scared of the diff, the actual diff is +42 -22, the rest is tests and test data. Fixes #14476
It's incredible that we aren't actually doing this already. When we load a project that contains multiple workspaces we do not deduplicate equal crates in the crategraph! That means just for the sysroot alone that it is loaded multiple times into memory and queries executed on the sysroot of one workspace are not reused for another. It wouldn't surprise me if some of the high memory usage reports we get comes from projects that have a lot of cargo workspaces in them.
Here all we do is append crates, but we are not deduplicating them. Nor are we doing so after having built the final crategraph.
rust-analyzer/crates/base-db/src/input.rs
Lines 495 to 515 in 2365762
The text was updated successfully, but these errors were encountered: