Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CLI] Tree shaking for package dependencies #21211

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

stefan-mysten
Copy link
Contributor

Description

This PR introduces tree shaking for package dependencies. The package dependencies that are not referenced in code (directly or via trans deps) are removed.

Test plan

Added tests in sui/tests/data/tree_shaking

cargo test -- test_tree_shaking

Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

  • Protocol:
  • Nodes (Validators and Full nodes):
  • gRPC:
  • JSON-RPC:
  • GraphQL:
  • CLI: sui move build and sui client publish/upgrade will now by default remove dependencies that are not referenced in source code from being published on-chain.
  • Rust SDK:

@stefan-mysten stefan-mysten requested a review from a team February 13, 2025 22:49
Copy link

vercel bot commented Feb 13, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 14, 2025 8:15pm
2 Skipped Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Feb 14, 2025 8:15pm
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Feb 14, 2025 8:15pm

@stefan-mysten stefan-mysten temporarily deployed to sui-typescript-aws-kms-test-env February 13, 2025 22:49 — with GitHub Actions Inactive
@@ -102,6 +104,8 @@ pub struct BuildConfig {
/// The chain ID that compilation is with respect to (e.g., required to resolve
/// published dependency IDs from the `Move.lock`).
pub chain_id: Option<String>,
/// Include unpublished dependencies in the build
pub with_unpublished_dependencies: bool,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's cleaner to have it in the BuildConfig rather than in the build function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate? Without more context this choice seems strange, because this flag should not affect the outcome of a build at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having seen the rest of the PR, I don't think this is the right course of action. Probably a less invasive move would be to make it so that building does not consume the ResolvedGraph, but returns it along with the CompiledPackage instead. Then you will have enough information as part of publish/upgrade to do the tree shaking there.

Copy link
Contributor Author

@stefan-mysten stefan-mysten Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build is called in a lot more places than creating the BuildConfig struct directly (usually it's via default), so thought it's easier to keep it there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think the ideal thing would be to put the ResolvedGraph in the CompiledPackage (your point about the graph being stored in BuildPlan is also nudging me in that direction).

self.dependency_ids.published.values().cloned().collect()
}

/// Tree-shake the package's dependencies to remove any that are not referenced in source code.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main tree shaking part.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc comment needs a bit more detail:

  • Distinguishing how tree shaking behaves with immediate dependencies vs transitive dependencies.
  • Distinguishing between module dependencies and package dependencies (we eliminate immediate package dependencies when there is no root module with an immediate module dependency to it, but from there onwards we follow edges in the package graph).

Comment on lines +904 to +922
pub fn get_transitive_dependencies<N, E>(graph: &DiGraphMap<N, E>, start: &N) -> BTreeSet<N>
where
N: Ord + Copy + Eq + std::hash::Hash,
{
let mut visited = BTreeSet::new();
let mut dfs = Dfs::new(graph, *start);

// Skip the start node itself
dfs.next(graph);

// Visit all reachable nodes
while let Some(n) = dfs.next(graph) {
visited.insert(n);
}

visited
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure if we want this in external-crates on the DependencyGraph type or we should keep it here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it would be useful on DependencyGraph -- yes! It also means you don't have to add the dependency on petgraph here.

Comment on lines +957 to +964
let (upgrade_policy, compiled_package) =
upgrade_result.map_err(|e| anyhow!("{e}"))?;

let compiled_modules =
compiled_package.get_package_bytes(with_unpublished_dependencies);
let package_id = compiled_package.published_at.clone()?;
let package_digest =
compiled_package.get_package_digest(with_unpublished_dependencies);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a refactoring of the compile_package function which used to return a 4-tuple and now returns a CompiledPackage, which holds the same information.

@amnn
Copy link
Contributor

amnn commented Feb 14, 2025

@stefan-mysten, what's the relationship between this PR and #21168 ? From the descriptions they seem like they would be doing the same thing.

@stefan-mysten
Copy link
Contributor Author

@stefan-mysten, what's the relationship between this PR and #21168 ? From the descriptions they seem like they would be doing the same thing.

This implements the tree shaking without the need to go on chain and fetch linkage tables. It supersedes the previous one.

Copy link
Contributor

@amnn amnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core tree shaking change looks good to me (praise be, thank you for sticking with it). I'm not a fan of adding with_unpublished_dependencies to BuildConfig though, because it has nothing to do with the build (I think that's clear from all the places you now have to add a value for that field that should have no business caring about unpublished dependencies). I've left some suggestions to address that:

  • Don't consume the ResolvedGraph during building, that way you avoid cloning it to do the tree shaking. Consider adding it or the DependencyGraph into the CompiledPackage so that you have access to it later to do the tree-shaking.
  • Instead of making tree shaking modify the published dependency ids just after you build the CompiledPackage, expose a function that calculates the list of published dependencies, post-tree-shaking. This is great because you avoid having to manage a side-effectful operation, and you no longer need to thread with_unpublished_dependencies into the build process (whatever cares about building the linkage table for publish/upgrade will just call this function to get the list of packages, and at that point with_unpublished_dependencies should be readily at hand).

@@ -102,6 +104,8 @@ pub struct BuildConfig {
/// The chain ID that compilation is with respect to (e.g., required to resolve
/// published dependency IDs from the `Move.lock`).
pub chain_id: Option<String>,
/// Include unpublished dependencies in the build
pub with_unpublished_dependencies: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate? Without more context this choice seems strange, because this flag should not affect the outcome of a build at all.

@@ -292,9 +310,9 @@ pub fn build_from_resolution_graph(
}

let result = if print_diags_to_stderr {
BuildConfig::compile_package(resolution_graph, &mut std::io::stderr())
BuildConfig::compile_package(resolution_graph.clone(), &mut std::io::stderr())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resolution graph is not a small structure -- can we take it by reference here for the build instead of cloning it?

Copy link
Contributor Author

@stefan-mysten stefan-mysten Feb 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's passed all the way down to BuildPlan type (see BuildPlan::create function), so probably not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another approach would be to have the BuildPlan be consumed on build so that you can take the ResolvedGraph out of it and put it in the CompiledPackage at that point?

Comment on lines +656 to +669
let root_modules = if with_unpublished_deps {
&self
.package
.all_compiled_units_with_source()
.filter(|m| m.unit.address.into_inner() == AccountAddress::ZERO)
.map(|x| x.unit.clone())
.collect::<Vec<_>>()
} else {
&self
.package
.root_modules()
.map(|x| x.unit.clone())
.collect::<Vec<_>>()
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know what I'm going to say...

Suggested change
let root_modules = if with_unpublished_deps {
&self
.package
.all_compiled_units_with_source()
.filter(|m| m.unit.address.into_inner() == AccountAddress::ZERO)
.map(|x| x.unit.clone())
.collect::<Vec<_>>()
} else {
&self
.package
.root_modules()
.map(|x| x.unit.clone())
.collect::<Vec<_>>()
};
let root_modules: Vec<_> = if with_unpublished_deps {
&self
.package
.all_compiled_units_with_source()
.filter(|m| m.unit.address.into_inner() == AccountAddress::ZERO)
.map(|x| x.unit.clone())
.collect()
} else {
&self
.package
.root_modules()
.map(|x| x.unit.clone())
.collect()
};

/// Return the transitive dependencies of a node in a directed graph
pub fn get_transitive_dependencies<N, E>(graph: &DiGraphMap<N, E>, start: &N) -> BTreeSet<N>
where
N: Ord + Copy + Eq + std::hash::Hash,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where does the Hash constraint come from?

self.dependency_ids.published.values().cloned().collect()
}

/// Tree-shake the package's dependencies to remove any that are not referenced in source code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc comment needs a bit more detail:

  • Distinguishing how tree shaking behaves with immediate dependencies vs transitive dependencies.
  • Distinguishing between module dependencies and package dependencies (we eliminate immediate package dependencies when there is no root module with an immediate module dependency to it, but from there onwards we follow edges in the package graph).

Comment on lines +904 to +922
pub fn get_transitive_dependencies<N, E>(graph: &DiGraphMap<N, E>, start: &N) -> BTreeSet<N>
where
N: Ord + Copy + Eq + std::hash::Hash,
{
let mut visited = BTreeSet::new();
let mut dfs = Dfs::new(graph, *start);

// Skip the start node itself
dfs.next(graph);

// Visit all reachable nodes
while let Some(n) = dfs.next(graph) {
visited.insert(n);
}

visited
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it would be useful on DependencyGraph -- yes! It also means you don't have to add the dependency on petgraph here.

compiled_package.get_package_digest(with_unpublished_dependencies);

// filter out dependencies that are not referenced in the source code.
// find_pkg_dependencies(&client, &mut compiled_package).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove?

.path()
.join("tree_shaking")
.join("F_depends_on_A_as_bytecode_dep");
// let (package_f_id, _) = publish_package(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a test for a case where, e.g. B_depends_on_A_no_code_references was published without tree-shaking with a dependency on A at v1, and then C was published depending on that B, and A at v2. Its linkage table should still contain a mapping for A at v2.

Also if you can, it would be good to split up checks across multiple tests, otherwise if there are multiple failures, you will only see one at a time.

pkgs_to_keep.extend(self.bytecode_deps.iter().map(|(name, _)| *name));

// #5
self.dependency_ids
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than having tree-shaking be a mutating operation, expose a function that returns the list of published dependencies which does the tree shaking and then returns the appropriate list.

@stefan-mysten
Copy link
Contributor Author

Thanks for the review @amnn. I'll have a stab at it later today!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants