Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add node-removal methods linear in node degree #1083

Merged
merged 6 commits into from
Apr 5, 2024

Conversation

jakelishman
Copy link
Member

The existing method, PyDiGraph.remove_node_retain_edges is quadratic in node degree, because the condition function takes in pairs of edges. This poses a problem for extreme-degree nodes (for example massive barriers in Qiskit).

This commit adds two methods based on making edge-retention decisions by hashable keys, making it linear in the degree of the node (at least; the MIMO broadcasting can make it quadratic again if all edges have the same key, but that's fundamental to the output, rather than the algorithm).

The ideal situation (for performance) is that the edges can be disambiguated by Python object identity, which doesn't require Python-space calls to retrieve or hash, so can be in pure Rust. This is remove_node_retain_edges_by_id.

The more general situation is that the user wants to supply a Python key function, which naturally returns a Python object that we need to use Python hashing and equality semantics for. This means using Python collections to do the tracking, which impacts the performance (very casual benchmarking using the implicit identity function as the key shows it's about 2x slower than using the identity). This method is remove_node_retain_edges_by_key.


Using a small vec for the edge data (as opposed to Vec) kept the performance of the case where in- and out-edges are close to a bijection sufficiently similar in my casual benchmarking to assuming a bijection and not allowing broadacsting at all. Using a Vec (and consequently always having pointer indirection to retrieve the nodes) came with a performance penalty (although I stupidly didn't actually write down the numbers on that).

The existing method, `PyDiGraph.remove_node_retain_edges` is quadratic
in node degree, because the ``condition`` function takes in pairs of
edges.  This poses a problem for extreme-degree nodes (for example
massive barriers in Qiskit).

This commit adds two methods based on making edge-retention decisions by
hashable keys, making it linear in the degree of the node (at least;
the MIMO broadcasting can make it quadratic again if all edges have the
same key, but that's fundamental to the output, rather than the
algorithm).

The ideal situation (for performance) is that the edges can be
disambiguated by Python object identity, which doesn't require
Python-space calls to retrieve or hash, so can be in pure Rust.  This is
`remove_node_retain_edges_by_id`.

The more general situation is that the user wants to supply a Python key
function, which naturally returns a Python object that we need to use
Python hashing and equality semantics for.  This means using Python
collections to do the tracking, which impacts the performance (very
casual benchmarking using the implicit identity function as the key
shows it's about 2x slower than using the identity).  This method is
`remove_node_retain_edges_by_key`.
@coveralls
Copy link

coveralls commented Feb 8, 2024

Pull Request Test Coverage Report for Build 8576435630

Details

  • 113 of 115 (98.26%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.01%) to 96.52%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/digraph.rs 113 115 98.26%
Totals Coverage Status
Change from base Build 8571532908: 0.01%
Covered Lines: 17305
Relevant Lines: 17929

💛 - Coveralls

Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this LGTM. The only real comment I have is about the the struct used for the key function. I think there is a way to avoid some of the python overhead there.

Comment on lines +1082 to +1083
const INLINE_SIZE: usize =
2 * ::std::mem::size_of::<usize>() / ::std::mem::size_of::<NodeIndex>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this is future proof and supports 32bit platforms well. For the foreseeable future NodeIndex is a u32. We looked at making it a usize a while ago but the extra memory overhead (on 64bit platforms) to support > 4 billion nodes or edges wasn't worth it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly I just don't like magic numbers and I don't trust myself to remember what they mean haha

src/digraph.rs Outdated
Comment on lines 1201 to 1203
if let Some(edge_data) = out_edges.get_item(key_value.as_ref(py))? {
let edge_data = edge_data.extract::<RemoveNodeEdgeValue>()?;
edge_data.nodes.as_ref(py).append(edge.target().index())?
Copy link
Member

@mtreinish mtreinish Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused why do we need to use a PyList here? We're just storing a list of node indices right? Can't that just store that in a rust with a Vec<NodeIndex>? I get you're using a PyDict so we can have the key be return from the key func, but nothing precludes storing Vec<NodeIndex> in RemoveNodeEdgeValue right. It just needs to be a pyclass so it can go in the pydict but the contents don't have to be useful for Python. This would let us avoid the python overhead for manipulating the nodes list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RemoveNodeEdgeValue already is a pyclass, but it looks like I've just borked the conversion - I clearly meant downcast and not extract. While it's extract, the Rust struct is cloned in, so the list needed to be stored only by references so the mutations were shared.

Fixed (along with other PyO3-0.21 stuff) in ab20d3c.

#[pyclass]
struct RemoveNodeEdgeValue {
weight: Py<PyAny>,
nodes: Vec<NodeIndex>,
Copy link
Member

@mtreinish mtreinish Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You used a smallvec in the identity function does the same optimization hold for this? Or because we end up storing this in Python it doesn't matter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd guess that in this case any locality effects are overshadowed by the requirement to reach through a few Python pointers to get here in the first place (and to call the Python-space key function), but I can also change it over if you prefer - I don't imagine it hurts for sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, it's fine that's a good reason to not bother. It wouldn't make a big difference if there was any in this case.

Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the quick updates.

#[pyclass]
struct RemoveNodeEdgeValue {
weight: Py<PyAny>,
nodes: Vec<NodeIndex>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, it's fine that's a good reason to not bother. It wouldn't make a big difference if there was any in this case.

@mtreinish mtreinish added the automerge Queue a approved PR for merging label Apr 5, 2024
@mergify mergify bot merged commit 783f30c into Qiskit:main Apr 5, 2024
30 checks passed
@jakelishman jakelishman deleted the remove-node-key branch April 5, 2024 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge Queue a approved PR for merging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants