Add node-removal methods linear in node degree #1083

jakelishman · 2024-02-08T15:17:53Z

The existing method, PyDiGraph.remove_node_retain_edges is quadratic in node degree, because the condition function takes in pairs of edges. This poses a problem for extreme-degree nodes (for example massive barriers in Qiskit).

This commit adds two methods based on making edge-retention decisions by hashable keys, making it linear in the degree of the node (at least; the MIMO broadcasting can make it quadratic again if all edges have the same key, but that's fundamental to the output, rather than the algorithm).

The ideal situation (for performance) is that the edges can be disambiguated by Python object identity, which doesn't require Python-space calls to retrieve or hash, so can be in pure Rust. This is remove_node_retain_edges_by_id.

The more general situation is that the user wants to supply a Python key function, which naturally returns a Python object that we need to use Python hashing and equality semantics for. This means using Python collections to do the tracking, which impacts the performance (very casual benchmarking using the implicit identity function as the key shows it's about 2x slower than using the identity). This method is remove_node_retain_edges_by_key.

Using a small vec for the edge data (as opposed to Vec) kept the performance of the case where in- and out-edges are close to a bijection sufficiently similar in my casual benchmarking to assuming a bijection and not allowing broadacsting at all. Using a Vec (and consequently always having pointer indirection to retrieve the nodes) came with a performance penalty (although I stupidly didn't actually write down the numbers on that).

The existing method, `PyDiGraph.remove_node_retain_edges` is quadratic in node degree, because the ``condition`` function takes in pairs of edges. This poses a problem for extreme-degree nodes (for example massive barriers in Qiskit). This commit adds two methods based on making edge-retention decisions by hashable keys, making it linear in the degree of the node (at least; the MIMO broadcasting can make it quadratic again if all edges have the same key, but that's fundamental to the output, rather than the algorithm). The ideal situation (for performance) is that the edges can be disambiguated by Python object identity, which doesn't require Python-space calls to retrieve or hash, so can be in pure Rust. This is `remove_node_retain_edges_by_id`. The more general situation is that the user wants to supply a Python key function, which naturally returns a Python object that we need to use Python hashing and equality semantics for. This means using Python collections to do the tracking, which impacts the performance (very casual benchmarking using the implicit identity function as the key shows it's about 2x slower than using the identity). This method is `remove_node_retain_edges_by_key`.

coveralls · 2024-02-08T15:45:47Z

Pull Request Test Coverage Report for Build 8576435630

Details

113 of 115 (98.26%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.01%) to 96.52%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/digraph.rs	113	115	98.26%

Totals
Change from base Build 8571532908:	0.01%
Covered Lines:	17305
Relevant Lines:	17929

💛 - Coveralls

mtreinish

Overall this LGTM. The only real comment I have is about the the struct used for the key function. I think there is a way to avoid some of the python overhead there.

mtreinish · 2024-04-04T23:26:02Z

src/digraph.rs

+        const INLINE_SIZE: usize =
+            2 * ::std::mem::size_of::<usize>() / ::std::mem::size_of::<NodeIndex>();


I like that this is future proof and supports 32bit platforms well. For the foreseeable future NodeIndex is a u32. We looked at making it a usize a while ago but the extra memory overhead (on 64bit platforms) to support > 4 billion nodes or edges wasn't worth it.

mostly I just don't like magic numbers and I don't trust myself to remember what they mean haha

mtreinish · 2024-04-04T23:39:38Z

src/digraph.rs

+                if let Some(edge_data) = out_edges.get_item(key_value.as_ref(py))? {
+                    let edge_data = edge_data.extract::<RemoveNodeEdgeValue>()?;
+                    edge_data.nodes.as_ref(py).append(edge.target().index())?


I'm a bit confused why do we need to use a PyList here? We're just storing a list of node indices right? Can't that just store that in a rust with a Vec<NodeIndex>? I get you're using a PyDict so we can have the key be return from the key func, but nothing precludes storing Vec<NodeIndex> in RemoveNodeEdgeValue right. It just needs to be a pyclass so it can go in the pydict but the contents don't have to be useful for Python. This would let us avoid the python overhead for manipulating the nodes list.

RemoveNodeEdgeValue already is a pyclass, but it looks like I've just borked the conversion - I clearly meant downcast and not extract. While it's extract, the Rust struct is cloned in, so the list needed to be stored only by references so the mutations were shared.

Fixed (along with other PyO3-0.21 stuff) in ab20d3c.

This is partly a PyO3 0.21 upgrade, partly fixing a dodgy `extract` in favour of `downcast`.

mtreinish · 2024-04-05T12:18:26Z

src/digraph.rs

+#[pyclass]
+struct RemoveNodeEdgeValue {
+    weight: Py<PyAny>,
+    nodes: Vec<NodeIndex>,


You used a smallvec in the identity function does the same optimization hold for this? Or because we end up storing this in Python it doesn't matter?

I'd guess that in this case any locality effects are overshadowed by the requirement to reach through a few Python pointers to get here in the first place (and to call the Python-space key function), but I can also change it over if you prefer - I don't imagine it hurts for sure.

Nah, it's fine that's a good reason to not bother. It wouldn't make a big difference if there was any in this case.

mtreinish

LGTM, thanks for the quick updates.

mtreinish · 2024-04-05T22:27:28Z

src/digraph.rs

+#[pyclass]
+struct RemoveNodeEdgeValue {
+    weight: Py<PyAny>,
+    nodes: Vec<NodeIndex>,


Nah, it's fine that's a good reason to not bother. It wouldn't make a big difference if there was any in this case.

jakelishman mentioned this pull request Feb 8, 2024

Fix overhead of DAGCircuit.remove_op_node Qiskit/qiskit#11677

Closed

Format

f86e8b7

jakelishman mentioned this pull request Feb 8, 2024

Optimize BarrierBeforeFinalMeasurement pass Qiskit/qiskit#11739

Closed

mtreinish added this to the 0.15.0 milestone Feb 21, 2024

mtreinish reviewed Apr 4, 2024

View reviewed changes

jakelishman added 3 commits April 5, 2024 11:54

Merge remote-tracking branch 'ibm/main' into remove-node-key

e49fb56

Update conversion actions to be by Bound reference

ab20d3c

This is partly a PyO3 0.21 upgrade, partly fixing a dodgy `extract` in favour of `downcast`.

Update out-of-date comment

9f3a0a7

mtreinish reviewed Apr 5, 2024

View reviewed changes

mtreinish approved these changes Apr 5, 2024

View reviewed changes

mtreinish added the automerge Queue a approved PR for merging label Apr 5, 2024

Merge branch 'main' into remove-node-key

bdb8164

mergify bot merged commit 783f30c into Qiskit:main Apr 5, 2024
30 checks passed

jakelishman deleted the remove-node-key branch April 5, 2024 23:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add node-removal methods linear in node degree #1083

Add node-removal methods linear in node degree #1083

jakelishman commented Feb 8, 2024

coveralls commented Feb 8, 2024 •

edited

Loading

mtreinish left a comment

mtreinish Apr 4, 2024

jakelishman Apr 5, 2024

mtreinish Apr 4, 2024 •

edited

Loading

jakelishman Apr 5, 2024

mtreinish Apr 5, 2024 •

edited

Loading

jakelishman Apr 5, 2024

mtreinish Apr 5, 2024

mtreinish left a comment

mtreinish Apr 5, 2024

		const INLINE_SIZE: usize =
		2 * ::std::mem::size_of::<usize>() / ::std::mem::size_of::<NodeIndex>();

Add node-removal methods linear in node degree #1083

Add node-removal methods linear in node degree #1083

Conversation

jakelishman commented Feb 8, 2024

coveralls commented Feb 8, 2024 • edited Loading

Pull Request Test Coverage Report for Build 8576435630

Details

💛 - Coveralls

mtreinish left a comment

Choose a reason for hiding this comment

mtreinish Apr 4, 2024

Choose a reason for hiding this comment

jakelishman Apr 5, 2024

Choose a reason for hiding this comment

mtreinish Apr 4, 2024 • edited Loading

Choose a reason for hiding this comment

jakelishman Apr 5, 2024

Choose a reason for hiding this comment

mtreinish Apr 5, 2024 • edited Loading

Choose a reason for hiding this comment

jakelishman Apr 5, 2024

Choose a reason for hiding this comment

mtreinish Apr 5, 2024

Choose a reason for hiding this comment

mtreinish left a comment

Choose a reason for hiding this comment

mtreinish Apr 5, 2024

Choose a reason for hiding this comment

coveralls commented Feb 8, 2024 •

edited

Loading

mtreinish Apr 4, 2024 •

edited

Loading

mtreinish Apr 5, 2024 •

edited

Loading