-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add node-removal methods linear in node degree #1083
Conversation
The existing method, `PyDiGraph.remove_node_retain_edges` is quadratic in node degree, because the ``condition`` function takes in pairs of edges. This poses a problem for extreme-degree nodes (for example massive barriers in Qiskit). This commit adds two methods based on making edge-retention decisions by hashable keys, making it linear in the degree of the node (at least; the MIMO broadcasting can make it quadratic again if all edges have the same key, but that's fundamental to the output, rather than the algorithm). The ideal situation (for performance) is that the edges can be disambiguated by Python object identity, which doesn't require Python-space calls to retrieve or hash, so can be in pure Rust. This is `remove_node_retain_edges_by_id`. The more general situation is that the user wants to supply a Python key function, which naturally returns a Python object that we need to use Python hashing and equality semantics for. This means using Python collections to do the tracking, which impacts the performance (very casual benchmarking using the implicit identity function as the key shows it's about 2x slower than using the identity). This method is `remove_node_retain_edges_by_key`.
Pull Request Test Coverage Report for Build 8576435630Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this LGTM. The only real comment I have is about the the struct used for the key function. I think there is a way to avoid some of the python overhead there.
const INLINE_SIZE: usize = | ||
2 * ::std::mem::size_of::<usize>() / ::std::mem::size_of::<NodeIndex>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that this is future proof and supports 32bit platforms well. For the foreseeable future NodeIndex
is a u32
. We looked at making it a usize
a while ago but the extra memory overhead (on 64bit platforms) to support > 4 billion nodes or edges wasn't worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly I just don't like magic numbers and I don't trust myself to remember what they mean haha
src/digraph.rs
Outdated
if let Some(edge_data) = out_edges.get_item(key_value.as_ref(py))? { | ||
let edge_data = edge_data.extract::<RemoveNodeEdgeValue>()?; | ||
edge_data.nodes.as_ref(py).append(edge.target().index())? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused why do we need to use a PyList
here? We're just storing a list of node indices right? Can't that just store that in a rust with a Vec<NodeIndex>
? I get you're using a PyDict
so we can have the key be return from the key func, but nothing precludes storing Vec<NodeIndex>
in RemoveNodeEdgeValue
right. It just needs to be a pyclass so it can go in the pydict but the contents don't have to be useful for Python. This would let us avoid the python overhead for manipulating the nodes list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RemoveNodeEdgeValue
already is a pyclass
, but it looks like I've just borked the conversion - I clearly meant downcast
and not extract
. While it's extract
, the Rust struct is cloned in, so the list needed to be stored only by references so the mutations were shared.
Fixed (along with other PyO3-0.21 stuff) in ab20d3c.
This is partly a PyO3 0.21 upgrade, partly fixing a dodgy `extract` in favour of `downcast`.
#[pyclass] | ||
struct RemoveNodeEdgeValue { | ||
weight: Py<PyAny>, | ||
nodes: Vec<NodeIndex>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You used a smallvec in the identity function does the same optimization hold for this? Or because we end up storing this in Python it doesn't matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd guess that in this case any locality effects are overshadowed by the requirement to reach through a few Python pointers to get here in the first place (and to call the Python-space key function), but I can also change it over if you prefer - I don't imagine it hurts for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, it's fine that's a good reason to not bother. It wouldn't make a big difference if there was any in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the quick updates.
#[pyclass] | ||
struct RemoveNodeEdgeValue { | ||
weight: Py<PyAny>, | ||
nodes: Vec<NodeIndex>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, it's fine that's a good reason to not bother. It wouldn't make a big difference if there was any in this case.
The existing method,
PyDiGraph.remove_node_retain_edges
is quadratic in node degree, because thecondition
function takes in pairs of edges. This poses a problem for extreme-degree nodes (for example massive barriers in Qiskit).This commit adds two methods based on making edge-retention decisions by hashable keys, making it linear in the degree of the node (at least; the MIMO broadcasting can make it quadratic again if all edges have the same key, but that's fundamental to the output, rather than the algorithm).
The ideal situation (for performance) is that the edges can be disambiguated by Python object identity, which doesn't require Python-space calls to retrieve or hash, so can be in pure Rust. This is
remove_node_retain_edges_by_id
.The more general situation is that the user wants to supply a Python key function, which naturally returns a Python object that we need to use Python hashing and equality semantics for. This means using Python collections to do the tracking, which impacts the performance (very casual benchmarking using the implicit identity function as the key shows it's about 2x slower than using the identity). This method is
remove_node_retain_edges_by_key
.Using a small vec for the edge data (as opposed to
Vec
) kept the performance of the case where in- and out-edges are close to a bijection sufficiently similar in my casual benchmarking to assuming a bijection and not allowing broadacsting at all. Using aVec
(and consequently always having pointer indirection to retrieve the nodes) came with a performance penalty (although I stupidly didn't actually write down the numbers on that).