Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nx-cugraph: support should_run that was added in NetworkX 3.3 #4348

Merged
merged 8 commits into from
Apr 29, 2024

Conversation

eriknw
Copy link
Contributor

@eriknw eriknw commented Apr 16, 2024

Also, update complete_bipartite_graph for nx dev

edit: This does not yet add per-algorithm should_run, but it makes doing so possible.

Add should_run for triangles, clustering, is_isolate

@eriknw eriknw requested a review from a team as a code owner April 16, 2024 15:43
@eriknw
Copy link
Contributor Author

eriknw commented Apr 17, 2024

@rlratzel, do you think we should add should_run elsewhere? Maybe is_negatively_weighted or number_of_selfloops?

Copy link
Contributor

@rlratzel rlratzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking forward to having some should_run()s in place - I think that will result in a decent performance boost for no-code-change cases involving smaller graphs. (well, I should say a performance boost compared to not having should_run()s when nx-cugraph is installed)

Comment on lines 51 to 54
if nx.__version__[:3] <= "3.2":
name = f"complete_bipartite_graph({orig_n1}, {orig_n2})"
else:
name = f"complete_bipartite_graph({n1}, {n2})"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious what name is used for and why it has to be different for newer versions of NX. From looking at from_coo(), I think name is an attr that is applied to the new graph object, but I didn't notice how it's used from there. Would a brief comment be out-of-place here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is context for posterity: networkx/networkx#7399

def _(G, nodes=None):
if nodes is None or nodes not in G:
return True
return "Fast algorithm when computing for a single node; not worth converting."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this and other should_runs: does should_run get called if the user passes in a backend graph (which does not need converting)? I remember we honor backend= regardless of can/should_run, but is passing a backend graph treated as if a user set backend= precedence-wise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should_run is only used when the graph needs to be converted. Hence, should_convert_and_run would be a more accurate name, but I as recall you liked the shorter name should_run. The backend algorithm will always be used if a backend graph if passed or backend= argument is used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eriknw and I talked about this offline. The use case I was envisioning was one where should_run implementations can choose whether or not a backend algorithm is not ideally applied to a particular set of inputs, regardless of if the graph has been converted. For example, if a backend is optimized for very large graphs and the graph is already converted but very small, that backend may choose to let other backends run by returning False for should_run.

@eriknw pointed out that a backend graph, in that situation, would have to be converted to another backend graph type (backend-to-backend) or to a NX graph (backend-to-nx) in order to continue, and neither of those are currently supported. So for now, should_run only applies to calls that require a conversion.

@rlratzel
Copy link
Contributor

@rlratzel, do you think we should add should_run elsewhere? Maybe is_negatively_weighted or number_of_selfloops?

Yes, those might be good but degree_centrality and descendants_at_distance would definitely benefit:

image
image

@eriknw
Copy link
Contributor Author

eriknw commented Apr 26, 2024

Thanks @rlratzel. I added should_run to degree centralities and number_of_selfloop.

To try to follow DRY principle, I'm considering an enhancement to add a module that has a collection of should_run or string constants of the reasons. We have a lot of copy/paste right now.

I also wonder about when an input graph already has a pre-converted backend graph. If it does, then should should_run even be run? Should networkx be smart enough to know that no conversion is necessary and not call should_run? Or, should it be the responsibility of backends to look at the cache and decide whether it should be run? I think I know my preference, but I'm curious to hear your thoughts.

@rlratzel
Copy link
Contributor

rlratzel commented Apr 26, 2024

I also wonder about when an input graph already has a pre-converted backend graph. If it does, then should should_run even be run? Should networkx be smart enough to know that no conversion is necessary and not call should_run? Or, should it be the responsibility of backends to look at the cache and decide whether it should be run? I think I know my preference, but I'm curious to hear your thoughts.

I personally like giving more autonomy to backends to make decisions that are best for them. Similar to my initial thought of having the dispatcher call should_run even if the input graph is a backend graph type - e.g. "maybe this particular algo on this particular graph is still not a good choice for this backend despite already being converted, even though it's technically possible", but as discussed above, I understand why we're not currently doing that. So for the question of "if a backend graph conversion is cached, should should_run still be called", my vote is 'yes'.

@rlratzel rlratzel added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 26, 2024
@rlratzel
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit 226e7de into rapidsai:branch-24.06 Apr 29, 2024
141 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants