Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nx-cugraph: add weakly connected components #4071

Merged
merged 8 commits into from
Jan 17, 2024

Conversation

eriknw
Copy link
Contributor

@eriknw eriknw commented Dec 28, 2023

This doesn't currently work, because plc.weakly_connected_components only works on symmetric graphs (so it's not actually performing wcc now is it?):

RuntimeError: non-success value returned from cugraph_weakly_connected_components: CUGRAPH_UNKNOWN_ERROR cuGraph failure at file=[...]/cugraph/cpp/src/components/weakly_connected_components_impl.cuh line=283: Invalid input argument: input graph should be symmetric for weakly connected components.

These are high-priority algorithms for nx-cugraph, because they are widely used by networkx dependents.

@eriknw eriknw added DO NOT MERGE Hold off on merging; see PR for details python labels Dec 28, 2023
@eriknw eriknw requested a review from a team as a code owner December 28, 2023 11:19
@eriknw eriknw added the improvement Improvement / enhancement to an existing function label Dec 28, 2023
@ChuckHastings
Copy link
Collaborator

So, I suppose this goes to your definition of weakly connected components. Networkx defines this as you describe:

  • components is what we have implemented and finds the connected components of an undirected graph
  • weakly_connected_components operates only on a directed graph but computes as if the graph were undirected
  • strongly_connected_components operates only on a directed graph and actively considers direction

A quick scan of the literature suggests that while this is perhaps a good labeling (matches Knuth, always a plus in my mind), that it's hardly universal.

Our implementation at the C++ level computes components as defined by networkx. In order to get weakly_connected_components as defined by networkx you would need to symmetrize the edge list. This will give you exactly the correct answer. If you look at the networkx implementation, all it does is traverse the incoming and outgoing edges instead of just one direction, by symmetrizing the graph we get all of the edges going in both directions so we can do it in one pass.

We don't really have an efficient way of traversing the edges backwards from whatever orientation that we have. Our CSR/DCSR representation only efficiently reflects one direction.

We have several options we can pursue:

  1. You could symmetrize the input edge list prior to creating the graph (less work for me, but probably not a great long term solution)
  2. We have a C++ function that will do this efficiently. We could expose this function via the C API
  3. We could add a parameter to the graph construction to let the graph construction call know that we should symmetrize the input prior to constructing the graph.
  4. We could add logic to the C API that would symmetrize the graph locally for just the WCC call. If you were to call our WCC algorithm with a directed graph we would execute this logic. The downside of this is that we would double the memory used, since we would presumably have to maintain the unsymmetrized graph as well as the symmetrized graph. This seems like a bad option.

I'm leaning toward option 3, but am open to other options.

@eriknw eriknw removed the DO NOT MERGE Hold off on merging; see PR for details label Jan 5, 2024
@eriknw
Copy link
Contributor Author

eriknw commented Jan 5, 2024

Thanks for the thoughtful and helpful reply @ChuckHastings!

I went ahead with option 1 where we symmetrize in Python before creating the PLC graph. This is probably good enough for a while. Nevertheless, I did it in a way that will let us easily switch to options 2 or 3 if/when available.

I chose the keyword argument symmetrize="union" (used by WCC) and symmetrize="intersection" (will probably use for another algorithm soon).

Related topic: is SCC on your radar to do any time soon? I'm less familiar with SCC algorithms and literature, but maybe https://doi.org/10.14778/2733085.2733089

@eriknw eriknw changed the title nx-cugraph: add weakly connected components (PLC needs updated!) nx-cugraph: add weakly connected components Jan 5, 2024
@ChuckHastings
Copy link
Collaborator

Related topic: is SCC on your radar to do any time soon? I'm less familiar with SCC algorithms and literature, but maybe https://doi.org/10.14778/2733085.2733089

We have a low priority activity exploring SCC. Constructing a good algorithm for the GPU is complicated as the best serial algorithms use depth-first search which doesn't parallelize well. I wouldn't expect a good implementation this year unless we find some customer that urgently wants it.

Copy link
Contributor

@rlratzel rlratzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I did point out something this PR might need from this PR that might make it worth waiting for and updating.

@rlratzel rlratzel added the non-breaking Non-breaking change label Jan 8, 2024
Copy link
Contributor

@rlratzel rlratzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I had one question below which need not hold up approval.

def number_strongly_connected_components(G):
G = _to_directed_graph(G)
if G.src_indices.size == 0:
return len(G)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, no action needed: is this expected to always return 0 in this case? If so, is there a reason calling len() is preferred over just returning 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should use G.number_of_edges() instead of G.src_indices.size (but for some reason the latter is easier for me to remember and reason about). Anyway, if the number of edges are zero, the the number of components is the number of nodes, hence we can't simply return 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may update to use number_of_edges lots of places for clarity in a different PR. I agree this shouldn't hold up this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, number_of_edges actually does a lot more work. If we want to know if there are exactly 0 edges, G.src_indices.size works great.

@rlratzel
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit 8672534 into rapidsai:branch-24.02 Jan 17, 2024
98 checks passed
rapids-bot bot pushed a commit that referenced this pull request Jan 19, 2024
NetworkX tests are somewhat underspecified regarding how to handle self-loops for these algorithms. Also, I'm not sure if transitivity is supposed to work on directed graphs.

Once #4071 is merged, it should be easy to add `is_bipartite` function (and maybe others?).

Authors:
  - Erik Welch (https://github.com/eriknw)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)

URL: #4093
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants