-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nx-cugraph: add k_truss and degree centralities #3945
Conversation
New algorithms: - `degree_centrality` - `in_degree_centrality` - `k_truss` - `number_of_selfloops` - `out_degree_centrality Also, rename `row_indices, col_indices` to `src_indices, dst_indices`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Erik, LGTM, I just had two requests/questions.
Also, are we getting good coverage via the NX test suite? We were bit by this for edge BC so I want to make sure we're covered.
"Consider using G.remove_edges_from(nx.selfloop_edges(G))." | ||
) | ||
# TODO: create renumbering helper function(s) | ||
if k < 3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment as an overview of what's being done differently here vs. the else block and why? (eg. PLC isn't even being called, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I did leave a comment: we drop nodes with zero degree. No need to call into pylibcugraph just for that. What comment would you find helpful here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I meant a comment describing the overall goal of the entire k < 3
block vs. the else
block. I admittedly don't know much about k_truss (maybe that's the problem?), but I was hoping to get at least a rough idea why they're so different. For instance, I'm curious why plc.k_truss_subgraph()
couldn't just be used unconditionally even when k < 3, or why we wouldn't remove zero degree nodes in both cases. However, if this would require an explanation of the k_truss algo itself, then feel free to ignore this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Take a quick look at k_truss documentation:
https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.core.k_truss.html
See: "[...] where every edge is incident to at least k-2
triangles.". So, k < 3
is an edge case.
I'm curious why plc.k_truss_subgraph() couldn't just be used unconditionally even when k < 3
It can--we can delete this entire branch if we wanted. PLC behaves the same as networkx for 0 <= k < 3
. We do the branch b/c it is faster, and it lets us check the "stop early" condition if there are no nodes of degree zero. I like fast, and I don't think this branch is excessively complicated.
why we wouldn't remove zero degree nodes in both cases
k_truss always removes zero-degree nodes. PLC's k_truss does this for us when we call it.
I suppose I can add a couple comments to the code like I just explained here ;) --thanks for the discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More detailed code comment added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that explains everything and is really helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I remember the primary reason I made this code branch in the first place and why I think it should exist for now: I was exploring renumbering and wanted to try multiple approaches. I'm sure we'll be doing more renumbering soon.
Great question. Yeah, it's pretty good. I think lines 29, 31, 43, 79 aren't covered, which doesn't concern me atm. After #3954, I think it will be a lot easier to do comparison testing with a variety of input graphs without too much effort. fwiw, I do like to have solid coverage, but our shared tooling today isn't great. It would help if I could quickly find "nx-cugraph" tests in CI. As is, I test for major coverage during development and occasionally look at coverage reports locally. |
I added |
I suppose that helps somewhat. Browsing and searching CI is pretty slow and cumbersome, so my preferred UX would to have an exandable/collapsable |
ah, that would be nice. I'll keep that in mind. I think it would require editing the yaml in the .github folder, which I'm not that familiar with, but I like that idea in general. |
/merge |
New algorithms:
degree_centrality
in_degree_centrality
k_truss
number_of_selfloops
out_degree_centrality
Also, rename
row_indices, col_indices
tosrc_indices, dst_indices