-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add many graph generators to nx-cugraph #3954
Conversation
Also, better handle dtypes for edge values passed to pylibcugraph, which only takes float32 and float64 atm.
Currently, node values aren't used for any values, the only thing they are used for is converting to and from networkx, which we do just fine.
I just updated this PR to allow node values to be numpy arrays (not just cupy arrays) so we can handle str and object dtypes. The only thing we do with node values is convert to/from networkx, which we do just fine. |
Switched to Draft, b/c some work is needed to pass networkx tests. This is 95% finished and can be probably be (mostly) reviewed. |
This is now passing CI :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Work on this PR also resulted in the following NetworkX PRs:
- Fix names of small graphs networkx/networkx#7055
- Fix error message for
nx.mycielski_graph(0)
networkx/networkx#7056 - Disallow negative number of nodes in
complete_multipartite_graph
networkx/networkx#7057
and to make our tests pass using dev version of NetworkX, we need one of these PRs:
# from . import convert_matrix | ||
# from .convert_matrix import * | ||
from . import convert_matrix | ||
from .convert_matrix import * | ||
|
||
# from . import generators | ||
# from .generators import * | ||
from . import generators | ||
from .generators import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Told you this would be coming soon ;)
#3848 (comment)
node_values: dict[AttrKey, cp.ndarray[NodeValue]] | ||
node_masks: dict[AttrKey, cp.ndarray[bool]] | ||
node_values: dict[AttrKey, any_ndarray[NodeValue]] | ||
node_masks: dict[AttrKey, any_ndarray[bool]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how disruptive it would be to allow numpy arrays for edge values too (in a new PR). It may not be bad at all, and we can raise if it has an incompatible dtype.
It could certainly be handy, but supporting this more broadly raises many other questions--how do we let users control whether data is numpy or cupy (for example, when converting from networkx)?
def add_nodes_from(self, nodes_for_adding: Iterable[NodeKey], **attr) -> None: | ||
if self._N != 0: | ||
raise NotImplementedError( | ||
"add_nodes_from is not implemented for graph that already has nodes." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method was added to be compatible with a networkx decorator, which is why it's only partially implemented.
... | ||
# Should we warn? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we warn if integer values are being (safely, if no operations are done) converted to float?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first thought would be "no", but I don't have a strong opinion. I'm guessing this is possibly a warning many users would see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can warn if we ever have any algorithms that could result in different results. Right now we don't, so let's not warn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a code comment based on my previous comment.
@@ -541,12 +625,60 @@ def _get_plc_graph( | |||
do_expensive_check=False, | |||
) | |||
|
|||
def _sort_edge_indices(self, primary="src"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This came in handy twice when working on this PR:
- to generate the small and classic graph datasets
- when testing generators in networkx
It's optionally used by nxcg.to_networkx
.
# We need to renumber indices--np.searchsorted to the rescue! | ||
kwargs["id_to_key"] = nodes.tolist() | ||
src_indices = cp.array(np.searchsorted(nodes, src_array), index_dtype) | ||
dst_indices = cp.array(np.searchsorted(nodes, dst_array), index_dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.searchsorted
is magical here to do renumbering for us (after doing np.unique
)!
} | ||
kwargs["edge_values"] = edge_values | ||
|
||
if graph_class.is_multigraph() and edge_key is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, the multigraph tests in networkx all seem to use string edge values, which we don't support, so this isn't well covered (yet).
_IS_NX32_OR_LESS = nx.__version__[:3] <= "3.2" and ( | ||
len(nx.__version__) <= 3 or not nx.__version__[3].isdigit() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't pretty... I wonder whether we should just depend on packaging
(already a test dependency) to make this easier if we expect to do different things w.r.t. networkx versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaning towards just keeping this in place over adding a dependency.
def _ensure_int(n): | ||
"""Ensure n is integral.""" | ||
return op.index(n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use this trick elsewhere in the code; I'll probably add a better error message here and use this more places, which captures intent better than op.index(n)
(I always add a comment to capture intent).
else: | ||
graph_class = G.__class__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure if it's a good idea for us to support instances of nxcg.Graph
as create_using=
with networkx semantics... but they are networkx semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in reviewing. Looks good, I just had a few comments/questions. I think the theme of my review is mainly related to testing.
... | ||
# Should we warn? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first thought would be "no", but I don't have a strong opinion. I'm guessing this is possibly a warning many users would see?
edge_dtype = np.dtype(edge_dtype) | ||
if edge_array.dtype != edge_dtype: | ||
edge_array = edge_array.astype(edge_dtype) | ||
# PLC doesn't handle int edge weights right now, so cast int to float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any tests for these new conditions too? It might be nice to see the size limits verified with test code that looks like what a user would try to use for creating a graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No direct tests yet, but iirc this might get exercised by networkx tests. Wouldn't it be nice to have coverage reports that could answer easily this question 😉 ?
In general, we are largely blazing a path ahead, implementing fast, and trying to write good code, and tests are slow to catch up. You're welcome to write some if you'd like ;) . I expect proactive testing will catch up more next year, since maybe we rely on networkx tests too heavily. For now, "best effort" + networkx tests + add regression tests for regressions is my strategy. Convenient coverage reports would be handy to reveal completely untested code.
_IS_NX32_OR_LESS = nx.__version__[:3] <= "3.2" and ( | ||
len(nx.__version__) <= 3 or not nx.__version__[3].isdigit() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaning towards just keeping this in place over adding a dependency.
create_using=None, | ||
edge_key=None, | ||
): | ||
"""cudf.DataFrame inputs also supported.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this since I think this would be good coverage, especially for cudf and "pam"
Ooh, networkx recently added |
/merge |
Also, better handle dtypes for edge values passed to pylibcugraph, which only takes float32 and float64 atm.
I also defined
index_dtype
(currently int32) to globally control the dtype of indices.