Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forward-merge branch-24.12 to branch-25.02 #4785

Merged
merged 8 commits into from
Nov 26, 2024

Conversation

bdice
Copy link
Contributor

@bdice bdice commented Nov 26, 2024

Manual forward merge from 24.12 to 25.02. This PR should not be squashed.

Closes #4782.

msarahan and others added 8 commits November 22, 2024 22:36
Enables telemetry during cugraph's build process. This parses github job metadata to obtain timing information. It should have very little impact on overall build time, and should not interfere with any build tools.

This implements emitting OpenTelemetry traces and spans, as described in rapidsai/build-infra#139

Authors:
  - Mike Sarahan (https://github.com/msarahan)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#4740
Increase max_iterations in MG HITS tests (with edge masking).

With masking, we may end up with different graphs with different numbers of GPUs; this results in higher iteration counts for convergence for certain GPU counts. Increase the maximum iteration count to consider this.

Authors:
  - Seunghwa Kang (https://github.com/seunghwak)

Approvers:
  - Chuck Hastings (https://github.com/ChuckHastings)

URL: rapidsai#4783
This PR includes multiple updates to cut peak memory usage in graph creation and improve performance of BFS on scale-free graphs.

* Add a bitmap for non-zero local degree vertices in the hypersparse region; this information can be used to quickly filter out locally zero degree vertices which don't need to be processed in multiple instances.
* Store (global-)degree offsets for vertices in the hypersparse region; this information can used to quickly identify the vertices with a certain global degree (e.g. for global degree 1 vertices, we can skip inter-GPU reduction as we know each vertex has only one neighbor).
* Skip kernel invocations in computing edge counts if the vertex list is empty.
* Add asynchronous functions to compute edge counts. This helps in preventing unnecessary serialization when we can process multiple such functions concurrently.
* Replace rmm::exec_policy with rmm::exec_policy_nosync in multiple places; the former enforces stream synchronization at the end. The latter does not.
* Enforce cache line alignment in NCCL communication in multiple places (NCCL communication performance is significantly affected by cache line alignment, often leading to 30-40% or more differences).
* For primitives working on a subset of vertices, broadcast a vertex list using a bitmap if the vertex frontier size is large. If the vertex frontier size is small (in case vertex_t is 8B and the local vertex partition range can fit into 4B), use vertex offsets instead of vertices to cut communication volume.
* Merge multiple host scalar communication function calls to a single one.
* Increase multi-stream concurrency in detail::extract_transform_e & detail::per_v_transform_reduce_e
* Multiple optimizations in template specialization (for update_major == true && reduce_op == any && key type is vertex && working on a subset of vertices) in detail::per_v_transform_reduce_e (this includes pre-processing vertices with non-zero local degrees; so we don't need to process such vertices using multiple GPUs, pre-filtering of zero local degree vertices, allreduce communication to reduce shuffle communication volumes, and special treatment of global degree 1 vertices, and so on).
* Multiple optimizations & specializations in detail::fill_edge_minor_property that works on a subset of vertices (this includes kernel fusion, specialization for bitmap properties including direct broadcast to the property buffer and special treatments for vertex partition boundaries, and so on).
* Added multiple optimizations & specializations in transform_reduce_v_frontier_outgoing_e (especially for reduce_op::any and to cut communication volumes and to filter out (key, value) pairs that won't contribute to the final results).
* Multiple low-level optimizations in direction optimizing BFS (including approximations in determining between bottom -up and top-down).
* Multiple optimizations to cut peak memory usage in graph creation.

Authors:
  - Seunghwa Kang (https://github.com/seunghwak)

Approvers:
  - Chuck Hastings (https://github.com/ChuckHastings)

URL: rapidsai#4751
cugraph is no longer dependent on cugraph_ops and no longer pulls files from the cugrpahops repo. But we have two `assert` statements still assuming cugrpahops files are available. These assert statements are compiled only in the debug mode. This PR fixes build errors due to these assert statements in the debug build.

Closes rapidsai#4763

Authors:
  - Seunghwa Kang (https://github.com/seunghwak)
  - Chuck Hastings (https://github.com/ChuckHastings)

Approvers:
  - Chuck Hastings (https://github.com/ChuckHastings)
  - Joseph Nke (https://github.com/jnke2016)

URL: rapidsai#4774
…e`, minor cleanup (rapidsai#4776)

* updates READMEs to remove outdated nx-cugraph text
* updates `core_number` docs, APIs, tests to properly ignore `degree_type` due to `core_number` not supporting directed graphs which `degree_type` is intended for - `degree_type` settings will be honored when directed graphs are supported.
* renames test helper function for clarity
* fixes issue with datasets API to properly recreate the edgelist for MG (dask) if previously created for SG.

Authors:
  - Rick Ratzel (https://github.com/rlratzel)

Approvers:
  - Don Acosta (https://github.com/acostadon)
  - Alex Barghi (https://github.com/alexbarghi-nv)
  - Brad Rees (https://github.com/BradReesWork)

URL: rapidsai#4776
I noticed the `Traversals` table is not showing up in the [nx-cugraph docs](https://docs.rapids.ai/api/cugraph/stable/nx_cugraph/supported-algorithms/). `sphinx-lint` catches the underlying issue, and it also caught a couple other minor issues that this PR fixes.

`sphinx-lint` is used by other notable repos such as `pandas` ([here](https://github.com/pandas-dev/pandas/blob/7fe270c8e7656c0c187260677b3b16a17a1281dc/.pre-commit-config.yaml#L92-L96)] and CPython ([here](https://github.com/python/cpython/blob/8fe1926164932f868e6e907ad72a74c2f2372b07/.pre-commit-config.yaml#L68-L73))

Authors:
  - Erik Welch (https://github.com/eriknw)

Approvers:
  - Jake Awe (https://github.com/AyodeAwe)
  - Ralph Liu (https://github.com/nv-rliu)
  - Rick Ratzel (https://github.com/rlratzel)

URL: rapidsai#4771
@bdice bdice requested review from a team as code owners November 26, 2024 00:13
@bdice bdice added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Nov 26, 2024
@bdice bdice requested a review from KyleFromNVIDIA November 26, 2024 00:13
@raydouglass raydouglass merged commit c5d3d23 into rapidsai:branch-25.02 Nov 26, 2024
62 of 73 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cuGraph improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants