Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Optimize PG.add_data #2924

Merged
merged 3 commits into from
Nov 15, 2022

Conversation

VibhuJawa
Copy link
Member

@VibhuJawa VibhuJawa commented Nov 15, 2022

This PR fixes #2903 .

We reduce the memory foot print by 3.5x and speeds up the add_data by 557x and also allows us to not be limited in the size of edges we can save. (Time is in seconds vs ms)

Before PR:

Name (time in s, mem in bytes)       Mean                  GPU mem            GPU Leaked mem            Rounds            GPU Rounds          
----------------------------------------------------------------------------------------------------------------------------------------------
bench_add_edge_data[15000000]      2.3044 (1.0)      2,160,000,064 (1.0)                   0 (1.0)           1           1
bench_add_edge_data[30000000]      4.7941 (2.08)     4,320,000,064 (2.00)                  0 (1.0)           1           1
bench_add_edge_data[60000000]      8.7235 (3.79)     8,640,000,064 (4.00)                  0 (1.0)           1           1
bench_add_edge_data[120000000]  FAILED
----------------------------------------------------------------------------------------------------------------------------------------------

After PR

-------------------------------------------------------------- benchmark: 4 tests --------------------------------------------------------------
Name (time in ms, mem in bytes)        Mean                  GPU mem            GPU Leaked mem            Rounds            GPU Rounds          
------------------------------------------------------------------------------------------------------------------------------------------------
bench_add_edge_data[15000000]       16.3785 (1.0)        615,000,080 (1.0)                   0 (1.0)           1           1
bench_add_edge_data[30000000]       17.3631 (1.06)     1,230,000,080 (2.00)                  0 (1.0)           1           1
bench_add_edge_data[60000000]       22.2947 (1.36)     2,460,000,080 (4.00)                  0 (1.0)           1           1
bench_add_edge_data[120000000]      26.9747 (1.65)     4,920,000,080 (8.00)                  0 (1.0)           1           1
------------------------------------------------------------------------------------------------------------------------------------------------

@VibhuJawa VibhuJawa requested a review from a team as a code owner November 15, 2022 02:12
@VibhuJawa VibhuJawa added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Nov 15, 2022
@@ -909,7 +910,8 @@ def add_edge_data(
if self.__series_type is cudf.Series:
# cudf does not yet support initialization with a scalar
tmp_df[TCN] = cudf.Series(
np.repeat(type_name, len(tmp_df)), index=tmp_df.index, dtype=cat_dtype
cudf.Series([type_name], dtype=cat_dtype).repeat(len(tmp_df)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one doing this operation on GPU.

@eriknw
Copy link
Contributor

eriknw commented Nov 15, 2022

Very nice. LGTM.

@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.12@1f346de). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@               Coverage Diff               @@
##             branch-22.12    #2924   +/-   ##
===============================================
  Coverage                ?   60.81%           
===============================================
  Files                   ?      122           
  Lines                   ?     6891           
  Branches                ?        0           
===============================================
  Hits                    ?     4191           
  Misses                  ?     2700           
  Partials                ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@VibhuJawa VibhuJawa added this to the 22.12 milestone Nov 15, 2022
@BradReesWork
Copy link
Member

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 605e2b5 into rapidsai:branch-22.12 Nov 15, 2022
rapids-bot bot pushed a commit that referenced this pull request Mar 28, 2023
This PR depends upon #2924 . 

We add `Graph-Sage` example to claim `dgl` , `cugraph` integration. 

Waiting on SWIPAT before merging

CC: @TristonC

Authors:
  - Vibhu Jawa (https://github.com/VibhuJawa)

Approvers:
  - Xiaoyun Wang (https://github.com/wangxiaoyunNV)
  - Rick Ratzel (https://github.com/rlratzel)
  - Alex Barghi (https://github.com/alexbarghi-nv)
  - Tingyu Wang (https://github.com/tingyu66)

URL: #2925
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: PG creates string series instead of categorical series leading to memory overhead
5 participants