Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't store redundant columns in PropertyGraph Dataframes #2449

Merged
merged 9 commits into from
Aug 2, 2022

Conversation

eriknw
Copy link
Contributor

@eriknw eriknw commented Jul 27, 2022

The main purpose of this is to reduce memory usage.

Closes #2400

I still need to update MG tests.

I'll also remove the in-code assertions, since they won't always be True, because a column name could have previously been used as a property. Nevertheless, seeing these assertions pass should give us warm-fuzzies :)

Closes rapidsai#2400

I still need to update MG tests.

I'll also remove the in-code assertions, since they won't always be True,
because a column name could have previously been used as a property.
Nevertheless, seeing these assertions pass should give us warm-fuzzies :)
@eriknw eriknw requested a review from a team as a code owner July 27, 2022 04:39
@codecov-commenter
Copy link

codecov-commenter commented Jul 27, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.08@93e15c0). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-22.08    #2449   +/-   ##
===============================================
  Coverage                ?   61.15%           
===============================================
  Files                   ?      106           
  Lines                   ?     5543           
  Branches                ?        0           
===============================================
  Hits                    ?     3390           
  Misses                  ?     2153           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 93e15c0...34f441e. Read the comment docs.

@BradReesWork BradReesWork added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jul 27, 2022
@BradReesWork BradReesWork added this to the 22.08 milestone Jul 27, 2022
@eriknw
Copy link
Contributor Author

eriknw commented Jul 28, 2022

One thing I don't like about the current state of this PR is that it makes the user use our column names for source, dest, etc. in the query string.

What if "_SRC_ == 4" was instead specified as e.g. "{src} == 4" (or f"{{src}} == 4" if the user was using an f-string). Any other syntax or spelling we should consider?

Copy link
Contributor

@rlratzel rlratzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. There might be places in here that will be affected by the outcome of how we want to handle the "internal" column names in output and tests, but no need to change anything now.

Copy link
Member

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have requested changes to drop inpace for cudf columns. LGTM otherwise.

python/cugraph/cugraph/structure/property_graph.py Outdated Show resolved Hide resolved
python/cugraph/cugraph/structure/property_graph.py Outdated Show resolved Hide resolved
eriknw and others added 2 commits July 28, 2022 22:50
Co-authored-by: Vibhu Jawa <[email protected]>
Co-authored-by: Vibhu Jawa <[email protected]>
Copy link
Member

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Erik

@eriknw
Copy link
Contributor Author

eriknw commented Jul 30, 2022

Test failures are now the same as seen in other PRs:

    from cupyx.scipy.sparse.coo import coo_matrix as cp_coo_matrix
E   ModuleNotFoundError: No module named 'cupyx.scipy.sparse.coo'

@BradReesWork
Copy link
Member

@gpucibot merge

@rapids-bot rapids-bot bot merged commit ac42e0b into rapidsai:branch-22.08 Aug 2, 2022
eriknw added a commit to eriknw/cugraph that referenced this pull request Aug 5, 2022
Updating these tests was missed in rapidsai#2449

Also, MG tests are finally working for me 🎉
rapids-bot bot pushed a commit that referenced this pull request Aug 5, 2022
Updating these tests was missed in #2449

Also, MG tests are finally working for me 🎉

Authors:
  - Erik Welch (https://github.com/eriknw)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)

URL: #2511
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Do not store redundant columns in _edge_prop_dataframe and _vertex_prop_dataframe
7 participants