-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vector properties #2882
Add vector properties #2882
Conversation
@eriknw the code looks good, but I do not see any test associated with the changes |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## branch-22.12 #2882 +/- ##
===============================================
Coverage ? 60.35%
===============================================
Files ? 122
Lines ? 7158
Branches ? 0
===============================================
Hits ? 4320
Misses ? 2838
Partials ? 0 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
@eriknw here are my responses to your questions:
|
Very helpful @alexbarghi-nv, thank you. My replies:
Perfect. This is as implemented.
I added a
K. I'm happy punting on this too and to give users the native object.
"Expand" to return a dataframe instead of an array, which can kind of match how data comes into PropertyGraph. I don't think this is necessary, since it's easy to go from an array to a dataframe.
Sounds good, but I don't want it on by default. Users should specify they want a vector property. I think I want to support this by adding another keyword We could squeeze this functionality into Edit: upon further thought, I think the most sense for
Agreed. |
Alright, I think the API and functionality is complete and includes feedback to:
I would like to add a couple more tests to wrap this up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small change, looks good otherwise
try: | ||
fill = list(fillvalue) | ||
except Exception: | ||
fill = [fillvalue] * length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not use list here but numpy or cupy/array library here please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to use list here; it's the only way I could get it to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, I think that's fine for now, we can revisit this when we start perf tuning if this becomes a problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks for working on this Erik , I expect some speed up with this PR on the below workflow. #2925
rerun tests |
rerun tests |
@gpucibot merge |
WIP.Thisprobablycloses #2721.This uses cudf List dtypes to store vectors. When converting a vector column to arrow, the data appears to be on host, so it's unclear how many copies and moves of the data we're doing, but I don't think we have many easy alternatives besides relying on what cudf gives us. In pandas, vector properties are object dtype and stored as numpy arrays.
I think it makes sense for a vector property to be required to be the same length (i.e., if it's added multiple times).
We may want to add a method to convert a vector property to a numpy or cupy array.
When getting data, should we allow vector properties to be expanded?
Can we create a graph with vector property data?
Should we add a keyword argument to
add_vertex_data
to say "use all (or the rest) of columns as a vector property of this name"?Should we allow vector properties to come in already as cupy List dtype?