Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for neighbor sampling #210

Merged
merged 6 commits into from
Mar 21, 2022

Conversation

Padarn
Copy link
Contributor

@Padarn Padarn commented Mar 19, 2022

Addresses #208

This fixes the case where repeated nodes are provided to the neighbour_sampling functions. Previously, the "local graph" would essentially create a node for each of these repeats, now repeats are ignored.

Additionally adding tests for this op.

Remaining to do

  • Add heterogeneous graph tests

@Padarn
Copy link
Contributor Author

Padarn commented Mar 19, 2022

I didn't do anything about the idea of creating separate graphs for each input node here: Do you think this is best added as an option in torch_sparse or in torch_geometric

Also I added a question on the issue this references. I thought it would be good to add some extra documentation to this function as its a bit hard to tell what it does, but I wasn't sure how to do this in torch_sparse cpp extensions so that it fits with the rendered documentation.

What do you think?

@codecov-commenter
Copy link

codecov-commenter commented Mar 19, 2022

Codecov Report

Merging #210 (3717cac) into master (b38d0b5) will not change coverage.
The diff coverage is n/a.

❗ Current head 3717cac differs from pull request most recent head cf6ad69. Consider uploading reports for the commit cf6ad69 to get more accurate results

@@           Coverage Diff           @@
##           master     #210   +/-   ##
=======================================
  Coverage   72.32%   72.32%           
=======================================
  Files          28       28           
  Lines        1120     1120           
=======================================
  Hits          810      810           
  Misses        310      310           

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

Copy link
Owner

@rusty1s rusty1s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I am personally a bit worried about the implications of this PR. In the end, this means that we no longer have a 1-to-1 mapping between node indices in input_nodes and the resulting samples vector.

It would be great to make this functionality at least optional.

csrc/cpu/neighbor_sample_cpu.cpp Outdated Show resolved Hide resolved
csrc/cpu/neighbor_sample_cpu.cpp Outdated Show resolved Hide resolved
csrc/cpu/neighbor_sample_cpu.cpp Outdated Show resolved Hide resolved
@Padarn
Copy link
Contributor Author

Padarn commented Mar 19, 2022 via email

@Padarn
Copy link
Contributor Author

Padarn commented Mar 20, 2022

Actually while working on the pytorch_geometric issue I noticed another problem with trying to implement link-based sampling using this function: There is no guarantee that given nodes X and Y, the edge X->Y will be sampled (unless directed is set to False)

I'm thinking the best way to get somewhere quickly would be to implement the link sampling in torch/torch_geometric code (similar to the old one) and then see if we want to move some of the functionality into torch_sparse.

@rusty1s
Copy link
Owner

rusty1s commented Mar 20, 2022

Yeah, let's do this step directly in PyG (which can be solved by a simple torch.unique call I think). With unique(return_inverse=True), you should be able to get the inverse mapping as well.

IMO, the edge X->Y do not need to be sampled. For link-level predictions, we only care of about sampling neighborhoods around X and Y to obtain their embeddings, and then try to predict the link X->Y based on it. This holds true independent of whether X and Y are originally connected or not.

@Padarn
Copy link
Contributor Author

Padarn commented Mar 20, 2022

You're right. It seemed weird the edge not being sampled, but there is indeed no real reason to need this...

Thanks for the discussion and clearing things up for me!

@Padarn Padarn closed this Mar 20, 2022
@rusty1s rusty1s reopened this Mar 21, 2022
@rusty1s rusty1s changed the title fixing neighbour sampling when repeated nodes Add tests for neighbor sampling Mar 21, 2022
@rusty1s
Copy link
Owner

rusty1s commented Mar 21, 2022

@Padarn I re-opened the PR to include the test suite of neighbor_sample. Thanks a lot :)

@rusty1s rusty1s merged commit 723039b into rusty1s:master Mar 21, 2022
@Padarn
Copy link
Contributor Author

Padarn commented Mar 21, 2022

Welcome!

RexYing pushed a commit to RexYing/pytorch_sparse that referenced this pull request Apr 26, 2022
* initial commit:

* code cov

* move progressbar
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants