Add tests for neighbor sampling #210

Padarn · 2022-03-19T06:35:31Z

Addresses #208

This fixes the case where repeated nodes are provided to the neighbour_sampling functions. Previously, the "local graph" would essentially create a node for each of these repeats, now repeats are ignored.

Additionally adding tests for this op.

Remaining to do

Add heterogeneous graph tests

Padarn · 2022-03-19T06:39:04Z

I didn't do anything about the idea of creating separate graphs for each input node here: Do you think this is best added as an option in torch_sparse or in torch_geometric

Also I added a question on the issue this references. I thought it would be good to add some extra documentation to this function as its a bit hard to tell what it does, but I wasn't sure how to do this in torch_sparse cpp extensions so that it fits with the rendered documentation.

What do you think?

codecov-commenter · 2022-03-19T06:45:24Z

Codecov Report

Merging #210 (3717cac) into master (b38d0b5) will not change coverage.
The diff coverage is n/a.

❗ Current head 3717cac differs from pull request most recent head cf6ad69. Consider uploading reports for the commit cf6ad69 to get more accurate results

@@           Coverage Diff           @@
##           master     #210   +/-   ##
=======================================
  Coverage   72.32%   72.32%           
=======================================
  Files          28       28           
  Lines        1120     1120           
=======================================
  Hits          810      810           
  Misses        310      310

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

rusty1s

Thank you. I am personally a bit worried about the implications of this PR. In the end, this means that we no longer have a 1-to-1 mapping between node indices in input_nodes and the resulting samples vector.

It would be great to make this functionality at least optional.

csrc/cpu/neighbor_sample_cpu.cpp

Padarn · 2022-03-19T20:45:17Z

Yeah, understand ... It seems a bit dangerous to assume that this is how it is being used, even if I think the behavior is unclear currently. I don't think its nice to add yet another input argument to the registered pytorch op, so I would propose either this is optional in the `pytorch_geometric` change (i.e do deduplication there), or I register another op that works this way (the functions in `neighbour_sample_cpu` could be reused. Personally, although it may be less efficient, I'd be in favor of just doing this all in `pytorch_geometric`. Reasoning: - It's perhaps slightly more expensive to dedupe separately, but maybe not too bad and given its new functionality there will be no impact on existing users. - It'd be better to keep this library smaller/cleaner until it is clear this is functionality that is needed in more than one place. Thoughts? Happy to abandon this if we're aligned.

Padarn · 2022-03-20T06:23:07Z

Actually while working on the pytorch_geometric issue I noticed another problem with trying to implement link-based sampling using this function: There is no guarantee that given nodes X and Y, the edge X->Y will be sampled (unless directed is set to False)

I'm thinking the best way to get somewhere quickly would be to implement the link sampling in torch/torch_geometric code (similar to the old one) and then see if we want to move some of the functionality into torch_sparse.

rusty1s · 2022-03-20T10:37:12Z

Yeah, let's do this step directly in PyG (which can be solved by a simple torch.unique call I think). With unique(return_inverse=True), you should be able to get the inverse mapping as well.

IMO, the edge X->Y do not need to be sampled. For link-level predictions, we only care of about sampling neighborhoods around X and Y to obtain their embeddings, and then try to predict the link X->Y based on it. This holds true independent of whether X and Y are originally connected or not.

Padarn · 2022-03-20T12:58:01Z

You're right. It seemed weird the edge not being sampled, but there is indeed no real reason to need this...

Thanks for the discussion and clearing things up for me!

rusty1s · 2022-03-21T07:14:55Z

@Padarn I re-opened the PR to include the test suite of neighbor_sample. Thanks a lot :)

Padarn · 2022-03-21T07:36:54Z

Welcome!

* initial commit: * code cov * move progressbar

Padarn added 2 commits March 19, 2022 06:28

fixing neighbour sampling when repeated nodes

092719a

fix problem with indexing

3717cac

rusty1s reviewed Mar 19, 2022

View reviewed changes

csrc/cpu/neighbor_sample_cpu.cpp Outdated Show resolved Hide resolved

csrc/cpu/neighbor_sample_cpu.cpp Outdated Show resolved Hide resolved

csrc/cpu/neighbor_sample_cpu.cpp Outdated Show resolved Hide resolved

Padarn closed this Mar 20, 2022

rusty1s reopened this Mar 21, 2022

rusty1s added 3 commits March 21, 2022 06:49

reset

672a66c

reset

f824f1f

reset

2c33b5c

rusty1s changed the title ~~fixing neighbour sampling when repeated nodes~~ Add tests for neighbor sampling Mar 21, 2022

add tests

cf6ad69

rusty1s merged commit 723039b into rusty1s:master Mar 21, 2022

RexYing pushed a commit to RexYing/pytorch_sparse that referenced this pull request Apr 26, 2022

Clean up pytorch-lightning logger (rusty1s#210)

12a931e

* initial commit: * code cov * move progressbar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tests for neighbor sampling #210

Add tests for neighbor sampling #210

Padarn commented Mar 19, 2022 •

edited

Loading

Padarn commented Mar 19, 2022

codecov-commenter commented Mar 19, 2022 •

edited

Loading

rusty1s left a comment •

edited

Loading

Padarn commented Mar 19, 2022 via email •

edited

Loading

Padarn commented Mar 20, 2022

rusty1s commented Mar 20, 2022

Padarn commented Mar 20, 2022

rusty1s commented Mar 21, 2022

Padarn commented Mar 21, 2022

Add tests for neighbor sampling #210

Add tests for neighbor sampling #210

Conversation

Padarn commented Mar 19, 2022 • edited Loading

Padarn commented Mar 19, 2022

codecov-commenter commented Mar 19, 2022 • edited Loading

Codecov Report

rusty1s left a comment • edited Loading

Choose a reason for hiding this comment

Padarn commented Mar 19, 2022 via email • edited Loading

Padarn commented Mar 20, 2022

rusty1s commented Mar 20, 2022

Padarn commented Mar 20, 2022

rusty1s commented Mar 21, 2022

Padarn commented Mar 21, 2022

Padarn commented Mar 19, 2022 •

edited

Loading

codecov-commenter commented Mar 19, 2022 •

edited

Loading

rusty1s left a comment •

edited

Loading

Padarn commented Mar 19, 2022 via email •

edited

Loading