Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added DeepGCNs and DeeperGCN #1403

Merged
merged 15 commits into from
Jul 21, 2020
Merged

Added DeepGCNs and DeeperGCN #1403

merged 15 commits into from
Jul 21, 2020

Conversation

lightaime
Copy link
Contributor

@lightaime lightaime commented Jul 5, 2020

Hi @rusty1s. Following the Issue#847 and Issue#684, I added the implmentation of DeepGCNs, GenMessagePassing, GENConv, MsgNorm, ogbn-proteins example, act layer and norm layer to helper.

Related papers:
DeepGCNs: Can GCNs Go as Deep as CNNs?
DeeperGCN: All You Need to Train Deeper GCNs

Related project:
https://github.com/lightaime/deep_gcns_torch

@rusty1s
Copy link
Member

rusty1s commented Jul 5, 2020

Thank you very much for this wonderful contribution. I will have a closer look ASAP.

@lightaime
Copy link
Contributor Author

Thanks for your quick response! I would like to mention that the class GenMessagePassing(MessagePassing) is implemented in a tricky way by hacking the class MessagePassing to support softmax (SoftMax), softmax_sg (SoftMax with semi gradient) and power (PowerMean) aggregations. It would be nice if you have a better idea to improve it.

@rusty1s
Copy link
Member

rusty1s commented Jul 5, 2020

I don't think the implementation is hacky at all. Overriding aggregate is the way to go here. I'm just not sure whether this needs its own class. We can either integrate this directly into MessagePassing or GENConv.

@lightaime
Copy link
Contributor Author

lightaime commented Jul 5, 2020

Yes, I agree with you. I prefer to integrate it into MessagePassing so that all the Graph Conv layers can use these new aggregation functions as alternatives. But this may make MessagePassing too heavy so I choose to separate them. No sure if it is a good idea.

@lightaime
Copy link
Contributor Author

Hi @rusty1s. Sorry for the messy code. It seems it took you a lot of efforts to reformat the code. Please let me how I can help with the clean-up.

@rusty1s
Copy link
Member

rusty1s commented Jul 20, 2020

No, that's not messy code at all. I am so sorry I haven't merged this yet (I was really busy in the last week), but I'm planning to merge this till tomorrow.

@rusty1s
Copy link
Member

rusty1s commented Jul 20, 2020

Hi @lightaime, based on your code example, I created a more lightweight one, see here. It would be great if you could give me some suggestions on what can be improved, and whether it matches the original architecture. Note that I needed to decrease the number of layers and increase the number of clusters to train it for myself on a 11GB GPU.

@lightaime
Copy link
Contributor Author

lightaime commented Jul 20, 2020

Hi @rusty1s. Thanks for the clean replementation. I checked your code. The main different are the res+ block and the training graph sampler.

For the res+ block, we implemented a preactivate residual connection which means the residual connection is applied before the activation function. See our original implemenation for the details . We find the order of skip connections is very important. By simply adjusting the order of components, we can improve the performance by a considerable margin. It non-trivial to get the order correct. So I implemented this module for every skip connections we have studied in our papers. In your example, only res+ is shown. I think it would be nice to also have examples for our original residual connection and dense connection in PyG.

As for the the training graph sampler, we use the same random partition/cluster scheme as testing. You use a GraphSAINTNodeSampler for training. But I guess this slight difference would not effect that much. I think it should be fine.

@rusty1s
Copy link
Member

rusty1s commented Jul 20, 2020

I see, thanks a lot. I modified the example accordingly. I think, in addition, I will implement a new data loader that follows the one from the DeepGCN paper.

@lightaime
Copy link
Contributor Author

Great! The modified code looks neat and perfect. We appreciate your efforts very much!

@rusty1s
Copy link
Member

rusty1s commented Jul 20, 2020

Great :) Do you can share some insights how long it takes to reach 85% on Proteins?

@lightaime
Copy link
Contributor Author

Do you mean the training time? Indeed it is quite slow to train such a deep model with learnable aggregators. It takes about 24 hours to finish 1000 epochs for training a 112-layer model on a NVIDIA V100 32G. The performance peaks at round 800-1000 epochs. For a 28-layer model, I don't remember how long it takes exactly. I am now traveling but I can check the log files once I settle down.

@rusty1s
Copy link
Member

rusty1s commented Jul 21, 2020

I added a random node sampler to PyG that implements the one used in the DeepGCN paper.

I think the current example is a good introduction to the DeepGCN model. I don't think we need to add huge configs to our examples. If one wants to dive deeper, one can always use the official repository. I hope you're okay with this :)

I'm merging this for now, please feel free to add any optimizations/extensions in a later PR. Thank you very much for this PR :)

@rusty1s rusty1s merged commit 8dfcf8e into pyg-team:master Jul 21, 2020
@lightaime
Copy link
Contributor Author

Sure, your modified implementation is very nice and clean. I like it and learn a lot. Thanks for adding DeepGCN. Cheers!

@lightaime lightaime deleted the features branch May 17, 2022 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants