-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Close #21810 #22288
Close #21810 #22288
Conversation
Thanks for contributing to Ivy! 😊👏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for looking into this @DevBhuyan 😄
You'll have to rewrite the syntax a bit so that it is in line with general convention. I'll recommend checking out other classes in that file.
Not super sure if the implementation is correct - it should ideally be similar to Pytorch's Transformer module - https://pytorch.org/docs/stable/generated/torch.nn.Transformer.html#torch.nn.Transformer https://github.com/pytorch/pytorch/blob/main/torch/nn/modules/transformer.py
and not creating a model with positional encodings pre-coded like this official Pytorch example making use of Transformer module - https://github.com/pytorch/examples/blob/13009eff7a80ebcf6ae89ed217d5d176bd3e019d/word_language_model/model.py#L107
@vedpatwardhan - do you think the above makes sense or @DevBhuyan's implementation is fine as it is? 🤔
Hi @rishabgit, Thanks for taking the time to review my PR. Yes, you're right, I need to change the syntax to fit with the other classes. My previous code was also just an attempt to write the layer as simply as possible (autoregressive, without a decoder), I guess I'll need to modify it to an encoder-decoder architecture (like that of PyTorch's). I'll work on it and make a commit as soon as I have it ready. Thanks again for the suggestions:) |
Hey @rishabgit, your suggestion makes perfect sense, we should definitely try and align with the |
Hi @vedpatwardhan, I guess I was assuming a totally different direction, a decoder-only transformer. I agree with @vedpatwardhan and @rishabgit, and I'm rewriting it entirely to be in line with Pytorch's implementation and the other classes from Please excuse the unforeseen closing and opening of this Pull Request:) I'm still new to contributing on GitHub, I accidentally tried to remove another branch and this happened. |
Hi @rishabgit, I have updated the Transformer class to fit in line with PyTorch's implementation as well as with the other classes in the Since I had used PyTorch's implementation as a starting point for the class(es), There were portions in PyTorch's implementation that required lower-level backend implementation to speed up processes ('FlashAttention'). I have not totally removed those lines, instead, I commented them out with Kindly let me know if there are any changes you'd suggest. Thank you for your patience:) |
I guess I messed up this PR. I'll create a new fork of the repo and create a fresh PR |
Closes #21810
#21810
From ToDo list: #14945
#14945
Update:
Added transformer layer in ~/ivy/ivy/stateful/layers.py
Added test test_transformer_layer in ~/ivy/ivy_tests/test_ivy/test_stateful/test_layers.py
Added custom composite strategy to generate data (ref. transformer_data())
This is my first PR. I tried to make it as appropriate as possible. Please let me know if there are any modifications that you'd suggest. Thank you!😊