-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CustomEngine parameter #1306
Conversation
I'm not sure this is the best way to do this. In general, I don't like deriving from classes and overwriting some parts of it too much. It's hard to follow such code, esp if the base class is already complex. But also, it will easily break, as the API between user and RETURNN is not well defined. Basically the current Btw, actually as part of the discussion in #1120, it was often said that if the user wants to implement a very custom training loop or do other very custom scripts, we recommend that the user just writes an own custom script for that. The user would directly call the custom script. The custom script would use from RETURNN whatever is needed. RETURNN should be modular, such that you can take individual pieces. And if some part of RETURNN is maybe difficult to use this way, we should work on that and improve it. Most other frameworks work this way (Keras, PyTorch itself, PyTorch Lightning, whatever) and provide mostly the same functionality as RETURNN, so I think it is definitely possible to design things this way. PyTorch Lightning is maybe one example which is a bit similar to deriving |
Hmm okay, so then I guess it is best if I work with a local version for now.
I am still in favor of versioning and linking experiments to a version. So far we did not manage to avoid people not updating RETURNN, and I am not so sure if we can change that in the future.
This sounds like I should rather consider using e.g. PyTorch Lightning instead of RETURNN and just import the datasets from RETURNN? I do not think this is the right way. But still, we should keep track of what benefit RETURNN actually provides us. Anyway, this again goes away from this PR. So I guess I close this for now and figure something out myself. |
You can still do that, but still, we should try to make the transition to the newest RETURNN version easy for people, not create any friction there, otherwise people can not really share common code, if it would require some custom set of versions. That means, if possible, we should define APIs in a way that we can keep compatibility easily. I.e., having APIs in a way where it is likely that they would break in the future is a bad idea.
No, I meant that if we want to have an API in RETURNN which allows that the user can define custom training loops, or customize other things, we can look at frameworks like PyTorch Lightning, as inspiration on how to design such API. I just say that we should have a well defined API for this, and just using the
What I wrote before though is another alternative, that the user can really write its own custom script, and use parts of RETURNN in a modular way, e.g. using LR scheduling, model checkpoint cleanup logic, model loading/restoring logic, dataset loaders, but otherwise write an own custom training loop. We should try to design things in RETURNN in a way that this is also easy to do. I think that would anyway improve the internal structure of RETURNN, if things are more modular and decoupled. It's then up to the user, whether the user writes a custom script with custom training loop, or still uses RETURNN and uses such well defined API to define a custom training loop but otherwise keep everything from RETURNN. I don't really know what way is better. I think both ways are maybe valid in certain situations. I think we should make both ways possible.
Yes, we always should think about this. When you think about writing an own custom script, RETURNN ideally can provide all the building blocks which help you, like LR scheduling, model checkpoint cleanup logic, model loading/restoring logic, dataset loaders, whatever else, so you can easily put things together just using a few lines of code. PyTorch Lightning, just like many other frameworks, have a function When you actually prefer to use the RETURNN entry point and using a config to define things, the question still is valid, i.e. what benefit would RETURNN provide then, what parts should it do to make things easier. So, coming more back to the topic here: When you want to use the RETURNN entry point, and want a way to define a custom training loop, we should think about a well-defined API for this. We can use PyTorch Lightning as inspiration for such API. E.g. in the GAN example, or in Own your loop (advanced) the training loop is still unmodified, but they customize the train step, by overwriting def custom_training_step(*, model: Any, extern_data: TensorDict, training_state: Any):
... But the difference to This is just a very initial suggestion. Maybe you have a better idea. But many other people seem to like the way how you can customize things in PT Lightning, so if we want to do things differently, we should have good reasons. Or if we do not want to think too much about it, we could just follow PT Lightning more or less. |
On the first glance, I do not really see how this is different to providing a custom engine. This would also only be: from returnn.torch.engine import Engine
CustomEngine(Engine):
def train_step(....)
# custom train step stuff
What parts can be modified is then of course an Engine-API thing, but I do not see how the logic of custom hooks is really different to a custom engine. |
The difference is the API. In one case, you really only allow to define In the other case, the user can very easily touch anything from the In terms of functionality for the user, there is no real difference. But in terms maintainability, and future compatibility, this is a really big difference. I would prefer to have a well-defined API. |
Allows to define a custom engine in the config that e.g. derives from the existing ones but overrides some functions.
This would be helpful to define very custom training functions and gradient handling, e.g. turn based generator/discriminator settings or direct experimenting with mixed-precision training.