-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lightning checkpoint #40
Conversation
that implements the CheckpointIO interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Congrats on first contribution to dataflux-pytorch!
...g/Getting Started with Dataflux PyTorch Lightning Checkpoint with Google Cloud Storage.ipynb
Show resolved
Hide resolved
BTW the Kokoro test is failing, we will need to fix the test failure there. |
...g/Getting Started with Dataflux PyTorch Lightning Checkpoint with Google Cloud Storage.ipynb
Outdated
Show resolved
Hide resolved
...g/Getting Started with Dataflux PyTorch Lightning Checkpoint with Google Cloud Storage.ipynb
Show resolved
Hide resolved
...g/Getting Started with Dataflux PyTorch Lightning Checkpoint with Google Cloud Storage.ipynb
Show resolved
Hide resolved
) | ||
self.bucket = self.storage_client.bucket(self.bucket_name) | ||
|
||
def _parse_gcs_path(self, path: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the client is initialized with a project name and bucket name, why is this necessary? Could we just have the user provide a path which is relative to the bucket rather than starting with "gcs:///"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was the original implementation but after I wrote the demo/example file I saw that for automated checkpointing this needed to ingest a full path. The full path is what gets passed through to the trainer through the ModelCheckpoint object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we update the demo to not pass in the gcs://<bucket>/
part of the path? Alternatively, could we remove the bucket from the constructor in this file? It seems duplicative as-is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it sounds like we are taking on this feature essentially, which I think it's great to have to make the user experience similar with gcsfs, etc. I'd also agree on just taking away buckets from the constructor. We'd also need work to support this for the two datasets though but definitely for different PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any code changes here, am I missing them? @divrawal can you follow up on it?
…y one object concerning, and for save_checkpoint modified so it utilizes a context manager
Lightning Checkpoint support in dataflux for PyTorch.
Includes
CheckpointIO interface implementation
Unit tests
Example file
Notebook
README change.
Tests pass
Appropriate changes to documentation are included in the PR