Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

(Hybrid)Block export filenames return or override #17688

Closed
athewsey opened this issue Feb 26, 2020 · 3 comments
Closed

(Hybrid)Block export filenames return or override #17688

athewsey opened this issue Feb 26, 2020 · 3 comments

Comments

@athewsey
Copy link
Contributor

Description

The HybridBlock.export() function currently saves two files: [path]-symbol.json describing structure and [path]-####.params for parameters.

The parameters filename contains the epoch number, zero-padded to minimum length 4 (but, given the implementation, will be longer if the number exceeds 9999).

Because the calling code doesn't have control over this naming, any program operating in potentially non-empty directories would need to re-implement the logic of the API in order to keep track of what files the program had output: Making user code fragile to API changes.

It would be better if the API either supported override of the filenames, included them in the return value (maybe just a tuple?), or both. Seems like none of this would be a particularly big code change.

References

mxnet.recordio.MXIndexedRecordIO allows specification of both the record and index file names, even though complementary tools like gluoncv.data.RecordFileDetection require them to be named specifically.

In the latter's case, the two files *.idx and *.rec differ only by extension which is trivial. Introducing string number formatting logic is less so.

Use Case Example

As a training script author / MLOps engineer
I want to track the exact locations of best and interval-checkpoint model exports made by my script
So that I can tidy up superseded "best" parameters (keeping only current best and interval-driven checkpoints), without deleting any other jobs' outputs, no matter how disorganized my data scientist users are

@leezu
Copy link
Contributor

leezu commented Feb 26, 2020

@athewsey would you like to contribute a PR?

@athewsey
Copy link
Contributor Author

athewsey commented Mar 2, 2020

Happy to try @leezu, but what other language bindings besides Python would the PR need to update at the same time? As far as I can tell only Perl seems to have an equivalent implementation right?

@leezu
Copy link
Contributor

leezu commented Mar 2, 2020

@sergeykolychev can clarify about Perl.

In general, it should be fine to target Python first, as this change won't lead to an incompatibility between language bindings (at least "include filenames in the return value"). If you're familiar with Perl, you can include the corresponding changes.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants