Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should MatplotilbWriter multiple plot functionality be removed in favour of PartitionedDataSet? #529

Open
antonymilne opened this issue Jun 23, 2022 · 2 comments

Comments

@antonymilne
Copy link
Contributor

antonymilne commented Jun 23, 2022

MatplotlibWriter currently supports 3 different save modes:

  • save a single plt.figure to a png file
  • save List[plt.figure] to multiple png files (labelled 0.png, 1.png, etc.)
  • save Dict[str, plt.figure] to multiple png files (labelled by dictionary keys)

There's a recently-added overwrite option associated with the latter two modes (kedro-org/kedro#868). This also exists for PartitionedDataSet.

The current behaviour has some problems:

On the other hand, the ability to save multiple plots rather than define one dataset per plot is essential. I have used it myself many times and seen it used a lot.

So, my question is: should we replace the matplotlib save modes that do multiple plots with instead wrapping MatplotlibWriter in PartionedDataSet? Leaving aside how we do this technically for the moment, would this be a good change to make? i.e. will this be a user-friendly solution here? Will it allow everything we need to allow in terms of functionality?

My suspicion is that the only reason we don't already use PartionedDataSet for this is historical (MatplotlibWriter was added to contrib at the same time PartionedDataSet was added to core).

Tagging @Galileo-Galilei who I suspect will just have the answers here 😀

@deepyaman
Copy link
Member

One other (likely unnecessary) discrepancy is that PartitionedDataSet doesn't currently support versioning--either of the overarching or underlying dataset. MatplotlibWriter does for the overarching dataset. Perhaps relevant discussions, although more focused on the underlying dataset: kedro-org/kedro#521.

@antonymilne
Copy link
Contributor Author

@deepyaman thanks, that is a very pertinent point given that experiment tracking is one of the main motivations here, and that directly relies on versioned datasets to work... So if we were to move to PartitionedDataSet for MatplotlibWriter then we should try and get kedro-org/kedro#521 done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants