Should MatplotilbWriter multiple plot functionality be removed in favour of PartitionedDataSet? #529

antonymilne · 2022-06-23T08:33:07Z

MatplotlibWriter currently supports 3 different save modes:

save a single plt.figure to a png file
save List[plt.figure] to multiple png files (labelled 0.png, 1.png, etc.)
save Dict[str, plt.figure] to multiple png files (labelled by dictionary keys)

There's a recently-added overwrite option associated with the latter two modes (kedro-org/kedro#868). This also exists for PartitionedDataSet.

The current behaviour has some problems:

it's very weird because it's the only dataset that has multiple save modes possible
(less important because this will still need to be solved on kedro-viz even if we change how it works...) it complicates some things in kedro-viz (#1626 Show Matplotlib dataset pngs in the metadata panel kedro-viz#783)

On the other hand, the ability to save multiple plots rather than define one dataset per plot is essential. I have used it myself many times and seen it used a lot.

So, my question is: should we replace the matplotlib save modes that do multiple plots with instead wrapping MatplotlibWriter in PartionedDataSet? Leaving aside how we do this technically for the moment, would this be a good change to make? i.e. will this be a user-friendly solution here? Will it allow everything we need to allow in terms of functionality?

My suspicion is that the only reason we don't already use PartionedDataSet for this is historical (MatplotlibWriter was added to contrib at the same time PartionedDataSet was added to core).

Tagging @Galileo-Galilei who I suspect will just have the answers here 😀

The text was updated successfully, but these errors were encountered:

deepyaman · 2022-07-06T11:58:42Z

One other (likely unnecessary) discrepancy is that PartitionedDataSet doesn't currently support versioning--either of the overarching or underlying dataset. MatplotlibWriter does for the overarching dataset. Perhaps relevant discussions, although more focused on the underlying dataset: kedro-org/kedro#521.

antonymilne · 2022-07-06T15:13:49Z

@deepyaman thanks, that is a very pertinent point given that experiment tracking is one of the main motivations here, and that directly relies on versioned datasets to work... So if we were to move to PartitionedDataSet for MatplotlibWriter then we should try and get kedro-org/kedro#521 done.

antonymilne mentioned this issue Jun 23, 2022

Ability to link plots to an experiment kedro-org/kedro#1626

Closed

antonymilne mentioned this issue Apr 18, 2023

Extend 'preview' functionality to PartitionedDatasets such as Plotly, Pandas Tables (CSV, Excel) kedro-org/kedro-viz#1319

Closed

astrojuanlu mentioned this issue Apr 25, 2023

Clarify documentation for matplotlib datasets kedro-org/kedro#2536

Open

Galileo-Galilei mentioned this issue Sep 28, 2023

kedro-datasets: Rename MatplotlibWriter to MatplotlibDataset #353

Open

merelcht transferred this issue from kedro-org/kedro Jan 30, 2024

merelcht added this to the Individual dataset improvements milestone Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should MatplotilbWriter multiple plot functionality be removed in favour of PartitionedDataSet? #529

Should MatplotilbWriter multiple plot functionality be removed in favour of PartitionedDataSet? #529

antonymilne commented Jun 23, 2022 •

edited

Loading

deepyaman commented Jul 6, 2022

antonymilne commented Jul 6, 2022

Should MatplotilbWriter multiple plot functionality be removed in favour of PartitionedDataSet? #529

Should MatplotilbWriter multiple plot functionality be removed in favour of PartitionedDataSet? #529

Comments

antonymilne commented Jun 23, 2022 • edited Loading

deepyaman commented Jul 6, 2022

antonymilne commented Jul 6, 2022

antonymilne commented Jun 23, 2022 •

edited

Loading