Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postmortem: All plots functionality broken in versions <= 0.2.5 #1520

Closed
mattseddon opened this issue Apr 3, 2022 · 6 comments
Closed

Postmortem: All plots functionality broken in versions <= 0.2.5 #1520

mattseddon opened this issue Apr 3, 2022 · 6 comments
Assignees
Labels
discussion 😇 postmortem Critical issue follow to reflect and improve. Not to blame anyone or point fingers!

Comments

@mattseddon
Copy link
Contributor

High-level summary

Prior to version 0.2.6 of the extension, any users relying on DVC 2.10.0 would have no access to any plots functionality.

Timeline

All times AEDT (GMT+11)

  1. 2022-03-30: DVC 2.10.0 released.
  2. 2022-04-01 11:58: Version 0.2.5 of the extension was released to the marketplace.
  3. 2022-04-01 12:05: @mattseddon began re-testing Use new plots diff data #1394 against 2.10.0
  4. 2022-04-01 12:26: Message sent from @mattseddon to @pared regarding a type change in the plots diff output.
  5. 2022-04-01 12:50: Further testing revealed that 2.10.0 was not backwards compatible with any version of the extension.
  6. 2022-04-01 12:55: Work began on an emergency release.
  7. 2022-04-01 13:46: Message sent from @mattseddon to @shcheklein detailing the issue and next steps were discussed.
  8. 2022-04-01 13:49: @shcheklein agreed to approve any PRs need for the next release.
  9. 2022-04-01 14:00: Version bumped for release.
  10. 2022-04-01 14:30: Patch merged into main.
  11. 2022-04-01 14:40: 0.2.6 was released to the marketplace (after CI for the above was completed).
  12. 2022-04-01 14:45: Manual testing confirmed that the fix reinstated plots functionality.
  13. 2022-04-01 14:45: Message sent to @shcheklein detailing resolution of the incident.
  14. 2022-04-01 19:58: @pared began investigating what happened on the DVC side.
  15. 2022-04-01 20:56: @pared raised PR with a fix that will be available in 2.10.1.
  16. 2022-04-01 21:27: plots: image: fix image to_json structure dvc#7533 merged.

Perf indicators:

  • Time to notice: 2 days.
  • Self-healed after: 2 hrs.
  • Resolved after: 2 days.

Impact

  • Any use of DVC 2.10.0 with any extension version prior to 0.2.6 will result in a complete loss of plots functionality.
  • Any users not on the current version of the extension will continue to experience this issue until such time that they upgrade.
  • It would take effort to have these versions of the extension removed from the marketplace.

Root cause analysis

revisions property of plots diff --json output for images was changed from an array/list to a string.

  1. Data type change detailed in plots: image: fix image to_json structure dvc#7533.
  2. Lack of CLI integration tests in this repository.
  3. Lack of a contract between CLI and extension.
  4. Plots code not resilient to change in data type.

Personal thoughts

This has highlighted the need to automate the testing of the contract between the CLI and extension. It is good that this has happened now as we are only in closed alpha testing. "Dead" releases will be a much bigger issue in the future.

Prevention and next steps

  • Add scheduled (daily) CLI output integration test (run against git+https://github.com/iterative/dvc version of DVC). This will give an early warning of issues with the output and should block DVC from releasing without a fix in place.
  • Add contract tests on the DVC side and ensure data is backwards compatible when making API changes.
  • Mark/decorate output relied on by extension in the DVC codebase, i.e plots diff, exp show, status.
  • Better inter-team communication regarding releases.
  • I am hesitant to pin the extension version to minor release ranges of DVC but this may be required in future.
@mattseddon mattseddon changed the title Postmortem: All plots functionality broken in version 0.2.5 Postmortem: All plots functionality broken in versions <= 0.2.5 Apr 3, 2022
@shcheklein shcheklein added the 😇 postmortem Critical issue follow to reflect and improve. Not to blame anyone or point fingers! label Apr 3, 2022
@shcheklein
Copy link
Member

cc @iterative/dvc for visibility and it would be great if you have any additional thoughts on this.

@mattseddon
Copy link
Contributor Author

#1521 also related.

@pared
Copy link

pared commented Apr 4, 2022

@shcheklein Initial implementation was implemented in a rush. Lack of proper testing caused this and #1521. I should have tested it properly since the beginning. For the future, even if we are implementing something as a initial/temporary feature for integration (which was the case in --json), we should start from the contract and define expected outcomes, and modify the contract as we modify the expectations.

@daavoo
Copy link
Contributor

daavoo commented Apr 4, 2022

This was my fault due to an overlook in iterative/dvc#7409 , apologies 🙏

Thoughts:

would have no access to any plots functionality

If the issue was with the output schema for image plots, how did it break any plots functionality?

Add scheduled (daily) CLI output integration test (run against git+https://github.com/iterative/dvc version of DVC).

👍

Add contract tests on the DVC side and ensure data is backwards compatible when making API changes.

There are tests but the issue was that I refactored all related tests as part of the extraction of dvc-render.
We could have a more explicit contract by having a JSON schema definition of expected output.

Mark/decorate output relied on by extension in the DVC codebase, i.e plots diff, exp show, status.

A side note about this is that, in the case of plots, the JSON output was created ad-hoc for VSCode. However, in the case of experiments and status the output is the internal DVC output (returned by Repo API). See iterative/dvc#7409 (comment)

pared added a commit to pared/dvc-render that referenced this issue Apr 4, 2022
pared added a commit to pared/dvc-render that referenced this issue Apr 4, 2022
pared added a commit to pared/dvc-render that referenced this issue Apr 4, 2022
pared added a commit to pared/dvc-render that referenced this issue Apr 4, 2022
@mattseddon
Copy link
Contributor Author

If the issue was with the output schema for image plots, how did it break any plots functionality?

In the rush to get a fix out I have actually made a mistake. The revisions field being broken only stopped images for loading with 2.10.0 and the new collection algorithm.

What killed the entire plots functionality for all previous versions of the extension is as follows:

We post-process the JSON into our internal data structure. Images and vega plots are processed in parallel with Promise.all, if a single promise fails then all are rejected. In vega collection, we previously relied on plot.content.data.values to provide the datapoints. A snippet of our code looked like this:

    for (const value of plot.content.data
        .values) {
        console.error(JSON.stringify(value))
        acc[value.rev][path].push(value);
    }

The rev field had been dropped from the provided dicts so we were now being provided with a list of dicts that looks like this:

[{
	"acc": "0.098",
	"filename": "logs/scalars/acc.tsv",
	"step": "0",
	"timestamp": "1646083445188"
}, 
... 
{
	"acc": "0.5076",
	"filename": "logs/scalars/acc.tsv",
	"step": "8",
	"timestamp": "1646083552005"
}]

value.rev being undefined means that we are trying to put data into the accumulator under undefined. That is not possible and fails with TypeError: Cannot read properties of undefined (reading 'logs[/scalars/acc.tsv]()'). Once that error was thrown the plots section of the code stopped running and we no longer had access to that functionality.

LMK if you need any further details on this. Sorry for the initial confusion.

@mattseddon
Copy link
Contributor Author

Closing this now, thanks to everyone.

pared added a commit to pared/dvc-render that referenced this issue Apr 25, 2022
pared added a commit to pared/dvc-render that referenced this issue Apr 26, 2022
daavoo pushed a commit to iterative/dvc-render that referenced this issue Apr 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion 😇 postmortem Critical issue follow to reflect and improve. Not to blame anyone or point fingers!
Projects
None yet
Development

No branches or pull requests

4 participants