Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notebooks are sometimes corrupted (made unloadable in Jupyter or Jupyter Lab) #2487

Closed
DonJayamanne opened this issue Jan 3, 2020 · 4 comments
Assignees

Comments

@DonJayamanne
Copy link
Contributor

We might need to create a notebook builder to ensure we never corrupt the notebook file.
This would be responsible for editing/creating notebook JSON (e.g. create a new cell that returns the JSON or inserts a new cell into the specific spot, edits a cell by appending the output...).

By corrupt i mean, our code creates/modifes an ipynb file such that:

  • It cannot be opened or displayed correctly in jupyter notebooks or jupyter labs.
  • It doesn't display displays correctly in jupyter notebooks or jupyter labs.

We seem to do this every once in a while (incorrectly formatted html/markdown, or the like).
Issues I've seen:

  • execution count not stored correctly in ipynb
  • latex not formatted/escaped correctly
  • multiline strings not formatted correctly
  • html strings not escapted/formatted correctly
  • error information (output) not stored correctly.
  • Other issues with other attributes missing

See https://nbformat.readthedocs.io/en/latest/api.html#nbformat.validate

Suggestions:

  • We could run nbformat.validate in the background and see if we break the notebook at the user end (capture as telemetry and see how bad it is)
    • We could always try to ship our own copy of nbformat (little benefit)
  • Create our own validator in nodejs (so we can run this everytime, however, the problem still remains - when do we run this? If on user machines, its too late with the only benefit of telemetry)
  • Ensure the type definitions are used appropriately and JSON is validated accordingly, instead of casting to any and the re-casting to hide compiler warnings.
  • Personally I think we need a notebook builder/writer that'll encapsulate the logic of creating/editing cells (something to wrap up a model as described here https://github.com/microsoft/vscode-python/issues/9255#issue-541253579)
    • This will ensure we create cells in one place with the right structure.
    • E.g. when we need to create a markdown cell, the attributes have been initialized correctly (i.e. create JSON)
    • Similarly when updating cells. This way the logic is in one place.
    • Right now editing is done in different parts of the code, which I don't think is correct (specially if there's a schema that needs to be adhered to).
  • Others

@rchiodo @IanMatthewHuff @DavidKutu Thoughts

@greazer greazer changed the title Notebooks are corrupted by VS Code Notebooks are sometimes corrupted (made unloadable in Jupyter or Jupyter Lab) Jan 9, 2020
@DonJayamanne DonJayamanne self-assigned this Jan 9, 2020
DonJayamanne referenced this issue in microsoft/vscode-python Jan 14, 2020
For #9386
* Use notebook cell factory for cell manipulation
* Add news entry
* Fix formatting
* Oopsy
* Fix linter issues
* Code review comments
* Fix failing tests
* Fixes to auto save
* Fix linter issues
@IanMatthewHuff IanMatthewHuff self-assigned this Jan 17, 2020
@IanMatthewHuff
Copy link
Member

I'm validating using microsoft/vscode-python#9385 and microsoft/vscode-python#8772 since those have specifics and this is general.

@IanMatthewHuff
Copy link
Member

Validate

@lock lock bot locked as resolved and limited conversation to collaborators Jan 30, 2020
@microsoft microsoft unlocked this conversation Nov 13, 2020
@DonJayamanne DonJayamanne transferred this issue from microsoft/vscode-python Nov 13, 2020
@JamesDConley
Copy link

JamesDConley commented Jan 18, 2021

I'm still having issues with this. I had a weird import error on pandas (referenced DLL and numpy), previously I was able to fix it by exiting the notebook and reloading it. When I exited out of it I accepted the prompt to save changes, and then I could no longer open the file. After I deleted the JSON for the first cell that contained the import and error I was able to open it again.

Edit - I can provide both of the files for comparison if they are of interest. Here is the cell that broke it
{ "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "output_type": "error", "ename": "ImportError", "evalue": "Unable to import required dependencies:\nnumpy: \n\nIMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!\n\nImporting the numpy C-extensions failed. This error can happen for\nmany reasons, often due to issues with your setup or how NumPy was\ninstalled.\n\nWe have compiled some common reasons and troubleshooting tips at:\n\n https://numpy.org/devdocs/user/troubleshooting-importerror.html\n\nPlease note and check the following:\n\n * The Python version is: Python3.7 from \"C:\\Users\\James\\anaconda3\\python.exe\"\n * The NumPy version is: \"1.19.1\"\n\nand make sure that they are the versions you expect.\nPlease carefully study the documentation linked above for further help.\n\nOriginal error was: DLL load failed: The specified module could not be found.\n", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mImportError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m<ipython-input-3-ffcf3b0a3ae9>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[1;32mimport\u001b[0m \u001b[0mpandas\u001b[0m \u001b[1;32mas\u001b[0m \u001b[0mpd\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2\u001b[0m \u001b[1;32mfrom\u001b[0m \u001b[0mwarframe_market\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mMarketItem\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 3\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mtime\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 4\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mmatplotlib\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mpyplot\u001b[0m \u001b[1;32mas\u001b[0m \u001b[0mplt\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 5\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mseaborn\u001b[0m \u001b[1;32mas\u001b[0m \u001b[0msns\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32m~\\anaconda3\\lib\\site-packages\\pandas\\__init__.py\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m 15\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mmissing_dependencies\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 16\u001b[0m raise ImportError(\n\u001b[1;32m---> 17\u001b[1;33m \u001b[1;34m\"Unable to import required dependencies:\\n\"\u001b[0m \u001b[1;33m+\u001b[0m \u001b[1;34m\"\\n\"\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mjoin\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mmissing_dependencies\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 18\u001b[0m )\n\u001b[0;32m 19\u001b[0m \u001b[1;32mdel\u001b[0m \u001b[0mhard_dependencies\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mdependency\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mmissing_dependencies\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;31mImportError\u001b[0m: Unable to import required dependencies:\nnumpy: \n\nIMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!\n\nImporting the numpy C-extensions failed. This error can happen for\nmany reasons, often due to issues with your setup or how NumPy was\ninstalled.\n\nWe have compiled some common reasons and troubleshooting tips at:\n\n https://numpy.org/devdocs/user/troubleshooting-importerror.html\n\nPlease note and check the following:\n\n * The Python version is: Python3.7 from \"C:\\Users\\James\\anaconda3\\python.exe\"\n * The NumPy version is: \"1.19.1\"\n\nand make sure that they are the versions you expect.\nPlease carefully study the documentation linked above for further help.\n\nOriginal error was: DLL load failed: The specified module could not be found.\n" ] } ], "source": [ "import pandas as pd\n", "from warframe_market import MarketItem\n", "import time\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns" ] }

@IanMatthewHuff
Copy link
Member

@JamesDConley Thanks for helping to report this issue. Could you please open up a new issues here in the vscode-jupyter repo with your info? This older issue was for a specific corruption cause, so I don't want to reactive the old bug for a new and probably different issue. A new issue will help us look into this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants