Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better Error messages for Jupyter Notebook files (data science) #1553

Closed
Juan-132 opened this issue Jan 20, 2020 · 2 comments
Closed

Better Error messages for Jupyter Notebook files (data science) #1553

Juan-132 opened this issue Jan 20, 2020 · 2 comments
Labels
notebook-workflow Issues that interrupt expected or desirable behavior

Comments

@Juan-132
Copy link

Description

When a Python error is thrown, the error message includes a full trackback of what went wrong.
This is probably great for software development/debugging. However, as a Notebook user working on a Data Science project, the most likely causes of errors are: typos in functions/variables, incorrectly called functions/parameters, or other mistakes on my end.

The most useful bits of information that help me identify the root cause of the error are:

  • The ErrorType (what went wrong)
  • The Traceback calls for the notebook itself (where did it go wrong in my code)

Problem

The most useful debug information is printed at the end of the Python error message. This works for traditional Python files, as the error output is printed to the terminal. Errors in Notebooks are however viewed from start (top) to end (bottom). This means that users have to scroll to the end of the error message, to view the most important debug information.
This is a pain, especially for errors with long tracebacks.

Improvement suggestion

[ either as an option / or by default ]
Show concisely formatted Error messages, when an error occurs within a Notebook file (geared towards Data Science users). Include the following bits of information:

  • The Error Type
  • Only tracebacks of code within the Jupyter Notebook (not from dependencies like pandas/numpy).

Example error code:

import pandas as pd
chipo = pd.read_csv("https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv", sep="\t")
chipo.groupby('item_name').sum()['item_price']

Concisely formatted error message:

KeyError: 'item_price'

Traceback (most recent call last)
 in 
      1 import pandas as pd
      2 chipo = pd.read_csv("https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv", sep="\t")
----> 3 chipo.groupby('item_name').sum()['item_price']

Current error message:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2896             try:
-> 2897                 return self._engine.get_loc(key)
   2898             except KeyError:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'item_price'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
 in 
      1 import pandas as pd
      2 chipo = pd.read_csv("https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv", sep="\t")
----> 3 chipo.groupby('item_name').sum()['item_price']

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2978             if self.columns.nlevels > 1:
   2979                 return self._getitem_multilevel(key)
-> 2980             indexer = self.columns.get_loc(key)
   2981             if is_integer(indexer):
   2982                 indexer = [indexer]

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2897                 return self._engine.get_loc(key)
   2898             except KeyError:
-> 2899                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2900         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2901         if indexer.ndim > 1 or indexer.size > 1:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'item_price'
@IanMatthewHuff
Copy link
Member

@Juan-132 That's a good suggestion. Thanks for the report. Thanks for the clean example as well, very helpful when triaging.

@DonJayamanne DonJayamanne transferred this issue from microsoft/vscode-python Nov 13, 2020
@greazer greazer added the notebook-workflow Issues that interrupt expected or desirable behavior label Aug 8, 2021
@DonJayamanne
Copy link
Contributor

Duplicate of #7764

@DonJayamanne DonJayamanne marked this as a duplicate of #7764 Oct 18, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
notebook-workflow Issues that interrupt expected or desirable behavior
Projects
None yet
Development

No branches or pull requests

4 participants