Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jupyter nb output in PyCharm IDE breaks with Kedro 0.18.2 #1719

Closed
Tracked by #2928
CraftingLevi opened this issue Jul 21, 2022 · 10 comments
Closed
Tracked by #2928

Jupyter nb output in PyCharm IDE breaks with Kedro 0.18.2 #1719

CraftingLevi opened this issue Jul 21, 2022 · 10 comments
Labels
Issue: Bug Report 🐞 Bug that needs to be fixed

Comments

@CraftingLevi
Copy link

CraftingLevi commented Jul 21, 2022

Description

Jupyter notebook cell output related to the logging package and pandas package in the IDE Pycharm >= 2022.1.2 renders as a web file instead of an interactive 'Table' after upgrading from kedro 0.18.1 to 0.18.2. My hypothesis is that this is caused by the native integration of the Rich package as of kedro 0.18.2. Current resolution is to remain at kedro 0.18.1 for time being, hoping this issues is fixed in a future release.

Code run prior to the outputs in the image below:
import numpy as np
import pandas as pd
from kedro.framework.cli import catalog
%load_ext kedro.extras.extensions.ipython
%reload_kedro D:\Work\vz-ona-kedro
df = catalog.load("node_dataset").head(5)
df.columns = ['Example'+str(idx) for idx, _ in enumerate(df.columns) ]

*Outputs as of 0.18.2
output0182
*Output types types as of 0.18.2
output0182_outputtypes
*Outputs in 0.18.1
output01801
*Output Types in 0.18.1
output01801_outputtypes

Context

Pycharm version >= 2022.1.2
Upgrade from Kedro 0.18.1 to 0.18.2 caused issue. Downgrade resolves issue.
Jupyter cell outputs as a result of the logging package and pandas package render as 'Web' instead of Text / Table.
After outputting, cells might become unresponsive to editing for +/- 15 seconds.

Works fine in the browser version of Jupyter, however this does not have the functionality of an IDE, impacting productivity.

Steps to Reproduce

  1. Upgrade to Kedro 0.18.2 from 0.18.1
  2. Run notebook cell that outputs a pandas dataframe, stacktrace or logs

Expected Result

Output of cell is of type 'Table' and renders as an interactive table

Actual Result

Output of cell is of type Web, and is non-interactive

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • Kedro version: 0.18.2
  • Python version: 3.8.13
  • Operating system and version: Windows 10 Home 21H2
  • Pycharm version >= 2022.1.2

Project Dependencies installed:

networkx~=2.5
plotly==5.9.0
pandas~=1.3.0
numpy~=1.22.3
holidays==0.11.3.1
tqdm~=4.56.0
python_dateutil==2.8.2
python-pptx==0.6.21
kaleido==0.2.1
scipy==1.7.3
Pillow~=8.4.0
python-dateutil~=2.8.1
kedro~=0.18.1
sphinx~=5.0.1
pyarrow~=6.0.1

@antonymilne
Copy link
Contributor

Thanks for raising this issue! Unfortunately I think this is going to be difficult (maybe impossible) to fix fully. The behaviour of rich in notebooks is out of our control. I would recommend raising an issue on their repo. Note they have had issues with PyCharm in the past (and also separately with Jupyter), so worth searching to see if anyone mentioned this before. The behaviour of Jupyter notebooks in PyCharm may also be out of their control though. In that case the issue will need to be fixed on the PyCharm side, which I think is unlikely to happen any time soon sadly.

For now, I would recommend just disabling rich logging in your kedro project:
https://kedro.readthedocs.io/en/latest/logging/logging.html#use-plain-console-logging

If you like having rich logging on the CLI but it can't be fixed for one particular notebooks environment then there should be a way of disabling rich logging just in that environment. Something involving logging removeHandler and addHandler (essentially run a programmatic version of the change in the docs at the start of your notebook).

I'd be curious to hear if any other notebook environments have a problem here, e.g. VS Code. From my experience of PyCharm, I think it's likely to just be acting weird in their particular implementation.

@CraftingLevi
Copy link
Author

Thanks for the quick response! I looked further into it, and it seems that rich and PyCharm just don't go hand in hand and found related issues on the JetBrains end.

I did some digging and found the root cause for the issues in PyCharm
This issue specifically is a result of calling rich.traceback.install and rich.pretty.install in kedro/kedro/framework/project/init.py. Commenting these out resolves all jupyter notebooks issues in PyCharm, and does not cause further issues in Kedro.

A resolution would be to rollback the decision on default_logging and continue use of the logging.yml file in conf/base, whilst setting the default to rich regardless, but providing end-users the option to use a different type of logging opposed to rich, in situations where rich might skrew with the dev environment.

For now, I'll rollback to 0.18.1. I'm new to contributing, so let me know what a good approach would be to get the suggestion above reviewed.

@noklam noklam added the Issue: Bug Report 🐞 Bug that needs to be fixed label Jul 21, 2022
@antonymilne
Copy link
Contributor

Thanks for letting us know @CraftingLevi. Just to check: if you comment out these lines:

        rich.traceback.install(
            show_locals=True, suppress=[click, str(Path(sys.executable).parent)]
        )
        rich.pretty.install()

and still have rich configured as the logging handler (easiest way to ensure this is just to delete your conf/base/logging.yml file), then things work ok in PyCharm Jupyter?

I wouldn't raise a PR for this just yet since we're going to need to have think about the right solution. There could be a few different ways to approach this I think, and we'll need to take a bit of time to figure out what the best way is. But thank you for the offer!

@antonymilne
Copy link
Contributor

antonymilne commented Jul 22, 2022

P.S. you can also completely uninstall rich tracebacks by adding these lines to your settings.py:

import sys
sys.excepthook = sys.__excepthook__

I'm not sure whether something similar is possible to revert rich.pretty.install() - would need to check the rich source code for that.

Edit: looks like sys.displayhook = sys.__displayhook__ might do it but I haven't tried it out.

@CraftingLevi
Copy link
Author

CraftingLevi commented Jul 22, 2022

@AntonyMilneQB I did the following steps as per your description:

  1. Commented out lines 203-206 in kedro/framework/project/__init__.py :
        rich.traceback.install(
            show_locals=True, suppress=[click, str(Path(sys.executable).parent)]
        )
        rich.pretty.install()
  1. Deleted the default logging.yml file in conf/base

Result
Logging by anything in the kedro project is still executed by rich

Anything outside of the scope of kedro (e.g. Tracebacks, Pandas DataFrame outputs) is now not affected by rich , meaning I can use dataframes in notebooks properly in the pycharm environment.

I ran my test and several run configurations after this change, both working as expected.

Therefore I am unsure if therich.traceback.install() and rich.pretty.install() are required here if the purpose of rich is to have better, clearer logging in the context of kedro.

Suggested next steps
Let's consider the removal of these lines, or making them optional in a future release.

In the current state, not only kedro logging, but also traceback and other console output is affected.

In the suggested state, after removing these lines, only kedro logging is affected by rich, and not traceback and other console output. If a user still wants to have fancier tracebacks and printing in the CLI, the rich.<...>.install() functions could be made optional and active by default in some way. If a user primarily works in an IDE, which might be affected by the rich.<...>.install() functions, a segment in the documentation can be adopted showing how to disable the effect of rich on console printing outside of kedro logging.

This is a response to the following:

Thanks for letting us know @CraftingLevi. Just to check: if you comment out these lines:

        rich.traceback.install(
            show_locals=True, suppress=[click, str(Path(sys.executable).parent)]
        )
        rich.pretty.install()

and still have rich configured as the logging handler (easiest way to ensure this is just to delete your conf/base/logging.yml file), then things work ok in PyCharm Jupyter?

I wouldn't raise a PR for this just yet since we're going to need to have think about the right solution. There could be a few different ways to approach this I think, and we'll need to take a bit of time to figure out what the best way is. But thank you for the offer!

@antonymilne
Copy link
Contributor

antonymilne commented Jul 29, 2022

Thanks very much for checking this @CraftingLevi, and sorry for the slow response - I was off on holiday for a few days.

I agree we probably need a better way of enabling/disabling and more generally configuring these two rich.<...>.install functions (which would help for not just for the issue you raised here but others too). I have a few ideas for how we might be able to do this - will follow up on it in #1728.

Ideally though, there would be some way of a user leaving these settings enabled and having rich work nicely in PyCharm Jupyter notebooks. This way you get all the benefits of the rich features and don't need to tinker with any settings depending on the IDE you're using. So I think we should see if that's possible first of all.

Did you find anything about this specific issue on the rich repo? From my experience of PyCharm bug reports, I think this is much more likely to be fixed (or at least some workaround developed) on rich rather than on PyCharm, even if the problem is on the PyCharm side. If there's nothing on the rich repo about it then I think we should open one there to see if they can figure out how to render things nicely in PyCharm Jupyter notebooks.

@CraftingLevi
Copy link
Author

It's been a while, and aside from commenting out the install lines for rich logging as suggested earlier, this might be a, albeit suboptimal, more sophisticated solution.

Inspired by how we handle rich logging and Databricks for now, we can do the same for jupyter notebooks in pycharm. However, there is only one environment variable (in my version of pycharm 2022.2) that only shows up in notebooks, PYDEVD_USE_FRAME_EVAL. I've checked and this is a flag related to debugging in pycharm.

So for now, I replace lines 217-220 in kedro/framework/project/init.py with:

        if "DATABRICKS_RUNTIME_VERSION" not in os.environ \
                and "PYDEVD_USE_FRAME_EVAL" not in os.environ:
            rich.traceback.install(suppress=[click, str(Path(sys.executable).parent)])
        if "PYDEVD_USE_FRAME_EVAL" not in os.environ:
            rich.pretty.install()

Maybe this helps bump this minor issue forward. I still agree with you that a proper solution with the interaction between PyCharms jupyter notebook integration and rich logging should be done in the rich repo, but in the meanwhile, this can bridge that gap.

@antonymilne
Copy link
Contributor

Hi @CraftingLevi, thanks for the update on this, much appreciated. I've just had a look at different environment variables that are present in my PyCharm in different run configurations (debug, emulate in terminal, emulate in Python console, Jupyter notebook), etc. and did not have PYDEVD_USER_FRAME_EVAL in any of them. The one that seemed to uniquely identify Jupyter for me was PYCHARM_HOSTED. This is on PyCharm 2021.3.1 (Professional Edition) so a bit older than yours.

If we can get another person to check that your environment variable seems to work for them then I'd happily accept your change in the kedro source. As you say, it's a hacky fix, but we do something similar for Databricks already so I'm not averse to it. I'd suggest that you open a PR for it and we get some others to review it to see if it works for them too (or I upgrade my PyCharm...).

I'm also currently considering how to make the whole kedro framework-side project logging configuration accessible to users so that any future modifications to the rich commands (or other bits of the logging setup) should be possible in future without having to modify your site-packages at all.

@CraftingLevi
Copy link
Author

In the latest release of Pycharm 2022.3.3 and as of Kedro 0.18.12, jupyter nb output is displayed as expected within the PyCharm IDE (i.e. interactive dataframe tables).

Not having Rich as a core dependency can still help older PyCharm distributions, but for the latest releases for both PyCharm and Kedro, this issue is resolved.

@astrojuanlu
Copy link
Member

Thanks a lot for the update @CraftingLevi 🙏🏽 Let's follow up on making rich optional in #2928, closing this old issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Bug Report 🐞 Bug that needs to be fixed
Projects
None yet
Development

No branches or pull requests

4 participants