-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: allow any value for expr #429
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #429 +/- ##
=======================================
Coverage 81.47% 81.47%
=======================================
Files 29 29
Lines 2618 2618
=======================================
Hits 2133 2133
Misses 485 485
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
Sorry for the slow response, I didn't catch this one. FWIW, if this becomes a complex discussion we might want to have it in an issue so that there is more visibility to discussion that isn't attached to the implementation. I think this is an important option to consider. As you note, it's important to consider both the "pros", the "cons", and the "assumptions" we're bringing. Some quick thoughts on each from me: I assume most users are going to think of this feature very similarly to RMarkdown and knitr, which I believe do allow for arbitrary expressions (e.g. here's a tutorial that shows this). Most users also do not have a mental model of "returning a variable vs. defining a variable". They just think of it all as "code that gets executed". To that extent, if we can make the inline code behave the same as code blocks, that would match the assumptions users are bringing, and would also reduce the amount of writing needed for certain patterns (like the pattern @agoose77 shared above). The main downside that I can think of is that this might have "unintended consequences" for users if they're modifying state in those expressions. However I do think it's a reasonable argument that Jupyter is already rife with this anyway, and it doesn't seem too hard to teach users that "the code you run in-line might change other things in your page". (one example where state is confusing an RMarkdown user interactively). Are there any other major downsides? (e.g., in implementation, complexity, edge-cases and errors, etc?) I think @stefanv and @matthew-brett mentioned the value of in-line expressions as well, and suggested that |
I've updated my original post to condense the pros/cons as I see them, including those that you've mentioned @choldgraf |
Right - consider text like this:
It makes for more verbose code, having to make an otherwise unnecessary set of variables to insert here. And yes, I agree, it's perfectly reasonable to expect the user to take the same care about side effects that they do in code cells. If it's important, I bet you could get a good idea of how often people use variables vs expressions in RMarkdown, by analyzing R notebooks. And I bet that a large proportion of RMarkdown inline expressions are expressions rather than variables. |
@agoose77 just a note that I updated your top comment to try and make it clear what this PR was proposing and what it wasn't, since your terminology made me think that "this restriction" was being applied to this PR, not removed by it :-) |
Well I do want to be clear here, that what is being proposed is to ditch the whole concept of a myst markdown notebook having a 1:1 correspondence with a Jupyterbook, e.g. if you execute the notebook top to bottom in JupyterLab you may well now get a different outcome than executing it via myst-nb. Having a 1:1 correspondence was a clear design constraint when creating myst-nb. If we go this direction, then this is no different than allowing |
Right, this is a valid concern (and I've updated the table). As I see it, what we're actually discussing here is not about side-effects per-se, but rather that we right now do not support expressions in jupyter-cache (and native Lab). Rather than implementing that support, we have this restriction. I can understand this restriction, after all - you've done most (all?) of the implementation work on expressions and it's not a small task. So, keeping things constrained makes for a deliverable result. But, longer term, I think we would benefit from adding that support. Arguably, even just limiting users to Python expressions (and not Walrus assignments) would be nearly as safe, but with far fewer restrictions. Of course, this means we have a language-specific solution, which I'm not wholly in favour of, even if I think we could do it with a per-lang table. Practically speaking, I think there are two separate issues here;
The problem of caching is already a significant one. Any notebook using inline expressions can no longer use the caching system, and I found this to be very restrictive whilst writing my thesis. Even if we don't address this PR, I think we will want to solve the caching problem. As we discussed on Slack, I think we can do this in a generalised manner through the use of a MyST plugin to jupyter-cache. EDIT: Finally, I think this also falls into the category of "trust your users, but maybe warn them about footguns". |
Exactly, I'm not dismissing this, but I want to be very clearly why the constraint is in there in the first place, and what the philosophical implications of removing it are
Personally I think you would just need to create a different package to jupyter-cache, that acts different, |
Preface the below with the disclaimer that this is just design discussion rather than a concrete plan of action. To my mind,
I feel that it is in scope to extend the implementation of components (1) and (2) to incorporate Markdown expressions. To be pedantic, it's already trivially possible to fool the cache into thinking the cache state is valid. Any Python cell that inspects the current saved notebook will do this. Similarly, anything that references random or time-sensitive data. Right now I suspect we'd all argue that this is simply "wrong usage" as it violates the assumption of the cache-system (that only the ordering and contents of the code cells affects the final cache result) Nevertheless, I think we need to consider this for
Being able to standardise the execution interface of a cached notebook whilst supporting plugins would be immensely useful. I can think of other use cases:
It might be that we need to restructure |
I agree with @chrisjsewell's concerns above - those are definitely extra
It looks like RStudio has some of the same confusion (e.g., this SO post is from a person who saw one thing in RStudio and would get another when things are knit with knitr). I'd hope that we can document well-enough to let users know how the two are different, and what behavior to expect. We could include documentation like the following here or in Jupyter Book (feel free to take as much of this language as is useful and add it to the PR if you like):
I agree - though getting core APIs added / changed is a lot of work to align the stakeholders needed to make it happen. I think projects like Jupyter Book / JupyterLab are opportunities to prototype and experiment new patterns. We can see how users like them and work out the kinks, and this will hopefully make it easier to gain broader adoption within Jupyter at a later time (e.g., it'll be easier to value in-line execution to the So I think to implement this, we should add a big "Prototype / Beta" caution at the top of the documentation, and link to an issue or a discussion to gather feedback from people about it. We'd need to tell users that this behavior is not "core Jupyter" and so they should expect some more speed bumps if they want in-line execution. We should also say that the goal is to collect feedback and use-cases so that we can then make more informed decisions about how / whether inline execution should exist within Jupyter's standard behavior. Something like:
Also just a final note that some of these are things that we might be able to get funding to make happen, so it is good to keep this kind of stuff in mind for future grant opportunities etc. |
It's been a while since we looked at this. The regex-based approach we currently use here is fairly non-general, and limits legitimate useful expressions. Ultimately, the notion of what has side effects is not something that can be resolved statically, at least not without full static analysis tools. I do see merit in the argument that we don't want users to shoot themselves in the foot. I'm not sure how likely that is; this is an advanced feature, and to discover it you need to read the docs. That said, I'd be willing to move on throwing out the regex entirely; could we implement a switch to opt-out of this check? That way, users can turn off this sanity check if they need the feature. Better would be to also have a per-kernel identifier table so that languages with more dynamic identifier patterns are supported. Thoughts? |
I think we need to consider this again, now that I suspect VSCode is an outlier here, as I have not been following the development. @chrisjsewell do we implement inline-expression support in VSCode? I took a look at their extension API and it looks like it might be a challenge. In any case, the problem I'm trying to solve is myst-nb inline execution being unusable with certain kernels. I can revise this PR if we want to make it a config setting, rather than a default off! |
Until there are clear standards for this, there is no "canonical" way to write MyST in notebooks, I say this, perhaps prematurely, to put a marker down for issues amounting to: "jupyterlab-myst does XXX, so myst-nb MUST do XXX"
Even more so for inline variables, I feel it needs to be resolved in some kind of actual standard, i.e. https://discourse.jupyter.org/t/inline-variable-insertion-in-markdown/10525,
Yep I think that is the compromise, a config setting that is also very clearly documented as to why it is not the default (as I have already explained here) |
There's a nuance in what I wrote above; I mentioned this in reply to one of the arguments in favour of being aggressive in defining valid eval syntax:
Although we've had inline expression support for a while, the new iteration of the extension makes it simpler to install and prettier too!
It's worth noting here. My understanding is that we want to collectively standardise the various parts of MyST / JB so that we can collectively avoid divergences.
Agreed, and note that I'm not making this argument (see above).
As I understand it, the aspect to standardise here is the output metadata. AFAICR,
We can agree on introducing a config variable. Though I'd prefer it to be non-default (i.e. I think the onus is on the user to behave instead of breaking some workflows by default), I don't have the motivation to push that case any further. I'll revise this PR in due course to introduce a new config setting, with suitable documentation. |
yep exactly, the point is to converge on a central standard, that's not tied to one implementation, as opposed to any one implementation being the standard itself. If that makes sense. |
No I'd say, if |
Er, that too! |
Exactly! But, if you want to do it in a really "thorough" way, that all notebook clients (which is a subclass of jupyter clients) can standardise around, then thats gonna take some thinking. |
FWIW from the perspective of guiding principles to make decisions about introducing new functionality and potentially surprising people, giving us flexibility to re-implement, etc, my thoughts are still pretty much: So my suggestion is to figure out the minimum viable improvement that will satisfy "people want to be able to execute code inline", make it a feature flag, and then ask for feedback from others and iterate from there. We don't want "perfect" to be a gatekeeper of iterative progress if it's a feature that people clearly want, and that will be complex/tricky to get right. |
Yep then main is that feature flags/config give you a clear hook for deprecations/changes. If you introduce a feature, people will use and will be annoyed if you remove/change it, however many warnings you give them lol |
This is true, though I also think there's room for "we gave you sufficient warning, you shouldn't do that". There's a balance to be struck with these things, and it's all social convention. |
@choldgraf is this OK to merge? |
If you think it's fine to merge then I'd say go for it @agoose77 . I don't have time to give it a deep dive though! |
for more information, see https://pre-commit.ci
@choldgraf can you approve, so that I can merge? :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course!
Currently, the
eval
role and directives only allow for C-like identifiers.Proposal in this PR
Remove the restriction on expression values in
eval
contexts.Advantages of this restriction (to be removed by this PR)
jupyter-cache
orJupyter Lab
1.Disadvantages of this restriction (to be resolved by this PR)
℘
orgänseblümchen
. Whilst some of these might be a little far-fetched, I find myself regularly usingτ
for Mathematical expressions.plot.display()
fork3d
) need to use a capturing context object + re-reraise the MIME bundle (ugly)e.g.
Given that notebooks can already have side effects, I fairly strongly disagree with trying to fix this at the MyST level. I feel it's better for users to burn their own fingers than for us to impose restrictions upon them.
Maybe, if someone can find use for this kind of restriction, we can make it an opt-in configuration where the user can turn on REGEX matching, and specify the REGEX?
EDIT: Expounded pros and cons
Footnotes
jupyterlab-myst does support inline expressions. ↩