Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: deprecate hard/soft scope distinction #19324

Merged
merged 1 commit into from
Oct 26, 2017
Merged

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented Nov 14, 2016

This PR examines the impact of deprecating (much of?) the distinction between hard/soft scope. Instead it simply distinguishes between global and local scope. This means that all scope-blocks introduce the same type of scope (local), rather than distinguishing that toplevel functions have special, hard scope rules. I've updated the manual to try to show how this change would impact the user. The main change is that there would no longer be the concept of implicit globals computed from examining the module bindings. Instead the global/local computation would be purely syntactic. For example, take the following code snippet:

global x = 0
for x = 1:10 end
@show x

Under the current, this shows 10, because x had a value before the for-loop.
Under the new rules, this shows 0, since the for-loop introduce a new local scope.

Making this code work as before would requiring declaring x to be a global inside the for-loop scope block:

for x = 1:10; global x; end

Another option that this PR still permits is to make an assignment to x inside a begin/end block:

begin
  x = 0
  for x = 1:10 end
end
@show x

The impact to base is small (two corrections to code that will be broken anyways when #265 is fixed).

The impact to tests is larger, as it takes a large number of unintentionally-global variables and causes them to emit a deprecation warning. I think the resulting changes are arguably beneficial, even if we don't decide to change the scoping rules, since they reduce the number of objects being kept around in global variables.

@StefanKarpinski
Copy link
Member

I like it and think we should just go ahead and merge all the changes that avoid creating bogus globals in test files. Any idea what the "(much of?)" might refer to in terms of what difference might remain between hard and soft scopes? I don't understand why the begin/end example works – can you elaborate?

@vtjnash
Copy link
Member Author

vtjnash commented Nov 14, 2016

I don't understand why the begin/end example works – can you elaborate?

Yes, that's where "much of" is currently not entirely deprecated. I think there's two options here:

  1. a syntactic assignment to a global inside a begin/end or if/end block converts usages everywhere inside that block
  2. a local scope must always declare variables as global

Option 1 would mean you could do the following, and f() would change the global x:

begin
    x = 1
    f() = (x += 1)
end

This is the begin/end example above extended to functions (scope inheritance of surrounding global blocks).

Option 2 would mean that begin/end blocks would get the same result lowered if lowered as a block or lowered one statement at a time (never scope inherit). This option would also mean that we could also (for example) delay expanding macros, and might help make it easier to get the lowering of if true; using Foo; Foo; end correct.

I left this undecided in the current parser deprecation.

@ararslan
Copy link
Member

So if I understand correctly, the option 1 example you gave would require global x = 1 in order for f to modify x in option 2? If that's the case, option 2 IMHO sounds like a winner overall.

@@ -117,24 +128,29 @@ changed within their global scope and not from an outside module.
Note that the interactive prompt (aka REPL) is in the global scope of
the module ``Main``.

Within the global scopes, the `global` keyword is never necessary,
Copy link
Member

@Sacha0 Sacha0 Nov 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps "Within a global scope"?

introduced into the top-level scope:
The following rules and examples pertain to local scopes.
A newly introduced variable in a local scope does not
back-propagate to its parent scope.
Copy link
Member

@Sacha0 Sacha0 Nov 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps "propagate out to its parent scope" or "escape to"?

julia> function foobar()
global x = 2
end;
.. doctest::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doctest appears to belong following the statement "An explicit global..." on line 253. But due to the deletions above neither foobar nor x associated with the remaining parts of this doctest are defined?

@vtjnash
Copy link
Member Author

vtjnash commented Nov 15, 2016

option 1 example you gave would require global x = 1 in order for f to modify x in option 2?

x and global x can be treated the same for the statement at global scope, what would differ is that you would always need to be explicit when at the local scope level:

begin
    x = 0
    f() = (global x += 1)
end

(I'm not sure if you meant global x = 0 or global x+= 1. The latter case would be proposed option 2.)

@StefanKarpinski
Copy link
Member

So, if I'm understanding this, option 1 makes begin/end blocks and if/else blocks soft. With option 1, how would you write that example without the begin/end block? Would you then need the global?

(find-assigned-vars ex '() '()))))
lv))
;; vars assigned anywhere, if they have been been explicitly defined
(gv-implicit (filter (lambda (x) (and (not (or (memq x lv) (memq x gv))) (defined-julia-global x)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this change would mean lowering doesn't depend on defined-julia-global?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual change wouldn't, but this is implementing the deprecation to precede the change, so it can't actually alter anything.

@vtjnash
Copy link
Member Author

vtjnash commented Nov 15, 2016

Yes, for option 1, if there is no containing soft scope block that assigns to it, then you would have to declare it global somewhere inside the local scope. (editorial note: this usage of "soft scope" is not related to the old behavior, since the difference in scope type is designed by the block construct used (if/begin) rather than the nesting (toplevel/inner). perhaps a new term should be used for clarity?)

@martinholters
Copy link
Member

👍 for simplification of the scoping rules. I always found those a bit confusing.

However,

for x in 1:10
    global x
    # ...
end

looks a bit unintuitive. For readability of the code, I'd always want the global to precede the variable access. Maybe we could add (possibly in a separate PR) parsing of for global x in 1:10 to be equivalent to that? (I.e. adding a global to the iteration variable, not changing the scoping rules of the for otherwise.) Then again, this simply might not be needed often enough to justify dedicated syntax.

@StefanKarpinski
Copy link
Member

I was about to suggest the for global x = 1:10 as well. I suspect with a change like this we'd need an outer or nonlocal keyword as well.

@vtjnash
Copy link
Member Author

vtjnash commented Nov 15, 2016

@StefanKarpinski I don't follow your outer scope comment. That keyword shouldn't affect the global/local distinction made here since it only changes the behavior of local scope. The outer keyword was previously suggested for making functions consistently hard-scoped, whereas this proposal makes all local scopes have consistent nesting rules.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Nov 15, 2016

If we have to write f() = (global x += 1) to mutate x when it's global, how does one do this:

function g()
    x = 0
    f() = (??? x += 1)
end

What keyword (or lack thereof) goes where the ??? is?

@vtjnash
Copy link
Member Author

vtjnash commented Nov 15, 2016

Currently, g has a hard-scope and f has soft-scope. In the proposal, both would have local-scope. Local scope would be consistent, so you don't need a keyword there regardless of how you nest the local scope (for loop, function, etc.). With respect to the nesting of local scope, this proposal is not an attempt to change what we do now.

@vtjnash
Copy link
Member Author

vtjnash commented Nov 15, 2016

@StefanKarpinski This PR is limited to the question of what the following code does:

for x = 1:10
    local function f()
        return x += 1
    end
    global g = f
end

The current answer is that you can't know from the information given, since I haven't given you the rest of the scope of x (the set of names previously declared to be global in the module).

@StefanKarpinski
Copy link
Member

I agree that's a problem, but if we're going to fix our scoping rules, we probably shouldn't do it in bits and pieces, but rather all at once (or rather, in two deprecation and finalization phases).

So you're saying that under both variations of your proposal, no keyword would be required in my example?

@vtjnash
Copy link
Member Author

vtjnash commented Nov 15, 2016

Correct. If we decided eventually to go with an outer keyword, there's no particular reason, in my opinion, to couple that to this change. I don't believe the effort of dealing with the deprecation will be reduced by combining them, as they mostly only affect different code.

For the outer keyword to make sense, we would need to have a concept of hard vs. soft scope. But in the spirit of this change, that distinction would be based upon the keyword used to introduce the scope (function vs. for, let, etc.) rather than being a toplevel function. However, I don't believe that decision needs to affect this PR because I believe it is rare for someone to depend on the soft-scope rules of nested global scope blocks that don't introduce a function. So it is rare for someone to be affected by this change, while requiring outer to escape an assignment from an inner function would be much more disruptive. I believe that change would take 3 cycles to make (introduce outer keyword, deprecate implicit outer keyword, remove deprecation).

@StefanKarpinski
Copy link
Member

I don't get your argument that outer only makes sense with two kinds of scope. If there are situations where one needs global to force a variable not to be local to the inner scope, how do you force the same local variable to be not global but local to an outer local scope? It seems to me that the need for that is caused by having only one kind of scope, not by having two kinds.

@vtjnash
Copy link
Member Author

vtjnash commented Nov 15, 2016

The minimum set of scoping rules is global + nesting local scope, and I think you can make an entirely consistent model from this. You only need outer if you want to add a non-nesting local scope to the mix ("hard scope"). I don't think anyone is going to argue that we should always use non-nesting local scope (function f(); x = 0; for i = 1:10; outer x += i; end; end; end), so then it's just a question of whether nested functions use the same rules as other nested blocks or whether we change them to require an extra keyword to assign to an outer scope.

But again, in this PR I'm primarily just looking at whether we want to continue to have a soft-global-scope, or whether to make global scope always hard-scope. Local scope is already always a soft-scope. Whereas global scope may be hard or soft depending on the scope keyword and the set of names already declared in global scope.

@mauro3
Copy link
Contributor

mauro3 commented Nov 16, 2016

Generally, +1 to ditching soft/hard scope distinction. However, I think this change would have negative impacts on using Julia for "scripting" (edit: so a -1 from me). A few points:

  • I'm a bit unclear whether the new "local" scope behaves like the "soft local" or the "hard local" scope from before. Or whether it is something new entirely. Judging by the doc-update: new local == old hard-local scope. If so, this would impact scripting and REPL work greatly: see example below.
  • note that the heads of scope blocks (loop variable, function arguments, let-assignments) have their own rules. So conceivably the very first example could be made to behave like it does currently.
  • currently begin-end blocks and if-blocks do not introduce a new scope. I'm a bit confused whether this would change with this PR (the examples above of @vtjnash, in particular this comment, suggest so, however the updated docs suggest no change). If changed, is that really desired?

Example of a future REPL session (or script in global scope, or Jupyter notebook):

julia> data = rand(5);

julia> for i=1:length(data)
         global data
         if data[i]>0.5
           data[i] += 2
         end
       end

(edit: this example is wrong as mutation has nothing to do with scope. See @vtjnash post two down for the correct example.)

So code will have to be peppered with lots of global statements, which is very annoying for repl/scripting/notebook-work. Also, a typical work-flow of mine involves first scripting in global scope and then copying the reusable code to functions; this would mean now a lot of deleting of global statements as the loops are then in local scope and they are refering to variables in a containing local scope.

All in all, I'm not sure whether this is a good change considering Julia is also a scripting language (it would be if REPL/scripting is not important). As the main impact of this PR is on scripting, it makes sense that there was little impact of this PR on base, because that is library code. Whereas the tests are more script-like (although arguably that is bad).

One other change could be to keep the hard/soft distinction but make the function scopes properly hard: i.e. they can never write to variable not in its own scope. This would require the nonlocal keyword, to allow to override this. See #10559.

@StefanKarpinski
Copy link
Member

I think I agree with everything @mauro3 wrote – although, as he said, it's a little hard to understand just what these changes entail. The difference between code you write to mutate a global versus to mutate a local from an outer (function) scope is the primary concern since the workflow of developing with globals in the REPL and then using the same code as part of a function body is ubiquitous – and so far Julia's scoping rules are carefully designed to make almost everything work as similarly as possible going between those two contexts. If we're going to require a keyword to mutate in one case, we should also require a keyword in the other case. Currently we mostly don't require one in either case – a symmetry which the proposed change seems to break.

@vtjnash
Copy link
Member Author

vtjnash commented Nov 16, 2016

Loop variables don't have their own scope rules.

This isn't changing your ability to use global data, only the ability to reassign it. If that was so, you may have also forgotten to annotate global length, getindex, setindex!, colon, start, next, done also. There's no way that's going to be up for discussion. So adding global data to the loop is unnecessary. This primarily only affects the ability to write reducers at global scope (the exact example above):

s = 0
for i = 1:10
  global s # Without this, it throws an UndefRefError. With this, it is very slow.
  s += 10
end

@mauro3
Copy link
Contributor

mauro3 commented Nov 16, 2016

Yes, good point @vtjnash. However, I feel the point I tried to make with the example still stands with your corrected example.

Also note that even though the scope rules are pretty complex they seem intuitive, at least judged by the number of julia-user posts.

@vtjnash
Copy link
Member Author

vtjnash commented Nov 16, 2016

Just to repeat myself, this PR only impacted test code that wasn't really doing the intended thing. I suspect that the lack of mailing posts is not really that informative. I understand that the intended behavior of the above for loop is not obvious, but that's also why I think it should be made explicit.

Here's one of the non-intuitive results that fall out of the current global-soft-scope rules:

julia> let
         x = 0
         g = () -> (global x = 1)
         g()
         return x
       end
0

julia> let
         x = 0
         g = () -> (global x = 1)
         g()
         return x
       end
1

This also happens to be slow for lowering right now (it seems to be quite difficult for it to figure out the scope of x, so it makes something like 4 successive guesses recursively over the entire AST to refine its estimate – and it still got the wrong answer above in the first execution), which is why I started looking into it just now.

@mauro3
Copy link
Contributor

mauro3 commented Nov 16, 2016

Shouldn't this be an error like this already is:

julia> let
           x = 2
           let
               global x = 3
           end
       end
ERROR: syntax: `global x`: x is local variable in the enclosing scope

@vtjnash
Copy link
Member Author

vtjnash commented Nov 16, 2016

No, the function introduces a hard scope

@vtjnash vtjnash added this to the 0.6.0 milestone Dec 18, 2016
@JeffBezanson
Copy link
Member

Let's merge this?

@mauro3
Copy link
Contributor

mauro3 commented Oct 24, 2017

For me, this PR just shows two files changed. In particular the docs/news are missing. Weren't they around at some point? See e.g. #19324 (review)

vtjnash added a commit that referenced this pull request Oct 24, 2017
@vtjnash
Copy link
Member Author

vtjnash commented Oct 24, 2017

Good catch. Looks like I had pushed those to the wrong branch.

@JeffBezanson JeffBezanson merged commit cc87d82 into master Oct 26, 2017
@JeffBezanson JeffBezanson deleted the jn/toplevel-scope branch October 26, 2017 03:41
@StefanKarpinski
Copy link
Member

💥 💣 ✊

@stevengj
Copy link
Member

stevengj commented Jan 7, 2018

One frustrating consequence of this PR is that I can't simply copy-and-paste code from a function into the REPL for interactive use. e.g. the following typical code is just fine in a function but gives a deprecation warning in the REPL:

s = zero(eltype(a))
for x in a
    s += x
end

Is there some way that this deprecation could be disabled in a REPL context?

(If I'm using the REPL to debug/understand code from a function, I want to be able to do bidirectional copy-paste, so being forced to insert global declarations is especially frustrating.)

@StefanKarpinski
Copy link
Member

Yes, this is awkward and why we designed the scope behavior the way it was in the first place – so that local and global scopes worked as similarly as possible. The ways forward I can see here are:

  1. Use a debugger instead of cut-and-paste into the REPL.

  2. Introduce an outer keyword so that you can write outer s += x and have it work the same in local or global scope. Then global is essentially "outermost" while outer is one scope outside of this one, which are the same thing if you're only inside one local scope.

I'm not sure how feasible turning of the deprecation in the REPL is, but it's certainly a thought.

@stevengj
Copy link
Member

stevengj commented Jan 8, 2018

outer doesn't solve the problem that this works fine (with no keyword) in a local scope.

Ideally, there would be no deprecation here for any loop that occurs in global scope, not just in the REPL.

@StefanKarpinski
Copy link
Member

Yes, that would be lovely, but then we have the hard/soft scope situation again and we're back where we started, so we can't do that.

@stevengj
Copy link
Member

stevengj commented Jan 9, 2018

Frankly, this change writing loops interactively much more difficult, and is especially confusing for new users who are used to working transparently with globals.

A fix can't be too REPL-specific, since it also has to work for Jupyter, Juno, etcetera.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Jan 9, 2018

This is the only solution that's been proposed that allows the meaning of a given variable to be statically determined. I can imagine not requiring static resolution of variables in the REPL, but it's not great pretty much everywhere else. The technical problems caused by this for inference and compilation are not insignificant, but it's also just not good to not be able to tell without running code what a given variable even refers to. Is it global or local? Who knows! Run the code and see.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Jan 9, 2018

The fundamental issue here is this kind of expression:

for _ = 1:1
    x = value
end

Does this introduce a new x local to the for loop or does it assign to an outer x binding? If the for loop appears in a function body, then the rule is that if there is an outer local variable named x, it will be updated; if there isn't then x is local to the for loop. This is fine because the existence of a local variable is a static property, independent of whether it has a definition or not.

In global scope, there is no notion of whether a global variable exists or not independent of whether it has a binding – it either has a value or doesn't. And that's the trouble since whether a global has a binding or not is not a static property, it depends on what previously evaluated code has done. The old behavior was that the above code would update an existing global binding for x if there was one or introduce a new local variable if there was not. So the meaning of the code depends on a dynamic unpredictable property based on what code that was evaluated previously may have done.

So we have a choice:

  1. Allow the meaning of code to depend on non-static control flow.

  2. Make the example code unconditionally create a new local variable when appearing in global scope, causing the behavior of code to become different in global and local scope.

  3. Make the example code unconditionally update an outer variable when appearing in global scope, causing the behavior of code to become different in global and local scope.

  4. Same as 2 but making local scope match. This would make it impossible to update an outer local variable from a for loop without explicitly declaring it as outer (a feature we don't yet have).

  5. Same as 3 but making local scope match. This would cause functions to have flat scope in which any appearance of a local variable name in the function is always the same variable.

  6. Introduce some static notion of global variable existence independent of having a binding. This would mean that a global variable could "exist" without having a binding just like a local variable can exist without being defined.

We've gone with option 2 in this PR. Which option would you prefer?

@StefanKarpinski
Copy link
Member

Realized my analysis was wrong and fixed it.

@stevengj
Copy link
Member

stevengj commented Jan 9, 2018

How about option 1 in global scope?

@StefanKarpinski
Copy link
Member

That is what we had. It means that the meaning of x in the above code in global scope depends on global state. Are you ok with that? I find it to be problematic in many ways.

@stevengj
Copy link
Member

I'm fine with that. Global loops depend on global state, why not?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deprecation This change introduces or involves a deprecation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants