Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ordered quantification (foreach loops and sequence comprehensions - take 2) #10

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

robin-aws
Copy link
Member

@robin-aws robin-aws commented May 12, 2022

(Second revision of the) design of the first step of dafny-lang/dafny#1753, and follow-up to #4. Replaces the first revision at #9

The key difference between this version and the old one is explicitly defining ordered quantification up front, and restricting the semantics to the first clauses of quantifier domains instead of inferring it from the range.

cc all the folks that provided such awesome feedback on the first version (turns out you can copy-paste this from the "participants" section of a PR's sidebar): @samuelgruetter @atomb @MikaelMayer @cpitclaudel @keyboardDrummer @RustanLeino. And while I'm at it why not @backtracking, @mschlaipfer, @parno, @amaurremi, @emina, and @seebees.

Edit: Revised to make <- required in ordered contexts (and drop the concept of natural ordering of types), and allow per quantified variable ranges. Also added motivation of idiomatic compilation.

Copy link
Member

@cpitclaudel cpitclaudel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nicely written proposal. I'm still concerned about the notion of ordering. In particular, the way the proposal is designed makes it so that in some cases you can't write seq x <- s | … and instead must write seq x | x in s && …. But in other cases, writing seq x <- s | … is possible and means something different from seq s | x in s && …:

seq x <- {1,2,3}     // rejected
seq x | x in {1,2,3} // ⇒ [1,2,3]

seq x <- [1,1,1]     // ⇒ [1,1,1]
seq x | x in [1,1,1] // ⇒ [1]

I would much prefer to ban sequence comprehensions without a <- clause.

the bindings will be ordered according to the *natural ordering* of the type `T`. Not all Dafny types will
have such a natural ordering defined, but at a minimum `int` and `real`, and any subset type or newtype based on them,
will use the natural mathematical ordering.
`x: int | x in {2, 5, -4}`, for example, would bind `x` to `-4`, `2`, and `5` in that order.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which case is the natural ordering needed? I'm surprised that we still need an ordering at all: for the example above we can write seq x: real <- sorted(mySet) | P(x), right? And for this one the libraries can provide a conversion function from sets?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, how will this work with types that don't have comparisons? I'm assuming that types that don't support == also don't support comparisons, and yet this feature will let me compare arbitrary values: var aLeB := (seq x | x in {a, b}) == [a, b]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, how will this work with types that don't have comparisons? I'm assuming that types that don't support == also don't support comparisons, and yet this feature will let me compare arbitrary values: var aLeB := (seq x | x in {a, b}) == [a, b]

This is a very good point - I've been assuming in all my examples that the types were equality-supporting to focus on defining the semantics. That means in some cases you can't even create a compiled {a, b} value in the first place. I'll call that out more explicitly and watch out for under-specified examples.

But yes, not all types will have natural orderings (any reference type for example) so some types will require <-.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your proposal makes sense, but it has caveats.
x: nat says that x is in [0, 1, 2, 3, 4, ....] and then the range says how to filter this infinite range, so it's natural to think that the result be sorted if the underlying type "int" is sorted. In practice, we would compile to something different: We would infer what to enumerate x from (the range), and sort it. But this is a hidden performance cost based on heuristics.
We might as well not accept this "implicit sorting" for sequences, and immediately suggest that the enumeration be part of the x: nat syntax, and enable non-deterministic set orderings.

The ordering of bindings is non-deterministic unless `C` is a sequence.
If `C` is a `multiset`, multiple bindings with the same value of `x` will be created, but their ordering is still non-deterministic.

The `<-` symbol would be read as "from", so a statement like `foreach x <- C { ... }` would be read as "for each x from C, ...". Note that `<-` is not an independent operator and is intentionally distinct from the `in` boolean operator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recycling in might be fine?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is only that the difference between x in C as a quantified variable declaration and x in C as a boolean expression will be pretty subtle, when C is a multiset or seq: if C is [1, 1, 1], for example, we'd have:

assert (seq x in C) == [1, 1, 1]
assert (seq x | x in C) == [1]

That may be less of a concern if I drop the natural ordering idea that allows the second form, so we'll see how this plays out in the next revision. :)

3. `<CasePatternLocal> <- C`

This is a generalization of the previous case that supports pattern matching, as in variable declarations and match statements.
It allows destructuring datatypes and tuples, as in `(k, v) <- myMap.Items`, and filtering, as in `Some(x) <- mySetOfOptionals`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have filtering assignments anywhere else, maybe this is best left to a later stage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely agree it doesn't have to be in the first set of changes - I've got delayed until phase 4 in the implementation plan. I strongly want it eventually though, if nothing else to support quantifying over the key-value pairs in a map succinctly and efficiently, which is a very common pattern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Destructuring lets are already supported; I'm not worried about them. It's filtering ones that we don't have.
So (x, y) <- [(1, 2)] is good; Some(x) <- [None] is what I'm worried about.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it as a natural extension of the semantics of match statements, which is like filtering as well. These two loops are equivalent, for example:

foreach Some(x) <- [None, Some(42), None] {
  print x;
}

foreach o <- [None, Some(42), None] {
  match o {
    case Some(x) => print x;
  }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I love it and I want that feature; but I want it in more cases (including if var Some (x) := opt then …), and I think it distracts from the main aspect of this proposal without adding expressivity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well awesome then! :) I mainly wanted to enforce that there was a good option for the map iteration case, and it was a slippery slope to allow filtering as well.

I could make it yet another Future Possibility if that would make the proposal tighter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please, that sounds great :)

It allows destructuring datatypes and tuples, as in `(k, v) <- myMap.Items`, and filtering, as in `Some(x) <- mySetOfOptionals`.

A single quantifier domain may include multiple such declarations separated by commas,
in which case the orderings described for each clause take lexicographic precedence.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean that x <- [2, 1], y <- [4, 3] is the same as x <- [1, 2], y <- [3, 4]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, seq x <- [2, 1], y <- [4, 3] :: (x, y) would be [(2, 4), (2, 3), (1, 4), (1, 3)]. This was a confusing use of "lexicographic" and I'll rewrite this.

0005-ordered-quantification.md Outdated Show resolved Hide resolved
The quantifier domain is allowed to be potentially infinite if the loop is used in a compiled context and an explicit `decreases` clause is provided.
`decreases *` is permitted, in which the `foreach` loop may never terminate. Any other `decreases` clause can be provided
to ensure the loop terminates even if the domain is potentially infinite. For example, the following (very slow) example collects
five arbitrary primes (assuming that Dafny can figure out how to enumerate the `allPrimes` infinite set):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also need to know that it's not empty, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worse, it would have to know there are at least five elements. Fortunately the code is sound, but my statement about how it behaves is wrong - it should say that it collects AT MOST five arbitrary primes. I'll fix that.


Since sequence comprehensions are expressions rather than statements, they carry an additional restriction
when used in functions: their ordering must be fully deterministic. If `s` is a `set<int>`, for example,
`seq x <- s` would be rejected in specification contexts, whereas `seq x | x in s` would be allowed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have this distinction?

As mentioned in the guide-level explanation, `foreach` loops and sequence comprehensions are both able to
borrow concepts and implementation substantially from other features. Parsing, resolving, verifying, and compiling
quantifier domains is already a mature aspect of the Dafny implementation. The most significant implementation burden
is ensuring that enumeration ordering is threaded through, and deterministic in contexts where it needs to be.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which contexts does it not need to be deterministic?

The proposed sequence comprehension expression allows more
logic that ultimately produces a sequence of values to be expressed as a value rather than a statement.
Just producing a sequence of the values in the set above, sorted by the natural ordering of `int` values,
is simple using a sequence comprehension expression: `seq i: int | i in s`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that it's worth having this, since users can simply write SetToSeq(s). And, once you remove this (keeping only <-), I think you don't need the notion of quantification ordering at all? I expanded on this below.

and I will label the `x: nat` and `y: nat` sections as *quantifier variable declarations*.
We can augment this existing syntax with a notion of ordering, and allow quantified variables to bind to duplicate values. The key points are:

* Any quantifier domain defines a potentially-infinite, partially-ordered set of *quantifier bindings*.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure you can order all types, see below

Co-authored-by: Clément Pit-Claudel <[email protected]>
@robin-aws
Copy link
Member Author

Very nicely written proposal. I'm still concerned about the notion of ordering. In particular, the way the proposal is designed makes it so that in some cases you can't write seq x <- s | … and instead must write seq x | x in s && …. But in other cases, writing seq x <- s | … is possible and means something different from seq s | x in s && …:

seq x <- {1,2,3}     // rejected
seq x | x in {1,2,3} // ⇒ [1,2,3]

seq x <- [1,1,1]     // ⇒ [1,1,1]
seq x | x in [1,1,1] // ⇒ [1]

Shoot, I've incorrectly stated in a few places that the ordering has to be deterministic in specification contexts, where I should have said when used in a compiled expression. It's the same reason that let-such-that expressions can only be compiled if they have unique solutions, so that they behave deterministically at runtime. I'll fix that and also call this out more explicitly where relevant.

Independently of that, though, I don't think this kind of restriction is unreasonable, since there's already plenty of precedent in Dafny.

I would much prefer to ban sequence comprehensions without a <- clause.

The main reason I want to allow omitting <- clauses is to at least support ordered quantification over integers, so that you can say seq i | 0 <= i < 10 :: ..., without having to explicitly actually create the [0, 1, ..., 10] value, which will be a performance hit at runtime unless we're far more clever than I'd like :). Besides the fact that it is useful in several motivating examples, that allows us to eventually replace the existing less consistent seq(10, ...) sequence construction feature. It also then provides a nice, succinct way to refer to a sequence of the values in an unordered collection in specifications. I do agree that part could be provided as a standard library function by method though.

Would you be comfortable with the incremental implementation plan I'm proposing, where the first few iterations would not support plain quantifier variables without <- clauses anyway?

@cpitclaudel
Copy link
Member

The main reason I want to allow omitting <- clauses is to at least support ordered quantification over integers, so that you can say seq i | 0 <= i < 10 :: ..., without having to explicitly actually create the [0, 1, ..., 10] value, which will be a performance hit at runtime unless we're far more clever than I'd like :)

I don't think we would have to be very clever. If we had syntax for that sequence (say 1..10, then we could recognize it easily at compile time).

Would you be comfortable with the incremental implementation plan I'm proposing, where the first few iterations would not support plain quantifier variables without <- clauses anyway?

But step 1 claims that there is a need for ordered quantification, and I don't understand that part.

Copy link
Member

@MikaelMayer MikaelMayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job figuring out that the core of everything you are suggesting is the "ordered" enumeration part. A few comments there and there.

// After:
method AllPairs(s: seq<nat>) returns (result: seq<(nat, nat)>) {
result := [];
foreach left <- s, right <- s {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to look stubborn with pointing back at your original idea 😁 , but anyway, what about the syntax:

foreach left <- s && right <- s {
}

? Other booleans operators like || or ! would not be permitted except possibly in ghost contexts.

With this generalization, you could easily add conditions, such as:
left <- s && right <- s && left < right
or
left <- s && left % 2 == 0 && right <- s && right % 2 == 0
or
left <- s && left !in other && right <- s
I know sometimes we use the word "heuristic" to describe how to "infer" some knowledge, but here it would be plainly deterministic, straightforward and well documented.
it would even work great for integers, as soon as their range is next to their declaration:
left: int && 0 <= left < max && ...
or you could reuse the current for syntax:
left <- 0 to max

Alternatively, if you really prefer the comma, you could enable both enumeration and variable declaration at the same time.

left <- s, right <- s, left < right
or
left <- s, left % 2 == 0, right <- s, right % 2 == 0
or
left <- s, left !in other, right <- s

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a quick TL;DR as I revise the proposal, your last version is closest to what I'm thinking now. Those examples would be as follows instead:

left <- s, right <- s | left < right
or
left <- s | left % 2 == 0, right <- s | right % 2 == 0
or
left <- s | left !in other, right <- s

i.e. I'm keeping the hard distinction of x <- c not being a boolean operator, but a syntax attached directly to declaring the quantified variable. But each quantified variable can have an optional range attached with |. I realized that's still backwards compatible, because existing x1, x2, ..., xN | <Range> quantifier domains will just be reinterpreted as having a range attached to the final xN variable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note: I like the left <- 0 to max syntax, if we would ever have a use for it. In fact, I like it better than left <- 0 .. max. If we were to introduce this at some point, then left <- max downto 0 is also a natural possibility, since for for loop supports similar syntax.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it's nice that it wouldn't clash with the existing s[i..j] syntax, and that there is already precedent for how to handle the reverse order.

If we do that, is it really worth continuing to support both for x := 0 to 10 and foreach x <- 0 to 10?

the bindings will be ordered according to the *natural ordering* of the type `T`. Not all Dafny types will
have such a natural ordering defined, but at a minimum `int` and `real`, and any subset type or newtype based on them,
will use the natural mathematical ordering.
`x: int | x in {2, 5, -4}`, for example, would bind `x` to `-4`, `2`, and `5` in that order.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your proposal makes sense, but it has caveats.
x: nat says that x is in [0, 1, 2, 3, 4, ....] and then the range says how to filter this infinite range, so it's natural to think that the result be sorted if the underlying type "int" is sorted. In practice, we would compile to something different: We would infer what to enumerate x from (the range), and sort it. But this is a hidden performance cost based on heuristics.
We might as well not accept this "implicit sorting" for sequences, and immediately suggest that the enumeration be part of the x: nat syntax, and enable non-deterministic set orderings.


```dafny
var myDoubleDeckerSeq: seq<seq<int>> := ...;
foreach x <- myDoubleDeckerSeq, y <- x | y != 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you want to filter on x before you extract y from x? What would be the syntax like? Can you write down a proposal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See latest commit

```dafny
// Filtering to optional values
var s: seq<Option<T>> := ...;
var filtered: seq<T> := seq o <- s | o.Some? :: o.value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminds me that I would love to be able to create a comprehension that would create an Option<T> by "enumerating other options", like Scala enables. But for now, we have the elephant operator, which It think is strictly better.
Yet, I love your proposal of collect at the end of this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, considering failure-compatible types to be collection values compatible with <- is another future possibility. But I realized that supporting pattern matching makes that unnecessary because you can just write Some(x) <- optionals instead.

Note that the existing `seq(size, i => i + 1)` syntax is inconsistently-named in the Dafny reference manual,
but we refer to it here as a "sequence construction" expression, to disambiguate it from the proposed sequence comprehension feature.
Sequence comprehensions are strictly more expressive, as any construction `seq(size, f)` can be rewritten as
`seq i | 0 <= i < size :: f(i)`. They also avoid the common trap of forgetting to explicitly attach the `requires 0 <= i < size`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this remark about the requires.

programmers how their code will behave at runtime, and hence these new features come with
the same downside. Dafny is free to implement an expression like `seq x <- a | x in b` by either enumerating the values
of `a` and skipping those not in `b`, or by sorting the elements of `b` according to the ordering in `a`. This flexibility
is excellent for future optimizations, but makes it harder to reason about runtime performance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 about reasoning about runtime performance.
I really read x <- a | x in b as "enumerate values in a, and skip those not in b". Sorting the elements of b according to the ordering in a and skipping the elements of a should be something explicit, such as x <- sorted(b, a) | x in a.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree.

no additional filtering, and with `C` directly providing the enumeration and not just the ordering of values.
This is certainly a well-formed and sound construct, but far less powerful than the proposed version, especially
when sequence comprehensions are excluded. Offering `foreach` loops along with sequence comprehensions means we can
recommend the latter where possible, with the former as a fallback that is inevitably more complicated to verify.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please discuss my alternative at least here?
The one that says "add '<-' as a synonym of 'in' in ranges, but such that they would be used for specifying the enumeration (i.e. no more conflicting heuristics needed)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely! I might as well summarize the best version of the first RFC here. :) I'm also adding Clement's version where <- is required everywhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACTUALLY @cpitclaudel has given me the kernel of an idea that allows us to chain filtering as you are asking for, while preserving the purity of boolean expressions, and still turns out to be backwards compatible with existing quantifier domains! I am tweaking the proposal now and will push a revision to see if you like it as much as I do. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You got me very curious now.


Along similar lines, will users frequently want to customize the ordering of their quantifications without
creating explicit collection values? To produce the sequence `[4, 3, 2, 1, 0]`, for example,
we could consider supporting something like `seq i by > | 0 <= i < 5`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sequences are both increasing and decreasing in everyday's usage, so having a way to support both would be better.
Alternatively to your suggestion:

  • seq i: int | 0 <= i < 5 :: 4 - i would not require any change
  • seq i: int | 5 > i >= 4 :: i might seem a bit WTF.
  • seq i: int | i <- 4 downto 0 :: i to constrain the notion of ordering not in the sequence comprehension, but in the original iterable sequence.

Copy link
Member Author

@robin-aws robin-aws May 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seq i: int | 0 <= i < 5 :: 4 - i would not require any change

Yup, that's a nice application of the generic comprehension syntax!

seq i: int | 5 > i >= 4 :: i might seem a bit WTF.

More than just a bit WTF methinks :D That's another case where I don't think we should mess with deeply established semantics of boolean operators. Plus I know @RustanLeino prefers the idiom of always using < and <= so values are ordered naturally left-to-right in expressions, and I tend to like that too.

seq i: int | i <- 4 downto 0 :: i

THAT'S not a bad idea, since there's precedent in for i := 4 downto 0 { .. } loops...


Since sequence comprehensions are expressions rather than statements, they carry an additional restriction
when compiled: their ordering must be fully deterministic. If `s` is a `set<int>`, for example,
`seq x <- s` would be rejected in compiled code, whereas `seq x | x in s` would be allowed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I would buy into that. Deterministic means that the result will always be the same if the same set is given. The compilers should allow traversing a set in a deterministic order if they are equal anyway, even if it just means comparing references of non-comparable types.
So seq x <- s could be accepted in compiled code.

I suspect that, most of the time, however, users will not want to convert sets to sequences, but always use sequences, and use set in ghost contexts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I would buy into that. Deterministic means that the result will always be the same if the same set is given. The compilers should allow traversing a set in a deterministic order if they are equal anyway, even if it just means comparing references of non-comparable types. So seq x <- s could be accepted in compiled code.

This can be done in some cases but definitely makes the runtime more complicated: most HashSet implementations might give you a different iteration order for {1, 2, 3} vs. {3, 2, 1}. I tried to explain how that could be addressed in many cases with a more specialized implementation in this section in the original RFC, but didn't go back to clean that up since I've dropped the requirement in this version. I can take another stab at it here if it would help though.

I suspect that, most of the time, however, users will not want to convert sets to sequences, but always use sequences, and use set in ghost contexts.

Well now that it's public (🎉!), allow me to point to the natively-implemented ComputeSetToOrderedSequence method that the ESDK uses in several places to deterministically serialize the keys of a map!

collect(*) x: T | P(x) :: Q(x) == (product)
collect(<) x: T | P(x) :: Q(x) == (minimum)
collect(Averager) x: T | P(x) :: Q(x) == (average)
collect(First(n)) x: T | P(x) :: Q(x) == Take(seq x: T | P(x) :: Q(x), n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow this blows my mind I love it !

1. `x [: T]`

In the existing syntax for quantifier variables (where the type `T` may be explicitly provided or omitted and inferred),
the bindings will be ordered according to the *natural ordering* of the type `T`. Not all Dafny types will
Copy link
Member

@atomb atomb May 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My immediate reaction to the difference between in and <- is to think of in as more "mathematical" and <- as more "implementation-focused" (given how those symbols have been used in other contexts before). Because of that, it seems more natural to me to have <- impose more ordering, and in impose less. What if in produced elements in a non-deterministic order, with no duplicates, and <- always used a collection-defined ordering with the potential for duplicates? That would eliminate the need for "natural ordering" altogether.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a look at the latest commit, I went with almost exactly that option. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉


This mirrors the `Collector` concept from the Java streaming APIs, and ideally the shape of the
parameter to `collect` expressions would define similar operations for merging intermediate results
to leave the door open for parallel or even distributed computation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would collect require a "zero" or "identity" element? Most such formalisms do, I think (such as Monoid in Haskell). In Dafny, we could avoid it by requiring T to be non-empty, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had that thought too. Formalizing the parameter to collect definitely needs a lot more thought and I didn't attempt to make this too concrete yet - it might work better to parameterize with a type parameter that implements a particular "type characteristic".

@robin-aws
Copy link
Member Author

The main reason I want to allow omitting <- clauses is to at least support ordered quantification over integers, so that you can say seq i | 0 <= i < 10 :: ..., without having to explicitly actually create the [0, 1, ..., 10] value, which will be a performance hit at runtime unless we're far more clever than I'd like :)

I don't think we would have to be very clever. If we had syntax for that sequence (say 1..10, then we could recognize it easily at compile time).

This is a fair point (and I'd considered the option of some kind of interval sequence syntax in earlier versions too). I decided I'm fine with replacing sequence constructions being out of scope for now, and requiring <- in ordered quantification.

Would you be comfortable with the incremental implementation plan I'm proposing, where the first few iterations would not support plain quantifier variables without <- clauses anyway?

But step 1 claims that there is a need for ordered quantification, and I don't understand that part.

In the latest version I've dropped the idea of a natural ordering for types, but still kept the idea of ordered quantification. Let me know if it resolves your concerns!


2. `x [: T] <- C`
Assuming that we have `posNat == iset n: nat | 0 <= n`, the set comprehension above could also then be expressed as:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't subset types just iset? :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True! In general the quantification x: T is equivalent to x <- (iset x: T). Since Dafny supports potentially infinite collections, types and collections are very similar. But collections are strictly more expressive since we don't have a fully dependent type system; you can't express "sequences of length n" as a subset type if n is a dynamic value, but you can declare a dynamic, local iset collection for that. Part of the reason I was fine with dropping the notion of default orderings of types is that we could still support infinite ordered collections in the future, and hence achieve the same thing with x <- (collection of all T's ordered by >).

This may not be the strongest possible example to motivate quantifier variable domains, since it could be done with a subset type instead, and in fact I'd prefer that style in this case. A more complicated example where the domain depended on dynamic values and couldn't be a subset type would be better.

@@ -360,16 +383,10 @@ assert (seq x <- b | x in a) == [3, 1];

Since sequence comprehensions are expressions rather than statements, they carry an additional restriction
when compiled: their ordering must be fully deterministic. If `s` is a `set<int>`, for example,
`seq x <- s` would be rejected in compiled code, whereas `seq x | x in s` would be allowed.
`seq x <- s` would be rejected in compiled code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`seq x <- s` would be rejected in compiled code.
`seq x <- s` would be rejected in compiled code if the compiler does not have a way to enumerate `s` in a deterministic way.

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point, I had a lot more detail on this in the first RFC that is still valid. Let me pull some of it in here.


1. Membership: `forall x :: P(x) <==> x in S`
1. Membership: `forall x :: x in C && P(x) ==> Q(x) in S`
2. Cardinality: `|S| <= |C|`
3. Ordering: `forall i, j | 0 <= i < j < |S| ::
exists i', j' | 0 <= i' < j' < |C| ::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you strengthen this by adding

Suggested change
exists i', j' | 0 <= i' < j' < |C| ::
exists i', j' | 0 <= i' < j' < |C| && i' <= i && j' <= j ::

2. Cardinality: `|S| <= |C|`
3. Ordering: `forall i, j | 0 <= i < j < |S| ::
exists i', j' | 0 <= i' < j' < |C| ::
C[i'] == S[i] && C[j'] == S[j]`
Q(C[i']) == S[i] && Q(C[j']) == S[j]`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you have a new construct that says:

countIf(x <- C, P(x))

that is always between |0| and |C|, that increases with C,
that counts the number of P(x) that hold for all the x in C ?
That way, you wouldn't need an existential inside the forall:

Cardinality: `|S| == countIf(x <- C, Q(x))`
Ordering: 
  forall i <
  Q(C[i']) == S[i]
forall i | 0 <= i < |S| :: 
  var i' := countIf(x <- C[0..i], P(x));
  Q(C[i']) == S[i]

Your axioms could be deduced from that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly! I only wanted to propose an encoding that seemed feasible enough, I don't claim to have found the optimal encoding yet nor am I trying to nail that before starting on implementation.

## Range expressions

This would interpret the existing `i..j` syntax,
currently only usable in sequence or array indexing expressions such as `s[i..j]`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you envision it would be possible to say something like this:

var x := i..j;
s[x]

and obtain a slice of s?
I personally was thinking that i..j would be available only in [i..j] or after an àrrow <- i..j.
I do love this syntax, but what about reverse iteration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was thinking it would be possible, just so we're not introducing a new type of value or special-casing a lot. I also like the syntax but have no good suggestion for the reverse order, and that's one reason this idea is safely quarantined in Future Possibilities for now. :)

Copy link
Collaborator

@RustanLeino RustanLeino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!

The one thing I'm not clear about is what happens with sequence comprehensions when the s in x <- s is a set. Is such a sequence comprehension allowed only in ghost contexts? Or is there something that, at run time, will figure out a deterministic ordering?

More generally, what types of e are allowed in x <- e?

I think the current proposal does not give a way to deterministically step through the elements of a set. Is that right? (I'm not saying this is necessary. Just checking my understanding.)

```dafny
// Before:
method AllPairs(s: seq<nat>) returns (result: seq<(nat, nat)>) {
for i := 0 to |s| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another whoops: missing initialization of result.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To which I say "whoops" :)

I legitimately toyed with figuring out a way for this repo to test code snippets as the main dafny repo is.

var ss := s;
while ss != {}
{
var i: int :| i in ss;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The : int can be omitted, since it will be inferred.

// After:
method AllPairs(s: seq<nat>) returns (result: seq<(nat, nat)>) {
result := [];
foreach left <- s, right <- s {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note: I like the left <- 0 to max syntax, if we would ever have a use for it. In fact, I like it better than left <- 0 .. max. If we were to introduce this at some point, then left <- max downto 0 is also a natural possibility, since for for loop supports similar syntax.

Comment on lines +138 to +139
If `C` is a `map`, the bound values will be the keys of the `map`, in order to be consistent with the meaning of `x in C`;
`map.Items` can be used instead to bind key-value pairs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to keep the two consistent. But I have also thought that x in m ought to mean x in m.Items, rather than x in m.Keys. Maybe we'll change both in m and <- m in the future.

If `C` is a `map`, the bound values will be the keys of the `map`, in order to be consistent with the meaning of `x in C`;
`map.Items` can be used instead to bind key-value pairs.

Assuming that we have `posNat == iset n: nat | 0 <= n`, the set comprehension above could also then be expressed as:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the mathematics involved in this example, I think you intended 0 < n.

Comment on lines +568 to +569
This would allow destructuring datatypes and tuples, as in `(k, v) <- myMap.Items`.
It could also support filtering, as in `Some(x) <- mySetOfOptionals`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this and I think we should do it (after the more basic functionality has been implemented and loved). Just a note: it will complicate the lookahead disambiguation that will need to be written for Coco (see my comments regarding the "," above).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much like this, too.

Comment on lines +591 to +592
the necessary interface. This is also likely the best way to provide a better user experience
for looping or quantifying over the values produced by an `iterator` type.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

## Range expressions

This would interpret the existing `i..j` syntax,
currently only usable in sequence or array indexing expressions such as `s[i..j]`,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless range expressions would be allowed everywhere and have their own type (which I resist at the moment), then ranges expressions would be allowed only to the right of <-. If so, I like the syntax @MikaelMayer suggested above, namely x <- i to j, which looks like in a for loop.

Comment on lines +631 to +633
There would have similar semantics to sequence comprehensions,
where the multiplicity of results is significant but not the ordering.
This feature would be very cheap to implement once sequence comprehensions are implemented.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

Comment on lines +416 to +417
The translation of `foreach` loops should be reducible to sequence comprehensions
and other existing features for loops in general,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sequence comprehensions are more difficult to formalize for the verifier than foreach loops, so I don't suggest formalizing foreach in terms of sequence comprehensions. I'm thinking that the verifier would handle foreach loops like this:

  • Check the loop invariant initially (as usual for a loop).
  • Inside the loop, play havoc with the loop assignment targets (as usual for a loop) and the bound variables (as is also done in the for loop). Then, assume the loop invariant and assume the condition that you get by replacing every <- with an in in the foreach loop header. Then, check the body and check that the body maintains the loop invariant.
  • After the loop, assume the loop invariant (as usual for a loop). Here's what I don't know what to do with: one also needs to assume some condition that's analogous to "the negation of the loop guard". I don't know what that condition is here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought translating to sequence comprehensions was appealing because it at least solves the last point: the invariants reduce to those of for loops, so the loop guard is just an expression of completing the enumeration. I may still try this encoding first, especially since I plan to have sequence comprehensions committed and stable before moving to foreach loops.

Copy link
Member

@keyboardDrummer keyboardDrummer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent writing and exciting proposal!

}
```

The feature supports much more sophisticated uses as well, however,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wary of adding syntax for features whose use-cases are also covered by multiple foreach loops and higher-order method calls.

I think it's important that if we add support for the 'sophisticated uses' syntax, that it's exactly the same syntax as used in comprehensions so it doesn't require any additional documentation or implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the big motivating styles: https://github.com/dafny-lang/rfcs/pull/10/files#diff-e3e01b715e6d0ac6dce19064a36b244c495633e9760a43233eaf22cdb18849f8R376 Supporting multiple quantified variables in one expression is necessary to support that flattening pattern, as you get a different value if you use nested comprehensions instead.

Totally agree that keeping the syntax and semantics consistent is very important. Hence the strategy of extending the existing quantification syntax slightly in a way that makes sense everywhere, but then also enables the new features.

Copy link
Member

@keyboardDrummer keyboardDrummer May 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was commenting only on providing sophisticated syntax for the foreach statement. For comprehensions I think it opens up new use-cases so that syntax is warranted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For foreach I think the stronger argument is consistency (given all other uses of quantification allow multiple quantified variables, and it won't be very hard to implement).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it would be so inconsistent not to do it, that doing it reduces the documentation and implementation we need, then of course let's do it!!!

```

Since sequence comprehensions are expressions rather than statements, they carry an additional restriction
when compiled: their ordering must be fully deterministic. If `s` is a `set<int>`, for example,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure the ordering must only be deterministic when compiled? Wouldn't it be difficult to ensure (seq x <- someSet) == (seq x <- someSet) at the verification level as well?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is easy to ensure at the verification level. In fact, it difficult NOT to ensure it, because expressions in math are deterministic. This is the reason that the compiler also must make sure to do this consistently.

Copy link
Member Author

@robin-aws robin-aws May 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think this is SUCH a common point of confusion, it needs to be taught and documented more prominently. We already have this problem with :| and it surprises even advanced Dafny users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, I wonder whether it wouldn't be better to disallow seq x <- someSet in all contexts. What would be the use-case of doing this in a ghost context?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You definitely need some efficient way of accessing the values in a set in sequence at runtime, but it might be enough to only support foreach. Supporting collections other than sequences as quantified variable domains is not until step 4 of the implementation plan, so I'm definitely happy to hold off until we feel there's a need for it.


This RFC proposes adding two closely-related new features to Dafny for working with collections and iteration:

1. `foreach` loops over any enumerable domain, initially only supporting the builtin collection types but leaving the door open for user-defined collections.
Copy link
Member

@keyboardDrummer keyboardDrummer May 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we considered introducing the concept of an iterator/enumerator type prior to introducing a foreach? The iterator/enumerator would be similar to that in Java/C#, and you could use it to do something similar to a foreach:

method ForEach<T>(hasEnumerator: HasEnumerator<T>)
{
   var enum: BoundedEnumerator<T> := hasEnumerator.GetEnumerator();
   while(enum.HasNext()) 
      decreases enum.MaxLeft() {
      var iterationVariable := enum.MoveNext();
      // do something useful
   }
}

trait HasEnumerator<T> {
  method GetEnumerator() returns (result: BoundedEnumerator<T>)
}

trait BoundedEnumerator<T> {
   function MaxLeft(): int
   function method HasNext(): bool
   method MoveNext() returns (value: T)
      requires HasNext()
      ensures MaxLeft() < old(MaxLeft())
}

Building the foreach concept on top of such an iterator would mean we immediately enable user-defined collections to work with foreach. More importantly, starting out with simple fundamentals and building higher-level language constructs on top of those might mean we avoid having to backtrack in our implementation in the future.

Can you provide a sketch for how you plan to support foreach for user-defined collections?


Similarly, for supporting user-defined collections to be used in the quantified variable domain of a sequence comprehension, I expect we would need to define a trait Ordered that can be used like this:

function SequenceComprehension<T>(domain: Ordered<T>): seq<T> {
  domain.Fold([], (value: T, result: seq<T>) => [value] + result)
}

trait Ordered<T> {
  ghost var values: seq<T>

  function method Fold<Result>(initial: Result, combine: (T, Result) -> Result): Result
    ensures Fold(initial, combine) == FoldSpec(values, initial, combine)

  function FoldSpec<Result>(values: seq<T>, initial: Result, combine: (T, Result) -> Result): Result {
    if |values| == 0 then initial else FoldSpec(values[1..], combine(values[0], initial), combine)
  }
}

And for set comprehensions we'd need a trait Unordered:

function SetComprehension<T>(domain: Unordered<T>): set<T> 
   reads *
{
  var sets := domain.Select<set<T>>(x => {x});
  var combine := (a,b) => a + b;
  assert forall x, y | x in sets.values && y in sets.values :: combine(x,y) == combine(y,x); 
  sets.Aggregate({}, combine)
}

trait Unordered<T(==)> {
  ghost var values: multiset<T>;

  function method Select<U>(f: T -> U): Unordered<U>

  function method Aggregate(zero: T, aggregate: (T, T) -> T): T
   reads this
   requires forall x, y | x in values && y in values :: aggregate(x,y) == aggregate(y,x)
   ensures Aggregate(zero, aggregate) == AggregateSpec(values, zero, aggregate)

  function AggregateSpec(values: multiset<T>, zero: T, aggregate: (T, T) -> T): T
   requires forall x, y | x in values && y in values :: aggregate(x,y) == aggregate(y,x) {
     if |values| == 0 then 
       zero
     else 
     (var x: T :| x in values; AggregateSpec(values - multiset{x}, aggregate(x, zero), aggregate))
   }
}

function FailedSeqComprehension<T>(domain: Unordered<T>): seq<T> 
   reads *
{
  var seqs := domain.Select<seq<T>>(x => [x]);
  var combine := (a,b) => a + b;
  //. Error: assertion might not hold
  assert forall x, y | x in seqs.values && y in seqs.values :: combine(x,y) == combine(y,x); 
  seqs.Aggregate([], combine)
}

Copy link
Member Author

@robin-aws robin-aws May 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we considered introducing the concept of an iterator/enumerator type prior to introducing a foreach?

Oh absolutely, and dafny-lang/libraries#37 has a snapshot of that work to date. It looks similar to what you sketched out with an Enumerator<T> trait, and I'd expect an Enumerable<T> trait with a GetEnumerator() method. That would be the bare minimum to allow more values to be used as quantified variable domains.

There are a number of factors that make that approach much harder to implement and less beneficial to users though, at least in the current state of Dafny:

  1. Your // do something useful comment in ForEach skips over a big gap. :) Lambda expressions in Dafny can't have any side-effects, so you can't make a call like ForEach(myCollection, x => { print x; } work. That means you have to declare a general-purpose "Action" trait and an explicit class for every use case, e.g. https://github.com/aws/aws-encryption-sdk-dafny/blob/c8be143088490301d486958ad126d53c46ec1311/src/AwsCryptographicMaterialProviders/Keyrings/AwsKms/AwsKmsKeyring.dfy#L599-L600. It's a massive pain and an order of magnitude harder to use than a built in foreach construct.
  2. I found that attaching accurate enough termination metrics and invariants to the various Enumerator<T> implementations required creating a combinatorial explosion of concrete classes. I'm not against doing that in the standard library if necessary (and possibly leaning on code generation to make it easier/more maintainable), but I'd like to explore solving the more fundamental limitation first.
  3. Consuming such enumerator types using a particular idiom of while loops (e.g. https://github.com/dafny-lang/libraries/pull/37/files#diff-1927bc3853704ac5f502a37509d998e3ac29f7fbe8c4e8c479810b6a02ca4c3bR15-R28) still takes a lot of boilerplate, and requires you to then prove your enumerator's Repr is disjoint from any other objects in your code.
  4. Dafny also doesn't support having "default" implementations of methods in a trait that can be overridden in classes. That means we can't decide the Enumerator<T> interface should really have a Contains(x) method (for example) in the future without breaking existing code. Supporting default implementations was necessary in Java 8 so they could add the streams API to existing interfaces like Collection without breaking existing implementations.
  5. Modelling the enumeration constraints as a trait is also limiting, since only reference types can currently implement them. The alternative is introducing yet more type characteristics similar to "failure-compatible" (and more complicated ones at that), and I'd rather avoid that and wait until Non-reference traits dafny#1226 is addressed.

The appealing thing about the features in this RFC is that they fully abstract away from the details of enumerating at runtime. I absolutely plan for us to support user-defined collections in the future, but supporting ordered quantification over just the built-in collection types provides a ton of value while keeping the door open for the future. I've tried very hard to keep the semantics as general and flexible as possible, to reduce the risk you point out of finding that they interfere with how we'd like to define lower-level features that can implement them, but please let me know if you're concerned about anything specific.

Note that the aggregation/Fold functionality you also sketch out is out of scope for now. I suspect it will also work out better as a more built in concept as the "collect" future possibility sketches out, for a lot of the same reasons - aggregation is essentially the dual of enumeration!

Copy link
Member

@keyboardDrummer keyboardDrummer May 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, but how do you plan to support foreach for user-defined collections?

If that means introducing Enumerable<T> and rebuilding foreach on top of it as syntax sugar for while based approach, wouldn't that mean we lose some of the effort that we spend now on a first-class foreach implementation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial foreach implementation will translate pretty directly to the existing BoundedPool / SinglePassCompiler.Create[Guarded]ForeachLoop/CompileGuardedLoops implementations already used to compile features such as set comprehensions and assign-such-that statements, so the implementation cost beyond the language changes is actually quite cheap.

If we end up replacing that code with a more pure-Dafny implementation of builtin collections in the future, very little effort will be wasted. And it's likely sequence comprehensions will be useful in in the specification of such code!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, sounds good :-)

Copy link
Member Author

@robin-aws robin-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!

The one thing I'm not clear about is what happens with sequence comprehensions when the s in x <- s is a set. Is such a sequence comprehension allowed only in ghost contexts? Or is there something that, at run time, will figure out a deterministic ordering?

Only allowed in ghost contexts, since the current runtimes can't provide a deterministic ordering. I added a bit of content here about that point: https://github.com/dafny-lang/rfcs/pull/10/files#diff-e3e01b715e6d0ac6dce19064a36b244c495633e9760a43233eaf22cdb18849f8R391

More generally, what types of e are allowed in x <- e?

Any "collection" type, or alternatively any value for which the expression x in e is valid.

I think the current proposal does not give a way to deterministically step through the elements of a set. Is that right? (I'm not saying this is necessary. Just checking my understanding.)

That's correct. There are a number of options for providing that feature outside the scope of this RFC:

  1. A standard library function by method can use seq x <- s in the method implementation and sort the result.
  2. The runtime representation of sets could be changed to guarantee a deterministic ordering (not necessarily sorted, but purely determined by the contents).
  3. As described in one of the future possibilities, we could add support for an infinite ordered collection type, so that an expression like seq x <- intsInIncreasingOrder | x in s would then express "the contents of s sorted", and would be implemented at runtime by building a sorted data structure with the contents of s and then enumerating that.

```dafny
// Before:
method AllPairs(s: seq<nat>) returns (result: seq<(nat, nat)>) {
for i := 0 to |s| {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To which I say "whoops" :)

I legitimately toyed with figuring out a way for this repo to test code snippets as the main dafny repo is.

// After:
method AllPairs(s: seq<nat>) returns (result: seq<(nat, nat)>) {
result := [];
foreach left <- s, right <- s {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it's nice that it wouldn't clash with the existing s[i..j] syntax, and that there is already precedent for how to handle the reverse order.

If we do that, is it really worth continuing to support both for x := 0 to 10 and foreach x <- 0 to 10?


2. `x [: T] <- C`
Assuming that we have `posNat == iset n: nat | 0 <= n`, the set comprehension above could also then be expressed as:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True! In general the quantification x: T is equivalent to x <- (iset x: T). Since Dafny supports potentially infinite collections, types and collections are very similar. But collections are strictly more expressive since we don't have a fully dependent type system; you can't express "sequences of length n" as a subset type if n is a dynamic value, but you can declare a dynamic, local iset collection for that. Part of the reason I was fine with dropping the notion of default orderings of types is that we could still support infinite ordered collections in the future, and hence achieve the same thing with x <- (collection of all T's ordered by >).

This may not be the strongest possible example to motivate quantifier variable domains, since it could be done with a subset type instead, and in fact I'd prefer that style in this case. A more complicated example where the domain depended on dynamic values and couldn't be a subset type would be better.

Comment on lines 151 to 156
quantified variable domains also specify the ordering of quantification bindings, and allow variables to be bound to duplicate values.
This ordering of bindings is non-deterministic unless `C` is a sequence.
If `C` is a `multiset`, multiple bindings with the same value of `x` will be created, but their ordering is still non-deterministic.
Note that ordering and multiplicity is irrelevant for all current uses:
the semantics of `[i]set` and `[i]map` comprehensions, `forall` and `exists` expressions,
and `forall` statements all do not depend on the order of quantification and semantically ignore duplicates.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I like that version better - I've massaged it slightly and used it.

Comment on lines +213 to +214
This means that the parser will accept attributes in-between variables,
but these can be easily rejected by the resolver instead.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three technical points:

  • If the resolver is what rejects the in-between attributes, then the AST still needs a place to hang them. Better seems to be for the parser to reject them. More precisely, the parser would first parse in-between attributes, but before when it has gathered all the variables and "from" expressions and is about to create the AST node, then the parser would generate errors for any in-between attributes it may have parsed along the way.

I came to the same conclusion. I may still add a new QuantifiedVar type for the AST in the first PR so that we can have better resolution errors without losing source tokens, but I'm going to try to avoid it.

  • Don't worry about the specific attribute {:heapQuantifier}, since I don't think it's supported any more. We should remove it.

Sure, I was only looking for an example of a quantification attribute. As long as we still support at least one I just want to make sure we can still parse them.

  • Note that the "," causes an ambiguity in the grammar. To resolve it, the lookahead mechanism needs to determine if what follows the "," is another QuantifierVarDecl or not.

Ah you spotted what I only figured out after making the parsing changes. :) I did figure out that it was helpful to look ahead for ", <Identifier>" at least. But note that this is still not fully backwards compatible! The statement:

print set x: int | 0 <= x < 10, y;

Is currently parsed as:

print (set x: int | 0 <= x < 10), (y);

But will then be parsed as:

print (set x: int | 0 <= x < 10, y);

And produce an error, since it is then a comprehension with more than one variable but no term. Our plan is to accept this minor breaking change with a helpful error message like "if y is not part of the comprehension, try adding parentheses or a term expression before it". We could add an option similar to /functionSyntax to avoid breaking people before the next major version.

2. Cardinality: `|S| <= |C|`
3. Ordering: `forall i, j | 0 <= i < j < |S| ::
exists i', j' | 0 <= i' < j' < |C| ::
C[i'] == S[i] && C[j'] == S[j]`
Q(C[i']) == S[i] && Q(C[j']) == S[j]`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly! I only wanted to propose an encoding that seemed feasible enough, I don't claim to have found the optimal encoding yet nor am I trying to nail that before starting on implementation.

Comment on lines +533 to +534
including Scala, Cryptol, and Haskell. Using something other than `in` is recommended to avoid confusion with the `in` boolean operator,
but I don't have a strong opinion about which symbol or keyword we use.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from reads quite nicely too, but <- seems to fit better with the mathematical notation leanings of Dafny IMO. This is something we can bikeshed more effectively on in the first PR when looking at lots of code examples, I think.

Comment on lines 578 to 580
match o {
case Some(x) => print x;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another whoops because my markdown renderer is not verifying my Dafny code. :P Thanks.

Comment on lines +416 to +417
The translation of `foreach` loops should be reducible to sequence comprehensions
and other existing features for loops in general,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought translating to sequence comprehensions was appealing because it at least solves the last point: the invariants reduce to those of for loops, so the loop guard is just an expression of completing the enumeration. I may still try this encoding first, especially since I plan to have sequence comprehensions committed and stable before moving to foreach loops.


```dafny
ghost var xs := [];
foreach x <- s {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See top level response

robin-aws added a commit to dafny-lang/dafny that referenced this pull request Jun 10, 2022
Resolves #2187. First phase of implementing dafny-lang/rfcs#10.

This change adds two new features on several uses of quantified variables in the language:

1. Quantified variable *domains*, specified with the syntax `x <- C`, where C is an expression with a collection type (i.e. set, multiset, sequence, or map). Besides offering a more compact syntax for a common pattern (i.e. replacing `myLovelyButLongVariableName | myLovelyButLongVariableName in someSet` with `myLovelyButLongVariableName <- someSet`), this is forwards-compatible with the notion of *ordered quantification*, which the two features designed in the linked RFC depend on, and potentially others in the future.

2. Quantified variable *ranges*, specified with the syntax `x | E`, where E is a boolean expression. These replace the existing concept of a single `| E` range after all quantified variables (which is now bound to the last declared variable, with equivalent semantics). These are helpful for restricting the values of one quantified variable so that it can be used in the domain expression of another variable.

These features apply to the quantifier domains used in `forall` and `exists` expressions, `set` and `map` comprehensions, and `forall` statements, and the parsing for quantifier domains is now shared between these features. As a small consequence, set comprehensions now no longer require range expressions, and will default to `| true`.

This new syntax is not fully backwards-compatible: the statement `print set x | 0 <= x < 10, y`, for example, was previously parsed as `print (set x | 0 <= x < 10), (y)` but will now be parsed as `print (set x | 0 <= x < 10, y)` and rejected. The `/quantifierSyntax` option is used to help migrate this otherwise breaking change:

* `/quantifierSyntax:3` keeps the old behaviour where a `| <Range>` always terminates the list of quantified variables.
* `/quantifierSyntax:4` instead attempts to parse additional quantified variables.

Like `/functionSyntax`, this option will default to `4` instead in Dafny 4.0.

This is implemented with pure desugaring for now: the `QuantifiedDomain` production parses a list of `QuantifiedVar` objects, but then immediately rewrites a snippet such as `x <- C | P(x), y <- D(x) | Q(x, y)` into the equivalent (for now) `x | x in C && P(x) && y in D(x) && Q(x, y)`. Token wrappers are used to ensure error messages can still refer to the original syntax. This will not be equivalent once features that depend on the ordering of quantification are added, though, so a subsequent change will add the new fields to `BoundVar` instead and plumb these extra components through the whole Dafny pipeline.

On testing: I duplicated several existing test files and modified them to use the new syntax features, to get good white-box test coverage even though the implementation is very simple for now.

Also resolves #2038 (and the same issue for Java) since I hit it when converting test cases, and it was dead easy to fix.
@davidcok
Copy link

Coming to this late, and I understand the useful generalizations, but now the very intuitive simple case seq(n, F) is what? seq j: j < n :: F(j)? If so, to me, that is not a syntactical improvement. seqs are naturally indexed by nat up to a limit. The generalizations are great in that one need not make that indexing explicit (e.g., in making a seq from a set), but in the common case where there is a natural indexing, as when there is a function from nat -> T, that just needs a limit, the indexing ought to be implicit in the syntax.

@robin-aws
Copy link
Member Author

Coming to this late, and I understand the useful generalizations, but now the very intuitive simple case seq(n, F) is what? seq j: j < n :: F(j)? If so, to me, that is not a syntactical improvement. seqs are naturally indexed by nat up to a limit. The generalizations are great in that one need not make that indexing explicit (e.g., in making a seq from a set), but in the common case where there is a natural indexing, as when there is a function from nat -> T, that just needs a limit, the indexing ought to be implicit in the syntax.

Since we decided to require a quantification domain for sequence comprehensions (i.e. seq x <- C :: f(x) is allowed but not seq x | P(x) :: f(x)), we still need seq(n, F) sequence constructions too for now.

We have a strong inclination to support seq i <- 0 to n :: F(i) after further discussion, but I'm not planning to include that in the initial version as we haven't fully designed that yet. I would argue that the natural indexing is very frequently indexing into a sequence anyway, so the initial version will cover a lot of cases.

@MikaelMayer
Copy link
Member

Coming to this late, and I understand the useful generalizations, but now the very intuitive simple case seq(n, F) is what? seq j: j < n :: F(j)? If so, to me, that is not a syntactical improvement. seqs are naturally indexed by nat up to a limit. The generalizations are great in that one need not make that indexing explicit (e.g., in making a seq from a set), but in the common case where there is a natural indexing, as when there is a function from nat -> T, that just needs a limit, the indexing ought to be implicit in the syntax.

After @robin-aws 's improvement, we could always define the previously built-in constructor as a simple library helper:

function seq<T>(n: nat, f: nat ~> T): seq<T>
  requires forall i | 0 <= i < n :: f.requires(i)
{
  seq i | 0 <= i < n :: f(i)
}

@robin-aws
Copy link
Member Author

After @robin-aws 's improvement, we could always define the previously built-in constructor as a simple library helper:

function seq<T>(n: nat, f: nat ~> T): seq<T>
  requires forall i | 0 <= i < n :: f.requires(i)
{
  seq i | 0 <= i < n :: f(i)
}

Unfortunately not, since we decided to require a domain to define the ordering in sequence comprehensions. It would have to be this (adding a necessary reads clause, and changing seq to Seq so that it actually parses :)

function Seq<T>(n: nat, f: nat ~> T): seq<T>
  requires forall i | 0 <= i < n :: f.requires(i)
  reads set o, i | 0 <= i < n && o in f.reads(i) :: o
{
  seq(n, f)
}

But if we allowed lo to hi to be a domain, either only in that syntax or as another form of sequence display (which I personally like, and could easily implement quickly and efficiently after dafny-lang/dafny#2390), then we could define it like this:

function Seq<T>(n: nat, f: nat ~> T): seq<T>
  requires forall i | 0 <= i < n :: f.requires(i)
  reads set o, i | 0 <= i < n && o in f.reads(i) :: o
{
  seq i <- 0 to n :: f(i)
}

@davidcok
Copy link

I would vote for keeping the seq(n,F) syntax, and defining it as a sugar of what you write in the previous comment. I expect it will be the commonest use.

And in this context seq i <- 0 to n :: f(i) is (almost) always going to start with 0 . Isn't seq i <- k to n :: f(i) just seq i <- 0 to n-k :: f(i+k) ? And I expect it will be confusing as some users will think it is the k .. n elements of the sequence that are being set. [So maybe even seq i <- to n :: f(i) :-)]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants