Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pattern Matching #17

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open

Pattern Matching #17

wants to merge 12 commits into from

Conversation

rattrayalex
Copy link

@rattrayalex rattrayalex commented Apr 25, 2017

Won't merge until the compiler implementation is done (haven't started it yet).

Example:

x = match z:
  | < 2: "less than two"
  | > 3 and < 10: "between three and ten"
  | 2: "equal to two"
  | 3 or 10: "equal to three or ten"
  | "hello" or "goodbye": "its a greeting"
  | instanceof MyClass: "its a MyClass instance"
  | ~isFunction() and instanceof OtherClass: z()
  | ~isFunction() with f: f()
  | ~isObject() and .x and .y with { x, y }: x + y
  | ~isArray() and [0] with [ first, ...rest ]: [...rest, first]
  | /f([oO]+)/ or /b([aA]+)(r)+/ with [ fullRegMatch, ...parts ]: match parts.length:
    | 0: "no paren groups matched (not possible here)"
    | 1: `found an o: ${parts.0}`
    | 2: `found an a and r: ${parts.0} ${parts.1}`
  | else: "none of the above"

This also should be preceded by outlawing of naked decimals (enforcing 0.123 instead of .123) and removing all bitwise operators from ordinary use.

@rattrayalex rattrayalex force-pushed the match branch 4 times, most recently from b515769 to ffa53ce Compare June 1, 2017 20:59
rattrayalex referenced this pull request in wcjohnson/babylon-lightscript Jun 1, 2017
commit a6224c4
Author: William C. Johnson <[email protected]>
Date:   Wed May 17 01:00:30 2017 -0400

    Revert "1.0.3"

    This reverts commit 084045a.

commit 084045a
Author: William C. Johnson <[email protected]>
Date:   Wed May 17 00:58:45 2017 -0400

    1.0.3

commit f2572dd
Author: William C. Johnson <[email protected]>
Date:   Tue May 16 22:34:24 2017 -0400

    Parse consequent bindings

commit 1a73a60
Author: William C. Johnson <[email protected]>
Date:   Tue May 16 21:51:45 2017 -0400

    Update fixtures

commit 586bf27
Author: William C. Johnson <[email protected]>
Date:   Tue May 16 21:50:12 2017 -0400

    Enable parsing for `MatchStatement`s
@rattrayalex
Copy link
Author

(significant pertinent discussion has been occurring on this thread)

@rattrayalex
Copy link
Author

rattrayalex commented Jun 1, 2017

Branching from the above, @wcjohnson, I'm considering moving from with to as for renaming/destructuring:

incremented = match x:
  | 1 as one: one + 1
  | 2 as two: two + 1
  | ~isObject() as { number }: number + 1

note that as is used as a keyword in imports, but it is not a reserved word, so syntax like this needs to be supported (which should be trivial):

match x:
  | as as as: as

EDIT: I think I'll stick with with, as as somehow feels like it's renaming the test (eg; 1) not the discriminant (eg; x)

@rattrayalex
Copy link
Author

rattrayalex commented Jun 1, 2017

I feel good about where the syntax/parsing is at now. My biggest concern with the syntax is that the indentation is a bit shallow:

fooBar = match barFoo:
  | "foo":
    "foo!"
  | (
    "bar" or
    "baaar" or
    "barr"
  ):
    "bah!"

it's a little hard to distinguish between tests and consequents, visually.

changing to, eg, case instead of | could help:

fooBar = match barFoo:
  case "foo": 
    "foo!"
  case (
    "bar" or
    "baaar" or
    "barr"
  ): 
    "bah!"

(at least the start of the consequent wouldn't line up exactly with the start of the test, which happens now).

thoughts?

eg, from https://github.com/wcjohnson/lightscript-utils/blob/16020cad80d9d2f7fb762691122bd7323224f6e7/packages/lightscript-ast-transforms/src/tails.lsc#L15:

  match node:
    | ~isa("IfStatement"):
      add(path.get("consequent"))
      add(path.get("alternate"))
    | ~isa("DoExpression"):
      add(path.get("body"))
    | allowLoops and (~isa("For") or ~isa("While")):
      add(path.get("body"))
    | ~isa("Program") or ~isa("BlockStatement"):
      add(path.get("body").pop())
    | ~isa("TryStatement"):
      add(path.get("block"))
      add(path.get("handler"))
      add(path.get("finalizer"))
    | else:
      paths.push(path)

would become:

  match node:
    case ~isa("IfStatement"):
      add(path.get("consequent"))
      add(path.get("alternate"))
    case ~isa("DoExpression"):
      add(path.get("body"))
    case allowLoops and (~isa("For") or ~isa("While")):
      add(path.get("body"))
    case ~isa("Program") or ~isa("BlockStatement"):
      add(path.get("body").pop())
    case ~isa("TryStatement"):
      add(path.get("block"))
      add(path.get("handler"))
      add(path.get("finalizer"))
    else:  
      paths.push(path)

@wcjohnson
Copy link

I agree that the fact that the delimiter is 2 characters long creates an unfortunate situation with indent alignment, but I also like the concision and visual flow of |. Going to case might be just as unfortunate. Tough call here.

@rattrayalex
Copy link
Author

Actually, looking at the example above, I like the case a lot more – you? @citycide you around to comment?

@rattrayalex
Copy link
Author

rattrayalex commented Jun 1, 2017

Copping a few more examples:

export toStatement(body) ->
  t = getTypes()
  match body:
    | ~isa("Expression"):
      buildAtLoc(getLoc(body), t.expressionStatement, body)
    | else: body

export toFunctionBody(body) ->
  match body:
    | ~isa("Expression"): body
    | else: toBlockStatement(body)

would become

export toStatement(body) ->
  t = getTypes()
  match body:
    case ~isa("Expression"):
      buildAtLoc(getLoc(body), t.expressionStatement, body)
    else: body

export toFunctionBody(body) ->
  match body:
    case ~isa("Expression"): body
    else: toBlockStatement(body)

@wcjohnson
Copy link

wcjohnson commented Jun 1, 2017

Other examples might make | look better, though:

export toBlockStatement(body) ->
  t = getTypes()
  bodyLoc = getLoc(body)
  match body:
    | ~isa("BlockStatement"): body
    | ~isa("Statement"): buildAtLoc(bodyLoc, t.blockStatement, [body])
    | else: buildAtLoc(bodyLoc, t.blockStatement, [
        buildAtLoc(bodyLoc, t.expressionStatement, body)
      ])
export blockToExpression(path) ->
  t = getTypes()
  node = path.node
  match node:
    | ~isa("BlockStatement"):
      path.canSwapBetweenExpressionAndStatement = () => true
      // XXX: source maps - this code is inside babel and makes unmapped
      // nodes.
      path.replaceExpressionWithStatements(path.node.body)
      path.node
    | ~isa("FunctionDeclaration"):
      // Special case: the parser promotes a block consisting of a single declaration
      // from ExprStatement(FunctionExpr) to FunctionDeclaration. Convert back to
      // expression here.
      nextNode = t.clone(node)
      if nextNode.type === "NamedArrowDeclaration":
        nextNode.type = "NamedArrowExpression";
      else:
        nextNode.type = "FunctionExpression";
      nextNode
    | ~isa("ExpressionStatement"):
      node.expression
    | else:
      node

@rattrayalex
Copy link
Author

rattrayalex commented Jun 1, 2017

case versions of the above:

export toBlockStatement(body) ->
  t = getTypes()
  bodyLoc = getLoc(body)
  match body:
    | ~isa("BlockStatement"): body
    | ~isa("Statement"): buildAtLoc(bodyLoc, t.blockStatement, [body])
    | else: buildAtLoc(bodyLoc, t.blockStatement, [
        buildAtLoc(bodyLoc, t.expressionStatement, body)
      ])
export blockToExpression(path) ->
  t = getTypes()
  node = path.node
  match node:
    | ~isa("BlockStatement"):
      path.canSwapBetweenExpressionAndStatement = () => true
      // XXX: source maps - this code is inside babel and makes unmapped
      // nodes.
      path.replaceExpressionWithStatements(path.node.body)
      path.node
    | ~isa("FunctionDeclaration"):
      // Special case: the parser promotes a block consisting of a single declaration
      // from ExprStatement(FunctionExpr) to FunctionDeclaration. Convert back to
      // expression here.
      nextNode = t.clone(node)
      if nextNode.type === "NamedArrowDeclaration":
        nextNode.type = "NamedArrowExpression";
      else:
        nextNode.type = "FunctionExpression";
      nextNode
    | ~isa("ExpressionStatement"):
      node.expression
    | else:
      node

@haltcase
Copy link

haltcase commented Jun 1, 2017

In terms of syntax only I lean toward case because in this situation I think the clarity wins over the terse. The conciseness of | is appealing but I don't know if it's that appealing.

Here's another thing though, do you think the familiarity of match with case to switch statements would be a benefit or could it end up being confusing?

If going the case route, maybe consider using else instead of default - it's symmetrical with ifs and aligns vertically with case ?

@rattrayalex
Copy link
Author

Definitely seems that | looks better with one-liners.

Also definitely think I agree that else looks better than default here. Updating examples to reflect that...

I am somewhat concerned that case's familiarity to switch might be confusing...

@rattrayalex
Copy link
Author

Another possibility would be to allow either, but I think this feature is dizzying enough as it is.

@rattrayalex
Copy link
Author

In super-simple cases like this:

  match body:
    | ~isa("Expression"): body
    | else: toBlockStatement(body)

I can see the | being nice enough that developers might choose it over an if; with case, it might tilt the balance juuust enough to write if instead:

  if body~isa('Expression'): body
  else: toBlockStatement(body)

I don't think that's necessarily a bad thing; match is super great, but idk if it should really be used for every conditional.

@wcjohnson
Copy link

I would characterize that particular example (or more generally, branching on the type of something) as an ideal case for using match over if.

@rattrayalex
Copy link
Author

Probably going to let this simmer for a bit while I implement the compiler bit for the feature.

@rattrayalex
Copy link
Author

@wcjohnson interesting – things like that were why I hesitated to build match in the first place – with convenient elif expressions, who needs it? But I definitely see how clear it makes the code: "we have this thing, now depending on its attributes, we will return different results".

@rattrayalex
Copy link
Author

rattrayalex commented Jun 1, 2017

(Jon's a friend I'm meeting up with soon)

Just pushed a commit that allows both | and case so they can be played with, will probably choose one option (or something not yet thought of) later.

EDIT: One idea: match x: with 1 as one: one + 1

@haltcase
Copy link

haltcase commented Jun 1, 2017

@JonAbrams I think classes & RegExps already work the way you're talking about:

result = match value:
  | instanceof Foo: `it's a Foo!`
  | /^hello$/: `it's a greeting!`
  | /^bye$/: `it's a farewell!`

I think the current < / > syntax is fine also - the value basically just becomes the left side of the binary operator in there. Having syntactic ranges would be a whole other story.

@haltcase
Copy link

haltcase commented Jun 1, 2017

Just pushed a commit that allows both | and case so they can be played with, will probably choose one option (or something not yet thought of) later.

Definitely think we should have just one only but this will be good to mess with.

@wcjohnson
Copy link

wcjohnson commented Jun 1, 2017

Actually, while we're on the subject of RegExps, I'm currently compiling them into .test:

  elif is("RegExpLiteral", testNode):
    // /^regexp/.test(ref)
    // XXX: let consequent see the match results? this could be something like:
    // _ref2 = ref.match(/^regexp/); consequent(_ref2);
    buildAtLoc(testNodeLoc, t.callExpression,
      buildAtLoc(testNodeLoc, t.memberExpression,
        testNode, buildAtLoc(testNodeLoc, t.identifier, "test")
      )
      [discriminantRef]
    )

But it may be better to have special-case code that compiles it into .match and then passes the match object to the consequent if there are any matches. (In fact I think @rattrayalex may have said this at some point though I can't find it)

That would, of course, add additional complexity to the construct (now devs have to remember that regexps are special cases -- your consequent gets the match object rather than the discriminant) but the full match object would make matching on regexps way more powerful.

@rattrayalex
Copy link
Author

rattrayalex commented Jun 2, 2017

Coming back to this, the aesthetics of | are hard to beat, especially in shorter scenarios (which should be encouraged).

My biggest concern with it is multiline conditionals, which would be especially confusing:

match x:
  | ~someLongNamedFn(otherArg, otherOtherArg) or
  ~someOtherLongNamedFn(otherArg, otherOtherArg) or
  ~someOtherOherLongNamedFn(otherArg, otherOtherArg):
    doSomeLongNamedFnThing(x, otherArg, otherOtherArg)
  | ~someLongNamedFn(otherArg, otherOtherArg)
  or ~someOtherLongNamedFn(otherArg, otherOtherArg)
  or ~someOtherOherLongNamedFn(otherArg, otherOtherArg):
    doSomeLongNamedFnThing(x, otherArg, otherOtherArg)
  | ~someLongNamedFn(otherArg, otherOtherArg) or
    ~someOtherLongNamedFn(otherArg, otherOtherArg) or
    ~someOtherOherLongNamedFn(otherArg, otherOtherArg):
      doSomeLongNamedFnThing(x, otherArg, otherOtherArg)
  | (
    ~someLongNamedFn(otherArg, otherOtherArg) or
    ~someOtherLongNamedFn(otherArg, otherOtherArg) or
    ~someOtherOherLongNamedFn(otherArg, otherOtherArg)
  ):
    doSomeLongNamedFnThing(x, otherArg, otherOtherArg)   

Of those, I like the 2nd and 4th best, personally... but 2nd clashes with the way most people do multiline conditionals now (at least, how prettier does it). 4th is a little ugly (| (, ehh) but matches the recommended multiline if style.

So as long as there's a style which assuages this, I think I feel okay about it. Hopefully a linter/formatter enforces one style in the future.

@rattrayalex
Copy link
Author

rattrayalex commented Jun 2, 2017

(apologies for the deluge)

...another option would be to go back to -> instead of :, which has far more visual "pop".

We left that because of the , -> redundancy – the comma was annoying, but necessary because of situations like this: | x -> x + 1, which would be parsed as, "does this match a function with param x that returns x plus one?". Hence the annoying comma.

However, by disabling function syntax in match-test's, which wouldn't make any sense anyway, this could be lifted, allowing comma-free syntax.

That would still leave the problem of how to rename/destructure params, and I think with would still be appropriate:

match x:
  | y with z -> z + 1

This would be moderately messy in the parser, but not the end of the world – two or three places, I think, where if (this.state.inMatchCaseTest) return false would need to be added.

Hackier (but less messy) would be the fact that the -> wouldn't be a real function at all – the code for parsing it would look something like this:

if (this.eat(tt._with)) {
  node.binding = this.parseBindingAtom();
}
this.expect(tt.arrow); // TODO: how to handle -*> etc – disallow? what about => vs -> ?
if (this.isLineBreak()) {
  node.consequent = this.parseStatement(false);
} else {
  node.consequent = this.parseMultilineWhiteBlock(...);
}

This would allow for slightly surprising (but, I think, desired) behavior in MatchStatements, like this:

f() ->
  match x: 
    | y ->
      if blah: return
  foo() // not reached if x == y && blah

EDIT:

When I originally conceived the feature, I had the even more space-age syntax:

match x:
  | ~isNumber(), n |> n * 2
  | ~isString(), s |> s + " times two"
  | ~isObject() and .x and .y, { x, y } |>
    x + y

which visually I like, but couldn't figure out a way to add renaming/destructuring... (EDIT: trying a ,-based syntax for that)

@wcjohnson
Copy link

I definitely agree there's an issue with some of that multiline test expression formatting, and I think having a consistent style guide between match and if (i.e. option 4) is one way out of the bind.

-> is a symbol I don't think should be overloaded to have other meanings. Any time I read an -> I think "I am making a function here." One of the reasons I think : is good there is because : has the same standard meaning of "initiating a block" that it has everywhere else, and the general syntax is consistent with IfExpressions.

I also don't love making up new kinds of arrows. Already I think things like -*/> are confusing (I'll bet you a lot of money programmers forget whether it's -/*> or -*/> and have to rely on a live linter to remind them. Fortunate that we have one of those.) I think runic languages in general make programmers fill limited head space with remembering what all the runes mean.

Out of the available options, I think a recommended style for multiline test expressions is better than the others. If that isn't sufficient to solve the issue to everyone's satisfaction, then I myself prefer going to case to any of the other options so far.

@haltcase
Copy link

haltcase commented Jun 2, 2017

I agree on all @wcjohnson 's points there.

Have you come across Brian Terlson's strawman proposal for JS? I'm just wondering, not proposing we use this necessarily - it's very different from this but probably worth a look if you haven't seen it. It's an alternative switch body basically.

https://gist.github.com/bterlson/da8f02b95b484cd4f8d9#file-patternmatching-md

@wcjohnson
Copy link

Another possible style:

match x:
  | ~test1()
    or ~test2()
    or ~test3():
      result

@rattrayalex
Copy link
Author

I hadn't seen that proposal before, thanks for pointing it out. Worth considering for sure. The Symbol.matches and whole-object-matching bits are interesting.

I think the point about not overloading -> is a good one.

Sounds like there's no enthusiasm for |> which isn't too surprising.

Will probably stick with | and : though remain somewhat concerned about visibility.

The power of destructuring before the test is appealing, though, which might change the grammar... In general, being able to match on multiple values would be great. ~isObject() and .x and .y with { x, y }: is unfortunate (especially with longer property names or nesting).

@wcjohnson
Copy link

Regarding auto-instanceof, my input would be that I kind of like it (especially the String/Integer typecases) , but on the other hand, I think we just at least doubled the cognitive burden associated with using and remembering all the intricacies of pattern matching with this latest round of proposals. There's a LOT going on here now.

@rattrayalex
Copy link
Author

Nice, well spelled out!

oh, but it is the logical operator 😉 that's the beauty of it. You could also do things like | someFlag == true and { prop } and prop > 3: - it stands in for "check that this has these properties, and destructure them".

that said, I see your point, and will keep that alternative in mind if people find this confusing.

Agreed about the dangers of cognitive overload, though in the String/Number situation, people who have come from other languages with matching are probably accustomed to that feature. Definitely need to be vigilant about (and probably scale back) the number of capabilities here...

@wcjohnson
Copy link

So if I want to test the discriminant, non-assertively destructure, then test one of the bound vars, what do I write? Would it be

| ~isWidget() as {price, color} and price == 2.50: "two fiddy"

@rattrayalex
Copy link
Author

rattrayalex commented Jun 3, 2017

no, you wouldn't be able to do that, which I think makes sense, because price would be "unsafe", and color would not exist.

EDIT: you could, however, do | ~isWidget() and .price == 2.50 as {color}: "two fiddy for the "+color+" one"

... the more I think about it, the more I think non-assertion should be opt-in, which is easy with { price = null, color = null }... I actually really don't see much of a case for a destructured as...

@wcjohnson
Copy link

Both price and color are safe, because ~isWidget() is standing in for a test that checks that the thing is a valid widget and therefore has those fields set. That's the whole point of the non-assertion as I see it; it's the programmer saying "I don't want you to generate check code because I've already done the checks."

@rattrayalex
Copy link
Author

hmm, k. I guess I just don't currently see that as such a problem, but open to being proven wrong.

If it's a performance concern (seems questionable), can always use { prop = null }. If it's the prettiness of the output code, I'll wait and see... if it's really ugly, no harm in adding a destructured as.

@rattrayalex rattrayalex force-pushed the match branch 3 times, most recently from 2815b3d to 85d3bb1 Compare June 9, 2017 18:58
@rattrayalex
Copy link
Author

plugin PR at lightscript/babel-plugin-lightscript#22

@DuncanMacWeb
Copy link

DuncanMacWeb commented Aug 12, 2017

Jumping in late to this discussion, but can I throw this in please?

How about omitting any symbol (like | or when) preceding the test? Since match already implies “Here follows a list of conditions and consequents:” the | is redundant; and reading the code examples above, in most cases if you omit the | at the start of each test the meaning would still be clear:

x = match z:
  < 2: "less than two"
  > 3 and < 10: "between three and ten"
  2: "equal to two"
  3 or 10: "equal to three or ten"
  "hello" or "goodbye": "its a greeting"
  instanceof MyClass: "its a MyClass instance"
  ~isFunction() and instanceof OtherClass: z()
  ~isFunction() with f: f()
  ~isObject() and .x and .y with { x, y }: x + y
  ~isArray() and [0] with [ first, ...rest ]: [...rest, first]
  /f([oO]+)/ or /b([aA]+)(r)+/ with [ fullRegMatch, ...parts ]: match parts.length:
    0: "no paren groups matched (not possible here)"
    1: `found an o: ${parts.0}`
    2: `found an a and r: ${parts.0} ${parts.1}`
  else: "none of the above"

Whatever syntax is decided on regarding with, when, as or , in the test expressions would work with this, wouldn’t it, since the indentation itself would be enough to continue the match block.

@rattrayalex
Copy link
Author

rattrayalex commented Aug 15, 2017

Hi @DuncanMacWeb , thanks for pitching in!

Given the direction the TC39 pattern-matching proposal is headed, it's very likely that syntax will be adopted in LightScript as well (eg; your proposal will probably be adopted 😄 ).

I'm currently waiting to see how that proposal shakes out before taking further action here.

@DuncanMacWeb
Copy link

@rattrayalex good plan, keeping LightScript as close as possible to JS — keeps the learning curve shallow for developers from a JS background 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants