Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: a macro DSL based on bidirectional type checking #61

Open
gelisam opened this issue Jan 14, 2018 · 2 comments
Open

Proposal: a macro DSL based on bidirectional type checking #61

gelisam opened this issue Jan 14, 2018 · 2 comments

Comments

@gelisam
Copy link
Contributor

gelisam commented Jan 14, 2018

Hackett's goal has always been for "types and macros to be able to cooperate and communicate with each other". Today, however, Hackett macros are written using syntax-parse's DSL, which makes it easy to write a macro which alters its output depending on the syntactic categories of its inputs, not their types. This proposal introduces a monadic DSL which makes it easy for the macro to depend on the type of its inputs, without unnecessarily forcing the type-checker to commit to a type for all of the inputs before expanding the macro.

Design goals

  1. It should be possible for a macro to ask the type-checker for the inferred type of an input and/or its output.
  2. It should be possible for a macro to tell the type-checker about the expected type of an input and/or its output.
  3. The implementation should not use backtracking.

Unification API

When implementing type inference using unification, the important primitive operations are:

  1. newSVar : (Solver SVar): create a new solver variable about which nothing is known yet.
  2. unify : {TypeExp -> TypeExp -> (Solver Unit)}: unify two type expressions, each of which may contain some solver variables.
  3. readSVar : {SVar -> (Solver TypeExp)}: look up a solver variable to obtain a type expression which may contain some solver variables; in the worst case, the entire expression may itself be a solver variable, in which case we learn nothing.

The core idea of this proposal's DSL is to expose those primitives to the macro implementation, so that it can use readSVar to inspect types, and unify to tell the type-checker about expected types. However, exposing readSVar directly (or equivalently, allowing the macro to call the type checker on a term) is problematic, because how much information we get from readSVar depends on which other expressions the type checker has already visited. Moreover, what should the macro implementation do in the worst case in which readSVar doesn't reveal any new information?

Stuck macros

Here's how I propose to solve both problems at once: instead of giving back an entire type expression containing everything we currently know about the type variable, readSVar should always return a type expression in which only the outermost type constructor is known, and all its arguments (if any) are solver variables. Let's call that variant readSVar'. If the outermost type constructor isn't yet known, readSVar' cannot return yet, so the macro expansion remains stuck until more information is obtained, by visiting and type checking other expressions (which can involve expanding other macros, which might also get stuck). If some macros are still stuck after everything is visited, then type checking fails with an "ambiguous type" error for that solver variable.

This way, the order in which the type checker visits the expressions doesn't affect the output of readSVar': either the outermost type constructor is uniquely determined, in which case the macro will receive that value (perhaps after being stuck and resumed), or two different unification calls give conflicting information for that type constructor, in which case type checking is going to fail with a type error anyway. This is similar to the way in which concurrent programs which only communicate via IVars are guaranteed to give a deterministic answer even though the order in which the threads are scheduled isn't deterministic.

Pattern matching

Repeatedly calling readSVar' in order to expand the outermost type constructor, then second outermost type constructor, etc., can be quite tedious. Let's provide a nicer API based on pattern-matching: type-pattern matching.

In the same way that ordinary pattern-matching expands to a tree of nested case statements, type-pattern matching can be desugared to a tree in which each node calls readSVar' and then examines the resulting type constructor using a case statement. The subtle relationship between the order of the type-patterns and the order in which the readSVar' calls are made (which determine the behaviour of the type checker) is the same as the relationship between the order of ordinary patterns and the order in which inputs are forced by a pure function.

One important difference between ordinary pattern-matching and type-pattern matching is the case in which a case statement with a single alternative is generated: for ordinary pattern-matching, the input is forced and then an exception is raised if the result doesn't match the pattern. For type-pattern matching, we could similarly call readSVar' and then fail with a type error if the outermost constructor doesn't match the pattern. But we can do better: we can unify the input (a solver variable) with the pattern!

Unifying SVars in context

The macro implementation should definitely receive an SVar representing the type of its output, so it can give the type checker some information (via unify) about the type of the expression it plans to produce before the macro is done producing that expression. In particular, providing this information before the macro becomes stuck might help type inference succeed more often.

Which other SVars should we provide to the macro implementation? When type-checking an expression such as (f x), we create one solver variable for f, one solver variable for x, and one solver variable for (f x). So when expanding the macro application (my-macro (f x)), we might naively think of providing the macro implementation with one solver variable for f, one solver variable for x, one solver variable for (f x), and one solver variable for (my-macro (f x)). That would be a mistake, because the macro is free to rearrange and duplicate those components arbitrarily, and it's this rearranged term which needs one solver variable per sub-term, not the input term.

Another difficulty is that an API in which macros can ask about the type of arbitrary expressions encourages a counter-productive style in which the macro asks a lot of questions about expressions without providing much information in return, thereby accidentally hurting type inference.

My proposed solution to both problems in an alternate API in which the only way to ask about the type of a target expression is to put this expression in a bigger expression which the macro implementation promises has the same type as the expression which the macro will output. This is done via a preliminary-output : {Syntax -> (Solver Unit)} command, where the syntax object may contain type annotations in which the type expression contains SVars. The command adds the syntax object to the type checker's list of expressions to visit, so that when readSVar' causes the control to return to the type checker, there is some hope for it to infer some information about the SVars in those type annotations.

Some macros may still wish to delay choosing their output type until they have learned the type of a few target expressions. This can still be done using this API, by using unsafe-coerce! or undefined! around their target expression to ensure that the output type remains unconstrained. However, due to the stigma around those partial functions, macro implementer will be encouraged to only do that if that's what they really want to do, and so in the common case the macro implementer will write an example output containing the target expression, thereby helping type inference.

Summary

A macro is implemented as a function of type {Syntax -> (Solver Syntax)}. The usual syntax-parse mechanism can be used to determine what to do with the input syntax object. If at some point the macro wants to get some information from the type checker, it must first create a solver variable, then it must use that solver variable in a type annotation inside an example output expression, then it must pattern-match on the solver variable in order to force type inference to find some information about the solver variable, or fail trying. Each time an example output is given, its type is unified with the type of the macro's call site (and thus with all the previous example outputs as well), thereby giving some important information for type inference. Eventually, the macro returns its final output, whose type is again unified with the earlier preliminary outputs.

@gelisam
Copy link
Contributor Author

gelisam commented Jul 27, 2018

I believe the proposed API would be sufficiently-powerful to express the cata and sql-find examples you gave in your Curry On Amsterdam talk.


Here are the examples you gave for cata:

> (cata 0 negate (Just 42))
-42

> (cata id show (Right True))
"True"
> (cata id show (Left "bang"))
"bang"

> (cata 0 + (List 1 2 3))
6

The second case is the interesting one: we know that the type of (Right True) has to be (Either String Bool), but how does the compiler figure it out? The implementation of cata definitely cannot rely on make-expected-type-transformer to obtain the expected type of the expansion, because there is no context at all around the call to cata.

First, notice that cata doesn't need to know about that most-general type, it only needs to know about the outermost type constructor, Either. So cata would begin by creating an SVar named t and calling preliminary-output on (undefined! {(Right True) : t}) in order to allow the type-checker to infer something about t. cata would then ask for the outermost type constructor for t (causing cata to get stuck and to transfer the control to the type-checker, who unifies t with Either a Bool, allowing cata to resume), and would learn that this type constructor is Either. cata would then look up the definition of Either in order to discover the number of constructors and their arity, and it would generate

(let [[f (λ [either] (case either
                       [(Left x) (id x)]
                       [(Right y) (show y)]))]]
  (f (Left "bang")))

At this point the macro is done, and the type-checker can continue its unification in order to discover that a ~ String because the two branches must return the same type.


Here is the example you gave for sql-find:

(do [maybe-user <- (sql-find (and {(string-downcase user.email) = email}
                                  {user.password = password}))]
    (case maybe-user
      [(Just user) (set-session/redirect user)]
      [nothing     render-login-403]))

In this particular example, a definition of sql-find based on make-expected-type-transformer would work just fine because the surrounding context is enough to conclude that the output type of the sql-find call has to be Maybe User. Let's look at a more challenging case instead:

(do [maybe-user <- (sql-find (and {(string-downcase user.email) = email}
                                  {user.password = password}))]
    (traverse set-session/redirect maybe-user))

This time, the surrounding context is only enough to determine that the expansion must unify with (Traversable f) => (f User). I guess (get-expected this-syntax) could return that type and that would be enough for sql-find to figure out that the table type is User. But allowing (get-expected this-syntax) to return the information it has so far about a type is a bit unsatisfying, because this information changes over time and depends on the order in which macros are expanded. For example, if the results of two calls to sql-find are used in a do block, one of the calls would receive (Monad f) => (f User), and then the expansion would make it clear that f is Maybe, and so the second call would receive Maybe User instead. With a more complicated macro, the expansion might be different depending on which macro gets expanded first!

Anyway, let's go back to the proposal. sql-find would begin by creating an SVar named t and calling preliminary-output on {undefined! : t} in order to specify that t is the type which the macro promises to return. Then, sql-find would pattern-match on t with the single case (Maybe s), thereby informing the type checker that the macro intends to return a Maybe. sql-find would then ask for the outermost type constructor for s and learn that it is User. It would then look up the definition of User to find the names of its fields, so it can generate the variables user.email and user.password. It would then generate raw-sql code which returns a (Maybe User).

Here is an even more challenging example:

(do [maybe-user <- (sql-find (and {(string-downcase user.email) = email}
                                  {user.password = password}))]
    (cata render-login-403 set-session/redirect maybe-user))

Here, both sql-find and cata get stuck while asking for the type of an outermost constructor. Before getting stuck, however sql-find revealed that t unifies with Maybe s, and so cata gets unstuck, which allows the type-checker to learn that s must unify with the input of set-session/redirect and is thus User, which gets sql-find unstuck. It doesn't matter whether we try to expand sql-find or cata first; both attempts will get stuck, and the data-dependency order will force cata to complete first.


I'm a bit disappointed that both examples had to use undefined!, as I was expecting this to be rare. Hopefully this is because those particular macros happen to be especially challenging, because you picked them as aspirational examples, and not as examples of everyday macros.

@wilbowma
Copy link

Sounds like some ideas Turnstile could use @stchang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants