-
Notifications
You must be signed in to change notification settings - Fork 126
The type system
This section details the type system of Opa, including:
- definition of types
- polymorphism and rank 2 polymorphism
- interactions with packages and modules
- type error messages
- type best practices
Opa is a statically and strongly typechecked language. This means that the type of an expression is computed once and for all at compile-time, an expression has one unique type and this type can't be used as another one. In Opa, any expression evaluates into a value and functions are first class values. This means that, like integers, booleans, strings in other languages, functions in Opa can be passed as arguments, returned as results, stored in data-types, and so on.
Opa compiler offers type inference, i.e. it automatically determines the type of expressions from the context, without the user needing to write these types. Hence, a definition like
x = 5
will lead x
to have the type int
because the integer literal 5
is
known to be of type int
, this type representing the type of integer values.
As a bit more complex example, the function
function chose(cond, if_true, otherwise) {
if (cond) { if_true } else { otherwise }
}
will be found having the type
(bool, int, int) -> int
where bool
represents the type or boolean values
and ->
is the function taking arguments of the types
stated on the left side and returning a result of the type stated on the right
side of the arrow symbol.
The type system of Opa features polymorphism, i.e. types in which some parts are not constrained and can be of any type. For instance, the identity function:
function identity(x) { x }
doesn't set any constraint on it argument's type:
it can safely be applied to any kind of value and will return this kind of
value. Its type will then be 'a -> 'a
where 'a
is a type variable,
i.e. any type will do for 'a. Using identity
with an
integer as argument will then instantiate 'a
with int
. Using identity
with a boolean will then instantiate 'a
with bool
.
In addition to usual data-types (integers, booleans, string, floating point
numbers), Opa provides two cornerstone structures: records and sums.
A record can be seen as a struct
in C, i.e. a non-ordered sequence of
fields packed into one global entity. However, conversely to C these records
can be either closed or open: the list of fields can be fixed or can
contain at least the ones stated in the record type.
A sum is a set of possible cases, each case being a record. Hence, a sum
allows to express the fact that a data-structure can be of different shapes. This
can be compared to union
in C, with the difference that the effective shape
of the data is directly embedded inside its representation.
In the Opa system, type abstraction, i.e. ability to hide the actual structure of a type is provided thanks to the notions of private and abstract types. The first ones are types that only have an existence in the package where they are defined; they are not visible outside this package. The second ones are transparently visible (i.e. their structure can be manipulated) only in the package where they are defined; outside this package, they become opaque, i.e. are still visible but their internal structure is not visible anymore.
Opa provides a certain notion of module, i.e. packages containing value definitions via a particular form of records. Such records are syntactically marked as being modules and will get a particular type called type forall. Under some restrictions due to typechecking decidability, it is possible to use these modules like regular record, hence passing them through functions or returning them as result.
Type inference is not complete for some parts of Opa features and a few type annotations are required from the user. This is mostly due to the expressiveness of the language that provides very flexible features at the price of the non-existence of a principal type. Existence of a principal type means that there is always one better type, i.e. all the other types that could be chosen for an expression would reduce the contexts where this expression can be used. In such a case, the user will have to indicate himself the type he wishes using a type annotation to enforce the expression to be of this type. This construct is called a coercion. On the other side, coercion by type annotations may also be required when the typechecker infers a cyclic type. In effect, such types are internally used by the typechecker but are not currently allowed outside in the language.
As previously stated, expressions evaluate to values of certain types. The type algebra describes the shape of these types. The present section doesn't explains how types are synthesized from expressions even if some examples are given. We rather address here an intuitive presentation of the types.
Natively, Opa provides three basic types:
-
int
: The type of signed integer values (e.g.0
,-4
). -
string
: The type of characters strings (e.g."this is a string"
). -
float
: The type of signed floating point values (e.g.0.489
,-5.94
).
As stated in introduction, type variables are used to represent
types with not particular constraint,
i.e. types we don't need to know about or we don't know yet their effective
internal structure. Type variables are written by a leading quote followed by
regular identifier, for instance: 'a
, 'myvar5
.
Type variables act in two particular parts of the type system.
- During type inference, they are used by the typechecker to temporarily give types to expressions that at a certain point do not exhibit the need to have a particular type.
- At the end of type inference of a definition, type variables that still remain unconstrained are generalized. This means that the typechecker discovered that there is no need to know more about the structure of the types they represent and hence decides to make the type polymorphic: any type will be suitable in place of these variables. Hence, in the type of a definition (not expression), all finally appearing type variables are implicitly quantified by a for all qualifier, telling that the definition having this type has all the possible type schemes where type variables can be instantiated (replaced) by any type. In other words, a type scheme is a model, a pattern of types, describing a family of possible types.
This last point raises an important concept:
- Expressions have a type.
- Definitions have a type scheme.
In other words:
- In an expression a type variable is just an unknown that may one day be known equal to some type.
- In a definition a type variable is a universally quantified type. Each instance, usage of the identifier bound by this definition is free to replace these variables by types it needs in its context.
A function is characterized by the type of its arguments and the type of its
result. The type constructor for a function is the arrow symbol: ->
. On the
left part the types of the arguments are listed, comma-separated if there are
several arguments. On the right part is the type of the result of the function.
-
-> int
: Function taking no argument and returning an integer. -
int -> bool
: Function taking one argument of type integer and returning a boolean. -
int, string -> string
: Function taking two arguments, the first one being of type integer, the second of type characters string and returning a characters string. -
int, (int -> bool), int -> bool
: Function taking three arguments, the first and third ones being of type integer, the second one being a function taking an integer and returning a boolean. Finally, the function returns a boolean. Note the presence of parentheses enclosing the function type of the second argument.
Types can be inductively combined to form complex data-types and bound to names
thanks to type definitions. Hence, the following extract defines a new type
name natural
, whose structure is int
.
type natural = int
Once this type is defined, we may use type name natural
to denote this new
type.
Types can have parameters. For instance, the type of pairs with their two components being of a same type can be defined as:
type same_comp_pair_ty('a) = ('a, 'a)
A pair denoted by the type same_comp_pair_ty(float)
will have its two
components carrying floating point numbers.
The type of pairs with components of any types can be defined as:
type same_comp_pair_ty('a,'b) = ('b, 'a)
Note that the order of the type variables 'a
and 'b
is different in the
quantification (i.e. left-part of the definition, on the left of the =
sign)
and in the body of the type definition. As a result, this means that pairs
having type same_comp_pair_ty(int, bool)
will carry integers in their second
component and booleans in their first component.
Finally the type of relational operators (i.e. functions taking two arguments of the same type and returning a boolean value) can be defined as:
type relational_ty('a) = 'a, 'a -> bool
Following this definition, the type of relational operators testing integers
is relational_ty(int)
.
Hence, a named type is an identifier followed by as many types comma-separated and enclosed between parentheses as the corresponding type definition has parameters.
See also the section dealing with type definitions.
Sums are a sequence of cases, each case being a record type.
A record is a sequence of typed fields. In a record, accessing a fields will yield a value of the type of this field. Sums as record can be either closed of open. A closed record type (respectively sum) enumerates exactly the fields it contains (respectively cases for a sum). An open record type (respectively sum) enumerates the fields (respectively cases for a sum) it may contain at minimum, and allows to have more if needed. In case of open record or sum, the type ends by a trailing ellipsis, otherwise it is silently closed.
An open sum ends by a column variable, i.e. a kind of type variable that can be instantiated by another sum. In other words, this column variable allows the sum to gain more sum cases.
An open record ends by a row variable, i.e. a kind of type variable that can be instantiated by another record (also called a row). In other words, this row variable allows the record to gain more fields.
In error messages (where they need to be clearly named), column variables
(resp. row variables) are denoted by 'c.a
, 'c.b
, etc
(resp. 'r.a
, 'r.b
, etc). In other cases, when types are printed and
the identity of such variables doesn't matter, they are both written "...".
Here follow some examples of correct record types :
-
{ }
: Empty record type. It contains no field. -
{ int x, float y }
: Closed record containing exactly the two fieldsx
of typeint
and 'y
of typefloat
. -
{ int x, ... }
: Open record type containing at least a fieldx
of typeint
and possibly any other fields of any type.
Sums are a form of sequence of records. They can be non-disjoint, meaning that some records forming cases may overlap.
**IMPORTANT: **
A strong restriction imposes that an *open sum* must not host *any* case
that is an open record and any (*closed or open*) *sum* must not
host *several* cases that are open records.
Moreover, if several cases (i.e. records) of a sum share a same label, this label
must have the same type in each cases. Finally, if a sum has several cases,
none of them must be open.
Here follow some correct sum types examples:
-
{ int a } or { float b }
: Closed sum containing two cases, one being a record having exactly one fielda
of typeint
, the second begin a record containing a record having exactly a fieldb
of typefloat
. -
{ int a } or ...
: Open sum containing at least one case being a record having exactly a fielda
of typeint
and possibly any other cases. -
{ int a } or { int a, float b }
: Closed sum containing exactly two cases, one being a record containing exactly one fielda
of typeint
, the second being a record containing exactly two fields,a
of typeint
andb
of typefloat
. We see in this example that fields can be common to several cases as long they keep the same type. For example, havinga
of typebool
in one of the cases would be ill-typed. -
{ int a } or { int a, float b } or ...
: Open sum type similar to the previous example. It may contains additional other cases compared to the previous example. -
{ int a }
: Closed sum with only one case. It can be hence seen as a record (i.e. like a struct a la C) exactly containing one fielda
of typeint
. -
{ int a, ... }
: Closed sum with only one case. This case is open hence can be seen as a record containing at least one fielda
of typeint
. Any other record containing additional fields is compatible with this type.
We saw in section Type variable the difference between types
and type schemes. Variables present in type schemes are universally
quantified, hence a type scheme represent a family of types. A type forall
embeds a type scheme inside a type and represents the type of a module field.
Such a type is written by prefixing its body by the keyword forall
followed
by the list of polymorphic type variables of the type between parentheses
followed by a ".
" (dot) then the body of the type. Type-forall are in fact a
way to introduce a flavor of rank-2 polymorphism in Opa type system.
Rank-2 polymorphism is a system in which a forall quantifier may not appear to the left of more than 2 arrows (when the type is drawn as a tree). To illustrate this concept, consider the function:
function f(x) { x }
_ = (f (true), f(2))
variables bound with a definition (sometimes called let-definition may be
polymorphic, as in f
above. That's the reason why f
can be applied to two
different types, bool
and int
.
However, in a rank-1 polymorphism system, function arguments may not be:
function g(f) {
(f(true), f(2))
}
results in an error:
> *Error*
> Function was found of type bool -> 'a but application expects it to be of
> type int -> 'b.
> Types int and { false } or { true } are not compatible
In a rank-2 polymorphism system, we can give g
a signature like
(forall a. a -> a) -> (bool, int)
. This is called a rank 2 type, because a
function argument is polymorphic, as indicated by the forall quantifier. Now the
function g
may be applied to expression whose generalized type is at least
as general as that declared.
Modules are particular records in which each field type gets generalized. This way, a module can be seen as a package of definitions grouped in the module rather than laying at toplevel.
When a module field definition appears to be polymorphic, its type is turned into a type forall where all generalizable variables are universally quantified. The type of the module is hence a closed record type whose fields names are the fields definitions names with their related type turned into a type forall if they are polymorphic. Hence, types forall are closely related to module fields types and can only be created via module expressions.
The following program clearly show the difference between the type of r
which
is a regular record and the one a m
which is a record containing types
forall for fields having a polymorphic type.
r = {
function id(x) { x },
function pair(x) { (x, x) },
one: 1
}
> r : {('v0 -> 'v0) id, ('v1 -> tuple_2('v1, 'v1)) pair, int one} or ... =
> { id: <fun>, one: 1, pair: <fun> }
m = {{
function id(x) { x },
function pair(x) { (x, x) },
one: 1
}}
> m : {(forall('v0). 'v0 -> 'v0) id,
> (forall('v1). 'v1 -> tuple_2('v1, 'v1)) pair,
> int one } =
> { id: <fun>, one: 1, pair: <fun> }
In the record type of r
, no type forall appear even if the type scheme of
r
itself is polymorphic ('v0
and 'v1
are generalized). In the record
type of m
, since it is a module, all fields having a polymorphic type get
generalized and turned into forall types. One can notice that one
being
not polymorphic, it is not turned into a forall type.
**IMPORTANT: **
Types forall are explicitly created by the developer when he uses a module
construct: there is hence no automatic type forall creation. Conversely, when
a type forall appears in an expression, it is automatically instantiated, hence
leading to a type in which generalized variables are replaced by fresh variables
and universal quantification disappears (in other words, the type forall
becomes a type non-forall again).
For sake of completeness, we give here a presentation of the type algebra like usually described in theoretical approaches. This section is mostly dedicated to advanced users or people familiar with type systems design. A more complete description of this theoretical presentation is available in the technical and architecture documents of Opa.
τ ::= `int` Integer constant type
| `bool` Boolean constant type
| `string` String constant type
| `float` Floating point numbers type
| `void` Trivial (i.e. having 1 unique constructor) constant type
| α Type variable
| t ( τ ) Named type constructor (possibly parameterized)
| τ `->` τ Function type
| κ Sum type
| ∀ α, ρ, δ . τ Type forall
φ ::= • Closed row
| ρ Row variable
| l : τ, φ Field sequence
κ ::= • Closed column
| ζ Column variable
| { φ } or κ Case (record) sequence
Now that the form of Opa's types has been presented, we will address how Opa programs are typechecked. The aim of this section is to understand the type to expect for each construct of the language. This presentation is split in two parts, one dealing with type definitions and one dealing with expressions.
Type definitions allow to create new data-structure from previously existing ones or allow to alias, i.e. provide a name as a shortcut for cumbersome type expressions. Both usages follow the same syntax and rules.
A type definition is introduced by:
[directive] type [directive] ident[(parameters, ...)] = type expression
where ident is the name of the created type, parameters is a comma-separated enumeration between parentheses of the type variables parameterizing the definition (or nothing if the definition is not parameterized), and type expression is the body of the defined type. We will see directives in the next section.
Once defined, the new type name will be compatible with itself and any
type expression compatible with the body of its definition. Hence, following
the definition type t = int
, the type t
can be used anywhere an int
is
expected and reciprocally.
Directives allow to alter the visibility of a type definition to ensure
abstraction. Two directives are available: private
and abstract
.
private
makes the type definition only visible in the package where it
it defined. Outside this package, the type is no more visible : it doesn't
exist for other packages. Such types are typically internal to a
package and serve only to implement functionalities without appearing in
the interface of the package. As a consequence, they must not escape outside
the package, i.e. must not appear in the signatures of values visible outside
the package. To make this possible, values manipulating such types must also
be declared private, using the private
directive also available for
value definitions. If a private type appears in the signature of an exported
value, an error is raised, advising to make this value private, like in the
following example:
package private_ty_escape2
private type prv_string_ty = string
vv = "" : prv_string_ty
> *Error*
> Description:
> Definition of vv is not private but its type contains a type private to the
> package. Type prv_string_ty must not escape its scope.
> Hint:
> Add a private directive on this definition.
abstract
makes the type definition transparently visible inside the
package where it is defined, i.e. its internal structure can be accessed and
manipulated, but outside this package it turns opaque: the type is still
visible (i.e. it still exists) but its internal structure is not visible
anymore.
Such types are typically data-structures implemented in a package, provided to other packages that must not directly manipulate the effective implementation of the data-structure and instead use the API provided by the package implementing the data-structure. This is a kind of abstract data type (ADT), which permits to preserve consistency and internal invariants required by the implementation of a data-type. The basic idea is that the definition package "knows" how to safely manage the type, provides a safe API to the outer "world", hence one gets sure that nobody will come and "hack by hand", hence break, invariants required by the data-structure to keep consistent. Outside its definition module, an abstract type loses its internal representation, hence becomes compatible only with itself.
package p1
abstract type u = int
function u to_u(int x) { x }
function int from_u(u x) { x }
// Legal since we are in the defining package: we see u's internals.
_ = 1 + ((u) 1)
package p2
import p1
// Obtain a value of type u from the integer 0.
v = to_u(0)
_ = 1 + v
Because the last line of code of package p2
tries to rely on u
's
internal structure although it is not the package defining it, u
being
turned abstract can't be compatible with int
anymore, hence leading to the
error:
> *Error*
> Function was found of type 'a, 'a -> 'a but application expects it to be of
> type int, u -> 'b.
> Types int and u are not compatible
Parameters allow to define types polymorphically, in order to instantiate them later by providing effective type arguments to the type name in future type definitions or type expressions. Let's examine the following sample code.
type fun_t('a, 'b) = 'a -> 'b
type int_string_fun_t = fun_t(int, string)
function twice_general(f , i) {
f(i) + f(i)
}
> twice_general : ('v0 -> int), 'v0 -> int = <fun>
function twice(fun_t(int, int) f, i) {
f(i) + f(i)
}
> twice : fun_t(int, int), int -> int = <fun>
fun_t
is defined as the type of functions taking one argument an returning
a value, both of them not being fixed in the definition. Hence the type
fun_t
is partially defined and will need to be instantiated with
effective arguments in further usages.
Next comes the definition of int_string_fun_t
using fun_t
and specializing
this latter with int
for both the types of the argument and returned value of
the function. By this definition, we used the general fun_ty
to create a more
specialized type definition.
Let's now consider the definition of twice_general
taking a function, a
parametric value and adding twice the result of applying this function to this
value. The type of `` (internally int, int -> int
) enforces the result of
`f` to be `int` but no constraint applies on the type of the argument of
`f`. Hence `f` has type `'v0 -> int`, and by consequence `twice_general` also
has a parameterized type `('v0 -> int), 'v0 -> int`.
Using the type definition int_string_fun_t
, we can define a function similar
to twice_general
, i.e. with the same body but with a more restricted type:
twice
.
Hence, by adding a type coercion on the parameter f
of twice
, we can
enforce it to be of type fun_t(int, int)
hence forcing the parameter of f
to be int
instead of leaving it free like in twice_general
.
**NOTE: **
Note that both definitions of `fun_t` and `int_string_fun_t` do not really
create _new_ types. They are aliases, _shortcuts_ for a type expression.
Instead of using these 2 defined type names, we could directly use their body
but this would be less handy. Creation of really new types is achieved by
defining sums and records as described below.
The basic type expressions that can be used allow to represent the types described in the Opa type algebra.
int
, bool
, string
, float
, void
.
type t = int
defines t
as an alias on the type of integers int
.
-
type t1 = int -> float
definest1
as the type of functions taking anint
and returning afloat
. -
type t2 = -> float
definest2
as the type of functions taking no argument and returning afloat
. -
type t3('a, 'b) = 'a -> 'b
definest3
as the type of functions taking an argument and returning a value whose types are not fixed and are parameters of the type definition. Hence,t3(int, float)
specializes these parameters and lead to a type compatible with the above typet1
. -
type t4 = int, string -> bool
definest4
as the type of functions taking two arguments (of typesint
andstring
) and returning abool
.
Type definitions bind type expressions to names. So defined type names can be on their turn used to form new type definitions. If a type name represents a parameterized type, it must be instantiated providing effective type arguments according to its arity (i.e. number of required arguments).
-
type t5 = t1 -> t2
definest5
as the type of functions taking a parameter of typet1
and returning a result of typet2
, both types being those presented in the above paragraph. -
type t6 = t3(bool, int)
definest6
using the previously defined typet3
and applying it tobool
andint
. As a consequence,t6
is the type of functions taking an argument of typebool
and returning a value of typeint
.
A sum type is an enumeration of cases separated by or
.
Each case is a record i.e. an enumeration between braces of labels
with their type.
type t7 = { int A } or { string B, bool C }
defines t7
as a new
sum type containing two cases. The first one is a record containing a
field A
of type int
; The second one is a record containing two fields,
B
of type string
and C
of type bool
. Values of this type can be
pattern matched to discriminate their effective form and access the
particular fields of each case.
Sum type definitions always lead to closed sum, i.e. the resulting type doesn't ends by a "..." (column variable). This simply means that when defining a type, the definition simply describes all the cases this type can host.
As a shortcut, when a field of record has type void
, its type
specification can be ommited. Hence,
type color = { Red } or { Green } or { Blue }
defines color
as a sum
with three cases, each of them implicitly of type void
. This is equivalent
to { Red : void } or { Green : void } or { Blue : void }
. The idea in
such a definition is that the fields of the record are self-sufficient and
do not need to carry additional information to be precise enough.
type pair('a, 'b) = { 'a first, 'b second }
defines pair
as a
parameterized sum with only one case. This case is a record
containing two fields, one per component of the pair. Since the pair
data-structure is naturally polymorphic, this type definition has in effect
2 parameters 'a
and 'b
.
**IMPORTANT: **
Remember, as already state while presenting Opa type algebra, in the
[[sum types section|The type system#algebra_sum_type]], that a strong restriction imposes that
an *open sum* must not host *any* case that is an open record and any
(*closed or open*) *sum* must not host *several* cases that are open records.
Moreover, if several cases (i.e. records) of a sum share a same label, this label
must have the same type in each cases. Finally, if a sum has several cases,
none of them must be open.
A record type is defined like (and compatible with) a sum type with only one case. This way, by default a record type is a closed row in an open column.
Type-forall being strongly related to modules, they are defined using a syntax close to modules, allowing to avoid the explicit quantification of type variables. In fact, type-forall are not directly written, they are introduced on need through module types. Any type variable appearing in the type of a field of the definition and not bound outside will be implicitly considered as quantified.
type t = {{ identity : 'a -> 'a }}
defines t
as a module type (i.e. a
record) in which the field identity
has the type-forall
forall('a).'a -> 'a
.
Note that variables not bound outside the module type have a scope local to
the field's type where they appear. For instance, in
type t = {{ ('a -> 'a) identity, 'a val }}
, 'a
represents 2
different variables in identity
and val
.
Variables bound outside the module type are not quantified, hence not
participating into type-forall creation. For instance in
type t('a) = {{ ('a -> 'a) id, 'b v }} -> 'a
the type variable 'a
being a type parameter is bound outside the module type, hence is not
quantified. By consequence, the type of the field identity
is no more a
type-forall. Conversely, 'b
being not bound remains quantified and
introduces a type-forall. This also shows that type-forall are introduced
only when polymorphism exists: there exists no trivial type-forall,
i.e. no type-forall with an empty set of quantified variables.
A few syntactic extensions allow shortcuts to make life easier.
Additionally, a syntactic shortcut allows to merge several sum types
definitions to create a new definition containing the cases of all the
initial types. In the same idea than when creating a sum type, i.e. separating
its different cases by a or
, the names of types to merge are simply separated
by or
.
type t = { A } or { B }
type u = { C }
type v = { D }
type w = t or u or v
_ = (u) { C }
> _do_ : u = { C: void }
In the above sample code, w
finally stands for the type definition
{ A } or { B } or { C } or { D }
and expression { C }
can
be coerced into w
although it is initially a value of type u
.
You may note the _ = expr
construct which allows evaluating the
expression expr
then throw its result value. Since this value is not bound
to any identifier, Opa names it _do_
in its feedback messages.
Tuples are generalization of pairs. A pair is a data-structure with 2 components: a tuple has an arbitrary number of such components. Two tuples types are compatible if their components are compatible 2 by 2. This especially means that 2 compatible tuples types must at least have the same number of components. Syntactic sugar allows to write tuples types as comma-separated sequence of types enclosed between parentheses.
type t10 = (int, bool, int, int)
defines t10
as a tuple with 4
components, all being of type int
except the second one.
type t11('a) = ('a, 'a, 'a)
defines t11
as a tuple with 3 components
all being of the same type, parameter of the type definition. Using
t11
in a type expression with bool
as effective argument will hence
represent the type of a tuple with 3 components all being booleans.
**NOTE: **
Opa standard library already proposes numerous data-types like lists, maps,
sets, etc which may be directly used in developments, avoiding the need to
implement again the related common and widespread algorithms.
In this section, we will review each Opa construct and explain which way its type is computed. In some cases, we will see that inference is not complete and type annotations may be added by the developper to help the typechecker in its job.
Constant expressions are given their natively defined type:
- Integer values:
int
- Boolean values (
{ true }
and{ false }
):bool
- Characters strings literals:
string
- Signed floating point values:
float
- The trivial type value (
{}
):void
i = 5
> i : int = 5
f = 3.14159
> f : float = 3.14159
s = "foo"
> s : string = "foo"
v = {}
> v : {} or ... = void
b = true
> b : bool = { void true }
First, notice the apparition of a construct ident = ...
we will detail
deeper later. For the intuition, let's simply
say it binds (attaches the result of its right-side expression to the
name ident
). This allows to name values. Be aware that this is different
from the notion of assignment "à-la C" which permits to modify in place the
value stored in the memory cell touched by the assignment. The difference between
a binding and an assignment refers to the notion of mutability, i.e. the
possibility to alter the value represented by the identifier. A binding links
an identifier once for all to a value. Some kind of values are mutable, this is
the way assignments are obtained aside bindings.
Note in the above sample code that the void and boolean values have been typed
as sums, not as named types void
and bool
. This is due to the fact that
type inference is structural: named types are introduced by type
annotations. Types bool
and void
are just abbreviations on the
structural types {}
and { void true } or { void false }
. If no explicit
type annotation imposes to see these void and boolean values as void
and
bool
, the structural types induced by their structures remain the primarily
ones inferred.
A function may have arguments but always return a value. Hence, defining a function requires to define its possible arguments and write its body, i.e. the expression building the returned value. The form of a function expression is:
[parameters, ...] -> expr
Parameters of the function are bound by the definition, hence are only available in the function's body. Obviously, to typecheck the body of the function, we will need the types of its arguments. Type schemes bound to identifiers (function parameters or identifiers introduced by the identifier binding construct) are recorded in an internal structure we call here a typing environment acting as the memory of already seen identifiers' types.
Typechecking a function expression is performed in 3 steps:
- First, each parameter is given a temporary unknown type: a trivial type scheme (i.e. with no generalized variables) whose body is a type variable. This especially means that a function parameter can't be used polymorphically inside the body of the function.
- The body
expr
is then typechecked in the typing environment extended by the function's parameters, hence leading to the typeτ
of the returned value. - Finally, an arrow type is built, grouping the types of the parameters (that were possibly instantiated because of type constraints found while typechecking the function's body) on the left-side of the arrow and setting the returned type on the right-side, finally giving a type of the form
τ_1, ..., τ_n -> τ
.
_ = x -> x + 1
> _do_ : int -> int = <fun>
First, notice the _ = ...
notation that allows to throw the result of the
expression laying on its right-side. This is a particular form of identifier
binding, a construct described in detail
later. For the moment, the intuition to keep
is that it binds the result of the expression to nothing.
In this sample code, x
is inserted in the typing environment with a type
variable, say 'a
(what we previously called an unknown type). The body of
the function is then typechecked. Even is we didn't present yet the typechecking
of function application, driven by the intuition, the operator +
expects two
integers and returns an integer. Hence, to be correctly typed, the parameter
x
of the function must be of type int
i.e. the type variable 'a
previously introduced for x
gets instantiated by int
. Now, 1
also has type
int
, so the operator +
is
used in a consistent typing context, returning a value of type int
which is
the global type of the function's body. Finally, the whole function type is
built having 'a
now being int
on the left-side of the ->
and int
on
its right-side, giving int -> int
.
_ = x, y -> x + 1
> _do_ : int, 'v0 -> int = <fun>
This second example shows a function with 2 arguments. As previously, each of
them is entered in the typing environment with a type variable before
typechecking the body of the function. This body is the same than in the
previous example, hence it doesn't use the y
parameter of the function. As a
consequence, no constraint applies on y
and its type remains a variable. As a
final result, grouping the types of the arguments on the left-side of the ->
and the result type on its right, we obtain the final type int, 'v0 -> int
.
_ = -> "constant"
> _do_ : -> string = <fun>
In this last code snippet, since the function doesn't have any argument, its
body is typechecked in the unmodified typing environment. The string literal
"constant"
representing the body of the function had type string
which
is the type of its returned value. Since there is no argument, there is no
type to set on the left-side of the final ->
type, only the result type will
appear on its right-side, leading to the function type -> string
.
Application is the construct allowing to "use" a function by providing it some effective arguments. In Opa, an application expression is denoted by a functional expression on the left-side and the enumeration comma-separated and between parentheses of the expressions being effective arguments to pass to the function:
expr1([expr2, ...])
If the function requires no argument, then the enumeration between the parentheses is empty, however, the parentheses still remain required. Note that there must not be any blank between the functional expression and the opening parenthesis. Typechecking an application is performed in 4 steps:
- First arguments expressions
expr2
, ... are typechecked, leading to as many typesτ_2, ...
. - Next, the functional expression
expr1
is typechecked. It is then expected to lead to a functional type with as many arguments' types as there are effective arguments and returning a value of a certain type:τ
. Hence, the expected type must have the formτ'_2, ... -> τ
. - Each effective argument's type
τ_i
is checked against the correspondingτ'_i
of the type inferred for the function to ensure they are compatible. - Finally, the type of the application is the result type of the function, i.e.
τ
function f(x, y) { x + y + 1 }
> f : int, int -> int = <fun>
_ = f(2, 4)
> _do_ : int = 7
To begin, simply ignore the typechecking of the function f
, we will have
a look at it later. Simply accept it has type int, int -> int
and let's
examine the second expression. The arguments expressions 2
and 4
have
type int
. We then typecheck the functional expression which is the identifier
f
. Even if we don't have described typechecking of identifiers yet, we already
introduced the notion of typing environment keeping trace of types of
identifiers (see Functions). Looking-up in this
environment we find that f
has type int, int -> int
. Hence, we see that
the types of the effective arguments provided in the application are compatible
with the expected types arguments of f
(i.e. int
). Hence the result type
is the one on the right-side of f
's type: int
.
To go further, we can study how f
is typechecked since its body contains the
application of the operator +
. Like previously described for functions, x
and
y
are entered in the typing environment both with temporary type variables,
say 'a
and 'b
. Then the body of the function which is an application must
be typechecked. We then first type the arguments, i.e. x
and y
that have
types 'a
and 'b
. Assuming that +
has the obvious type int, int -> int
we
must now ensure that the types of the effective arguments are compatible with
the type of +
. This means that 'a
must be int
and same thing for 'b
: the
two type variables get then instantiated, i.e. made equal to int
. The
result
type of the application is then the result type of +
, i.e. int
. We iterate
exactly the same process to typecheck the addition with the constant 1
which
finally gives type int
. Hence, int
is the type of f
's body, so the
returned type of f
. As a consequence, f
is inferred the final type
int, int ->int
.
The construct
ident = expr1
expr2
allows to evaluate expr1
, bind its value to ident
and evaluate
expr2
in which ident
possibly appears and will be replaced by its
bound value. Identifier binding introduces polymorphism in Opa: once the type
of the expression to bind to the identifier is computed, all the type
variables remaining in it with no constraint are considered polymorphic
and generalized. As a consequence, in identifier binding expressions,
identifiers obtain not a type but a type scheme (see also
the presentation of type variables) that can be
polymorphic.
Typechecking this construct processes in 3 steps:
- First,
expr1
is typechecked, leading to a typeτ1
. - Then
τ1
gets generalized: all type variables without any constraint are generalized and universally quantified to lead to a type scheme. This type scheme is then bound to the identifierident
in the typing environment. - Finally,
expr2
is typechecked in the extended typing environment, giving a typeτ2
which is the final type of the whole expression.
v =
one = 1
one + one
> v : int = 2
The identifier one
is bound to the value 1
which has type int
. Then it
is used in an addition. The operator +
expects 2 integers and returns an
integer.
Hence, since one
was found of type int
, it can safely be used with the
addition, hence this latter expression leads to the final type int
.
By the way, we can note a special identifier binding construct, the
toplevel one that binds the identifier v
. In its toplevel flavor, this
construct doesn't have any trailing expression (we previously called
expr2
). This allows to bind identifiers at the outermost level of a
program, i.e. to give it a scope (a life duration) spanning on
all the rest of the program.
pair =
function identity(x) { x }
(identity(1), identity("foo"))
> pair : tuple_2(int, string) = { f1: 1, f2: "foo" }
The above code snippet introduces 2 features. First, the shortcut for defining
a function by directly enumerating its arguments. Instead of writing
f = x, y -> 4
, it is possible to write function f(x, y) { 4 }
. Second, we make use of
the built-in syntactic sugar to represent tuples whose components are enclosed
between parentheses and comma-separated.
Hence, the identifier identity
gets bound to a function type directly
returning its argument. Clearly, such a function can take any kind of value and
will return this kind of value. The only constraint on the type of this
function is that it returns the same type it got for its argument, hence
giving the type 'a -> 'a
. This type can then be generalized giving identity
the polymorphic type scheme ∀ 'a.'a -> 'a
. The function is then invoked
twice, once with an int
, once with a string
, which is legal since it has
a polymorphic type. Hence, the results of these invocations have types int
and string
and are grouped in a tuple, tuple_2(int, string)
which is the
final type of this expression.
Similarly to the identifier binding construct, the recursive flavor
rec ident_1 = expr_1
...
rec ident_N = expr_N
expr
allows to bind each ident_i
to the value of expr_i
but with
all the ident_x
possibly appearing in the bodies, expr_x
, of
the bound expressions. This hence allows (mutually) recursive
definitions. Intuitively, this implies that to compute the type to bind to
ident_i
, we must compute the type of all the expressions expr_x
for which we may need to already known the types bound to the ident_x
.
Note that toplevel definitions are automatically reordered, i.e. it is possible
to use a function and define it later and if needed, definitions are implicitly
made recursive. Hence, the rec
construct mostly serves to define local
recursive definitions, i.e. definitions not spanning at toplevel (or in other
words, definitions nested in other definitions).
Typechecking this construct processes in 5 steps:
- First, each
ident_i
is temporarily given a unknown type: a trivial type scheme (i.e. with no generalized variables) whose body is a type variable. This especially means that a recursively defined identifier can't be used polymorphically in its own definition. The typing environment gets extended, binding eachident_i
to its related temporary type scheme. - Then each
expr_i
is typechecked in the extended typing environment, leading to a typeτ_i
. - Now the bodies of the bindings are typechecked, each
τ_i
and the related temporary type scheme bound toident_i
(which have possibly be instantiated by some type constraints) are checked to ensure they are compatible. - If so, each
τ_i
is then generalized and becomes the type scheme bound toident_i
in the typing environment. - Finally,
expr_n+1
is typechecked in the extended typing environment, giving a typeτ_n+1
which is the final type of the whole expression.
As we previously introduced (see Functions), the typing environment is a memory keeping trace of types inferred and bound to identifiers (bound by a function definition or an identifier binding construct). More accurately, it contains, not the types, but the type schemes bound to identifiers since Opa supports polymorphism.
Typechecking an identifier expression aims to find a type for this identifier and is performed in two steps.
- First a looking-up is performed in the typing environment to find the type scheme entered for this identifiers. Providing that the identifier was actually defined, it das obviously been typechecked and bound a type scheme in this environment.
- The obtained type scheme must be transformed into a type. Remember that a type scheme is a model, a pattern of types, describing a family of possible types. Generalized (i.e. universally quantified) type variables in the scheme represent parts that can be instantiated at need by any type expression. In order to allow the identifier to be used with different types in place of the type variables of its scheme, the scheme gets specialized, i.e. the type for this occurrence of the identifier is built by copying the scheme's body, replacing all generalized type variables by fresh (i.e. not appearing anywhere in the program) type variables.
function id(x) { x }
> id : 'v0 -> 'v0 = <fun>
one = 1
> one : int = 1
hello = "Hi"
> hello : string = "Hi"
_ = (id(one), id(hello), id(id))
> _do_ : tuple_3(int, string, 'v1 -> 'v1)
In this sample code, the interesting part deals with the 3 applications used
to build the tuple expression. We see that id
was entered in the typing
environment with the polymorphic type scheme ∀ 'a.'a -> 'a
.
Hence when
typechecking id(one)
, the type scheme of one
is fetched, which is not
polymorphic and whose body is int
. Hence, the occurrence of one
is given
the type int
. The type scheme of id
is then fetched and instantiated: the
quantified type variable 'a
is replaced by a fresh one, say 'b
leading to
type 'b ->'b
(note there is no more ∀ since this is a type, not a type
scheme). Application then makes this 'b
instantiated into int
and leading to
the final application type int
for the first component of the tuple.
The
second component of the tuple is processed the same way, the type scheme of id
being fetched again in the environment and specialized again by replacing 'a
by another fresh type variable, say 'c
. The type scheme of hello
is fetched
and is not polymorphic, leading to type string
. As a consequence, not
surprisingly the type of the second component of the tuple is string
.
Finally
remains the third and last component of the tuple. The type scheme of id
is
fetched twice: one first time for the argument of the application, one second
time
for the functional part of it. Each occurrence of id
is then given a type
obtained by specialization of the scheme, each one having fresh type variables
in place of the ones generalized in id
's scheme, leading to types
'd ->'d
for the functional usage of id
and 'e ->'e
for id
used as
argument. Then typechecking of the application imposes that 'd
and 'e->'e
must be the same type. The result of the application expression being 'd
,
from the constraint we have just seen, we obtain that id(id)
has type
'e->'e
, which is the type of the third component of the tuple.
A sum expression consists in the record representing the case of sum, i.e. an enumeration (possibly empty) between braces and semi-separated of the field labels with their related values that represent the sum case:
{ [label: expr or ...] }
Such an expression always lead to an open sum type containing a closed record. The reasons for closed / open are the following:
- Since the record enumerates all the fields of the case, this means that this record has no more fields in this expression. For this reason, it must be closed because it contains no other fields than thoses explicitly stated in the expession.
- Since the expression represents one case of sum, its type at least contains this case, but possibly some other cases. Hence it must be open to accept being compatible with these possible future other cases.
Typechecking of a record expression is performed in 3 steps:
- Each field expression
expr_i
is typechecked, leading to as many typesτ_i
. - A closed record type is then built, binding each
label_i
to its related found typeτ_i
. This record being closed means that it doesn't end by a row variable, hence won't be compatible with record containing other fields than those explicitly specified. - Finally, an open sum type is created, plugging the above record type as one of its case. This sum being open means that it ends by a column variable, hence means that the type of this expression contains at least this case, but may be more if needed.
Pattern-matching is the construct allowing to discriminate by cases on the structure of a value. According to the syntax:
match (expr) {
case case_1: expr_1
case case_2: expr_2
...
case case_n: expr_n
default: expr_def
}
the expression expr
is matched against each pattern in sequence, stopping
on the first one that matches. This fitting between the value and the pattern
may induce bindings of pattern variables. The environment is then extended by
these bindings and the right-side part expression of the matching case is then
evaluated as the result of the whole expression.
As a consequence, the matched value and all the patterns must have a same type. Because any right-side part expression of the cases can be returned, they must also have a same type.
More accurately, typechecking of this construct is performed as follows. First,
the matched expression is typechecked, leading to a type τ
. Then, each
pattern is typechecked leading to types τ_1
, ... τ_n
that are checked
to be compatible together and with τ
. While typechecking patterns, pattern
variable types are inferred and reminded for later.
At this point, a special operation
is performed on catchall patterns ("_", or pattern variables) having a sum
type. In effect, a catchall pattern absorbs any other value not already matched by previous patterns.
In the case of sum types, this means that the
column type representing this sum can have any other cases, hence is
open. For this reason, sum types appearing in the type of a catchall pattern
get recursively open (i.e. opening is done in depth).
Now the type of the matched
expression and patterns is computed, each right-side part expression of cases
is typechecked in the environment extended by bindings of its related pattern
variables (previously recorded). As a result, each branch gets a type
τ'1, ..., τ'n
that are checked to be compatible.
The resulting type of the whole expression is then the type of the right side
parts expressions of the pattern-matching.
Below follow some examples showing pattern-matching typechecking.
function f(x) {
match (x) {
case 0: false
case 1: false
default: true
}
}
> f : int -> bool = <fun>
Typechecking the matched expression x
doesn't impose any particular
constraint. Hence x
is assigned a type variable. Then, each pattern is
typechecked and appears to be int
. Now, each right-side part
is typechecked leading to type bool
which is the return type of the
function. The last pattern is a catchall, but since it doesn't have a sum type,
opening has no effect. Since this function takes x
as parameter and this
latter was inferred the type int
, the final type of this function is as
expected int -> bool
.
function g1(x) {
match (x) {
case { a: 1 }: 0
case { b: bval }: bval
}
> g1 : {int b} or {int a} -> int = <fun>
Typechecking the patterns, it can be seen that the matched expression
must be of a type sum containing either { a : int }
or { b : 'a }
.
At this point, the pattern variable bval
is not yet constrained to any
particular type from the current context, so it remains a type variable. Hence,
the type of the matched expression must be, a priori, {b : 'a} or {a : int}
.
The first right side part is then typechecked leading to type int
. The
second one, thanks to the previously recorded binding of bval
to 'a
,
has type 'a
and must be compatible with int
. Hence 'a
must be
instantiated by int
. As a result, the matched value must have type
{b : int } or {a : int}
and the result of the pattern-matching must have
type int
. Finally, we have a catchall pattern, the variable bval
,
but since its type is not a sum type, opening has no effect and types are not
changed. Hence, the function g1
is inferred of type
{b : int } or {a : int} -> int
.
function g2(x) {
match (x) {
case { a: 1 }: 0
case { b: bval }: bval
default: 42
}
}
> g2 : {int b} or {int a} or ... -> int = <fun>
This example only differs from the previous one by the addition of a final
catchall case. Typechecking is done in a similar way than in the previous
example. The only difference is that because the catchall pattern is related
to a sum type ({b : int} or {a : int}
), opening columns has now an effect.
The type of the matched value finally gets open, under the form
{b : int} or {a : int} or ...
to reflect the fact that this pattern-matching
can handle sums with a case { a }
, a case { b }
or any other case.
function g3(x) {
match (x) {
case { a: { c } }: { d }
case { b: bval }: bval
}
}
> g3 : {{d} b or ...} or {{c} a} -> {d} or ... = <fun>
More subtle, the above example shows that while typechecking the patterns,
the catchall pattern variable bval
is not yet constrained, hence remains
a type variable 'a
. If we have a look "in the future", we will see that
right-side parts impose that 'a
is instantiated by a sum type. However, this
is not yet known and the opening operation has no effect on the type
currently bound to bval
.
The type of right-side parts is now inferred. The first one imposes to have a
sum type containing a case { d }
, hence has type { d } or ...
(remember that a record expression is by default inferred as plugged in an
open column). The second one imposes now 'a
to be instantiated by the
former type. This instantiation of 'a
propagates both in the types of the
argument and of the result of the function g3
, stating that this
pattern-matching can in fact handle sums with a case { a : { c } }
and a
case { b }
itself having a sum type containing "d
or any other case".
This difference then appears in the global type of the function g3
since
it shows that it returns a sum with either a case a
or any other case
being the embedded value of its argument when it is a sum of the form
{ b = ... }
.
function g4(x) {
match (x) {
case { a: 1, b: { c: 3 } }: 0
case { a: 1, b: { c: 4 } }: 0
default: 0
}
}
> g4 : {int a, ({c : int} or ...) b} or ... -> int = <fun>
This example demonstrates the opening of sums due to catchall patterns operating
in depth. In effect, the final "_" pattern is bound to a sum type
{int a, {int c} b}
, i.e. a sum whose first case has a field b
also having a sum type. Since this catchall "absorbs all the other cases" of
this sum type, it absorbs others cases than { a, b }
but recursively also
" the other cases than { c }
in the type of b
". For this reason, both
sum types get open in the argument of the function (note the apparition of
the 2 column variables or ...
).
By default, when typechecking patterns, if all the patterns lead to a same record type, the inferred type is a sum type with only one case being this record type:
function g5(x) {
match (x) {
case { a: 1, b: "" }: x
case { a: 2, b: "A" }: x
default: x
}
}
> g5 : {int a, string b} or ... -> {int a, string b} or ... = <fun>
v = g5({ a: 1, b: "" })
> v : {int a, string b} or ... = { a: 1, b: "" }
_ = v.a
> _do_ : int = 1
Since such a sum only has one case, it remains compatible with a record as shows
the creation of v
then its field access in the above sample code. However,
it is possible to add constraints enforcing the matched expression to have a
pure record type, i.e. a closed sum. For instance, in:
function g6(x) {
match (x) {
case { a: 1, b: "" }: x.a
case { a: 2, b: "A" }: 1
default: 2
}
}
> g6 : {int a, string b} -> int = <fun>
the field access performed on x
in the right-side part of the first case
imposes x
to be a record (see section dealing with
field access), i.e. to be plugged into a closed column.
This appears in the type of g6
since is has any more column variable in
spite of the catchall pattern. In effect, opening operation is performed after
typechecking the patterns and before typechecking right-side parts. Hence, the
field access construct enforce closing again the column that was open due to
the final catchall pattern. This is sound because if the sum was remained
open, the following example
function g6bug(x) {
match x {
case { a: 1, b: "" }: x.a
case { a: 2, b: "A" }: 1
default: x.a
}
}
would also have type {int a, string b} or ... -> int
which would be
unsound because it would mean that in the default case, when none of the 2
first patterns matches the tested value, we return its field x
. Passing
this function a value of any form not compatible with the 2 first patterns and
not containing a field x
, for instance, { c: 1 }
, would fall in this
default case and since no field a
is present, the field access would fail.
A special form of pattern, the elipsis written "..." can be used in record patterns involved with both sums and records. However, it currently has 2 different meanings depending on the context and these 2 meanings may sometimes conflict. Elipsis is used in patterns under the form:
match (x) {
case { a: 1, ... }: 1
}
The first meaning of the elipsis is "this pattern is an open record". In other words, "this pattern matches records with at least all the specified fields, all other ones being of no interest". For instance:
function h1(x) {
match (x) {
case { a: v, ... }: v + 1
}
}
> h1 : {int a, ...} -> int = <fun>
_ = h1({ a: 3, b: "useless", c: 3.14159 })
> _do_ : int = 4
states that the function h1
takes as argument a record having at least a
field a
of type int
and ignores all other possible fields of the
record actually passed.
The second meaning of the elipsis deals with sum cases in which only some fields of each case are useful and means "this pattern matches a case of a sum containing the specified fields that are useful and the other fields of this case are of no interest but I don't want to enumerate all of them". For instance
type t =
{ int a, int b, int c, int d } or
{ int e, int f }
> type t = {int a, int b, int c, int d} or {int e, int f}
function h2(x) {
match (x : t) {
case { a: a_val, ... }: a_val
case { f: f_val, ... }: f_val
}
> h2 : t -> int = <fun>
defines a sum type with 2 cases and several fields inside. The function h2
performs a pattern-matching on a value of this type but is only interested in
one of the fields of each case. To avoid the tedious enumeration of all the
other useless cases, elipsis is used to let the typechecker determine the
remaining part of the fields of each case.
However, this 2-sided aspect of the elipsis introduces tricky behavior when the context doesn't permit to guess if the elipsis is used in the context of a record or a sum type. Without enough information, the elipsis is implicitly considered as meaning an open record type. This means that this creates, for the pattern, a record type ended by a row variable. When unifying the type of patterns and matched expression to ensure they are compatible together, if the open record types of 2 patterns get unified, they will be along the record and not along the sum. In other words, 2 open records will join their fields in a unique final type although 2 closed records with incompatible fields will join into a sum type with these 2 records as cases. Hence,
function h3(x) {
match (x) {
case { a: 1, ... }: 0
case { b: 2, ... }: 0
}
}
> h3 : {int a, int b, ...} -> int = <fun>
having 2 open cases, infers for them { int a, ... }
and
{ int b, ... }
, that get unified into {int a, int b, ...},
giving this possibly surprising type to h3
. In fact, the point is that
there is not enough information in the context to guess if these 2 patterns
represent 2 different "interesting" fields of a record or represent 2 cases
of a sum in which we don't want to enumerate useless fields of each case.
The difference with the previous example is that in the previous one, there was
a type constraint on the matched expression bringing the information that the
matched value had a sum type. With this extra information, the typechecker
"knew" that patterns were related to a sum type and could complete the omitted
fields of each case instead of considering them as 2 records to merge together.
As we stated above, the elipsis is implicitly considered as meaning an open record type hence creates a type ended by a row variable. If when checking that patterns have compatible types unification fails unifying along the record, i.e. merging the fields of the record, then it tries to unify along the column, i.e. creating a sum case with the 2 records as cases. Consider
function h4(x) {
match (x) {
case { a: 1, ... }: 0
case { b: "true" }: 0
case { a: _ }: 0
}
}
> h4 : {int a} or {string b} -> int = <fun>
the types inferred for each pattern is { int a, ... } or ...
,
{ string b } or ...
and { int a }
. Unifying the 2 first types
can't succeed along the row because the second record type is closed, then can't
acquire the field a
. Hence unification is performed along the column,
leading to a sum type {int a, ...} or {string b} or ...
. Now unification
is done between this type and the one of the last pattern, { int a }
.
These 2 types can be unified since {int a, ... }
and { int a }
can
be, instantiating and closing the ending row variable. Since there is no
catchall pattern, the resulting sum is not open, hence doesn't show any column
variable.
**CAUTION: **
What happens if both row and column variables appear ? The answer is that in
this case, it may be possible to unify types along row or columns, i.e. to
merge fields of records or to catenate records as sum cases. In some cases,
there is no better choice and hence no principal type. This weakness of the
type system is known, as well as the fact that unification "prefers" (first
tries) to unify along the rows. That's some reasons why some type annotations
are sometimes required to get out of tying errors.
Let's consider the following example:
function h5(x) {
match (x) {
case { a: 1, ... }: 0
case { b: "true" }: 0
default: 0
}
}
The first pattern has type { int a, ... } or ...
, the second one has
type { string b } or ...
and the third has type 'a
. When unifying
the 2 first types to ensure they are compatible, the only solution is to unify
along the columns, hence leading to a sum type:
{ int a, ... } or { string b } or ...
. Unification with the last type
only instantiates 'a
. Because there is a catchall pattern, all sum types
remain open, hence the global type of the patterns is finally
{ int a, ... } or { string b } or ...
. We are exactly in presence of a
type that can be unified either along its column or row variable, hence in
presence of possibilities of non-existence of principal type. In such a case,
an error is raised by the typechecker:
> Sum type with row and column variables
> { string b } or { int a, 'r.a } or 'c.a
> Error
> Exportation failure
It is worthy to note that at this point, the typechecker detected an issue while working with a richer type algebra that the one available to state the signatures. In effect, this algebra temporarily tolerates sums with row and column variables but is only internally used by the typechecker. Error messages related to the internal type algebra print types a bit differently.
**CAUTION: **
Because a sum type represents an enumeration of cases, possibly open, but
with each case having a fixed and finite set of fields, sums with cases
containing row variables are also not legal types. For this reason, even if
the richer and internal type algebra used by the typechecker temporarily
tolerates such types, once typecheking ends such types, if they escape are
rejected and don't fit the public type algebra in which signatures of the
values are stated. Hence, the following example:
function h6(x) {
match (x) {
case { a: 1, ... }: 0
case { b: "true" }: 0
}
}
> Closed sum type with row variable(s) { string b } or { int a, 'r.a }##
> *Error*
> Exportation failure
is rejected since the first pattern has type { int a, ... } or ...
. The
second one has type { string b } or ...
. Hence, unifying these 2 types
gives { int a, ... } or { string b } or ...
. Since there is no catchall
pattern, the sum doesn't remain open hence leading to type
{ int a, ... } or { string b }
. This type is a sum and it actually has
a case being a row ended by a variable.
Binary conditional is introduced under the form:
if expr_1 then expr_2 else expr_3
where expr_1
is the tested condition, expected to be of type bool
,
expr_2
is the result value in case the condition holds (i.e. the
evaluation of expr_1
yields true
) and expr_3
is the result
value in case the condition doesn't holds. Since this construct can return
the value of either expr_2
or expr_3
, both expressions must have
the same type.
Hence, typechecking of this construct if performed in 3 steps:
- The condition expression
expr_1
is typechecked and it is checked to be of typebool
. - The expression
expr_2
andexpr_3
are typechecked, leading to typesτ_2
andτ_3
. - Types
τ_2
andτ_3
are checked to be compatible, i.e. are unified in a common typeτ
and if so, the type of the whole expression is thenτ
.
ok = if (true) { 1 } else { 2 }
> ok : int = 1
reok = if (1 != 2) { {A} } else { {B} }
> reok : {{} A} or {{} B} or ... = { A: void }
ko = if (1) { {A} } else { {B} }
> *Error*
> Matched expression or previous patterns have type int but new pattern is
> found of type { true } or 'c.a.
reko = if (false) { 1 } else { "one" }
> *Error*
> Previous right-side parts of the pattern matching return type int but current
> one returns type string.
> Types int and string are not compatible
test(c, default) = if (c) { "Holds" } else { default }
> test : bool, string -> string = <fun>
In the first example, the condition expression true
is correctly of type
bool
, then both 1
and 2
have type int
hence the whole expression is
well-typed and of type int
.
In the second example, assuming that the comparison operator "different"
!=
has type 'a -> 'a -> bool
, its application to 2 integers is
well-typed and of type bool
. Hence the condition expression is well-typed.
The "then" part has type { A } or ...
, the "else "part has type
{ B } or ...
, hence the common type obtained by unification of these 2
types is { A } or { B } or ...
which is the type of the whole expression.
The third example shows that because the condition expression 1
has type
int
and not bool
the whole expression is ill-typed.
The next example shows that because both branches of the binary conditional do
not have the same type (string
versus int
), even if the condition
expression is of the correct type bool
, the whole expression is ill-typed.
Finally, the last example exhibits the constraints imposed by the binary
conditional construct, making them appearing in the type of the function using
it. In effect, since the tested condition is the first parameter, c
of the
function test
, this enforce this parameter to be of type int
. Moreover,
since the "then" branch has type string
and since both branches are assumed
to have the same type, the "else" part must also have type string
, which
requires default
, the second parameter of test
to be of type
string
, which is also the type of the whole conditional expression. From
these constraints, we actually find that test
has type
bool, string -> string
.
**NOTE: **
Internally, the binary conditional construct is transformed in a
pattern-matching on boolean values. In other words,
`if c then t else e` is rewritten as
match (c) {
case { true }: t
case { false }: e
}
That's the reason why, in case
of typechecking error on a binary conditional, one sometimes get a message
involving pattern-matching as we could see in some above examples.
Opa provides a "light" notion of modules through a special form of records that are written between double-braces instead of simple-braces. Hence, a module expression has the form:
{{ [label: exp, ...] }}
The main feature brought by modules compared to records is the ability to have each field separately generalized and all of them implicitly mutually recursive. Hence, a module can be seen as a record grouping several definitions instead of leaving them spanning at toplevel.
Since modules are a sort of records they are first-class values, i.e. can be passed to functions, returned as result, stored in data-structures, etc. However their type is not a simple record type. In effect, since each field, i.e. definition of the module can be generalized, the type of the definitions are especially processed before being part of the module type: if they are polymorphic then their type is turned into a type-forall (see types forall) to mimic a type scheme. The final type of a module is them a closed record type in which labels are bound to their respective type (possibly being type-forall), embedded in a closed sum (i.e. the sum has the record as unique case).
Typechecking of a module is performed in 4 steps:
- Like for record expressions, each field expression
expr_i
is typechecked and like for a recursive identifier binding (see dedicated section), fields are considered as implicitly mutually recursive. As a result, we obtain as many typesτ_i
. - Each type
τ_i
is inspected, any unconstrained type variables get generalized and a type-forallτ'_i
is built quantifying these variables and whose body itτ_i
. The related module field is then assigned this new typeτ_i
. However if no such type variable exist inτ_i
, then no change is done and the module field keeps its initial typeτ_i
. - A closed record type is then built, binding each
label_i
to its related found type. - Finally, an closed sum type is created, plugging the above record type as its unique case.
M = {{
function plus(x, y) { x + y }
function f(x) { x }
}}
> M : { f : forall('v0). 'v0 -> 'v0, plus : int, int -> int} or ... = { f: <fun>, plus: <fun> }
As we stated before, the type of a module is a closed record in a closed sum. The reason for having a closed record is that, like any record, a module expression enumerates all the fields this module has, and it has no more fields that those making its definition. There are 2 reasons for having a closed column. First, modules are not sum types: they do not present "different cases". Next, having an ending column variable would make it unconstrained, hence generalized. In such a case, when creatign sub-modules, the type of the sub-module would have a generalizable column variable and would naturally turn itself into a type-forall which is not the expected type for a module.
This feature, pretty related to rank 2 polymorphism, however introduces some
usage restrictions. The first important point is that modules are explicitly
introduced: when the syntactic construct {{ }}
is used, a module is to be
created. The second point to remind is that instantiation of type-forall is
automatic: each time an expression is found having a type-forall, it is
immediately specialized, replacing its generalized variables by fresh ones and
removing the forall quantification (i.e. the obtained type is an instance of the
type-forall's body).
Let's examine a simple example.
// Definition of a module type.
type Mty = {{ ('a -> 'a) f, (int -> int) g }}
> type Mty = { (forall('a). 'a -> 'a) f, (int -> int) g}
// Coercion to a module type
M = {{ function f(x) { x }, function g(x) { x + 1 } }} : Mty
> M : Mty = { f: <fun>, g: <fun> }
_ = M.f(5)
> _do_ : int = 5
_ = M.f("foo")
> _do_ : string = "foo"
The first type definition introduces a module type, automatically adding the
forall-types on polymorphic fields. This avoid having to explicitly state them
in the type de finition. Next a module M
is defined and its signature gets
constrained by Mty
. Note that this constraint is not mandatory but set here
to show how to constrain a module and to avoid getting a verbose structural
type like
{(forall('a). 'a -> 'a) f, (int -> int) g} = { f: <fun>, g: <fun> }
we would get otherwise. Module fields access is performed like for record, via
the dot-notation. The point to note is that M.f
is a forall-type, hence during
the application to the expression 5
, M.f
appears to have type
forall('a). 'a -> 'a
. As stated above, this forall-type get automatically
instantiated into a fresh type 'b -> 'b
, and the type of the whole expression
is int
as expected. Following this point, when applied a second time, to the
expression "foo"
, this type is instantiated again into another fresh type
'c -> 'c
, hence preserving the polymorphism and allowing to apply M.f
to
a string, returning the expected string
final type.
An important remark is that module types are strict and there is no implicit instantiation of explicit quantification. Automatic instantiation is only done when encountering a forall-type as a type, not as a subterm-of a type. Hence, coercion cannot be used to restrict the type of a module:
// Rejected.
_ = {{ function f(x) { x }, function g(x) { x + 1 } }} : {{ f : int -> int, g : int -> int }}
> *Error*
Expression has type { f: !for all 'a . 'a -> 'a, g: int -> int } but is
coerced into { f: int -> int, g: int -> int }.
Types !for all 'a . 'a -> 'a and int -> int are not compatible
Hint:
Error occurred through field f.
// Ok.
_ = {{ function f(int x) { x }, function g(x) { x + 1 } }} : {{ f : int -> int, g : int -> int }}
> _do_ : {f : int -> int, g : int -> int} = { f = <fun>, g = <fun> }
// Rejected.
_ = M : {{ f : int -> int, g : int -> int }}
> *Error*
Expression has type Mty but is coerced into { f: int -> int, g: int -> int }.
Types int -> int and !for all 'a . 'a -> 'a are not compatible
Hint:
Error occurred through field f.
// Ok.
_ = {{ f = M.f : int -> int, g = M.g }} : {{ f : int -> int, g : int -> int }}
> _do_ : {f : int -> int, g : int -> int} = { f = <fun>, g = <fun> }
// Ok.
_ = {{ function f(int x) { M.f(x) }, function g(int x) { M.g(x) } }} : {{ f : int -> int, g : int -> int }}
> _do_ : {f : int -> int, g : int -> int} = { f = <fun>, g = <fun> }
Modules can be returned as results of functions, hence providing the mean to create functors. For instance:
type Mty2 = {{ f : 'a -> 'a }}
> type Mty2 = {f : forall('a). 'a -> 'a}
M2 = {{ function f(x) { x } }} : Mty2
> M2 : Mty2 = { f = <fun> }
// A functor.
F(m : Mty2) = {{
function a(x) { m.f(x) }
function b(x, y) {
{ f1: m.f(1)
, f2: m.f(x)
, f3: m.f("string")
, f4: m.f(y)
}
}
}}
> F : Mty2 ->
> {a : forall('v0). 'v0 -> 'v0,
> b : forall('v0, 'v1, rows:, cols:...).
> 'v0, 'v1 -> {f1 : int, f2 : 'v0, f3 : string, f4 : 'v1} or ...} = <fun>
// Application of the functor.
M3 = F(M2)
> M3 : {a : forall('v0). 'v0 -> 'v0,
> b : forall('v0, 'v1, rows:, cols:...).
> 'v0, 'v1 -> {f1 : int, f2 : 'v0, f3 : string, f4 : 'v1} or ...} = { a = <fun> , b = <fun> }
_ = M3.a(1)
> _do_ : int = 1
_ = M3.b("toto", 18.26)
> _do_ : {f1 : int, f2 : string, f3 : string, f4 : float} or ... =
> { f1: 1 , f2: "toto" , f3: "string" , f4: 18.26 }
However, an important consequence of module types strictness is that a module cannot be passed to a function expecting a less general module or a record, for instance:
f(m) = {{
function a(x) {
{ f1: m.f(x)
, f2: m.f(365)
}
}
}}
> f : {f : int -> 'v0, ...} ->
> {a : forall(, rows:, cols:...). int -> {f1 : 'v0, f2 : 'v0} or ...} = <fun>
_ = f(M) // Rejected because M is too general.
> *Error*
> Function was found of type
> { f: int -> 'a, 'r.a } ->
> { a: !for all 'c.a . int -> { f1: 'a, f2: 'a } or 'c.a } but application
> expects it to be of type Mty -> 'b.
> Types int -> 'a and !for all 'c . 'c -> 'c are not compatible
> Hint:
> Error occurred through field f.
// Ok because M.f is instantiated.
_ = f({ f = M.f })
> _do_ : {a : forall(, rows:, cols:...). int -> {f1 : int, f2 : int} or ...} = { a = <fun> }
// Rejected.
_ = f({{ f = M.f }})
> *Error*
> Function was found of type
> { f: int -> 'a, 'r.a } ->
> { a: !for all 'c.a . int -> { f1: 'a, f2: 'a } or 'c.a } but application
> expects it to be of type { f: !for all 'b . 'b -> 'b } -> 'c.
> Types int -> 'a and !for all 'b . 'b -> 'b are not compatible
// Ok.
_ = f({{ f = M.f : int -> int }})
> _do_ : {a : forall(, rows:, cols:...). int -> {f1 : int, f2 : int} or ...} = { a = <fun> }
This construct allows to access the value of a record field following the
form expr.field_name
. As a consequence, it is expected that
expr
has a record type in which the field field_name
exists with a
given type τ
. The type of the whole expression is then this type τ
.
type t('a) = { 'a a }
> type t('a) = { 'a a }
x = { a: 3.14159 } : t(float)
> x : t(float) = { a: 3.14159 }
_ = x.a
> _do_ : float = 3.14159
_ = x.b
> *Error*
Record has type t(float) but field access expected it to have type
{ b: 'a, 'r.a }.
As shows the last example, if the "dotted" expression doesn't have a record
type containing at least the required field, the expression is ill-typed. It
is important to note that the expected record type must contain at least the
accessed fields and may contain any other ones as the ending row variable
'r.a
suggests.
The construct:
{ expr_0 with field_1: expr_1 , ... , field_n: expr_n }
allows creating a new record value, borrowing all the fields from
expr_0
but replacing the values of field_1
to field_n
by
those evaluated for expr_1
to `expr_n.
Three important points are:
- The fields updated must belong to the type of
expr_0
. This means that record update doesn't add new fields to the record. This restriction is required to keep Opa's type system consistent. - The types of the updated fields can however change from the ones they had in
expr_0
. - This construct only applies on records, i.e. open or closed rows hosted by a closed column.
Let's assume that expr_0
has a record type with fields field_1
to
field_n
and other possible fields fields_m
of type τ_ms
.
Expressions expr_1
, to expr_n
are typechecked leading to types
τ_1
, ..., τ_n
. The type of the whole expression is then the record
type where field_1
is bound to type τ_1
, ..., field_n
bound
to type τ_n
and all other fields from the type of expr_0
are
unchanged. In particular, if the original record type was open (resp.
closed), so will be the resulting one. This is clearly shown by the type of the
function f
in the example below where both the argument and return types
are open records since the body of this function only add the constraint
that the record must contain at least the field x
.
v = { x: 3 , y: 4 }
> v : {x : int, y : int} or ... = { x: 3 , y: 4 }
w = { v with x = 3.14159 }
> w : {x : float, y : int} = { x: 3.14159 , y: 4 }
function f(r) {
{ r with x: "str" }
}
> f : {x : 'v0, ...} -> {x : string, ...} = <fun>
x = f({ z: 5 })
> *Error*
> Function was found of type { x: 'a, 'r.a } -> { x: string, <'r.a> } but
> application expects it to be of type { z: int } or 'c.a -> 'b.
> Types { x: 'a, 'r.a } and { z: int } or 'c.a are not compatible
> Hint:
> Field(s) x only appear(s) in the first type.
> Hint:
> Field(s) z only appear(s) in the second type.
Although Opa proposes type inference, it is sometimes useful to force an expression to have a certain type instead of the one naturally guessed by the typechecker. This mostly happen when a particular signature is wanted for an expression (for instance for sake of abstraction) or when a type was structurally inferred although the user prefers to get a named type having the same structure (either for sake of shortness or because the type structurally inferred is recursive and hence must be explicitly replaced by a named type since cyclic types are not accepted in types appearing in values interfaces).
Type coercion, expr : t
, typechecks expr
leading to a type
τ
. Then is converts the type expression t
leading to a second type
τ'
. Finally, if τ
and τ'
are compatible, it gives the whole
expression the type τ'
.
type t = int
> type t = int
_ = 4 : t
> _do_ : t = 4
_ = 4.5 : t
> *Error*
> Description: This expression has type float.
> However, according to the context, it seems that it should have type t.
type ilst = { head : int , tail : ilst } or { nil }
> type ilst = {head : int, tail : ilst} or {nil}
function f(x, y) {
if (x) {
{ nil }
} else {
{ head: y , tail: f(true, y) }
}
> Cyclic type 'a
> where 'a = { head: 'b, tail: 'a } or { nil } or 'c.a##
> *Error*
> Exportation failure
function ilst f(x, y) {
if (x) {
{nil}
} else {
{head: y , tail: f(true, y)}
}
> f : bool, int -> ilst = <fun>
The first coercion in the above sample code show that because t
and int
are compatible, the value 4
being naturally of type int
can be assigned
the type t
. The second example shows that because 4.5
has type float
and this latter and int
are not compatible, coercion is a failure.
Finally, the last example exhibits a case where an explicit naming of a
recursive type must be done. First, a type similar to a list of integer is
defined, ilst
. Then the function f
that builds empty or one-length
lists of integer is defined. Since Opa typechecking is performed in a
structural way, nothing in the body of f
indicates that it works on a
particular named type. The constraints only indicates that the function returns
a record type containing a field tail
having this record type. As a result,
an error is issued since cyclic types must not escape in signatures.
Adding a coercion on the return type of f
, like in the last line of code,
allows the typechecker to find a named type hiding the recursion and removes
the error.
When using a parameterized type name in a coercion, it is possible to omit the arguments by which its parameters must be instantiated. The typechecker will then automatically instantiate them according to the context. Omission applies to all or none of the parameters: only omitting some of them will result in a typechecking error related to the arity of the used type name. To omit only some of the arguments, they must explicitly be replaced by some underscore ("_") characters upon matching the right arity of the type constructor.
type t('a, 'b) = { 'a first, 'b second }
> type t('a, 'b) = { 'a first, 'b second }
_ : t = { first: 1 , second: "str" }
> _do_ : t(int, string) = { first: 1 , second: "str" }
_ : t(int) = { first: 1 , second: "str" }
> *Error*
> Description: The type instanciation for t is invalid. According to the
> definition of this type, it should have 2 argument(s). However, here, it is
> applied to 1 argument(s).
_ : t(int, _) = { first: 1 , second: "str" }
> _do_ : t(int, string) = { first: 1 , second: "str" }
In the above sample code, a record type, t
, with 2 parameters is defined.
The first coercion omits all of them. According to the context, since field
first
is assigned the value 1
and field the value "str"
it is
inferred that 'a
maps onto int
and 'b
onto string
. The second
coercion only omits 1 of the parameters but doesn't explicitly state the
omission, hence making the type t
used with a wrong number of argument.
Finally, the last example explicitly claims that the first parameter of t
is
expected to be int
but second one is not explicitly given, hence
making the t
type constructor used with its right number of arguments and
avoiding the previous error.
Type variables appearing in coercion type expressions have precise scoping rules and may influence the generalization process, i.e. the polymorphism level of definitions. Mostly, in type annotations, type variables are identified by their name. This means that inside a definition, several occurrences of a same type variable name implies the sharing of this type variable among these occurrences. The scope of a type variable name is:
- the current toplevel value definition,
- the current module field definition.
In other words, all along these 2 constructs, a same variable name represents a unique type variable. For instance, in:
function 'a f(x, y) { y }
> f : 'v0, 'a -> 'a = <fun>
function 'a g('a x, y) { y }
> g : 'a, 'a -> 'a = <fun>
the definition of f
introduces a type variable 'a
used only once.
As a result, x
and y
have 2 different types since they are not
related. Conversely, in g
, a new type variable 'a
is introduced
(diferent from the one of f
), but used twice, once to constrain x
and
once to constrain y
. Since these 2 occurrences of 'a
are hosted in the
same toplevel definition, they represent the same variable. As a result, the
types of the 2 arguments of g
are enforced to be the same which was not the
case in f
.
This example can be reproduced the same way in a module since the scope of a variable introduced in a module field is this module field:
M = {{
function 'a f(x, y) { y }
function 'a g(x:'a, y) { y }
}}
> M : {
> f : forall('v0, 'a). 'v0, 'a -> 'a ,
> g : forall('a). 'a, 'a -> 'a } =
> { f: <fun> , g: <fun> }
As the type-forall's suggest variables are differents (i.e. not shared) between
the fields f
and g
since they are generalized separatly. And like in the
previous example, f
has 2 arguments with different types although g
have
its ones of the same type.
**CAUTION: **
Introducing type variables in coercions *doesn't* mean that the annotated
expression is polymorphic ! This only means that its type is named by this
type variable. Generalization, i.e. polymorphism introduction is a separate
process that arises at the end of a definition. In particular, introducing a
type variable in a annotation will postpone its possible generalization upon the
end of the current toplevel or module field definition. In effect, as soon as
a name is introduced for a type, we can't make sure this name will be or not
used later, until we get out of the definition where it is introduced. For
this reason, it is not safe to generalize it as soon as we leave the definition
where it appears since some later type constraints may change this variable
into a non-variable type, i.e. the variable will not remain a variable hence
can't be subject to generalization. Let's consider:
function f1() {
rec function 'b v1('a x) { v1(x) }
v1
}
> f1 : -> ('a -> 'b) = <fun>
function f2() {
rec function 'b v1('a x) { v1(x) }
_ = v1(1)
_ = v1(1.0)
v1
}
> *Error*
> Function was found of type int -> 'a but application expects it to be of type
> float -> 'b.
> Types int and float are not compatible
function f3() {
rec function 'b v1('a x) { v1(x) }
v2 : bool -> int = v1
v1
}
> f3 : -> (bool -> int) = <fun>
Let's first have a look at `f1`. It introduces 2 variables, `'a` and
`'b` that finally get generalized in the final type of `f1` that really
is polymorphic. However, it must be clear that these 2 type variables were *not*
generalized at the end of the local definition of `v1` ! They were
generalized at the end of the toplevel definition of `f1` through the
generalization of the type of `f1`. In other words, inside the body of
`f1`, the type of `v1` was not polymorphic because of the presence of the
type variables. If we used `v1` with several types in the body of `f1`,
we would get a typing error like it is the case in `f2`.
The definition of `f3` also shows that `'a` and `'b` are not generalized
as soon as exiting from the definition of `v1`. In effect, before returning
`v1` as the result of `f3`, we define a local variable `v2` as being equal
to `v1` but coercing its type. This implies a unification between the type
of `v2` which is forced to be `bool -> int` and an instance of the type
scheme of `v1`. But, because `v1` is not yet generalized, its type scheme
is "trivial", i.e. has no generic variables. Hence, taking an instance of it
implies directly sharing its scheme's body, then instantiating its non-generic
variables during the unification with the type of `v2`. For this reason,
`'a` becomes `bool` and `'b` becomes `int`, hence leading `f3` to
finally have type `-> (bool -> int)`.
Since databases are explicitly typed at their definition, accesses to paths simply have the type they were given when the database was defined. For instance, defining a database as:
db /test/i : int
db /test/map : map(int, string)
will explicitly state that the path /test/i
has type int
and the
path /test/map
has type map(int, string)
. This way, paths can only
by used in contexts where their types are compatible with expected ones.
Sum type definitions always lead to closed columns. In effect, such a definition enumerates all the values of the type, hence there is no reason for having a trailing column variable. In some particular cases, when an expression has a type coming from such a definition, it may be useful to alter its type to make it an anonymous open sum in order to "plug" it in a larger sum type. Let's consider the following example:
type small = { a } or { b } or { c }
> type small = {a} or {b} or {c}
type larger = small or { d }
> type larger = small or {d}
function larger_to_string(larger x) {
match (x) {
case { a }: "a"
case { b }: "b"
case { c }: "c"
case { d }: "d"
}
}
> larger_to_string : larger -> string = <fun>
small_v = { a } : small
> small_v : small = { void a }
_ = larger_to_string(small_v)
> *Error*
> Function was found of type larger -> string but application expects it to be
> of type small -> 'a.
> Types { a } or { b } or { c } or { d } and { a } or { b } or { c } are not
> compatible
> Hint:
> One of the sum types may be missing the following cases of the other:
> { d }.
In the above example, the type larger
clearly contains all the values of
the type small
, and even more. The function larger_to_string
operates
on all these values, hence is also able to operate on the type small
.
However, applying it to a value of type small
fails. The reason is that
the type small
is a closed sum. Hence it can't unify with larger
since
it has no column variable to acquire the field c
.
To workaround this issue, the @opensums
directive can be applied on the
expression small_v
in order to replace its type small
by the structure
of this type in which all columns are turned open (i.e. ending by a column
variable).
_ = larger_to_string(@opensums(small_v))
> _do_ : string = "a"
This directive only has an effect when applied to an expression having a sum type. Applying it to another type will not change the type unless it is a type variable. In effect, type constraints are discovered in an arbitrary order by the typechecker. At some point, it may arise that the type of an expression is not yet complete, hence still remain a type variable even if later it will appear to be a sum type. If the directive is applied to such an expression, it can't yet have any effect although if the type constraints were acquired sooner it would have an effect. When later the type variable may get instantiated by a sum type, it will be too late to open it since the directive expression will have been already processed. To prevent such non-commutative behavior, an error message is issued requesting to add a type constraint on the expression in order to be able to have its type before applying the directive.
function f(x) { @opensums(x) }
> *Error*
> Expression argument of the @opensums directive has a type that can't be subtype of another
> type (sum or record).
> Hint:
> Its type is currently unconstrained. Consider explicitly adding a type constraint toward
> a sum type to precise the expression's type.
It is important to note that this directive recursively (i.e. searching in depth) opens all the sums present in the type of its argument, except on function types where it only descends on the result type and not on the arguments types. In particular, named types are examined and their structure is also recursively traversed by the opening operation. It is worthy to make clear that because opening a sum type changes its nature, if an expression appears to have a named type being a sum, once open, it won't have anymore this named type, but instead, the structure of this named type in which sums are now open.
As presented in the section dedicated to the record update construct, fields borrowed from a record must exist in this record: it is forbidden to add a new field. This restriction is imposed for sake of consistency of Opa's type system. However, when 2 different record types have numerous fields in common, writing a function transforming a value of the first type to a value of the second one, we would like not to have to manually map the common fields' values 2 by 2 between the 2 records. For instance:
type house = { number : int , street : string , city : string , zip : int }
type rented_house = { number : int , street : string , city : string ,
zip : int , start_date : int , end_date : int }
defines 2 types, the first one describing houses in general and the second one describing houses being rented. The only difference is that the latter has 2 more fields than the first one. When modelling the fact that a house gets rented, we would like to write a function like:
function rented_house rent(house h, start_day, stop_day) {
{ h with start_date: start_day, end_date: stop_day }
}
However, above restriction forbids the addition of new fields as reported by the error message:
> *Error*
> Record to update has type house but extension requires it to have type
> { end_date: 'a, 'r.a }.
> Types { end_date: 'a, 'r.a } and
> { city: string, number: int, street: string, zip: int } are not compatible
> Hint:
> Field(s) end_date only appear(s) in the first type.
> Hint:
> Field(s) city number street zip only appear(s) in the second type.
One solution would then be to write this function under the form:
function rented_house rent(house h, start_day, stop_day) {
{ number: h.number, street: h.street, city: h.city, zip: h.zip,
start_date: start_day, end_date: stop_day }
}
which is very tedious when the number of common fields is important. As an
alternative it is possible to explicitly request a record type to get open,
hence adding it a fresh row variable to accept addition of new fields by
the directive @openrecord
. The renting function can then be written as:
function rented_house rent(house h, start_day, stop_day) {
{ @openrecord(h) with start_date: start_day, end_date: stop_day }
}
and finally gets the expected type: house, int, int -> rented_house
.
The @openrecord
directive can only be applied on an expression
having a record type. Applying it to another type will raise an error depending
on the encountered type.
First, when the type is a variable, for the same kind of reason than explained about the @opensums directive, it is requested to add a type constraint to explicitly precise the type as shown in the following example:
function f(x) { @openrecord(x) }
> *Error*
> Expression argument of the @openrecord directive has a type that can't be subtype of
> another type (record).
> Hint:
> Its type is currently unconstrained. Consider explicitly adding a type constraint
> toward a record type to precise the expression's type.
Second, for any other type not being a record or a variable the default error message is issued without any hint:
_ = @openrecord(1.0)
> *Error*
> Expression argument of the @openrecord directive has a type that can't be subtype of a record type.
It is important to note that, conversely to the @opensums
directive,
@openrecord
doesn't recursively traverse types: it only acts on the
toplevel structure of the type being expected, as previously explained, a
record type.
By default, Opa typechecker infers types in a structural way. This means that, without user coercions, it determines the types by guessing their structure rather by giving them names. This is pretty satisfactory since several named types can have a same structure and there is no reason the typechecker decides itself to give a name a priori to a type structure. For instance, consider:
type t = { a } or { b } or { c }
> type t = { a } or { b } or { c }
type u = { a } or { z } or { c }
> type u = { a } or { z } or { c }
function f(x) {
match (x) {
case { a }: "a"
case { c }: "c"
default: "something else"
}
}
> f : {c} or {a} or ... -> string = <fun>
the function f
has no reason to be considered as taking an argument of type
t
more than of type u
. The typechecker inferred a structural type,
i.e. the shape of the type.
If we want f
to take explicitly a particular named type, it must be
coerced by f(x : t)
or f(x : u)
. The fact that types are inferred
structurally is powerful since f
can naturally be used with a value of
type t
or u
since both of them have a compatible structure:
_ = f({ a } : t)
> _do_ : string = "a"
_ = f({ c } : u)
> _do_ : string = "c"
However, this also has drawbacks. During type inference, the typechecker knows how to deal with, i.e. build if required, cyclic (recursive) types. However such types can't escape in values signatures. They must get mapped onto a named type, whose definition is obviously recursive, who hides the structural recursion. This may require the user to add some type annotations, i.e coercions, to prevent these recursive types from remaining in the final signature of the definition which leads to an error message.
function len(l) {
match (l) {
case { end }: 0
case { head: _, tail: t }: 1 + len(t)
}
}
> Cyclic type 'a
> where 'a = { 'b head, 'a tail } or { end }
> *Error*
> Exportation failure
In the above example, the system determined that len
is a function taking
a sum type with 2 cases, the second one containing a field, tail
being
of this sum type. Hence, the type is really cyclic as reported in the error
message. To avoid such a recursive type, one must define a named type having
an equivalent structure:
type mylst('a) = { end } or { 'a head, mylst('a) tail }
allowing to "name" the recursion. From now, adding a type annotation in the
source of len
will give the typechecker a clue "which type is used here"
and will allow replacing the cyclic parts of the structure by the type name.
function len(mylst('a) l) {
match (l) {
case { end }: 0
case { head: _ , tail: t }: 1 + len(t)
}
}
> len : mylst('a) -> int = <fun>
It could have been possible to put the type constraint at some other points, like:
function len(l) {
match (l) {
case { end }: 0
case { head: _ , tail: t } -> 1 + len((mylst('a)) t)
}
but a good practice is to put constraints directly in the function's header,
the closest to its parameters and return types definition. Hence, the fully
annotated definition of len
according to this good practice would be:
function int len(mylst('a) l) {
match (l) {
case { end }: 0
case { head: _ , tail: t }: 1 + len(t)
}
}
where the return type is stated after the parameters of the function and before its body.
- A tour of Opa
- Getting started
- Hello, chat
- Hello, wiki
- Hello, web services
- Hello, web services -- client
- Hello, database
- Hello, reCaptcha, and the rest of the world
- The core language
- Developing for the web
- Xml parsers
- The database
- Low-level MongoDB support
- Running Executables
- The type system
- Filename extensions