-
Notifications
You must be signed in to change notification settings - Fork 55
Lists
Lists are a common data structure in Mercury, as in most functional programming languages.
list(T)
is a discriminated union type defined in the standard list
module:
:- type list(T)
---> []
; [T | list(T)].
All elements of a list must have the same type T.
[]
denotes the empty list, also pronounced "nil".
If you have an element X
and another list Xs
then [X | Xs]
is a
list with X
at the head of the list, followed by the tail of the list
Xs
. The [|]/2
constructor is pronounced "cons".
List terms are written as comma-separated terms between square brackets.
[a, b, c]
is short for [a | [b | [c | []]]]
.
You can write multiple comma-separated values before the vertical bar.
[a, b, c | Tail]
is short for [a | [b | [c | Tail]]]
.
The list
module has a lot of useful operations on lists.
We'll look at a few of them now, and consider how they are implemented.
The list.length/1
function return the lengths of a list;
there is also a predicate version. They are declared as:
:- func length(list(T)) = int.
:- pred length(list(T), int).
:- mode length(in, out) is det.
We can implement the function straightforwardly:
length([]) = 0.
length([_X | Xs]) = 1 + length(Xs).
That works, but there is a small problem. Each call to +
cannot proceed
until both its arguments have been computed. To compute the right operand
we make a recursive call to length/1
. For each element in the list,
the call stack will grow by one stack frame. For a very long input list,
the program may run out of stack space and crash.
Fortunately, there is a simple solution. We can take advantage of the associativity of addition and write the function in an "accumulator style", where we evaluate the addition first and make the recursive call afterwards.
length(Xs) = Len :-
length_acc(Xs, 0, Len).
:- pred length_acc(list(T), int, int).
:- mode length_acc(in, in, out) is det.
length_acc([], N, N).
length_acc([_X | Xs], N0, N) :-
length_acc(Xs, N0 + 1, N).
The length_acc
procedure is tail recursive. The recursive call (in the
second clause) is the final action before returning to its caller.
The return values of the recursive call (in this case, just N
) will be
exactly the values, in the right order, that the current call needs to return to its
caller. Therefore, there is no need for the recursive call to return
to the current call; it could return the values directly to the parent
caller. In such a case, the compiler can perform tail call elimination:
the tail call will be compiled to a jump that reuses the stack frame
of the current call, and when it returns will return directly to the
current call's caller.
The accumulator version of length/1
uses constant amount of stack
space regardless of the length of the input list.
The Mercury compiler has an option --warn-non-tail-recursion
to help
detect non-tail recursive procedures.
Another useful operation on lists is to append them together.
This can be done
with the predicate append(Start, End, List)
, which is true if and
only if List
is the result of concatenating Start
and End
.
:- pred append(list(T), list(T), list(T)).
:- mode append(di, di, uo) is det.
:- mode append(in, in, out) is det.
:- mode append(in, in, in) is semidet. % implied
:- mode append(in, out, in) is semidet.
:- mode append(out, out, in) is multi.
There are also function versions:
:- func append(list(T), list(T)) = list(T).
:- func list(T) ++ list(T) = list(T).
so you can write [1, 2, 3] ++ [4, 5, 6]
or append([1, 2, 3], [4, 5, 6])
.
Let's consider the implementation of the predicate. There are two cases.
Appending an empty list to another list just produces that second list.
To append a non-empty list [X | Xs]
to another list Ys
, we just need to
"cons" X
to the result of appending Xs
and Ys
.
append([], Ys, Ys).
append([X | Xs], Ys, [X | Zs]) :-
append(Xs, Ys, Zs).
That's all there is to it.
What about all those modes?
:- mode append(in, in, out) is det.
This means append/3
can append two input lists to produce an output list,
as expected.
:- mode append(in, in, in) is semidet.
Given three input lists, this mode can tell you if the third list is the
result of appending the first two lists. If not, the predicate fails,
so this mode is semidet
. You could append two lists then compare it
with the third, but this mode avoids building the temporary list.
:- mode append(in, out, in) is semidet.
For a prefix list Xs
and a list Zs
, this mode can produce a list
Ys
such that appending Xs
and Ys
produces Zs
. If Xs
is not a
prefix of Zs
then the predicate can fail, so this mode is semidet
.
:- mode append(out, out, in) is multi.
For a list Zs
, append/3
can generate the possible pairs of Xs
and Ys
such that appending Xs
and Ys
produces Zs
.
So append(Xs, Ys, [1, 2, 3])
has these solutions:
append([], [1, 2, 3], [1, 2, 3])
append([1], [2, 3], [1, 2, 3])
append([1, 2], [3], [1, 2, 3])
append([1, 2, 3], [], [1, 2, 3])
Since this mode must have at least one solution, and possibly more, its
determinism is multi
.
:- mode append(di, di, uo) is det.
In theory, you could provide two unique lists to append/3
. Unique
means there are no other references to either list, so append
could
destructively update them or reuse their memory cells to produce the
result, which will also be unique. Unfortunately, it doesn't work; the
result list is allocated afresh. It's also very hard to provide unique
arguments of the input arguments in the first place.
Now, is the append(in, in, out)
procedure tail recursive?
If we rearrange the second clause a little:
append([], Ys, Ys).
append([X | Xs], Ys, Result) :-
append(Xs, Ys, Zs),
Result = [X | Zs].
It's clear that the recursive call is not is not in a tail position. After the
recursive call returns Zs
, we still need to produce the Result
term.
If we could construct most of the result term first, leaving a space for the
recursive call to fill in, then we could move the recursive call into a tail
position and thus enable tail call elimination.
This is called the tail call modulo cons transformation (or last call modulo
cons), enabled by the --optimise-constructor-last-call
option.
The append(in, in, out)
procedure will be compiled as if it were written
something like this (if Mercury had C's &
and *
operators):
% not actual Mercury code
append([], Ys, Ys).
append([X | Xs], Ys, Result) :-
% allocate cell for Result, but fill in only X
Result = [X | Zs],
% call helper predicate, pass address of the second field of Result cell
append_lcmc(Xs, Ys, &Zs).
append_lcmc([], Ys, AddrOfZs) :-
% place Ys at location given by AddrOfZs
*AddrOfZs = Ys.
append_lcmc([X | Xs], Ys, AddrOfZs) :-
% allocate cell but fill in only X
Result = [X | Zs1],
% place Result at location given by AddrOfZs
*AddrOfZs = [X | Zs1],
% tail call, pass address of the second field of Result cell
append_lcmc(Xs, Ys, &Zs1).
The recursive call to append_lcmc
is in a tail position and, after tail call
elimination, this version of the append(in, in, out)
procedure uses constant
stack space.
member(Element, List)
is true if and only if Element
is a member of
List
. It can be called in the forward direction to test if Element
occurs
in List
. Or it can be called in the reverse direction, to generate all the
elements which occur in the list. The predicate is declared and defined thus:
:- pred member(T, list(T)).
:- mode member(in, in) is semidet.
:- mode member(out, in) is nondet.
member(X, [X | _]).
member(X, [_ | Xs]) :-
member(X, Xs).
Perhaps the only thing you might get wrong in the implementation is if you wrote the first clause as:
member(X, [X]).
That clause would only succeed if the second argument is unified with a list of exactly one element X, not what was intended.
reverse(List, Reverse)
is true if and only Reverse
is a list
containing the same elements as List
but in the reverse order.
The predicate and its functional counterpart are declared thus:
:- pred reverse(list(T), list(T)).
:- mode reverse(in, out) is det.
:- func reverse(list(T)) = list(T).
One way to implement reverse is: to reverse a non-empty list, remove the head of the list, reverse the rest of the list, then append the head element to the end of the reversed list.
reverse([], []).
reverse([X | Xs], Reverse) :-
reverse(Xs, RevXs),
append(RevXs, [X], Reverse).
It's very inefficient. The call to append(RevXs, [X], Rev)
calls the
procedure with mode append(in, in, out)
which cannot destructively
update its input arguments. The tail part of the appended list will be
exactly the second input list, so the result can share memory with the
second input list. However, the procedure must allocate a cons cell for
every element in the first input list to construct the result list.
We can count how many memory allocations will be incurred:
-
reverse([], [])
allocates 0 cons cells.[]
is a constant term that does not require memory allocation. -
reverse([3], Reverse)
will- call
reverse([], RevXs)
- allocates 0 cons cells - create
[3]
- allocates 1 cons cell (remember[3]
is shorthand for[3 | []]
) - call
append([], [3], Reverse)
- allocates 0 cons cells - for a total of 1 cons cell
- call
-
reverse([2, 3], Reverse)
will- call
reverse([3], RevXs)
- allocates 1 cons cell - create
[2]
- allocates 1 cons cell - call
append([3], [2], Reverse)
- allocates 1 cons cell - for a total of 3 cons cells
- call
-
reverse([1, 2, 3], Reverse)
will- call
reverse([2, 3], RevXs)
- allocates 3 cons cells - create
[1]
- allocates 1 cons cell - call
append([3, 2], [1], Reverse)
- allocates 2 cons cells - for a total of 6 cons cells
- call
In general, this definition of reverse/2
will allocate (n+1)*n/2 memory cells for
an input and output list of length n, a lot of unnecessary garbage.
A better implementation of reverse/2
builds up the output list as we
go down the input list. This implementation allocates one cons cell for
each element of the input list, and is also tail recursive.
reverse(List, Rev) :-
reverse_acc(List, [], Rev).
:- pred reverse_acc(list(T), list(T), list(T)).
:- mode reverse_acc(in, in, out) is det.
reverse_acc([], L, L).
reverse_acc([X | Xs], L0, L) :-
reverse_acc(Xs, [X | L0], L).
It turns out the helper predicate is quite useful in of itself, so the
standard library exports it under the name list.reverse_prepend/3
.
We previously declared a single mode for reverse/2
. If B is the reverse
of A then A is the reverse of B so, logically, we ought to be able to
call reverse/2
with the second argument as input and get the reversed
list out as the first argument. We can try to declare a second mode for
reverse/2
:
:- mode reverse(out, in) is det.
Alas, it doesn't work (for either definition of reverse). For the accumulator version you get this error message:
In clause for `reverse(out, in)':
in argument 1 of call to predicate `rev.reverse_acc'/3:
mode error: variable `List' has instantiatedness `free',
expected instantiatedness was `ground'.
We're missing a mode for reverse_acc/3
where the third argument is input
(ground). Try as we might, we can't get such a mode to compile.
reverse_acc
doesn't work in reverse.
In this case, reordering conjunctions is insufficient to get the reverse(out, in)
mode that we want. In other cases, reordering conjunctions might produce
a workable but inefficient procedure.
To handle both cases, Mercury allows you to annotate clauses with modes so
that the clause is only used for that mode of the predicate or function.
Then you can provide different implementations for different modes,
such as for reverse/2
:
reverse(List::in, Rev::out) :-
reverse_acc(List, [], Rev).
reverse(Rev::out, List::in) :-
reverse_acc(List, [], Rev).
There is one more piece of administravia. When you provide different clauses for the same predicate or function, you may have inadvertently (or deliberately!) written a predicate or function that does not have the same logical meaning for all modes. By default, the compiler will assume that's the case and treat the predicate or function as "impure" (we will discuss the purity system later).
Since we're sure reverse/2
means the same thing in both modes,
we can go ahead and declare that to the compiler:
:- pragma promise_equivalent_clauses(reverse/2).
reverse(List::in, Rev::out) :-
reverse_acc(List, [], Rev).
reverse(Rev::out, List::in) :-
reverse_acc(List, [], Rev).
The list
module provides some predicates to sort lists.
The basic one has the signature:
:- pred sort(list(T)::in, list(T)::out) is det.
The sorting algorithm of choice for linked lists is merge sort, and that's the algorithm used by the standard library. We'll implement it now.
Recall the merge sort algorithm: to sort a list, split the list into two sub-lists (of approximately equal lengths), recursively sort each sub-list, then merge the two sorted sub-lists. In code:
:- pred merge_sort(list(T)::in, list(T)::out) is det.
merge_sort(List, SortedList) :-
length(List, Length),
( if Length > 1 then
HalfLength = Length // 2,
det_split_list(HalfLength, List, Front, Back),
merge_sort(Front, SortedFront),
merge_sort(Back, SortedBack),
merge(SortedFront, SortedBack, SortedList)
else
SortedList = List
).
int.//
performs truncating integer division (but so does int./
).
det_split_list(N, List, Start, End)
splits List
into two lists
Start
and End
, where Start
has length N
. You might wonder why
it has a "det" prefix. There is a semidet predicate split_list/4
which fails if List
has fewer than N
elements. det_split_list/4
throws an exception instead of failing. We don't want to deal with a
failure here (it can't happen anyway) so we use det_split_list/4
.
Next, we make two recursive calls to sort the sub-lists. Notice
merge_sort/2
always recomputes the length of its input, but we already
know the length of the sub-lists. You may want to improve on this.
Once we have the two sorted sub-lists, we need to merge them together.
This is the job of merge/3
.
:- pred merge(list(T)::in, list(T)::in, list(T)::out) is det.
Let's get the base cases out of the way. Merging an empty list with another list just produces the other list:
merge([], [], []).
merge([A | As], [], [A | As]).
merge([], [B | Bs], [B | Bs]).
To merge two non-empty lists, we just need to compare the heads of both lists and place the lesser element at the start of the (recursively) merged list:
merge([A | As], [B | Bs], Cs) :-
( if A < B then
merge(As, [B | Bs], Cs0),
Cs = [A | Cs0]
else
merge([A | As], Bs, Cs0),
Cs = [B | Cs0]
).
Oops! There is a int.<
predicate and there is a float.<
predicate,
but we want merge
to work for elements of any type.
How can we compare two elements of some arbitrary type?
Is there even an ordering on elements of arbitrary types?
The reference manual says:
For every Mercury type there exists a standard ordering; any two values of the same type can be compared under this ordering by using the
builtin.compare/3
predicate. The ordering is total, meaning that the corresponding binary relations are reflexive, transitive and anti-symmetric.
The existence of this ordering makes it possible to implement generic data structures such as sets and maps, without needing to know the specifics of the ordering. Furthermore, different platforms often have their own natural orderings which are not necessarily consistent with each other. As such, the standard ordering for most types is not fully defined.
So if we don't mind the standard ordering, we can use builtin.compare/3
to
compare elements to define the ordering of our sort.
The builtin
module is implicitly imported into every Mercury module.
It defines, amongst other things, the compare/3
predicate and the
comparison_result
type:
:- type comparison_result
---> (=)
; (<)
; (>).
:- pred compare(comparison_result, T, T).
:- mode compare(uo, in, in) is det.
% some more modes
Then merge
can be written as:
merge([], [], []).
merge([A | As], [], [A | As]).
merge([], [B | Bs], [B | Bs]).
merge([A | As], [B | Bs], Cs) :-
compare(R, A, B),
(
( R = (<)
; R = (=)
),
merge(As, [B | Bs], Cs0),
Cs = [A | Cs0]
;
R = (>),
merge([A | As], Bs, Cs0),
Cs = [B | Cs0]
).
The builtin
module also defines some inequality predicates @<
, @=<
,
@>
, @>=
defined in terms of compare/3
, but some people may consider
those symbols a bit cryptic.
The list
module also provides the usual higher-order predicates
and functions that you'd expect: map, filter, and way too many
folds/reduce. Since they make for good examples, we'll discuss them when
we introduce higher-order programming in Mercury.
Here is the code from this page. Some names have been prefixed to
disambiguate them from names in the list
module.
:- module lists.
:- interface.
:- import_module io.
:- pred main(io::di, io::uo) is det.
:- implementation.
:- import_module int.
:- import_module list.
% length
:- func my_length(list(T)) = int.
my_length(Xs) = Len :-
length_acc(Xs, 0, Len).
:- pred length_acc(list(T), int, int).
:- mode length_acc(in, in, out) is det.
length_acc([], N, N).
length_acc([_X | Xs], N0, N) :-
length_acc(Xs, N0 + 1, N).
% append
:- pred my_append(list(T), list(T), list(T)).
:- mode my_append(di, di, uo) is det.
:- mode my_append(in, in, out) is det.
:- mode my_append(in, in, in) is semidet. % implied
:- mode my_append(in, out, in) is semidet.
:- mode my_append(out, out, in) is multi.
my_append([], Ys, Ys).
my_append([X | Xs], Ys, [X | Zs]) :-
my_append(Xs, Ys, Zs).
% member
:- pred my_member(T, list(T)).
:- mode my_member(in, in) is semidet.
:- mode my_member(out, in) is nondet.
my_member(X, [X | _]).
my_member(X, [_ | Xs]) :-
my_member(X, Xs).
% reverse
:- pred my_reverse(list(T), list(T)).
:- mode my_reverse(in, out) is det.
:- mode my_reverse(out, in) is det.
:- pragma promise_equivalent_clauses(my_reverse/2).
my_reverse(List::in, Rev::out) :-
reverse_acc(List, [], Rev).
my_reverse(Rev::out, List::in) :-
reverse_acc(List, [], Rev).
:- pred reverse_acc(list(T), list(T), list(T)).
:- mode reverse_acc(in, in, out) is det.
reverse_acc([], L, L).
reverse_acc([X | Xs], L0, L) :-
reverse_acc(Xs, [X | L0], L).
% merge sort
:- pred merge_sort(list(T)::in, list(T)::out) is det.
merge_sort(List, SortedList) :-
length(List, Length),
( if Length > 1 then
HalfLength = Length // 2,
det_split_list(HalfLength, List, Front, Back),
merge_sort(Front, SortedFront),
merge_sort(Back, SortedBack),
merge(SortedFront, SortedBack, SortedList)
else
SortedList = List
).
:- pred my_merge(list(T)::in, list(T)::in, list(T)::out) is det.
my_merge([], [], []).
my_merge([A | As], [], [A | As]).
my_merge([], [B | Bs], [B | Bs]).
my_merge([A | As], [B | Bs], Cs) :-
compare(R, A, B),
(
( R = (<)
; R = (=)
),
my_merge(As, [B | Bs], Cs0),
Cs = [A | Cs0]
;
R = (>),
my_merge([A | As], Bs, Cs0),
Cs = [B | Cs0]
).
% main
main(!IO) :-
merge_sort([3, 1, 4, 1, 5, 9], SortedList),
write(SortedList, !IO),
nl(!IO).