Skip to content

Latest commit

 

History

History
286 lines (232 loc) · 9.56 KB

HOWTO-precompile.adoc

File metadata and controls

286 lines (232 loc) · 9.56 KB

Precompile (a.k.a. ahead-of-time compile) Scheme code

Note
This is a feature still in development and very likely change over time. Especially, we usually keep ABI compatibility across micro versions so that you don’t need to recompile extensions for every new release, but we don’t guarantee that for precompiled binary yet — if you precompile your library to distribute, let users know that they need to recompile it for new Gauche releases, even if it’s just a micro version update.

As a script engine, the default operating mode of Gauche is to read Scheme source code at run-time, compile it on-the-fly and execute. However, that incurs some overhead of loading and compiling, and you might want to cut the time if you have tons of Scheme code.

Gauche has a feature to compile Scheme code and dump the result as C code, which can be compiled by a C compiler into DSO and dlopen-ed at runtime. At this moment, we don’t really "compile Scheme to C" — what we do is to compile Scheme into Gauche’s VM instruction sequence, as we do at run-time, and to dump the result as a C static data structure. Thus run-time performance won’t improve by precompilation, but you do cut the compile time, including macro expansion, and the parsing time. In future, we may improve performance of precompiled code, for we can spend more time on optimization.

Things to consider before precompiling

  1. The precompiler needs to make sense of source statically (i.e. without executing it), so we recommend the source to follow certain constraints:

    • Each scm source begins with a define-module form.

    • One scm source defines one module.

  2. In the normal operation, Gauche compiles and executes each toplevel form. It allows you to use a value computed at run-time to be used in the macro expansion of the subsequent toplevel forms in the same file. It doesn’t work in precompilation, since macro expansion precedes execution. This limitation includes procedure definitions. For example:

    (define (quote-it x) `(quote ,x))
    
    (define-syntax quote-self
      (er-macro-transformer
        (^[f r c] (quote-it f))))
    
    (define (test)
      (quote-self a b c))

    This works by loading the source, but won’t work in precompilation, since the expansion of quote-self requires to call quote-it, which is only available at run-time.

    You need to split the file, one contains the values the macro expander depends, and another contains the forms that uses the macro. The macro definition itself can be in either file. (If the macro expander does not depend on the runtime value computed in the same file, you can put the macro definition and macro use in the same file.)

  3. The cond-expand macro is handy to switch architecture-dependent code. Since it is a macro, it’s expanded at the precompile time, so the features are selected based on the platform you’re precompiling, not the platform you actually run the program on. That means you can’t simply precompile a Scheme file and carry over the generated C file to other platform, if the Scheme file needs to select architecture-dependent part with cond-expand. C code itself is portable, but the unportable selection is to be done before C file is created.

    If you need to distribute generated C code and you need to use cond-expand, just extract the part that uses cond-expand into a Scheme file and keep it from being precompiled.

The first step

Let’s begin with the simplest case. Precompiling a single Scheme file.

Suppose we have greet.scm:

(define-module greet
  (export greeting))
(select-module greet)

(define *message* '("Hello, " ".  How are you?"))

(define (greeting who)
  #"~(car *message*)~|who|~(cadr *message*)")

Run tools/precomp subcommand as follows. The -e option is important — that tells precomp to create a C file suitable for Gauche extension module on its own.

$ gosh tools/precomp -e greet.scm

Now you see two files are generated: greet.c and greet.sci

$ ls
greet.c  greet.sci  greet.scm

The .sci suffix stands for SCheme Interface, which contains a small Scheme code fragment for module definitions, plus forms that can’t be compiled to C, and the call to dynamic-load that loads DSO file:

$ cat greet.sci
;; generated automatically.  DO NOT EDIT
#!no-fold-case
(define-module greet (export greeting))
(select-module greet)
(dynamic-load "greet")

Precompiled DSOs are always loaded via its interface file — it resolves module dependency. That means you have to distribute the interface file along the DSO. You don’t usually see *.sci, though, since gauche-install command can rename *.sci to *.scm while it copies the file (with -C option). We use *.sci suffix only because we want to be able to generate the file in the same directory as the source *.scm file.

Next, you can compile greet.c to generate greet.so (or greet.dylib or greet.dll, depending on your platform):

$ gauche-package compile greet greet.c
$ ls
greet.c  greet.o  greet.sci  greet.scm  greet.so

gauche-package compile command knows which C compiler and flags are used when Gauche is compiled, where the Gauche header files are, etc. You have to compile the *.c file with the same compiler and same CFLAGS, otherwise Gauche can’t safely dlopen the resulting DSO. You can give extra flags to the gauche-package compile; just type this command with no arguments and it tells the usage.

Now, let’s try the precompiled module. We give -I. option to gosh so that it can find greet.sci and greet.so. We also give -fload-verbose option to check if it really loads the precompiled file, not the source file:

$ gosh -I. -fload-verbose
;;Loading /usr/share/gauche-0.9/0.9.5/lib/gauche/interactive.scm...
;;  Loading /usr/share/gauche-0.9/0.9.5/lib/gauche/common-macros.scm...
;;  Loading /usr/share/gauche-0.9/0.9.5/lib/gauche/defvalues.scm...
gosh> (use greet)
;;Loading ./greet.sci...
;;  Dynamically Loading ./greet.so...
gosh> (greeting "John")
"Hello, John.  How are you?"

Woohoo! Note that *.sci takes precedence over *.scm, so even we have both greet.sci and greet.scm, gosh loads greet.sci (and subsequently, greet.so).

In practice, you have to install *.sci file and *.so file to appropriate places so that other’s code can find them. The standard places can be queried to with gauche-config command:

$ gauche-config --sitelibdir
/usr/share/gauche-0.9/site/lib
$ gauche-config --sitearchdir
/usr/lib/gauche-0.9/site/x86_64-pc-linux-gnu

You can use gauche-install to copy the files:

$ gauche-install -T `gauche-config --sitelibdir` -C greet.sci
$ gauche-install -T `gauche-config --sitearchdir` greet.so

The -C option renames greet.sci to greet.scm in the target directory.

(If you use the template makefile generated by gauche-package generate, it has the skeleton to set up the installation destination.)

Multiple source files

The previous section shows converting one Scheme source into one DSO. However, usually you want to make one DSO per library, which may contain lots of source files.

Suppose you have the following source code structure, and foo.scm uses sub files as foo.boo and foo.woo.

foo.scm
foo/boo.scm
foo/woo.scm

You have to generate C file for each Scheme source, then compile and link all together.

$ gosh tools/precomp -I. -e foo.scm foo/boo.scm foo/woo.scm
$ ls
foo/  foo--boo.c  foo--woo.c  foo.c  foo.sci  foo.scm
$ ls foo
boo.sci  boo.scm  woo.sci  woo.scm

Note the -I. option, which allows precomp to find foo/util1.scm and foo/util2.scm referred from foo.scm.

You see C files are generated in the current directory, while SCI files are on the side of each file under the subdirectory.

Now you can compile those C files into a single DSO:

$ gauche-package compile foo foo.c foo--boo.c foo--woo.c
$ ls
foo/        foo--boo.o  foo--woo.o  foo.o    foo.scm
foo--boo.c  foo--woo.c  foo.c       foo.sci  foo.so

You still have an SCI file per each Scheme source, but there’s only one DSO and that can be loaded at once.

But I don’t want a bunch of SCI files!

The reason we have one *.sci file per one *.scm file is to guarantee the consitent behavior between source form and precompiled form. With keeping the interface file in the same relative path as the source, we can guarantee the library user can say not only (use foo), but also (use foo.boo) and (use foo.woo). The latter two still work, since we have foo/boo.sci and foo/woo.sci, with appropriate initialization in it.

However, if you have hundreds of source files and you know the external user will only use the toplevel module and not directly use the submodules, then you can consolidate those SCI files into single one. Just give --single-interface option to precomp.

$ gosh tools/precomp -I. -e --single-interface foo.scm foo/boo.scm foo/woo.scm
$ ls
foo/  foo--boo.c  foo--woo.c  foo.c  foo.sci  foo.scm
$ ls foo
boo.scm  woo.scm

Having this in foo.sci, you only need it and foo.so for the library to work:

$ gosh -I. -fload-verbose
;;Loading /usr/share/gauche-0.9/0.9.5/lib/gauche/interactive.scm...
;;  Loading /usr/share/gauche-0.9/0.9.5/lib/gauche/common-macros.scm...
;;  Loading /usr/share/gauche-0.9/0.9.5/lib/gauche/defvalues.scm...
gosh> (use foo)
;;Loading ./foo.sci...
;;  Dynamically Loading ./foo.so...