The ambient tool is a static binary verification tool for proving that weird machines in programs do (or do not) execute without errors under a variety of environmental conditions.
See the requirements and design documents in the doc
directory for more details on the motivation and structure of the verifier:
To build the verifier, first clone this repository with:
git clone [email protected]:ambient/verifier.git cd verifier git submodule update --init
Note: if these steps complete successfully, the directories in submodules/
will be populated (not empty).
Important: do not use --recursive
- it's not needed, and will result in downloading unnecessary content.
The verifier is built with the GHC Haskell compiler (versions 8.10, 9.0, and 9.2). To get the compiler, use your distribution packages or the ghcup tool:
ghcup install ghc 9.2.5 ghcup install cabal 3.6.2.0
Then, in the verifier
directory:
ln -s cabal.project.dist cabal.project ulimit -n 2560 # Might be required on OSX. cabal update && cabal configure -w ghc-9.2.5 pkg:ambient-verifier cabal build pkg:ambient-verifier
Note that there is an optional dependency on a C compiler (gcc
by default)
that is used when running C overrides. (See the "User-specified Function
Overrides" section for more details.)
Functionality in the verifier is broken into subcommands:
verify
: Verify a given binary with inputs satisfies verification conditionstest-overrides
: Run tests present in user override files
More details on available options for the top level verifier, as well as the
various subcommands can be found using --help
:
cabal run exe:ambient-verifier -- --help cabal run exe:ambient-verifier -- verify --help cabal run exe:ambient-verifier -- test-overrides --help
To run the verifier's test suite, run:
cabal run test:ambient-tests
To record the results of a test suite run to a junit.xml
file, run:
cabal run test:ambient-tests -- --xml=junit.xml
The verifier includes functionality for profiling its performance. To generate
a profiling report, pass --profile-to <DIR>
to an invocation of verify
.
This will generate a <DIR>/profile.html
file that can be viewed in a web
browser. The data that this HTML file presents will be periodically regenerated
as the verifier runs on the program. A typical example looks like:
The profiling report presents a hierarchical view of the functions that are invoked during the simulation of a program. The x-axis represents the time, and the y-axis shows the call stack depth, counting from zero at the bottom. Each stack frame is given a score based on some simulator heuristics (the time spent simulating it, its number of allocations, its numbers of merges, etc.). Generally speaking, the higher the score, the more likely it is to be a performance bottleneck.
There is also a nightly Docker image that gets built after each commit to the
master
branch. To run the verifier through Docker, take a verifier command
and replace the cabal run exe:ambient-verifier --
bit with
docker run artifactory.galois.com:5017/nightly
:
docker run artifactory.galois.com:5017/nightly --help docker run artifactory.galois.com:5017/nightly verify --help docker run artifactory.galois.com:5017/nightly test-overrides --help
The verifier's test suite can also be run through Docker, although it requires
changing the entrypoint to use ambient-tests
instead:
docker run --entrypoint ambient-tests artifactory.galois.com:5017/nightly
To record the results of a test suite run to a junit.xml
file, run:
docker run --entrypoint ambient-tests --volume $(pwd):/test-output artifactory.galois.com:5017/nightly --xml=/test-output/junit.xml
The --volume $(pwd):/test-output
mounts the current directory to
/test-output
in the container so that when the test suite records the
results to /test-output/junit.xml
, the junit.xml
file is copied back
to the current directory on your host machine.
When verifying programs, it is almost always useful to be able to stub out program functionality that is not important for the verification. For example, one may want to turn calls to printf
into no-ops to significantly speed up verification. Normally, these function overrides are written in Haskell; this is expressive, but not user friendly (end users are unlikely to know Haskell or to have a Haskell build environment available).
To better support end users, and enable faster experimentation, ambient-verifier
supports defining custom overrides in the following languages:
The crucible syntax language, which is based on a simple s-expression grammar. The concrete syntax is documented in the crucible-syntax repository. In addition to the base constructs provided by the core concrete syntax,
ambient-verifier
supports additional primitives.Example:
(defun @padd ((p1 Pointer) (p2 (Bitvector 64))) Pointer (start start: (let res (pointer-add p1 p2)) (return res)))
A limited subset of the C language, which we compile into crucible syntax. See the overrides-dsl repository for more information. Note that this is not currently at feature parity with crucible syntax overrides, so if you encounter a missing feature, please open an issue at this issue tracker.
Example:
int abs(int x) { int ret = x; if(x < 0) { ret = -x; } return ret; }
Note that the verifier uses a C compiler to preprocess the C file, so a C compiler must be available on your
PATH
. By default, the verifier looks forgcc
, but this can be overridden using the--with-cc
option.
A directory containing overrides can be specified to the verifier using the --overrides
command line option.
The overrides
directory in this repo contains various overrides that we have curated for particular applications.
The --overrides <DIR>
option expects the name of a directory DIR
whose
contents look like:
DIR/ ├── function/ │ ├── fun1.cbl │ ├── fun2.cbl │ ├── fun3.c │ ├── fun4.c │ └── ... └── overrides.yaml (optional)
The function
subdirectory contains .cbl
and .c
files for crucible
syntax and C overrides, respectively. Each of these file should be named after
the function that should be overridden. For instance, printf.cbl
would
correspond to an override for the printf
function.
The overrides.yaml
is an optional file that can be present if one desires
more fine-grained control over which functions in a binary should receive
particular overrides. The contents of an overrides.yaml
file will look
like this:
function address overrides: main.exe: 0x123: "foo" 0x456: "bar" ... libc.so: 0x123: "baz" 0x456: "quux" ... startup overrides: - "start1" - "start2" - ...
Here:
function address overrides
specifies an optional mapping from function addresses to override names. This can be useful for situations where a function in a binary has no corresponding symbol name (for instance, as in stripped binaries). A separate mapping is specified for each binary or shared library. The name that each address maps to must correspond the name of an override file in thefunction
subdirectory (not including the file extension).Note that the mapping only cares about the file names of each binary and does not care about the parent directories. For example, if the verifer is invoked on
/foo/bar/main.exe
, then theoverrides.yaml
only needs to specifymain.exe
, not its full path.startup overrides
specifies a list of overrides to run at the very start of simulating a binary, before the entry point (e.g.,main
) is run. Each override will be run in the order it appears in the list. Each startup override is expected to have no arguments and returnUnit
. Attempting to specify a startup override with a different type will result in an error before simulation begins.A key use case for startup overrides is for initializing the values of global variables, especially in conjunction with the
get-global-pointer-addr
,get-global-pointer-named
, andmalloc-global
functions. This can serve as a more straightforward way to initialize global state than, say, simulating everything inglibc
's_start
function.
Override names that appear in function address overrides
take precedence
over other overrides. To illustrate how this works, suppose a user specifies
--overrides DIR
, where the contents of DIR
are the following:
DIR/ ├── function/ │ ├── foo.cbl │ └── bar.cbl └── overrides.yaml
Where the contents of overrides.yaml
are as follows:
function address overrides: main.exe: 0x123: "bar"
Now suppose that the verifier encounters a function in main.exe
at address
0x123
named foo
. Although there is a foo.cbl
override present, the
function address overrides
mapping also maps the address 0x123
to
bar
. In such situations, the function address overrides
take higher
precedence, so the verifier will use the bar
override.
Each <name>.{cbl,c}
file is expected to define a function named @<name>
.
For instance, an add_bvs.cbl
file should define an @add_bvs
function,
e.g.:
(defun @add_bvs ((x (Bitvector 32)) (y (Bitvector 32))) (Bitvector 32) (start start: (let sum (+ x y)) (return sum)))
An override file is also permitted to define other functions. These functions
are considered local to the file defining it and are not visible to other
files. For instance, an alternative way to implement add_bvs.cbl
would
be:
(defun @add_bvs ((x (Bitvector 32)) (y (Bitvector 32))) (Bitvector 32) (start start: (let res (funcall @add_bvs_2 x y)) (return res))) ; Local to this file (defun @add_bvs_2 ((x (Bitvector 32)) (y (Bitvector 32))) (Bitvector 32) (start start: (let sum (+ x y)) (return sum)))
An override file is allowed to invoke functions defined in other override files
by way of forward declarations. A forward declaration states the type of a
function that is not defined in the file itself, but will be provided later by
some other means. For instance, suppose that @add_bvs_2
were defined in its
own .cbl
file and that you want to invoke it from add_bvs.cbl
. To do
so, one must declare add_bv_2
's type using a forward declaration in
add_bvs.cbl
:
(declare @add_bvs_2 ((x (Bitvector 32)) (y (Bitvector 32))) (Bitvector 32)) (defun @add_bvs ((x (Bitvector 32)) (y (Bitvector 32))) (Bitvector 32) (start start: (let res (funcall @add_bvs_2 x y)) (return res)))
The verifier will ensure that the invocation of add_bvs_2
will be resolved
to the same function defined in add_bvs_2.cbl
. The verifier will raise an
error if it cannot find a function of the same name, or if it finds a function
with a different type than what is stated in the forward declaration.
Currently, forward declarations can be resolved to overrides defined in other
override files as well as overrides that are built into the verifier (e.g.,
the override for memcpy
). Note that forward declarations cannot be used to
resolve functions that are local to a particular override file. Resolving
forward declarations to functions defined in binaries is not currently
supported.
One main type addition is for representing pointers:
Pointer
Unlike C/C++, these pointers are untyped and essentially correspond to uint8_t*
.
ambient-verifier
also adds a few wrappers around Bitvector
types for portability and convenience:
Byte
is an alias forBitvector 8
.Int
is an alias forBitvector 32
.Long
is an alias forBitvector 32
on Arm32 andBitvector 64
on X86_64.PidT
is an alias forBitvector 32
.Short
is an alias forBitvector 16
.SizeT
is an alias forBitvector 32
on Arm32 andBitvector 64
on X86_64.UidT
is an alias forBitvector 32
.
The extra operations supported in ambient-verifier
are:
bv-typed-literal :: Type -> Integer -> Bitvector w
where the first argument is aBitvector
type alias (see the Types section), the second argument is the value theBitvector
should contain, andw
is the number of bits in the returnedBitvector
(will match the width of theType
argument).fresh-vec :: String Unicode -> forall (t :: Type) -> Nat -> Vector t
, where(fresh-vec s t n)
generates a length-n
vector where each element is a fresh constant of typet
with the name<s>_<i>
(for eachi
between0
and<n> - 1
). Note thatt
must be a scalar type (e.g., no nestedVector
s), ands
andn
must both be concrete values.make-null :: Pointer
returns a null pointer.pointer-add :: Pointer -> Bitvector w -> Pointer
wherew
is the number of bits in a pointer (usually 32 or 64).pointer-diff :: Pointer -> Pointer -> Bitvector w
wherew
is the number of bits in a pointer (usually 32 or 64).pointer-sub :: Pointer -> Bitvector w -> Pointer
wherew
is the number of bits in a pointer (usually 32 or 64).pointer-eq :: Pointer -> Pointer -> Bool
.pointer-read :: forall (t :: Type) -> Endianness -> Pointer -> t
where the first argument is the type of the value to read and the second argument isle
orbe
.Type
must either beBitvector (8 * w)
(for some positive numberw
) or one of the type aliases listed above.pointer-write :: forall (t :: Type) -> Endianness -> Pointer -> t -> Unit
where the first argument is the type of the value to read and the second argument isle
orbe
.Type
must either beBitvector (8 * w)
(for some positive numberw
) or one of the type aliases listed above.
There are some overrides that are built-in to the verifier, as they cannot easily be defined in terms of the primitives that the syntax override language provides. The following overrides can be invoked from both binaries and syntax overrides:
accept :: Int -> Pointer -> Pointer -> Int
bind :: Int -> Pointer -> SizeT -> Int
calloc :: SizeT -> SizeT -> Pointer
close :: Int -> Int
connect :: Int -> Pointer -> SizeT -> Int
execve :: Pointer -> Pointer -> Pointer -> Int
exit :: Int -> Void
free :: Pointer -> Unit
getppid :: PidT
listen :: Int -> Int -> Int
recv :: Int -> Pointer -> SizeT -> Int -> SizeT
malloc :: SizeT -> Pointer
memcpy :: Pointer -> Pointer -> SizeT -> Pointer
memset :: Pointer -> Int -> SizeT -> Pointer
mkdir :: Pointer -> SizeT -> Int
open :: Pointer -> Int -> SizeT -> Int
read :: Int -> Pointer -> SizeT -> SizeT
send :: Int -> Pointer -> SizeT -> Int -> SizeT
shmat :: Int -> Pointer -> Int -> Pointer
shmget :: SizeT -> SizeT -> Int -> Int
socket :: Int -> Int -> Int -> Int
read :: Int -> Pointer -> SizeT -> SizeT
write :: Int -> Pointer -> SizeT -> SizeT
The following overrides can be invoked from both binaries and syntax overrides, but with the limitation that they can only be invoked from syntax overrides without any variadic arguments:
sprintf :: Pointer -> Pointer -> ... -> Int
sscanf :: Pointer -> Pointer -> ... -> Int
The following overrides can only be invoked from syntax overrides:
malloc-global :: SizeT -> Pointer
is likemalloc
, except that it is explicitly meant for allocating memory for use in global variables.read-bytes :: Pointer -> Vector (Bitvector 8)
reads a null-terminated, sequence of bytes from thePointer
. The null terminator at the end of the sequence of bytes will be concrete, but the preceding bytes may be symbolic. Unlikeread-c-string
, this function reads the raw bytes without converting to a particular text encoding.read-c-string :: Pointer -> String Unicode
reads a null-terminated, UTF-8–encoded, concrete string from thePointer
and converts it to aString
. Representing it as aString
can be more convenient in the syntax override language, as it is easier to manipulate and check for equality.write-bytes :: Vector (Bitvector 8) -> Pointer
writes a sequence of bytes to aPointer
, including a null terminator (which does not need to be in theVector
). The null terminator written at the end will be concrete, but the preceding bytes may be symbolic. Unlikewrite-c-string
, this function writes the raw bytes without converting to a particular text encoding. For example, to write the string"abc"
, supply(vector (bv 8 97) (bv 8 98) (bv 8 99))
as an argument, as the bytes97
,98
, and99
correspond to the numeric values of thea
,b
, andc
characters, respectively.write-c-string :: Pointer -> String Unicode -> Unit
writes a UTF-8–encoded, concrete string to aPointer
, including a null terminator.Note that in order to write an escaped Unicode character, one must supply an extra backslash. For instance, to write the
☃
character, supply the string"\\9731"
. Note that some Unicode characters require multiple bytes in the UTF-8 encoding, so make sure that thePointer
has enough space allocated to store all of the bytes.print-pointer :: Pointer -> String Unicode
converts a pointer to a string representation. This prints the pointer as(block, offset)
, or simplyoffset
ifblock
is0
.
The following overrides can only be invoked from syntax overrides when using
the verify
command, as they require interfacing with a binary. Attempting
to use any of these overrides from the test-overrides
command (which
operates independently of any binary) will result in an error:
get-global-pointer-addr :: String Unicode -> SizeT -> Pointer
will return a pointer to a global variable, where:- The first argument must be a concrete string that indicates the binary in which the global variable is defined.
- The second argument must be a concrete address for the global variable.
Note that only the file names of the binary needs to be specified, not the full path. For example, if a global variable is located in
/foo/bar/main.exe
, then the first argument should simply bemain.exe
.get-global-pointer-named :: String Unicode -> String Unicode -> Pointer
will return a pointer to a global variable, where- The first argument must be a concrete string that indicates the binary in which the global variable is defined.
- The second argument must be a concrete string that indicates the name of for the global variable. At present, only unversioned names are supported.
The same caveats about full paths mentioned in
get-global-pointer-addr
also apply toget-global-pointer-named
.
Overrides may declare global variables using defglobal
at the top level:
(defglobal $$varname Type)
The verifier permits global variable declarations anywhere in the top level,
including after their use sites. Global variables are scoped to the files they
are declared in. The verifier instantiates global variables as fresh symbolic
values. To change the value of a global variable, use set-global!
:
(set-global! $$varname value)
A .cbl
file can also access global variables defined externally by using an
extern. An extern declaration states the type of a global variable that is
not defined in the file itself, but will be provided later by some other means.
(Externs are to global variables what forward declarations are to functions.)
For instance, suppose we have overrides for functions named f
and g
,
where f
defines a global variable:
(defglobal $$f-glob Int) (defun @f () Unit (start start: (let val (bv-typed-literal Int 42)) (set-global! $$f-glob val) (return ())))
And g
accesses f
's global variable:
(extern $$f-glob Integer) (defun @g () Int (start start: (return $$f-glob))
Suppose that we first invoke f
, then g
. By the time that g
is
invoked, the value of $$f-glob
will already have been set to 42
, so
g
will return 42
.
Currently, externs can be resolved to global variables defined in other
override files. Note that the verifier will raise an error if it cannot find a
global variable of the same name, or if it finds a global variable with a
different type than what is stated in the extern
declaration. Resolving
externs to global variables defined in binaries is not currently supported.
There are some global variables that are built in to the verifier itself and
are accessible via extern
declarations. These global variables are often
useful in startup overrides, as they can be written to particular locations in
the binary before simulating the entry point function.
AMBIENT_environ :: Pointer
: An array ofKEY=VALUE
pairs representing each environment variable. This has the same contents as theenvp
argument to themain()
function. (See the "main()
Arguments" section.)
Crucible syntax files may optionally contain functions starting with @test_
that use assert!
to test the behavior of an override. Under normal
operation the verifier ignores these test functions, but when run with the
test-overrides
subcommand the verifier will execute any test functions it
finds and report test results. The test-overrides
subcommand has two
mandatory options:
--overrides
must point to the directory containing crucible syntax overrides.--abi
must be eitherX86_64Linux
orAArch32Linux
. This flag sets the ABI to use when interpreting crucible syntax overrides. For example, using theX86_64Linux
will cause the verifier to execute function override tests using the X86_64Bitvector
type aliases.
There are various pre-set configurations for controlling the behavior of how
memory is represented during simulation. These configurations can be selected
with the --memory-model MODEL-NAME
flag, where MODEL-NAME
must be one
of the following:
default
: The default memory model. Each call tomalloc
orcalloc
allocates a single, contiguous region of memory on the verifier's heap model.bump-allocator
: There is a single allocation representing the entire heap, and there is a global variable pointing to the start of unused memory in the heap. Each call tomalloc
orcalloc
will hand out some unused memory from the heap and bump the global variable accordingly.
Note that the choice of allocator may affect the number of goals that the verifier produces. As a result, running the verifier on a program with one allocator may produce failing goals that would not arise with a different choice of allocator.
By default, the verifier begins simulating binaries at the main()
function
rather than _start()
. This is because the implementations of _start()
in glibc
(related issue) and musl
(related issue)
often give the verifier trouble. Beginning at main()
is usually an effective
workaround, but this comes at the expense of skipping initialization-related
code in _start()
, which may be important for certain binaries. If you are
brave start simulation at a different entry point, you can do so with the
--entry-point-name <function-name>
option.
Note that the verifier is only able to discover a main()
function is the
binary contains the relevant function symbol. If a binary is stripped, however,
then the verifier will not be able to discover the main
symbol and will
give up as a result. To work around this problem, one can manally specify the
address of the entry point function (be in main()
or otherwise) with the
--entry-point-addr <function-address>
option.
If main()
is the entry point function for your binary, you may want to
supply arguments or environment variables to it. These can be specified with
the following command-line arguments:
--argv0 <ARG>
specifies the name of the first command-line argument to pass to the process (i.e.,argv[0]
). Usually, this is the name of the binary itself, so if--argv0
is not specified, its value will default to the--binary
path.--argument <ARG>
specifies a command-line argument to pass to the process. In other words, these specify all of the elements of theargv
array besides the first one. This can be supplied multiple times to pass multiple arguments (e.g.,--argument foo --argument bar ...
.--env-var <KEY>=<VALUE>
defines an environment variable namedKEY
with the concrete valueVALUE
for the duration of the process. In other words, these specify elements of theenvp
array. This can be supplied multiple times to define multiple environment variables.--env-var-from-bytes <KEY>[FILE]
defines an environment variable namedKEY
with a concrete value consisting of the bytes inFILE
's contents. This is like--env-var
, but--env-var
assumes a particular text encoding for its value (by virtue of being supplied directly on the command line), whereas--env-var-from-bytes
does not assume any text encoding.--env-var-symbolic <KEY>[LEN]
defines an environment variable namedKEY
with a value containingLEN
symbolic characters (plus a null terminator) for the duration of the process. Like--env-var
, this is a way to specify elements of theenvp
array. This can be supplied multiple times to define multiple symbolic environment variables.Note that unlike
argv
, the order of environment variables in theenvp
array is not specified. For example, passing--env-var FOO=f --env-var BAR=g
does not guarantee thatFOO=f
will appear beforeBAR=g
.If an environment variable is redefined on the command-line (e.g.,
--env-var TWEEDLE=dee --env-ar TWEEDLE=dum
), it is not specified which valueTWEEDLE
will have inenvp
. As a result, we don't recommend redefining environment variables like this.
The verifier fully supports statically linked libraries and has partial support
for dynamically linked libraries. In order to simulate a dynamically linked
library, it is required to put all of the shared libraries into a single
directory and pass the --shared-objects <lib-dir>
option to the verifier.
Be aware of the following limitations in the verifier's support for dynamically linked libraries:
- The verifier makes certain assumptions about the layout of PLT stubs that do
not hold for binaries compiled with
-fcf-protection
, which is now the default for many versions of GCC (e.g., the one provided on recent versions of Ubuntu). See this issue. To avoid any issues, it is recommended that you compile binaries with-fcf-protection=none
. - The verifier currently only supports a small number of dynamic relocations and will abort if it encounters a relocation that it doesn't support. See `this issue <https://gitlab-ext.galois.com/ambient/verifier/-/issues/93`_. If you encounter an unsupported relocation, please file an issue.