Skip to content

Latest commit

 

History

History
293 lines (254 loc) · 17.7 KB

Tour.md

File metadata and controls

293 lines (254 loc) · 17.7 KB

Let’s understand static analysis of Go code: A guided tour of “mingo”

“Static analysis” is what a program does when it parses some other program’s source code in order to transform, report on, or operate on it in some way. A simple example is turning the include/import/require statements in a collection of files into a dependency graph. A more complex example is a linter, which looks for style and correctness problems. Your IDE does static analysis when it helps you jump to the definition of some identifier. And of course the compiler statically analyzes source code in order to turn it into an executable binary.

The Go programming language was designed in part to be easy to parse and analyze, for the sake of fast compilation, and to simplify the creation of all manner of development tooling. To that end, its standard library includes some relevant packages, such as go/ast (for representing a program’s abstract syntax tree) and go/types (for supplementing such a tree with type information). There are additional static-analysis tools in the golang.org/x/tools module — about which, more below.

Mingo” is a Go static-analysis tool that I created recently in order to answer the question, “What is the oldest version of Go that can compile my code?” The answer to that question is what belongs in the go directive in a Go program’s go.mod file, but it has historically been challenging to know what to put there.

At a high level, mingo works by parsing a Go program, then walking its syntax tree looking for three things:

  1. Language constructs introduced at a specific version of Go. For example, if a program contains a for … range statement with no variable assignment, it needs at least Go 1.4. If a bit-shift expression uses an signed integer on the right-hand side, it needs at least Go 1.13.
  2. Identifiers added to the Go standard library at specific versions. For example, if the program refers to bufio.ErrFinalToken, it needs at least Go 1.6. If it calls bytes.Clone, it needs at least Go 1.20.
  3. The minimum Go version declared by the dependencies in go.mod.

Each time it finds one of these, it may bump up its idea of the minimum Go version needed.

Mingo refers to this process as a “scan,” and the behavior of the scan is controlled by a Scanner, defined here. The caller configures it with the desired settings and then calls its ScanDir method. The result of that is a Result, which can report the lowest minor version of Go needed for the scanned code (e.g., the 18 in “Go 1.18”) and a string explaining what feature of that version made it necessary (e.g., “generic function type”).

Scanner.ScanDir uses packages.Load to turn a tree of Go files into a sequence of packages.Package objects. Each encapsulates a lot of information about each Go package encountered, including syntax trees and types.

This list of packages is passed to Scanner.ScanPackages, which loops over them and calls Scanner.scanPackage on each one (stopping early if it ever discovers that the maximum known version of Go is needed).

This ends up creating a pkgScanner object containing the Scanner itself and information from the packages.Package object. That is then used to continue the scan by calling pkgScanner.file for each file in the package (again stopping early if the max-known-Go-version condition is reached).

Here’s where we can delve a bit into the types from the go/ast package. The syntax trees of the package are a sequence of ast.File objects, each of which contains a Decls field, a list of top-level declarations (among a few other pieces of information). For mingo’s purposes, pkgScanner.file needs only to iterate over those declarations and scan those.

The type ast.Decl is an interface, implemented by these concrete types:

(How do I know it’s those and no others? Decl uses the Go trick of requiring implementations to define an unexported no-op method − here it’s declNode − and these three types are the only ones in $GOROOT/src/go/ast that define it.)

We don’t care about BadDecl, which is a placeholder for declarations that couldn’t be parsed properly. If there is any unparseable code, mingo will already have returned an error (here). So pkgScanner.decl does a type switch, calling pkgScanner.funcDecl for function declarations and pkgScanner.genDecl for generalized declarations (types, variables, constants, and imports).

So far, this tree walk has taken us from containers to the things they contain: from a directory to a set of packages; from a package to a set of files; from a file to a set of declarations. Now at last we’re getting to where some actual computation happens.

A function declaration consists of an optional doc comment, an optional “receiver” (if the function is a method), a name, a “signature” (the names and types of any parameters and results), and a body. The receiver, the parameters, the results, and the body we descend as before (with calls to pkgScanner.fieldList and pkgScanner.funcBody), but now we also get to do our first check for a version-specific feature: generic type parameters. If we see any, we know this code requires Go 1.18 or later. The call to p.greater updates the Result in the Scanner (if 18 is greater than what’s already there) and returns the result of an isMax check that might allow us to quit early.

(How do we know this change requires Go 1.18? By reading the “Changes to the language” section of each Go version’s release notes. Here is the one for Go 1.18.)

Let’s drill down into pkgScanner.funcBody, which checks the list of statements that constitute a function body, and then checks that there’s a final return statement. (If there isn’t, then Go 1.1 or later is required.)

The types ast.Stmt, ast.Decl, and ast.Expr (which we’ll see in a moment), form the core of the ast package. Stmt is an interface, and it’s implemented by a wide assortment of concrete types, representing variable assignments, for loops, conditionals, and every other kind of statement in the Go language. Mingo inspects each statement with pkgScanner.stmt, which is just a big type switch for dispatching to functions for each concrete statement type. Let’s zoom in to one of those concrete types, ast.AssignStmt, handled by pkgScanner.assignStmt.

An assignment statement is a sequence of left-hand expressions (the variables, struct fields, or other lvalues being assigned to), a sequence of right-hand expressions (the values to assign to them), and an assignment operator. This is one of the token.Token constants from the go/token package, representing operators like =, :=, +=, and so on.

Scanning an assignment statement involves descending into the expressions it contains, of course, but there is some additional logic for checking whether this is a bit-shift assignment (operator <<= or >>=) and, if it is, whether its right-hand side is a signed rather than an unsigned integer. In that case, Go 1.13 is required.

How is that check done? By looking up the ast.Expr on the right-hand side of the assignment in the Types map of the types.Info contained in the pkgScanner. The result of the lookup is a types.TypeAndValue. This contains at least the type, and for constant expressions also the value, of expressions in the abstract syntax tree.

The Type in a TypeAndValue is another interface, representing categories of Go type. One implementation of Type is types.Array. Another is types.Map. And so on. The one we’re interested in for now is types.Basic, which represents booleans, numbers, and strings. To test whether that right-hand-side expression of the bit-shift-assignment statement is signed, we check whether it’s a *types.Basic and then whether its types.Integer flag is set and its types.IsUnsigned flag is unset.

Let’s now take a closer look at the last of our major interface types, ast.Expr, which represents every kind of expression in Go: identifiers, pointer dereferences, multiplications, and also types that are spelled out in the code. This interface is implemented by this collection of concrete types.

When pkgScanner.assignStmt needs to descend into the left-hand and right-hand expressions, it does so by calling pkgScanner.expr on each of them. This leads to another big type switch that dispatches to another set of type-specific pkgScanner methods. By now the techniques for walking these data types should be familiar, but there’s still one important thing we haven’t seen, so let’s focus on the expression type ast.Ident, handled in pkgScanner.ident.

First the function checks to see whether the identifier is any, and is being used as a type, and is predefined. If so, the code requires Go 1.18.

Otherwise it’s time to see what object this identifier is a use of, by consulting the Uses map of the types.Info contained in the pkgScanner. This maps the identifier to the thing it’s being used to denote. That thing − a type, a function, a constant, etc. − is represented by a types.Object, and from it we can get the path of the package it lives in.

Now we can use the package’s path and the identifier to do an API history lookup. This is where we check to see whether the code is referring to something from the standard library that was introduced at some particular version of Go. But how?

One thing we glossed over, back at the beginning of Scanner.ScanPackages, was this call to Scanner.ensureHistory, which ensures that the scanner’s API history information (this field) is populated. That happens here. It parses the files in the api directory that ships with Go (a snapshot of which is embedded into mingo itself). Each file describes the new identifiers added to the standard library at a particular version of Go. Here, for example, is the addition of the context package in Go 1.7.

Armed with that history, looking up an identifier in a particular package path is relatively simple, and happens here.

Back in pkgScanner.ident, if we get a hit from this history lookup, we update the required Go version using the value from the lookup.

That takes care of detecting language changes and standard-library changes. Of course there are a lot of other such cases that mingo covers, but all of those are handled in one or another of the ways described here. There’s one more way in which the minimum-required version of Go can increase, though, and that’s if it depends on a module that declares a higher minimum Go version. This is handled in Scanner.scanDeps.

That function parses the go.mod file (here) to discover direct dependencies. (Indirect ones are discarded here.) Each of those is then individually scanned by Scanner.scanDep.

The first thing that Scanner.scanDep does is to get a “download” of information about the dependency. This is the result of running go mod download -json, which happens here. (The extra logic delegating that job to a depScanner allows a mock dependency scanner to be injected for testing.) It is then possible to parse the dependency’s go.mod file to discover the version in its go directive, which mingo accepts on faith (rather than scanning all the code in the dependency too).

That’s everything about how mingo does its job. But there’s still one more topic to cover: the Analyzer API.

The golang.org/x/tools module defines a framework for Go static-analysis tools. To participate in this framework, an analyzer (like mingo) must present itself as an analysis.Analyzer. Doing so allows it to participate in command-line tools based on singlechecker and multichecker. (Which isn’t quite good enough for many purposes, which is why, as of this writing, there is work in progress to improve its API.)

Mingo includes an adapter for turning a Scanner into an analysis.Analyzer, here.

To continue learning about static analysis in Go, I recommended studying the code in golang.org/x/tools/go/analysis/passes, each subdirectory of which implements a different analyzer. For example, appends looks for misuse of the builtin append function, and defers looks for the common mistake of passing the result of a (non-deferred) time.Since call to a deferred function.