Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Granular nix #15848

Merged
merged 11 commits into from
Sep 12, 2024
Merged

Granular nix #15848

merged 11 commits into from
Sep 12, 2024

Conversation

georgeee
Copy link
Member

@georgeee georgeee commented Jul 21, 2024

This PR is ought to be rebased after the following PRs are merged: #15510

This PR updates flake.nix, nix/ocaml.nix to integrate o1-labs/dune-nix library into the repository.

This adds a family of derivations that allow:

  1. Building any Mina library/executable in isolation from rest of the code
  2. Running unit tests
  3. Extract useful information about internal dependencies of Mina's Ocaml packages

Crucially, it allows Mina to be built package-by-package, caching build/test results for code that didn't change. This brings CI run (similar to buildkite/scripts/unit-test.sh) to be shortened from ~40 minutes to something between 2 minutes and 17 minutes (depending on how deep the changes are).

Build methodology

Dune files are analyzed by the o1-labs/describe-dune tool. Libraries, executables, tests and generated files, along with their mutual dependencies (as present by dune files) are extracted.

After that, dependencies between all of the units (libraries, executables, tests) and files are determined. Then a greedy algorithm attempts to "upgrade" each library/executable dependency to a package dependency. It ends with an error if it fails. But if it succeeds, it comes up with a dependency tree (that can be plotted with nix build mina#info.deps-graph) that allows compilation of project's Ocaml code to be executed package-by-package.

Then packages are compiled one by one using dune, with dependencies of a package built before it and then provided to the package via OCAMLPATH environment variable. Code of each package is isolated from its dependencies and packages that depend on it.

New nix derivations

There is a bunch of new derivations available using nix flake for Mina repository.
In subsections there are some examples and the full list of new derivations.

A note on building process and treatment of packages. All of the code is build on a package level. That is, for compilation units (executables, libraries, tests) that are assigned to a package, they are compiled with dune build --only-packages <package>. Any of dependencies are pre-built the same way and are provided to dune via OCAMLPATH.

For compilation units that have no package defined, a synthetic package name is generated. Path to dune directory with these units is quoted by replacing . with __ and / with - in the package path, and also prepending and appending the resulting string with __. E.g. for path src/test/command_line_tests a synthetic package __src-test-command_line_tests__ is generated. Such synthetic packages are built with dune build <path-to-dune-directory> (isolation is ensured by filtering out all of irrelevant Ocaml sources).

Examples

CLI commands below assume to be executed from a directory with Mina repository checked out and ./nix/pin.sh executed.

Description Command
Build all Ocaml code, run same tests as in unit-test.sh nix build mina#granular
Build all Ocaml code and run all unit tests nix build mina#all-tested
Build all Ocaml code without running tests nix build mina#all
Build mina_lib package nix build mina#pkgs.mina_lib
Build mina_net2 package and run its tests nix build mina#tested.mina_net2
Build validate_keypair executable nix build mina#exes.validate_keypair
Run tests from src/lib/staged_ledger/test nix build mina#tested.__src-lib-staged_ledger-test__
Plot dependencies of package mina_lib nix build mina#info.deps-graphs.mina_lib
Plot dependency graph of Mina repository nix build mina#info.deps-graph
Extract json description of dependencies nix build mina#info.deps

Dependency description generated via nix build mina#info.deps --out-link deps.json is useful for investigation of depencies in a semi-automated way. E.g. to check which executables implicitly depend on mina_lib, just run the following jq command:

$ jq '[.units | to_entries | .[] | { key: .key, value: [ .value.exe? | to_entries? | .[]? | select(.value.pkgs? | contains(["mina_lib"])?) | .key ] } | select (.value != []) | .key ]' <deps.json
[
  "__src-app-batch_txn_tool__",
  "__src-app-graphql_schema_dump__",
  "__src-app-test_executive__",
  "cli",
  "zkapp_test_transaction"
]

Combined derivations

Derivations that combine all packages: all of the Ocaml code is built, three options vary in which unit tests are executed.

  • #all
    • builds all the Ocaml code discovered in the dune root (libraries, executables, tests)
    • tests aren't executed
  • #granular
    • #all + running all of tests except for tests in src/app (behavior similar to buildkite/scripts/unit-test.sh)
  • #all-tested
    • #all + running all discovered tests
    • discovery of tests is done by finding test stanzas and libraries with inline_tests stanza

Individual compilation units

These allow every package to be compiled/tested in isolation, without requiring all of the other packages to be built (except for dependencies, of course).

  • #pkgs.<package>
    • takes sources of the package and builds it with dune build <package>
    • all library dependencies are provided via OCAMLPATH
    • derivation contains everything from the _build directory
  • #src.pkgs.<package>
    • show sources of a package (and some relevant files, like dune from the parent directory), as filtered for building the #pkgs.<package>
  • #files.<path>
    • build all file rules in the <path> used by stanzas in other directories
    • defined only for generated files that are used outside <path>/dune scope
  • #src.files.<path>
    • source director used for building #files.<path>
  • #tested.<package>
    • same as #pkgs.<package>, but also runs tests for the package
    • note that tests for package's dependencies aren't executed

There are also a few derivations that help build a particular executable. These are merely shortcuts for building a package with an executable and then copying the executable to another directory.

  • #all-exes.<package>.<exe>
    • builds a derivation with a single file bin/<exe> that is executable with name <exe> from package <package>
    • when a public name is available, <exe> stands for executable's public name (and private name otherwise)
  • #exes.<exe>
    • shortcut for #all-exes.<package>.<exe>
    • if <exe> is defined for multiple packages, error is printed
    • if <exe> is defined in a single package <pkg>, it's same as #all-exes.<pkg>.<exe>

Metadata/debug information

  • #info.src
    • mapping from dune directory path dir to metadata related to files outside of dune directory that is needed for compilation
    • in particular the following fields:
      • subdirs, containing list of file paths (relative to the dir) that contain dune files with compilation units
      • file_deps, containing list of file paths from other dirs which should be included into source when compiling units from dir (e.g. some top-level dune files which use env stanza)
      • file_outs, containing a mapping from absolute path of a file generated by some rule stanza of the dune file to type of this file output (for type being one of promote, standard and fallback)
  • #info.exe
    • mapping from executable path (in format like src/app/archive/archive.exe) to an object with fields name and package
    • package field contains name of the package that declares the executable
    • name is either public_name of executable (if defined) or simply name
  • #info.package
    • mapping from package name to an object containing information about all of the units defined by the package
    • object schema is the following:
      { exe | lib | test : { name:  { public_name: string (optional), name: string, src: string, type: exe | lib | test, deps: [string] } }
      
    • this object contains raw data extracted from dune files
    • deps contains opam library names exactly as defined by dune (ppx-related libraries are concatenated to libraries)
  • #info.separated-libs
    • when there is a package-to-package circular dependency, this would be a non-empty object in the format similar to #info.deps containing information about edges in dependency graph that form a cycle
    • useful for debugging when an error Package ${pkg} has separated lib dependency to packages which may occur after future edits to dune files
  • #info.deps
    • mapping from files and units to their dependencies
    • external dependencies (defined outside of repository) are ignored
    • schema:
      { files: { <path> : { exes: { <package> : [<exe>] } } },
        units: { <package> : { exe | lib | test : { <name> : { 
           exes: { <package> : <exe> },
           files: [<dune dir path>],
           libs: { <package> : [<lib name>] },
           pkgs: [ <package> ]
        } } 
      }
      
  • #dune-description
  • #base-libs
    • a derivation that builds an opam repo-like file structure with all of the dependencies from opam.export (plus some extra opam packages for building)

Dependency graphs

  • #info.deps-graph
    • generates a dot graph of all of Mina packages (a huge one)
    • see example (after generating an svg with nix build mina#info.deps-graph --out-link all.dot && dot -Tsvg -o all.svg all.dot)
  • #info.deps-graphs.<package>
    • plots package dependencies for the <package>

Here's example of graph for mina_wire_types package:

Dependencies of mina_wire_types

Note that there are a few details of this graph. Graph generated for a package p displays may omit some of transitive dependencies of a dependency package if they're formed by units on which p has no dependency itself. And dependencies A -> B and B -> C do not always imply A -> C: package B may have a test dependent on package C, but A doesn't depend on that tests, only libraries it uses.

True meaning of this graph is that one can build package by package following edges backwards, building all of the units of a package all at once on each step. Interpretation of a graph for dependency analysis is handy, just it's useful to keep in mind certain details.

Testing

Via CI:

  • Created a branch out of this, merged an unrelated PR
  • Posted a comment !ci-nix-me
  • Observed build completed
  • Checked that the build succeeded

Manually:

  • Changed some code of tests for some packages
  • Ran nix build mina#tested.<pkg>, observed failures
  • Reverted the change, observed build success

Also, to check the caching:

  • Changed some files of packages that are at various depth of the dependency stack
  • Ran nix build mina#granular
  • Confirmed that only necessary packages are rebuilt and have their tests executed
  • Confirmed the total build/test time to vary in relation to depth of a change

Future work

Some work items are left for future:

  • Start checking that generated files with promote propagation are generated the same way they appear in repository
  • Generate Docker images and post their URLs in form of a PR comment after !ci-nix-me call
  • Optimize nix flake "warm-up": rewrite parts of dune-nix from nix to Ocaml (processing is inefficient in the current form)
  • Improve UX on handling build jobs that failed (upload separate logs to the Buildkite job)

Checklist

  • Dependency versions are unchanged
    • Notify Velocity team if dependencies must change in CI
  • Modified the current draft of release notes with details on what is completed or incomplete within this project
  • Document code purpose, how to use it
    • Mention expected invariants, implicit constraints
  • Tests were added for the new behavior
    • Document test purpose, significance of failures
    • Test names should reflect their purpose
  • All tests pass (CI will check this if you didn't)
  • Serialized types are in stable-versioned modules
  • Does this close issues? None

@georgeee georgeee requested review from a team, bkase, psteckler, mrmr1993 and nholland94 as code owners July 21, 2024 12:34
@georgeee georgeee force-pushed the georgeee/granular-nix branch from 1106bc1 to d97766c Compare July 21, 2024 13:13
@georgeee georgeee requested a review from a team as a code owner July 26, 2024 10:26
docs/README.md Outdated
@@ -3,3 +3,4 @@
The docs for the Mina Protocol website are published on [docs.minaprotocol.com](https://docs.minaprotocol.com/).

The docs repository is [https://github.com/o1-labs/docs2/](https://github.com/o1-labs/docs2/).
Hey
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: revert the random change

@georgeee georgeee force-pushed the georgeee/granular-nix branch from 69c9b7b to 73917d7 Compare July 31, 2024 21:27
@georgeee georgeee changed the base branch from compatible to georgeee/remove-unnecessary-submodules-mini July 31, 2024 21:27
@georgeee georgeee self-assigned this Aug 15, 2024
@georgeee georgeee force-pushed the georgeee/granular-nix branch 5 times, most recently from 66b2076 to 9e925c5 Compare August 23, 2024 18:01
Base automatically changed from georgeee/remove-unnecessary-submodules-mini to compatible September 3, 2024 17:38
Reverts change introduced in fcbea58.

In develop it isn't used. And it prevents further work in incremental
nix builds of Mina. If a patch will need to be applied to a source tree,
it's recommended to come up with a more involved process where patch
file is given as an argument to nix/ocaml.nix, then split to a number of
separate files (using patchutils) and then every patch is applied in
patch phases of derivations that work on chunks of tree.
Using the o1-labs/dune-nix library allows for "granular" builds.

Every Ocaml package, test, executable etc. are treated by nix as
standalone build units. This allows to use nix's capabilities of
parallelizing builds and caching build results.
@georgeee georgeee force-pushed the georgeee/granular-nix branch from 9e925c5 to 97f63d0 Compare September 3, 2024 20:18
@georgeee
Copy link
Member Author

georgeee commented Sep 3, 2024

!ci-build-me

@georgeee
Copy link
Member Author

georgeee commented Sep 3, 2024

!ci-nix-me

Copy link
Member

@Trivo25 Trivo25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamping only product

@georgeee georgeee requested review from volhovm and dkijania and removed request for a team, bkase, mrmr1993, psteckler and nholland94 September 4, 2024 17:44
Copy link
Member

@volhovm volhovm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super great feature!

@@ -85,16 +91,150 @@ let
[ zlib bzip2 gmp openssl libffi ]
++ lib.optional (!(stdenv.isDarwin && stdenv.isAarch64)) jemalloc;

dune-nix = inputs.dune-nix.lib.${pkgs.system};

base-libs = dune-nix.squashOpamNixDeps scope.ocaml.version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when someone changes opam.export? Does dune-nix re-generate some nix parts, or nix reads opam.export every time?

From looking at dune-description and how it's used in duneDescLoaded and info, it seems the latter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's being read in full after every change. However parsing it with dune-description is fast, and result will be cached. So this isn't an issue.

sqlSchemaFiles = with inputs.nix-filter.lib;
filter {
root = ../src/app/archive;
include = [ (matchExt "sql") ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a bit offtop, but assuming these sql files create/drop tables, what happens during DB migrations? Migrations are all outside of scope of nix building, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have any migrations as of now. This folder contains SQL schema files which should always be enough for a unit-like testing of archive DB.

commitShort = builtins.substring 0 8 commit;
cmdLineTest = ''
mina --version
mv _build/default/src/test/command_line_tests/command_line_tests.exe tests.exe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for including just this set of tests? (I understand this just checks that the node does not crash immediately? and can recover from a crash?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are very simple tests of CLI interface. I think idea behind this test was to run a comprehensive check of all of the CLI commands, that they do what they are supposed to do.

As of now, it's very basic. Something along the lines of what you said.

It's wrapped in a vmtest because it opens tcp/ip ports which is prohibited within nix environment.

./tests.exe --mina-path mina
'';
in [
(dune-nix.testWithVm { } "mina_net2" [ pkgs.libp2p_helper ])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this testWithVm somehow use the vmOverlays above? It seems that vmOverlays is only passed to granularOverlay.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it doesn't. Every entry in vmOverlays is independent from another.

Most tests we define in vmOverlays use tcp/ip localhost connections within the test, which we can't run explicitly from within nix. With an exception of __src-lib-staged_ledger-test__, which needs a VM because it uses far too many file descriptors, and nix is restrictive about that (VM specifes a specifically set high limit on open file descriptors).

pkgs.__src-lib-ppx_mina-tests__ =
makefileTest "__src-lib-ppx_mina-tests__" super;
pkgs.__src-lib-ppx_version-test__ =
makefileTest "__src-lib-ppx_version-test__" super;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried that these declarations are so explicit. I'd assume there would be a script that auto-generates them, or at least tells if you if a new package you created is not explicitly declared in this file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly this way. Absolute majority of tests are autodetected. Anything that dune test runs will be discovered automatically.

For some of these tests we need to specify external dependencies. E.g. for mina_lib tests we need libp2p_helper compiled from Go code. With make build we'd inject it into environment for all the packages, in nix wrapping I decided to make it more delicate: make only relevant tests/modules dependent on an external dependency, not every single package. My approach has a benefit of say mina_wire_types tests not requiring libp2p_helper to be compiled/executed.

In this case though it's different. Both ppx_mina and ppx_version have their tests turned off the execution via dune test, because dune test interface wasn't flexible enough (sometimes we want to test that some test code doesn't compile with a ppx, this is not easy to do from within dune test). For this reason tests for these packages are overridden with explicit makefileTest scenario.

@@ -31,7 +31,7 @@
mina_metrics.none
logger.fake
)
(forbidden_libraries node_config)
(forbidden_libraries mina_node_config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this renamed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't renamed, just public name will be used instead of private name. Some context: Ocaml packages always have a private name and sometimes have a public name. When two packages are compiled simultaneously from their codes, both private and public names are good to go. This is the case of running dune build src/lib that we currently have in CI.

Nix wrapping introduced in this PR has a different strategy: it compiles every package in isolation with code of other packages not being available. Dependencies are supplied as precompiled units via OCAMLPATH variable. When package is used as a precompiled dependency from path, it only can be referred to by its public name. Hence the need to update this dune file.

@georgeee georgeee merged commit a1163fd into compatible Sep 12, 2024
45 checks passed
@georgeee georgeee deleted the georgeee/granular-nix branch September 12, 2024 14:04
@volhovm
Copy link
Member

volhovm commented Sep 12, 2024

🤩 🤩 🤩

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants