Re-enable coverage #16913

tokenrove · 2023-01-03T18:31:10Z

This has been languishing for a bit, and the "hard part" is figuring out exactly what llvm-cov invocations we want to best present the information, what tests should run in nightly to get useful coverage information, and so on. So I'm going to punt on these bits and aim to get the rest merged, which enables using mzcompose with --coverage and does demonstrate adding the fast sqllogictests with coverage to nightly, which is useful as a starting point to experiment with that coverage data. We can iterate on the rest.

Motivation

This PR adds a known-desirable feature, see MaterializeInc/database-issues#4781.

Checklist

This PR has adequate test coverage / QA involvement has been duly considered.
This PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way) and therefore is tagged with a T-proto label.
This PR includes the following user-facing behavior changes:

def- · 2023-02-13T13:49:09Z

ci/nightly/coverage.py

+            [
+                "rust-cov",
+                "export",
+                "--format=lcov",


Any reason to export as lcov?

I'm in the middle of experimenting with a bunch of ways of presenting the output. Originally I exported the JSON, which no tools seem to consume at all... so at least lcov output can be consumed by lcov's genhtml. Which I know is more useful than llvm-cov show's own HTML format which is completely unusable because it produces a giant, multi-hundred-megabyte HTML file. Suggestions appreciated though!

def- · 2023-02-13T13:49:51Z

ci/nightly/coverage.py

+            "rust-profdata",
+            "merge",
+            "-sparse",
+            *Path("test/sqllogictest/coverage").glob("*.profraw"),


Are we sure each profraw file has a unique name? environmentd and clusterd might be colliding with same name.

Yes, that's the problem with the environment variable set in mzcompose; FYI, reviewing this right now is probably not very useful since I am working on it. Or you could take it over as it is if you want?

Nono, just wrote down my thoughts as I read it out of interest, not a real review.

def- · 2023-02-13T13:52:01Z

I have a whole bunch of issues open regarding llvm-cov, but that was with C++, I'd be interested to see a coverage report and if things work well with Rust. I basically reimplemented llvm-cov show last time around based on the json export to show coverage as inline reviews on a PR.

def- · 2023-02-13T13:54:17Z

.gitignore

@@ -29,6 +29,7 @@ junit_*.xml
 perf.data*
 **/*.rej
 **/*.orig
+**/*.profraw


Can also add .profdata

def- · 2023-03-22T15:10:17Z

@tokenrove Any update on this? I might try working on PR code coverage soon, and would take this PR as the basis for my work.

tokenrove · 2023-03-22T15:35:16Z

Yes, I was going to suggest that I get rid of the commits that add this to nightly, and just get the other commits merged so mzcompose can be invoked with --coverage, and then you can take the rest.

def- · 2023-03-22T15:38:53Z

Alright, that would be fine.

tokenrove · 2023-03-22T20:06:17Z

I haven't re-run the nightly so I haven't checked if this still does the right thing, but I'll do that now.

instrument-coverage has been stable since Rust 1.60. I am removing the unresolved symbols warning hack for Nix because, like the commentors who closed the reference issues, it's unclear to me what would cause this, and I'd rather resolve it by fixing the actual issue. Looking at related issues like rust-lang/rust#92769, it's probably an artifact of the implementation in nightly, which hopefully is fixed now that it's stable.

Necessary to run rust-profdata and rust-cov for working with coverage data.

Without this, we end up mutating the same config again, which then causes docker compose to fail with: services.sqllogictest.environment array items[1,2] must be unique

def-

This shouldn't affect anything except coverage, which is unused currently anyway. So fine by my to submit, then I'll start on https://github.com/MaterializeInc/materialize/issues/18340

The goal here is to avoid sqllogictest and clusterd from overwriting each-other's profraw outputs. %p is by PID, which one would expect to be sufficient, and %Nm allows up to 9 concurrent profiles for the same build-ID; perhaps the two together is overkill, but this seems like a reasonable starting point.

We can now use the stable Rust toolchain and hopefully this no longer suffers from the aforementioned OOMs. Just fast sqllogictests for now, with coverage in the raw JSON format until we figure out what we want to do with it.

So that one can conveniently locally run environmentd or sqllogictest with coverage.

tokenrove · 2023-03-23T13:25:32Z

Just to note: we don't get accurate results for clusterd currently because it gets killed by SIGTERM and libc's atexit handlers don't run, which would normally invoke __llvm_profile_write_file. Specifying %c in the LLVM_PROFILE_FILE variable should fix this, but seems to cause some other problem and profiling doesn't get written at all.

def- · 2023-03-23T13:44:38Z

Based on my experience %c is only functional on macOS, not on Linux. I'll try to fix this by handling SIGTERM and writing the profile file.

tokenrove · 2023-03-23T13:48:57Z

Based on my experience %c is only functional on macOS, not on Linux.

Oof, good to know.

handling SIGTERM and writing the profile file.

I considered this but signal handling in Rust is still a disaster. It should be sufficient to catch SIGTERM and then make sure the program exits so that atexit's handlers run.

I had hopes that rust-lang/rust#102900 would fix this since I get a ton of LLVM Profile Error: __llvm_profile_counter_bias is undefined messages when using %c, but alas it does not.

def- · 2023-03-23T14:03:04Z

The %c problems are independent of Rust, see https://clang.llvm.org/docs/SourceBasedCodeCoverage.html

“%c” expands out to nothing, but enables a mode in which profile counter updates are continuously synced to a file. This means that if the instrumented program crashes, or is killed by a signal, perfect coverage information can still be recovered. Continuous mode does not support value profiling for PGO, and is only supported on Darwin at the moment. Support for Linux may be mostly complete but requires testing, and support for Windows may require more extensive changes: please get involved if you are interested in porting this feature.

I tried enabling the Linux support for %c, but the results were nonsensical. I didn't spend more effort on getting it working.

tokenrove · 2023-03-23T14:51:20Z

Right, but this error message is pertinent, as it prevents any profiling from happening: https://github.com/llvm/llvm-project/blob/0eabf59528f3c3f64923900cae740d9f26c45ae8/compiler-rt/lib/profile/InstrProfilingFile.c#L574

Maybe we need to add the -runtime-counter-relocation option to LLVM.

In fact, this helps:

diff --git a/misc/python/materialize/cli/run.py b/misc/python/materialize/cli/run.py
index 964c2e1ac..5069dca69 100644
--- a/misc/python/materialize/cli/run.py
+++ b/misc/python/materialize/cli/run.py
@@ -268,7 +268,9 @@ def _build(args: argparse.Namespace, extra_programs: list[str] = []) -> int:
         features += ["tokio-console"]
         env["RUSTFLAGS"] = env.get("RUSTFLAGS", "") + " --cfg=tokio_unstable"
     if args.coverage:
-        env["RUSTFLAGS"] = env.get("RUSTFLAGS", "") + " -Cinstrument-coverage"
+        env["RUSTFLAGS"] = (env.get("RUSTFLAGS", "")
+                            + " -Cinstrument-coverage"
+                            + " -Cllvm-args=-runtime-counter-relocation")
     if args.features:
         features.extend(args.features.split(","))
     if features:
diff --git a/misc/python/materialize/mzbuild.py b/misc/python/materialize/mzbuild.py
index 08f6f0cda..c23c9b60d 100644
--- a/misc/python/materialize/mzbuild.py
+++ b/misc/python/materialize/mzbuild.py
@@ -232,6 +232,7 @@ class CargoBuild(CargoPreImage):
         if rd.coverage:
             self.rustflags += [
                 "-Cinstrument-coverage",
+                "-Cllvm-args=-runtime-counter-relocation",
             ]
         if len(self.bins) == 0 and len(self.examples) == 0:
             raise ValueError("mzbuild config is missing pre-build target")

I'll open a followup PR to add this though perhaps it's a rustc bug that this doesn't get supplied.

def- · 2023-03-23T15:21:22Z

I mean I changed the LLVM/compiler-rt source code to enable bias variables on Linux, but they didn't work then.

tokenrove force-pushed the julian/reenable-coverage branch 3 times, most recently from b4fb7f6 to 843e26d Compare January 11, 2023 16:43

tokenrove force-pushed the julian/reenable-coverage branch from 843e26d to 93210e5 Compare January 16, 2023 15:28

tokenrove force-pushed the julian/reenable-coverage branch 7 times, most recently from 38b1b3a to 0e3ca6a Compare February 13, 2023 13:11

def- reviewed Feb 13, 2023

View reviewed changes

tokenrove force-pushed the julian/reenable-coverage branch from 0e3ca6a to 9591786 Compare February 13, 2023 13:55

tokenrove force-pushed the julian/reenable-coverage branch from 9591786 to ba58d69 Compare March 22, 2023 18:01

tokenrove changed the title ~~WIP: Re-enable coverage~~ Re-enable coverage Mar 22, 2023

tokenrove marked this pull request as ready for review March 22, 2023 20:05

tokenrove requested a review from a team as a code owner March 22, 2023 20:05

tokenrove added 4 commits March 22, 2023 17:47

Ignore profiling files generated by -Cinstrument-coverage

b6f5599

ci: install llvm-tools-preview in the ci-builder image

40747b3

Necessary to run rust-profdata and rust-cov for working with coverage data.

mzcompose: don't duplicate coverage configuration

6284a9b

Without this, we end up mutating the same config again, which then causes docker compose to fail with: services.sqllogictest.environment array items[1,2] must be unique

tokenrove force-pushed the julian/reenable-coverage branch from ba58d69 to dce2ab2 Compare March 22, 2023 20:17

def- approved these changes Mar 22, 2023

View reviewed changes

tokenrove added 3 commits March 23, 2023 09:03

ci: re-enable coverage in nightly

cccca25

We can now use the stable Rust toolchain and hopefully this no longer suffers from the aforementioned OOMs. Just fast sqllogictests for now, with coverage in the raw JSON format until we figure out what we want to do with it.

bin/run: add a --coverage argument

402003b

So that one can conveniently locally run environmentd or sqllogictest with coverage.

tokenrove force-pushed the julian/reenable-coverage branch from 4d741e3 to 402003b Compare March 23, 2023 11:37

tokenrove merged commit 6f60699 into MaterializeInc:main Mar 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-enable coverage #16913

Re-enable coverage #16913

tokenrove commented Jan 3, 2023 •

edited

Loading

def- Feb 13, 2023

tokenrove Feb 13, 2023

def- Feb 13, 2023

tokenrove Feb 13, 2023

def- Feb 13, 2023

def- commented Feb 13, 2023

def- Feb 13, 2023

def- commented Mar 22, 2023

tokenrove commented Mar 22, 2023

def- commented Mar 22, 2023

tokenrove commented Mar 22, 2023

def- left a comment

tokenrove commented Mar 23, 2023

def- commented Mar 23, 2023

tokenrove commented Mar 23, 2023

def- commented Mar 23, 2023 •

edited

Loading

tokenrove commented Mar 23, 2023

def- commented Mar 23, 2023

Re-enable coverage #16913

Re-enable coverage #16913

Conversation

tokenrove commented Jan 3, 2023 • edited Loading

Motivation

Checklist

def- Feb 13, 2023

Choose a reason for hiding this comment

tokenrove Feb 13, 2023

Choose a reason for hiding this comment

def- Feb 13, 2023

Choose a reason for hiding this comment

tokenrove Feb 13, 2023

Choose a reason for hiding this comment

def- Feb 13, 2023

Choose a reason for hiding this comment

def- commented Feb 13, 2023

def- Feb 13, 2023

Choose a reason for hiding this comment

def- commented Mar 22, 2023

tokenrove commented Mar 22, 2023

def- commented Mar 22, 2023

tokenrove commented Mar 22, 2023

def- left a comment

Choose a reason for hiding this comment

tokenrove commented Mar 23, 2023

def- commented Mar 23, 2023

tokenrove commented Mar 23, 2023

def- commented Mar 23, 2023 • edited Loading

tokenrove commented Mar 23, 2023

def- commented Mar 23, 2023

tokenrove commented Jan 3, 2023 •

edited

Loading

def- commented Mar 23, 2023 •

edited

Loading