-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: Make envoy_select_quiche no-op. #6393
Conversation
Signed-off-by: Bin Wu <[email protected]>
1) add "bazel clean" for bazel.release, at the end of bazel_release_binary_build(). 2) add "allow-multiple-definition" for coverage and clang_tidy. Signed-off-by: Bin Wu <[email protected]>
Signed-off-by: Bin Wu <[email protected]>
/retest |
🙀 failed invoking rebuild of |
/retest |
🙀 failed invoking rebuild of |
Signed-off-by: Bin Wu <[email protected]>
Signed-off-by: Bin Wu <[email protected]>
…erage.sh. Signed-off-by: Bin Wu <[email protected]>
/retest |
🙀 failed invoking rebuild of |
…i, directory. Signed-off-by: Bin Wu <[email protected]>
/retest |
🔨 rebuilding |
Signed-off-by: Bin Wu <[email protected]>
@lizan @htuch While working on this, the coverage ci(example) keeps failing at some EXPECT_DEATH statements(example), the symptom is that the test does die at the EXPECT_DEATH, but does not emit the expected error message before it dies, e.g. test/extensions/quic_listeners/quiche/platform/quic_platform_test.cc:381: Failure I noticed run_envoy_bazel_coverage.sh sets "-c dbg --copt=-DNDEBUG" in the bazel test command, could it be related to the test failure? |
@wu-bin I'm a bit confused by this PR. Why are there QUICHE test changes as well? Is this related? |
Yes, QUICHE test is related. The test has 2 classes of failures when running in "coverage" ci:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't see why CHECK
would behave differently in coverage mode only, that's pretty weird. The main difference of coverage builds is we are using gcc
and we compile all tests into a single test binary. I had a look at the CHECK
implementation and it seems legit.
Signed-off-by: Bin Wu <[email protected]>
Signed-off-by: Bin Wu <[email protected]>
/retest |
🔨 rebuilding |
Signed-off-by: Bin Wu <[email protected]>
Signed-off-by: Bin Wu <[email protected]>
/retest |
🔨 rebuilding |
@htuch The latest coverage run actually went ok, it's "failed" because the code coverage is 0.1% lower than the threshold(see below). Is it ok if I lower the threshold to 97%? sent 100,311,982 bytes received 19,076 bytes 200,662,116.00 bytes/sec |
@wu-bin we can't lower it; the only reason we ever do that is for structural reasons (e.g. switching coverage tools or toolchains). Why do you think this resulted in such as decrease? I think we have a more serious situation though with the state of Envoy coverage. @envoyproxy/maintainers do we need to send out an all-hands to get folks to try and improve coverage again? I'm wondering if there is b/w to go fix some of the worst offending code. |
I will send an email and ask folks to look at the master coverage report to see if they recently added changes that are missing coverage. |
I guess the reason for the slight drop of coverage is that, some quiche platform impl code are included in the report, but their tests are not under //test, instead they are tested by the (exernal) QUICHE code I believe quiche/platform/*.cc was not part of the report before this PR. Does it count as a structural reason?
|
@wu-bin I think you have 3 options:
If (2) can be made to work, that would be ideal, otherwise I would argue for (1) as being slightly more preferable to (3), but (3) is probably OK given the nature of this project and integration. At least IMHO, other maintainers might differ. |
Signed-off-by: Bin Wu <[email protected]>
/retest |
🔨 rebuilding |
@htuch 2 is done. See the change in test/coverage/gen_build.sh. The PR is ready for review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, a few comments and we can ship it. Thanks.
@@ -54,6 +54,9 @@ function bazel_release_binary_build() { | |||
cp -f "${ENVOY_DELIVERY_DIR}"/envoy "${ENVOY_SRCDIR}"/build_release | |||
mkdir -p "${ENVOY_SRCDIR}"/build_release_stripped | |||
strip "${ENVOY_DELIVERY_DIR}"/envoy -o "${ENVOY_SRCDIR}"/build_release_stripped/envoy | |||
# TODO(wu-bin): Remove once https://github.com/envoyproxy/envoy/pull/6229 is merged. | |||
bazel clean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to clean
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What appears to be happening is that the "bazel build" in this function and the "bazel build" after this function created same object files in different directories, once in bazel-bin/external/envoy/source/..., and once in bazel-bin/source/...
When the second "bazel build" runs, both instances of object files are given into the linker, causing "multiple definition" errors. See this link for the actual failure message.
The bazel clean hack here removes one copy of the objects, thus avoids the "multiple definition" error. The hack won't be needed after the ci WORKSPACE is removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm completely stuck on that one as the coverage is hard to debug (and not reproducible locally very well), will try something tonight but no ETA for now. The coverage failure seems unrelated to the change though. It would be nice if you can take a look (pushed latest master to the PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lizan The latest coverage run from your PR failed because some GRPC related test failed. Those tests seems to be flaky, since I've seen the same failures while debugging my PR. I've just started a retest.
@htuch #6229 was stuck on coverage failure, which is likely fixed in this PR, since my change in do_ci.sh's "bazel.coverage" section is very similar to that one. I think we should get this PR in first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the coverage test failure in #6229 is not a flake, I have retested it 3 times, the following test fails every time:
SslIpVersionsClientType/GrpcSslClientIntegrationTest.BasicSslRequestWithClientCert/IPv4_EnvoyGrpc
@wu-bin hmm, I just had a look a the CI logs. We've gone from ~400 Envoy tests to 11961 tests in total. That seems a scary increase :) CI took 1h 41m, vs. a recent run in another PR of 1h 30m. If we could just filter down to the platform tests, maybe this wouldn't be so bad. |
Impressive growth:) All the current QUICHE tests are platform tests, I'll discuss with @danzh2010 and @mpwarres on how to manage the non-platform tests. |
Signed-off-by: Bin Wu <[email protected]>
Oh sorry I might misunderstood you. To clarify, the 11K tests are mostly Envoy proper tests, QUICHE platform currently adds about 11961(from this pr)-11889(from a test of master)=72 tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. I think I was looking at a non-coverage run which didn't aggregate before or something. LGTM once CI passes. We are going into a merge freeze in 45 mins until tomorrow afternoon for the 1.9.1 security release fix FYI.
* master: (137 commits) test: router upstream log to v2 config stubs (envoyproxy#6499) remove idle timeout validation (envoyproxy#6500) build: Change namespace of chromium_url. (envoyproxy#6506) coverage: exclude chromium_url (envoyproxy#6498) fix(tracing): allow 256 chars in path tag (envoyproxy#6492) Common: Introduce StopAllIteration filter status for decoding and encoding filters (envoyproxy#5954) build: update PGV url (envoyproxy#6495) subset lb: avoid partitioning host lists on worker threads (envoyproxy#6302) ci: Make envoy_select_quiche no-op. (envoyproxy#6393) watcher: notify when watched files are modified (envoyproxy#6215) stat: Add counterFromStatName(), gaugeFromStatName(), and histogramFromStatName() (envoyproxy#6475) bump to 1.11.0-dev (envoyproxy#6490) release: bump to 1.10.0 (envoyproxy#6489) hcm: path normalization. (#1) build: import manually minified Chrome URL lib. (envoyproxy#3) codec: reject embedded NUL in headers. (envoyproxy#2) Added veryfication if path contains query params and add them to path header (envoyproxy#6466) redis: basic integration test for redis_proxy (envoyproxy#6450) stats: report sample count as an integer to prevent loss of precision (envoyproxy#6274) Added VHDS protobuf message and updated RouteConfig to include it. (envoyproxy#6418) ... Signed-off-by: Michael Puncel <[email protected]>
Description:
Remove envoy_select_quiche from envoy_build_system.bzl.
Risk Level: none. build only.
Testing:
bazel test --test_output=all test/extensions/quic_listeners/quiche/platform:all @com_googlesource_quiche//:all
bazel test --test_output=all --define quiche=enabled test/extensions/quic_listeners/quiche/platform:all @com_googlesource_quiche//:all
Docs Changes: none
Release Notes: none