ci: Make envoy_select_quiche no-op. #6393

wu-bin · 2019-03-27T00:42:06Z

Description:

Remove envoy_select_quiche from envoy_build_system.bzl.

Risk Level: none. build only.
Testing:

bazel test --test_output=all test/extensions/quic_listeners/quiche/platform:all @com_googlesource_quiche//:all
bazel test --test_output=all --define quiche=enabled test/extensions/quic_listeners/quiche/platform:all @com_googlesource_quiche//:all

Docs Changes: none
Release Notes: none

Signed-off-by: Bin Wu <[email protected]>

1) add "bazel clean" for bazel.release, at the end of bazel_release_binary_build(). 2) add "allow-multiple-definition" for coverage and clang_tidy. Signed-off-by: Bin Wu <[email protected]>

Signed-off-by: Bin Wu <[email protected]>

wu-bin · 2019-03-28T16:31:09Z

/retest

repokitteh-read-only · 2019-03-28T16:31:14Z

🙀 failed invoking rebuild of ci/circleci: coverage: 500 Internal Server Error

🐱

Caused by: a #6393 (comment) was created by @wu-bin.

see: more, trace.

wu-bin · 2019-03-28T17:30:08Z

/retest

repokitteh-read-only · 2019-03-28T17:30:12Z

🙀 failed invoking rebuild of ci/circleci: docker: 500 Internal Server Error

🐱

Caused by: a #6393 (comment) was created by @wu-bin.

see: more, trace.

Signed-off-by: Bin Wu <[email protected]>

…erage.sh. Signed-off-by: Bin Wu <[email protected]>

wu-bin · 2019-03-28T19:38:35Z

/retest

repokitteh-read-only · 2019-03-28T19:38:39Z

🙀 failed invoking rebuild of ci/circleci: coverage: 500 Internal Server Error

🐱

Caused by: a #6393 (comment) was created by @wu-bin.

see: more, trace.

…i, directory. Signed-off-by: Bin Wu <[email protected]>

wu-bin · 2019-03-29T00:47:34Z

/retest

repokitteh-read-only · 2019-03-29T00:47:37Z

🔨 rebuilding ci/circleci: asan (failed build)

🐱

Caused by: a #6393 (comment) was created by @wu-bin.

see: more, trace.

Signed-off-by: Bin Wu <[email protected]>

wu-bin · 2019-03-29T04:21:01Z

@lizan @htuch While working on this, the coverage ci(example) keeps failing at some EXPECT_DEATH statements(example), the symptom is that the test does die at the EXPECT_DEATH, but does not emit the expected error message before it dies, e.g.

test/extensions/quic_listeners/quiche/platform/quic_platform_test.cc:381: Failure
Death test: { switch (0) default: if (!(((__builtin_expect(!(false), 0))) && quic::IsLogLevelEnabled(quic::FATAL))) { } else quic::QuicLogEmitter(quic::FATAL).stream() << "CHECK failed: " "false" "." << " Supposed to fail in all modes."; }
Result: died but not with expected error.
Expected: contains regular expression "CHECK failed:.* Supposed to fail in all modes."
Actual msg:

I noticed run_envoy_bazel_coverage.sh sets "-c dbg --copt=-DNDEBUG" in the bazel test command, could it be related to the test failure?

htuch · 2019-03-29T15:30:50Z

@wu-bin I'm a bit confused by this PR. Why are there QUICHE test changes as well? Is this related?

wu-bin · 2019-03-29T15:44:49Z

@wu-bin I'm a bit confused by this PR. Why are there QUICHE test changes as well? Is this related?

Yes, QUICHE test is related. The test has 2 classes of failures when running in "coverage" ci:

Some tests expect the log level is "error" at the beginning of the test, however, the test run in coverage ci set the default log level to "trace". This is easily fixed in the QUICHE test.
There are a few tests failed at EXPECT_DEATH(). For example,
EXPECT_DEATH(PANIC("abort now"), "abort now");
will fail, because the PANIC statement caused the child process to die, but the child process doesn't print "abort now" before it dies. This is the issue I asked you and @lizan in the previous comment. I'm still struggling on this one. Do you have any idea?

htuch

Yeah, I don't see why CHECK would behave differently in coverage mode only, that's pretty weird. The main difference of coverage builds is we are using gcc and we compile all tests into a single test binary. I had a look at the CHECK implementation and it seems legit.

bazel/envoy_build_system.bzl

Signed-off-by: Bin Wu <[email protected]>

wu-bin · 2019-03-31T01:58:59Z

/retest

repokitteh-read-only · 2019-03-31T01:59:04Z

🔨 rebuilding ci/circleci: coverage (failed build)

🐱

Caused by: a #6393 (comment) was created by @wu-bin.

see: more, trace.

Signed-off-by: Bin Wu <[email protected]>

wu-bin · 2019-04-02T11:02:38Z

/retest

repokitteh-read-only · 2019-04-02T11:02:41Z

🔨 rebuilding ci/circleci: coverage (failed build)

🐱

Caused by: a #6393 (comment) was created by @wu-bin.

see: more, trace.

wu-bin · 2019-04-02T13:16:28Z

@htuch The latest coverage run actually went ok, it's "failed" because the code coverage is 0.1% lower than the threshold(see below). Is it ok if I lower the threshold to 97%?

sent 100,311,982 bytes received 19,076 bytes 200,662,116.00 bytes/sec
total size is 100,145,559 speedup is 1.00
Code coverage 97.4 is lower than limit of 97.5

wu-bin · 2019-04-02T13:37:23Z

This should also unblock @lizan 's #6229

htuch · 2019-04-02T15:06:27Z

@wu-bin we can't lower it; the only reason we ever do that is for structural reasons (e.g. switching coverage tools or toolchains). Why do you think this resulted in such as decrease?

I think we have a more serious situation though with the state of Envoy coverage. @envoyproxy/maintainers do we need to send out an all-hands to get folks to try and improve coverage again? I'm wondering if there is b/w to go fix some of the worst offending code.

mattklein123 · 2019-04-02T15:10:06Z

I think we have a more serious situation though with the state of Envoy coverage. @envoyproxy/maintainers do we need to send out an all-hands to get folks to try and improve coverage again? I'm wondering if there is b/w to go fix some of the worst offending code.

I will send an email and ask folks to look at the master coverage report to see if they recently added changes that are missing coverage.

wu-bin · 2019-04-02T15:23:28Z

@wu-bin we can't lower it; the only reason we ever do that is for structural reasons (e.g. switching coverage tools or toolchains). Why do you think this resulted in such as decrease?

I guess the reason for the slight drop of coverage is that, some quiche platform impl code are included in the report, but their tests are not under //test, instead they are tested by the (exernal) QUICHE code
for example: quiche/platform/string_utils.cc has low coverage in the report , but it is covered in this QUICHE test.

I believe quiche/platform/*.cc was not part of the report before this PR.

Does it count as a structural reason?

I think we have a more serious situation though with the state of Envoy coverage. @envoyproxy/maintainers do we need to send out an all-hands to get folks to try and improve coverage again? I'm wondering if there is b/w to go fix some of the worst offending code.

htuch · 2019-04-02T15:26:59Z

@wu-bin I think you have 3 options:

Write Envoy-side tests for this code; after all, it's Envoy code :)
Run the external QUICHE tests as part of coverage in CI.
Exclude the QUICHE platform layer from Envoy's code coverage in CI.

If (2) can be made to work, that would be ideal, otherwise I would argue for (1) as being slightly more preferable to (3), but (3) is probably OK given the nature of this project and integration. At least IMHO, other maintainers might differ.

Signed-off-by: Bin Wu <[email protected]>

wu-bin · 2019-04-02T18:37:48Z

/retest

repokitteh-read-only · 2019-04-02T18:37:52Z

🔨 rebuilding ci/circleci: coverage (failed build)

🐱

Caused by: a #6393 (comment) was created by @wu-bin.

see: more, trace.

wu-bin · 2019-04-02T20:34:19Z

@wu-bin I think you have 3 options:

Write Envoy-side tests for this code; after all, it's Envoy code :)

Run the external QUICHE tests as part of coverage in CI.

Exclude the QUICHE platform layer from Envoy's code coverage in CI.

If (2) can be made to work, that would be ideal, otherwise I would argue for (1) as being slightly more preferable to (3), but (3) is probably OK given the nature of this project and integration. At least IMHO, other maintainers might differ.

@htuch 2 is done. See the change in test/coverage/gen_build.sh. The PR is ready for review now.

htuch

Looks good, a few comments and we can ship it. Thanks.

bazel/envoy_build_system.bzl

htuch · 2019-04-02T22:50:38Z

ci/do_ci.sh

@@ -54,6 +54,9 @@ function bazel_release_binary_build() {
  cp -f "${ENVOY_DELIVERY_DIR}"/envoy "${ENVOY_SRCDIR}"/build_release
  mkdir -p "${ENVOY_SRCDIR}"/build_release_stripped
  strip "${ENVOY_DELIVERY_DIR}"/envoy -o "${ENVOY_SRCDIR}"/build_release_stripped/envoy
+  # TODO(wu-bin): Remove once https://github.com/envoyproxy/envoy/pull/6229 is merged.
+  bazel clean


Why do we need to clean?

What appears to be happening is that the "bazel build" in this function and the "bazel build" after this function created same object files in different directories, once in bazel-bin/external/envoy/source/..., and once in bazel-bin/source/...

When the second "bazel build" runs, both instances of object files are given into the linker, causing "multiple definition" errors. See this link for the actual failure message.

The bazel clean hack here removes one copy of the objects, thus avoids the "multiple definition" error. The hack won't be needed after the ci WORKSPACE is removed.

OK, @lizan do you know when we might see #6229 merged? Would be great to avoid having to include this, but if not we can hopefully get a cleanup PR ASAP.

I'm completely stuck on that one as the coverage is hard to debug (and not reproducible locally very well), will try something tonight but no ETA for now. The coverage failure seems unrelated to the change though. It would be nice if you can take a look (pushed latest master to the PR).

@lizan The latest coverage run from your PR failed because some GRPC related test failed. Those tests seems to be flaky, since I've seen the same failures while debugging my PR. I've just started a retest.

@htuch #6229 was stuck on coverage failure, which is likely fixed in this PR, since my change in do_ci.sh's "bazel.coverage" section is very similar to that one. I think we should get this PR in first.

Looks like the coverage test failure in #6229 is not a flake, I have retested it 3 times, the following test fails every time:

SslIpVersionsClientType/GrpcSslClientIntegrationTest.BasicSslRequestWithClientCert/IPv4_EnvoyGrpc

ci/do_ci.sh

test/coverage/gen_build.sh

htuch · 2019-04-02T23:00:32Z

@wu-bin hmm, I just had a look a the CI logs. We've gone from ~400 Envoy tests to 11961 tests in total. That seems a scary increase :) CI took 1h 41m, vs. a recent run in another PR of 1h 30m. If we could just filter down to the platform tests, maybe this wouldn't be so bad.

wu-bin · 2019-04-03T00:39:35Z

@wu-bin hmm, I just had a look a the CI logs. We've gone from ~400 Envoy tests to 11961 tests in total. That seems a scary increase :) CI took 1h 41m, vs. a recent run in another PR of 1h 30m. If we could just filter down to the platform tests, maybe this wouldn't be so bad.

Impressive growth:)

All the current QUICHE tests are platform tests, I'll discuss with @danzh2010 and @mpwarres on how to manage the non-platform tests.

Signed-off-by: Bin Wu <[email protected]>

wu-bin · 2019-04-04T21:15:04Z

Oh sorry I might misunderstood you.

To clarify, the 11K tests are mostly Envoy proper tests, QUICHE platform currently adds about 11961(from this pr)-11889(from a test of master)=72 tests.

htuch

Thanks for the explanation. I think I was looking at a non-coverage run which didn't aggregate before or something. LGTM once CI passes. We are going into a merge freeze in 45 mins until tomorrow afternoon for the 1.9.1 security release fix FYI.

* master: (137 commits) test: router upstream log to v2 config stubs (envoyproxy#6499) remove idle timeout validation (envoyproxy#6500) build: Change namespace of chromium_url. (envoyproxy#6506) coverage: exclude chromium_url (envoyproxy#6498) fix(tracing): allow 256 chars in path tag (envoyproxy#6492) Common: Introduce StopAllIteration filter status for decoding and encoding filters (envoyproxy#5954) build: update PGV url (envoyproxy#6495) subset lb: avoid partitioning host lists on worker threads (envoyproxy#6302) ci: Make envoy_select_quiche no-op. (envoyproxy#6393) watcher: notify when watched files are modified (envoyproxy#6215) stat: Add counterFromStatName(), gaugeFromStatName(), and histogramFromStatName() (envoyproxy#6475) bump to 1.11.0-dev (envoyproxy#6490) release: bump to 1.10.0 (envoyproxy#6489) hcm: path normalization. (#1) build: import manually minified Chrome URL lib. (envoyproxy#3) codec: reject embedded NUL in headers. (envoyproxy#2) Added veryfication if path contains query params and add them to path header (envoyproxy#6466) redis: basic integration test for redis_proxy (envoyproxy#6450) stats: report sample count as an integer to prevent loss of precision (envoyproxy#6274) Added VHDS protobuf message and updated RouteConfig to include it. (envoyproxy#6418) ... Signed-off-by: Michael Puncel <[email protected]>

Make envoy_select_quiche return the input list.

2837ed5

Signed-off-by: Bin Wu <[email protected]>

wu-bin mentioned this pull request Mar 27, 2019

ci: remove ci workspace #6229

Merged

wu-bin added 2 commits March 28, 2019 10:33

Add workarounds to make the ci happy

6abf3a7

1) add "bazel clean" for bazel.release, at the end of bazel_release_binary_build(). 2) add "allow-multiple-definition" for coverage and clang_tidy. Signed-off-by: Bin Wu <[email protected]>

Try again.

b50c6ab

Signed-off-by: Bin Wu <[email protected]>

wu-bin changed the title ~~Get rid of envoy_select_quiche.~~ ci: Make envoy_select_quiche no-op. Mar 28, 2019

mattklein123 assigned htuch Mar 28, 2019

wu-bin added 3 commits March 28, 2019 13:56

Still trying.

db38d83

Signed-off-by: Bin Wu <[email protected]>

Merge remote-tracking branch 'upstream/master' into envoy_select_quiche

b8e90c2

Signed-off-by: Bin Wu <[email protected]>

Move the workaround for coverage from do_ci.sh to run_envoy_bazel_cov…

3b59c2b

…erage.sh. Signed-off-by: Bin Wu <[email protected]>

Try fixing coverage by running bazel test in the source, instead of c…

a3e6331

…i, directory. Signed-off-by: Bin Wu <[email protected]>

Try to fix failing quic_platform_test in coverage ci.

12212fa

Signed-off-by: Bin Wu <[email protected]>

wu-bin requested review from alyssawilk and mattklein123 as code owners March 29, 2019 02:26

htuch reviewed Mar 29, 2019

View reviewed changes

bazel/envoy_build_system.bzl Show resolved Hide resolved

htuch added the waiting:any label Mar 29, 2019

repokitteh-read-only bot removed the waiting:any label Mar 29, 2019

wu-bin added 2 commits March 30, 2019 21:51

Change EXPECT_DEATH to EXPECT_DEATH_LOG_TO_STDERR for coverage ci.

8e64ba9

Signed-off-by: Bin Wu <[email protected]>

Merge remote-tracking branch 'upstream/master' into envoy_select_quiche

652415a

Signed-off-by: Bin Wu <[email protected]>

wu-bin added 2 commits April 1, 2019 22:56

Try fixing coverage again.

4e3dc84

Signed-off-by: Bin Wu <[email protected]>

Merge remote-tracking branch 'upstream/master' into envoy_select_quiche

1af8f68

Signed-off-by: Bin Wu <[email protected]>

Run QUICHE platform api tests in coverage.

f7d5c27

Signed-off-by: Bin Wu <[email protected]>

htuch suggested changes Apr 2, 2019

View reviewed changes

htuch added the waiting label Apr 2, 2019

Merge remote-tracking branch 'upstream/master' into envoy_select_quiche

d5cdd33

Signed-off-by: Bin Wu <[email protected]>

repokitteh-read-only bot removed the waiting label Apr 4, 2019

htuch approved these changes Apr 4, 2019

View reviewed changes

htuch merged commit d1cdd25 into envoyproxy:master Apr 5, 2019

wu-bin mentioned this pull request Apr 8, 2019

quiche: implement quic_port_utils #6488

Merged

ci: Make envoy_select_quiche no-op. #6393

ci: Make envoy_select_quiche no-op. #6393

Conversation

wu-bin commented Mar 27, 2019

wu-bin commented Mar 28, 2019

repokitteh-read-only bot commented Mar 28, 2019

wu-bin commented Mar 28, 2019

repokitteh-read-only bot commented Mar 28, 2019

wu-bin commented Mar 28, 2019

repokitteh-read-only bot commented Mar 28, 2019

wu-bin commented Mar 29, 2019

repokitteh-read-only bot commented Mar 29, 2019

wu-bin commented Mar 29, 2019

htuch commented Mar 29, 2019

wu-bin commented Mar 29, 2019 • edited Loading

htuch left a comment

Choose a reason for hiding this comment

wu-bin commented Mar 31, 2019

repokitteh-read-only bot commented Mar 31, 2019

wu-bin commented Apr 2, 2019

repokitteh-read-only bot commented Apr 2, 2019

wu-bin commented Apr 2, 2019

wu-bin commented Apr 2, 2019

htuch commented Apr 2, 2019

mattklein123 commented Apr 2, 2019

wu-bin commented Apr 2, 2019 • edited Loading

htuch commented Apr 2, 2019

wu-bin commented Apr 2, 2019

repokitteh-read-only bot commented Apr 2, 2019

wu-bin commented Apr 2, 2019

htuch left a comment

Choose a reason for hiding this comment

htuch Apr 2, 2019

Choose a reason for hiding this comment

wu-bin Apr 2, 2019 • edited Loading

Choose a reason for hiding this comment

htuch Apr 3, 2019

Choose a reason for hiding this comment

lizan Apr 3, 2019

Choose a reason for hiding this comment

wu-bin Apr 3, 2019

Choose a reason for hiding this comment

wu-bin Apr 3, 2019

Choose a reason for hiding this comment

htuch commented Apr 2, 2019

wu-bin commented Apr 3, 2019

wu-bin commented Apr 4, 2019

htuch left a comment

Choose a reason for hiding this comment

wu-bin commented Mar 29, 2019 •

edited

Loading

wu-bin commented Apr 2, 2019 •

edited

Loading

wu-bin Apr 2, 2019 •

edited

Loading