-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve kind e2e tests #3922
Improve kind e2e tests #3922
Conversation
4c972ac
to
258b38b
Compare
Codecov Report
@@ Coverage Diff @@
## main #3922 +/- ##
==========================================
+ Coverage 64.00% 65.10% +1.09%
==========================================
Files 293 293
Lines 43300 43300
==========================================
+ Hits 27716 28192 +476
+ Misses 13340 12880 -460
+ Partials 2244 2228 -16
Flags with carried forward coverage won't be shown. Click here to find out more.
|
/test-all |
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I much prefer this approach of not enabling features directly in e2e test cases.
.github/workflows/kind.yml
Outdated
mkdir test-e2e-encap-proxy-all-coverage | ||
ANTREA_LOG_DIR=$PWD/log ANTREA_COV_DIR=$PWD/test-e2e-encap-proxy-all-coverage ./ci/kind/test-e2e-kind.sh --encap-mode encap --proxy-all --coverage --skip mode-irrelevant | ||
mkdir test-e2e-encap-all-features-disabled-coverage | ||
ANTREA_LOG_DIR=$PWD/log ANTREA_COV_DIR=$PWD/test-e2e-encap-all-features-disabled-coverage ./ci/kind/test-e2e-kind.sh --encap-mode encap --coverage --feature-gates AllAlpha=false,AllBeta=false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just test with default values - keep default enabled features enabled, and default disabled features disabled? Usually, no one will disable default enabled features right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"test-e2e-encap" tests the configuration you describe, it has been there from the begining.
If we don't have this job "all-features-disabled", there will be no job testing the code under the ! DefaultFeatureGate.Enabled(FEATURE)
condition, which were previously covered by "test-e2e-encap-no-proxy", "test-e2e-encap-no-np". Is it safe that once a feature graduates to beta, we never run test with it being disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. In this case, do you think it makes more sense to test only no-proxy, assuming people may like to disable AntreaProxy, but still enable other default features?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jianjuns do you agree with above or have other comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the feature specific code under ! DefaultFeatureGate.Enabled(FEATURE)
except AntreaProxy will not be covered by any e2e test, is this reliable? For example, if a PR makes a change to a flow in L3Forwarding with the assumption that the packet will be handled by some flows installed by Egress feature, no test would catch the problem. It's not normal that users disable features that are enabled by default, but should we cover them in test with best effort? this may also affect test coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added "all features disabled" not because it's an important combination, but a single job to cover the code path that all optional features are disabled and ensure core functions work properly without relying on any optional feature (When these features are disabled, no feature specific test case will run, the test cases are only the generic ones like Pod connectivey, K8s NetworkPolicy, etc.). I think it's not just adding 1 feature combination, but 0 coverage vs. full coverage on related code path, "all features enabled" covers the other side. And I think "all features disabled" covers "no-proxy" in some way from core functions's perspective, assuming optional features are not too coupled with each other, (and if they are coupled explicitly, disabling either of them is not a supported scenario and we won't even test it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding configuration combination, I very agree we cannot cover all combinations, I think all tests are only testing the default combination with single factor mutated:
test-e2e-ipv6 and test-e2e-ipv6-only's factor is ip family
test-windows's factor is platform
test-e2e-noencap and test-e2e-hybrid's factor is encapulation mode
test-e2e-all-features-enabled and test-e2e-all-features-disabled is planed for the factor of feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the questions are:
- do we believe "AntreaProxy disabled" is a case used by considerable number of users
- if so, are these users would like other default features enabled
- if so, could this combination of "AntreaProxy disabled" + "other features enabled" have a good chance to be broken, if we do not test it
Do you mean you think the chance of 3) is low, if we test "AntreaProxy and all other features disabled"? I feel AntreaProxy flows can impact other features like ANP and Egress, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I only know one company disables it because of insufficient kernel version
- I guess no because the above reason doesn't apply to other features, however, just disabling AntreaProxy and keeping other configuration as default is not exactly their configuration. AFAIK, at least they change encapsulation
mode and tunnel type: they use both noEncap mode and encap mode with vxlan tunnel. I'm not sure other configurations. - It has a chance to be broken, but I imagine it shouldn't be too related to other features. For examples, ANP doesn't care whether it's kube proxy or antrea proxy, it cares the 5-tuple anyway, with one exception, it relies on AntreaProxy to implement
toServices
, but it's documented as unsupported and test didn't run when AntreaProxy is disabled. For Egress, if the traffic is desined to external address directly, how proxy works doesn't matter at all; if the traffic is destined to Service whose backend is external address, I guess it even doesn't work today when AntreaProxy is disabled because we matchInPort
when determining whether to SNAT.
I agree with your proposal after reconsidering the test from another perspective: every feature should finally graduate to GA or deprecated, however as long as the kernel version restriction exists or users have a reason to stick to kube-proxy, AntreaProxy will GA and be removed from feature gate and perhaps a configuration "antreaProxy.enable" should be added like --run-service-proxy
of kube-router. And if we think it's an important scenario like noEncap, we could keep a dedicated job testing it being disabled and all features being kept as default. Before that happens, we configure the job via "AntreaProxy feature gate", but it's for a typical configuration like noEncap. From feature gate's perspective, we only test a job with their default setting and a job with all of them enabled, the former of which covers the most common configuration and the latter ensures a non-default feature will be tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I always wonder if we really need to test with default enabled feature gates disabled. The only reason to disable a feature gate should be either defects or maybe a feature does not work in particular envs. We should know if we really have the latter case (and ideally feature gate should not be for that).
I know a confusion is that we use feature gate for both gating and enablement configuration in some cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I found I did not publish my earlier reply.
.github/workflows/kind.yml
Outdated
mkdir test-e2e-encap-proxy-all-coverage | ||
ANTREA_LOG_DIR=$PWD/log ANTREA_COV_DIR=$PWD/test-e2e-encap-proxy-all-coverage ./ci/kind/test-e2e-kind.sh --encap-mode encap --proxy-all --coverage --skip mode-irrelevant | ||
mkdir test-e2e-encap-all-features-disabled-coverage | ||
ANTREA_LOG_DIR=$PWD/log ANTREA_COV_DIR=$PWD/test-e2e-encap-all-features-disabled-coverage ./ci/kind/test-e2e-kind.sh --encap-mode encap --coverage --feature-gates AllAlpha=false,AllBeta=false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. In this case, do you think it makes more sense to test only no-proxy, assuming people may like to disable AntreaProxy, but still enable other default features?
.github/workflows/kind.yml
Outdated
ANTREA_LOG_DIR=$PWD/log ANTREA_COV_DIR=$PWD/test-e2e-encap-no-proxy-coverage ./ci/kind/test-e2e-kind.sh --encap-mode encap --no-proxy --coverage --skip mode-irrelevant | ||
mkdir test-e2e-encap-all-features-enabled-coverage | ||
# Currently multicast tests require specific testbeds, exclude it for now. | ||
ANTREA_LOG_DIR=$PWD/log ANTREA_COV_DIR=$PWD/test-e2e-encap-all-features-enabled-coverage ./ci/kind/test-e2e-kind.sh --encap-mode encap --coverage --feature-gates AllAlpha=true,AllBeta=true,Multicast=false --proxy-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will take a look at this issue. 👀
There were 3 feature specific jobs running e2e tests with AntreaProxy disabled, AntreaProxy and proxyAll enabled, and AntreaPolicy disabled separately. Having a job for each feature is not extensible and requires maintenance efforts to configure the job along with the feature's lifecycle. There is a lot of redundancy between these jobs as unrelated tests ran repeatedly in them. There were other feature specific test cases enabling the features at runtime, which doesn't cover the scenario when two features are enabled at the same time. This patch removes some feature specific jobs and creates one job running tests with all features enabled, which can test all features' interaction. The patch also makes necessary changes to kind test script and manifest generating script to achieve it. Signed-off-by: Quan Tian <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test-all |
There were 3 feature specific jobs running e2e tests with AntreaProxy
disabled, AntreaProxy and proxyAll enabled, and AntreaPolicy disabled
separately. Having a job for each feature is not extensible and
requires maintenance efforts to configure the job along with the
feature's lifecycle. There is a lot of redundancy between these jobs as
unrelated tests ran repeatedly in them.
There were other feature specific test cases enabling the features at
runtime, which doesn't cover the scenario when two features are enabled
at the same time.
This patch removes some feature specific jobs and creates one job
running tests with all features enabled, which can test all features'
interaction. The patch also makes necessary changes to kind test script
and manifest generating script to achieve it.
Signed-off-by: Quan Tian [email protected]