Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing rules when NodePort support is disabled #2026

Conversation

antoninbas
Copy link
Contributor

@antoninbas antoninbas commented Jul 8, 2022

What type of PR is this?

bug

Which issue does this PR fix:

#2025

What does this PR do / Why do we need it:

  • the rules that need to be installed for NodePort support and SNAT
    support are very similar. The same traffic mark is needed for both. As
    a result, rules that are currently installed only when NodePort
    support is enabled should also be installed when external SNAT is
    disabled, which is the case by default.
  • remove "-m state --state NEW" from a rule in the nat table. This is
    always true for packets that traverse the nat table.
  • fix typo in one rule's name (extra whitespace).

Testing done on this change:

  • Unit testing
  • Tested manually by disabling NodePort support and adding the missing rule.
  • Tested with custom container image for amazon-k8s-cni and amazon-k8s-cni-init

Automation added to e2e:

Will this PR introduce any new dependencies?:
No

Will this break upgrades or downgrades. Has updating a running cluster been tested?:
Should not break upgrades or downgrade (may have some superfluous iptables rules). Not tested.

Does this change require updates to the CNI daemonset config files to work?:
No

Does this PR introduce any user-facing change?:
No, bug fix

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Fixes #2025

Co-authored-by: Quan Tian [email protected]

Signed-off-by: Antonin Bas [email protected]

@antoninbas antoninbas requested a review from a team as a code owner July 8, 2022 20:07
@antoninbas
Copy link
Contributor Author

I am happy to do more testing, once maintainers validate the approach

@antoninbas antoninbas force-pushed the add-missing-rules-when-node-port-support-is-disabled branch from 4e888c7 to 8802030 Compare July 8, 2022 23:18
@antoninbas antoninbas force-pushed the add-missing-rules-when-node-port-support-is-disabled branch 2 times, most recently from f0219c0 to 6fe5050 Compare July 13, 2022 00:03
@kishorj
Copy link
Contributor

kishorj commented Jul 13, 2022

@antoninbas, thanks for the fix. Could you test the following?

  • upgrade from previous v1.10 to the one with your fix, verify nat PREROUTING rules are configured as expected

@antoninbas
Copy link
Contributor Author

@kishorj I did 2 upgrade tests, one with NodePort support enabled, and one with NodePort support disabled. Full iptables rules for nat and mangle tables are below.

There is only one issue. Because I removed the unnecessary -m state --state NEW matcher from a PREROUTING rule, the rule is now duplicated, and will be until the next Node restart:

# old rule
-A PREROUTING -i eni+ -m comment --comment "AWS, outbound connections" -m state --state NEW -j AWS-CONNMARK-CHAIN-0
# new rule
-A PREROUTING -i eni+ -m comment --comment "AWS, outbound connections" -j AWS-CONNMARK-CHAIN-0

I see 3 solutions, let me see which one you want:

  1. add some logic to remove the old rule
  2. rollback this change in my patch and keep the unnecessary matcher; it won't impact the correctness of the patch and the improved rule was just a "nice-to-have"
  3. keep the patch as it is, since the duplicated rule doesn't impact correctness. On new K8s clusters and after a Node restart, there will be no duplicates.
Full iptables rules

NodePort support enabled (default)

Default build (v1.10.1-eksbuild.1)

# nat
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -i eni+ -m comment --comment "AWS, outbound connections" -m state --state NEW -j AWS-CONNMARK-CHAIN-0
-A PREROUTING -m comment --comment "AWS, CONNMARK" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -m comment --comment "AWS SNAT CHAIN" -j AWS-SNAT-CHAIN-0
-A AWS-CONNMARK-CHAIN-0 ! -d 192.168.0.0/16 -m comment --comment "AWS CONNMARK CHAIN, VPC CIDR" -j AWS-CONNMARK-CHAIN-1
-A AWS-CONNMARK-CHAIN-1 -m comment --comment "AWS, CONNMARK" -j CONNMARK --set-xmark 0x80/0x80
-A AWS-SNAT-CHAIN-0 ! -d 192.168.0.0/16 -m comment --comment "AWS SNAT CHAIN" -j AWS-SNAT-CHAIN-1
-A AWS-SNAT-CHAIN-1 ! -o vlan+ -m comment --comment "AWS, SNAT" -m addrtype ! --dst-type LOCAL -j SNAT --to-source 192.168.14.191 --random-fully

# mangle 
-A PREROUTING -i eth0 -m comment --comment "AWS, primary ENI" -m addrtype --dst-type LOCAL --limit-iface-in -j CONNMARK --set-xmark 0x80/0x80
-A PREROUTING -i eni+ -m comment --comment "AWS, primary ENI" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80
-A PREROUTING -i vlan+ -m comment --comment "AWS, primary ENI" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80

After upgrade to patched build

-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -i eni+ -m comment --comment "AWS, outbound connections" -m state --state NEW -j AWS-CONNMARK-CHAIN-0
-A PREROUTING -i eni+ -m comment --comment "AWS, outbound connections" -j AWS-CONNMARK-CHAIN-0
-A PREROUTING -m comment --comment "AWS, CONNMARK" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -m comment --comment "AWS SNAT CHAIN" -j AWS-SNAT-CHAIN-0
-A AWS-CONNMARK-CHAIN-0 ! -d 192.168.0.0/16 -m comment --comment "AWS CONNMARK CHAIN, VPC CIDR" -j AWS-CONNMARK-CHAIN-1
-A AWS-CONNMARK-CHAIN-1 -m comment --comment "AWS, CONNMARK" -j CONNMARK --set-xmark 0x80/0x80
-A AWS-SNAT-CHAIN-0 ! -d 192.168.0.0/16 -m comment --comment "AWS SNAT CHAIN" -j AWS-SNAT-CHAIN-1
-A AWS-SNAT-CHAIN-1 ! -o vlan+ -m comment --comment "AWS, SNAT" -m addrtype ! --dst-type LOCAL -j SNAT --to-source 192.168.14.191 --random-fully

# mangle
-A PREROUTING -i eth0 -m comment --comment "AWS, primary ENI" -m addrtype --dst-type LOCAL --limit-iface-in -j CONNMARK --set-xmark 0x80/0x80
-A PREROUTING -i eni+ -m comment --comment "AWS, primary ENI" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80
-A PREROUTING -i vlan+ -m comment --comment "AWS, primary ENI" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80

NodePort support disabled

Default build (v1.10.1-eksbuild.1)

# nat
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -i eni+ -m comment --comment "AWS, outbound connections" -m state --state NEW -j AWS-CONNMARK-CHAIN-0
-A PREROUTING -m comment --comment "AWS, CONNMARK" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -m comment --comment "AWS SNAT CHAIN" -j AWS-SNAT-CHAIN-0
-A AWS-CONNMARK-CHAIN-0 ! -d 192.168.0.0/16 -m comment --comment "AWS CONNMARK CHAIN, VPC CIDR" -j AWS-CONNMARK-CHAIN-1
-A AWS-CONNMARK-CHAIN-1 -m comment --comment "AWS, CONNMARK" -j CONNMARK --set-xmark 0x80/0x80
-A AWS-SNAT-CHAIN-0 ! -d 192.168.0.0/16 -m comment --comment "AWS SNAT CHAIN" -j AWS-SNAT-CHAIN-1
-A AWS-SNAT-CHAIN-1 ! -o vlan+ -m comment --comment "AWS, SNAT" -m addrtype ! --dst-type LOCAL -j SNAT --to-source 192.168.14.191 --random-fully

# mangle
# empty

After upgrade to patched build

# nat
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -i eni+ -m comment --comment "AWS, outbound connections" -m state --state NEW -j AWS-CONNMARK-CHAIN-0
-A PREROUTING -i eni+ -m comment --comment "AWS, outbound connections" -j AWS-CONNMARK-CHAIN-0
-A PREROUTING -m comment --comment "AWS, CONNMARK" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -m comment --comment "AWS SNAT CHAIN" -j AWS-SNAT-CHAIN-0
-A AWS-CONNMARK-CHAIN-0 ! -d 192.168.0.0/16 -m comment --comment "AWS CONNMARK CHAIN, VPC CIDR" -j AWS-CONNMARK-CHAIN-1
-A AWS-CONNMARK-CHAIN-1 -m comment --comment "AWS, CONNMARK" -j CONNMARK --set-xmark 0x80/0x80
-A AWS-SNAT-CHAIN-0 ! -d 192.168.0.0/16 -m comment --comment "AWS SNAT CHAIN" -j AWS-SNAT-CHAIN-1
-A AWS-SNAT-CHAIN-1 ! -o vlan+ -m comment --comment "AWS, SNAT" -m addrtype ! --dst-type LOCAL -j SNAT --to-source 192.168.14.191 --random-fully

# mangle
-A PREROUTING -i eni+ -m comment --comment "AWS, primary ENI" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80

@antoninbas
Copy link
Contributor Author

@kishorj friendly ping for this

@kishorj
Copy link
Contributor

kishorj commented Aug 16, 2022

3. keep the patch as it is, since the duplicated rule doesn't impact correctness. On new K8s clusters and after a Node restart, there will be no duplicates.

This option is acceptable to me

Option 1 is cleaner, it removes the old rule.

@@ -571,15 +573,15 @@ func (n *linuxNetwork) buildIptablesConnmarkRules(vpcCIDRs []string, ipt iptable
}

var iptableRules []iptablesRule
log.Debugf("Setup Host Network: iptables -t nat -A PREROUTING -i %s+ -m comment --comment \"AWS, outbound connections\" -m state --state NEW -j AWS-CONNMARK-CHAIN-0", n.vethPrefix)
log.Debugf("Setup Host Network: iptables -t nat -A PREROUTING -i %s+ -m comment --comment \"AWS, outbound connections\" -j AWS-CONNMARK-CHAIN-0", n.vethPrefix)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the old rule with redundant state NEW, we can force delete. Append the old rule with shouldExist field set to false.

Tests:

  • Add a new UT with old rule, verify the old one gets deleted
  • Update the remaining UTs to refer to the new rule
  • Upgrading from old ds to new, old rule should not be there
  • once the new rules get added, restarting the DS should be fine - no errors in the logs

@kishorj
Copy link
Contributor

kishorj commented Aug 16, 2022

networking.go:703 has the following log line - log.Debugf("host network setup: found potentially stale SNAT rule for chain %s: %v", chain, ruleSpec).
I greatly appreciate if you can modify it as follows:
log.Debugf("host network setup: found potentially stale rule for chain %s: %v", chain, ruleSpec)

@antoninbas
Copy link
Contributor Author

@kishorj I will update the PR with all your feedback some time this week

* the rules that need to be installed for NodePort support and SNAT
  support are very similar. The same traffic mark is needed for both. As
  a result, rules that are currently installed only when NodePort
  support is enabled should also be installed when external SNAT is
  disabled, which is the case by default.
* remove "-m state --state NEW" from a rule in the nat table. This is
  always true for packets that traverse the nat table.
* fix typo in one rule's name (extra whitespace).

Fixes aws#2025

Co-authored-by: Quan Tian <[email protected]>

Signed-off-by: Antonin Bas <[email protected]>
Signed-off-by: Antonin Bas <[email protected]>
* Delete legacy nat rule
* Fix an unrelated log message

Signed-off-by: Antonin Bas <[email protected]>
@antoninbas antoninbas force-pushed the add-missing-rules-when-node-port-support-is-disabled branch from 6fe5050 to df0e125 Compare August 19, 2022 22:37
@antoninbas
Copy link
Contributor Author

@kishorj I updated this PR with the changes you requested (it's all in the most recent commit)

I also ran the requested tests on an EKS cluster.

  1. start with old DS

Here are the PREROUTING chains for the nat table:

-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -i eni+ -m comment --comment "AWS, outbound connections" -m state --state NEW -j AWS-CONNMARK-CHAIN-0
-A PREROUTING -m comment --comment "AWS, CONNMARK" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80
  1. update to the new DS

Here are the PREROUTING chains for the nat table:

-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -i eni+ -m comment --comment "AWS, outbound connections" -j AWS-CONNMARK-CHAIN-0
-A PREROUTING -m comment --comment "AWS, CONNMARK" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80

The rule has been updated correctly.

  1. restart the DS Pod and check the logs
$ kubectl -n kube-system logs pod/aws-node-b88wv -c aws-node
{"level":"info","ts":"2022-08-19T22:36:08.333Z","caller":"entrypoint.sh","msg":"Validating env variables ..."}
{"level":"info","ts":"2022-08-19T22:36:08.334Z","caller":"entrypoint.sh","msg":"Install CNI binaries.."}
{"level":"info","ts":"2022-08-19T22:36:08.350Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "}
{"level":"info","ts":"2022-08-19T22:36:08.352Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}
{"level":"info","ts":"2022-08-19T22:36:10.363Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-08-19T22:36:12.370Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-08-19T22:36:12.391Z","caller":"entrypoint.sh","msg":"Copying config file ... "}
{"level":"info","ts":"2022-08-19T22:36:12.395Z","caller":"entrypoint.sh","msg":"Successfully copied CNI plugin binary and config file."}
{"level":"info","ts":"2022-08-19T22:36:12.397Z","caller":"entrypoint.sh","msg":"Foregrounding IPAM daemon ..."}
$ kubectl -n kube-system logs pod/aws-node-b88wv -c aws-vpc-cni-init
Copying CNI plugin binaries ...
Configure rp_filter loose...
net.ipv4.conf.eth0.rp_filter = 2
2
net.ipv4.tcp_early_demux = 1
CNI init container done

@kishorj
Copy link
Contributor

kishorj commented Aug 19, 2022

/lgtm

@antoninbas
Copy link
Contributor Author

@kishorj I see that the CI tests are not running. I think you may have to approve that on your side since I am a first time contributor to this repo.

@jayanthvn
Copy link
Contributor

@antoninbas - We have disabled the CI tests for now. I will discuss with @kishorj and see if we can run it manually before merge.

@antoninbas
Copy link
Contributor Author

@antoninbas - We have disabled the CI tests for now. I will discuss with @kishorj and see if we can run it manually before merge.

Ok. For what it's worth, I have run the unit tests locally already (with make docker-unit-tests).

@jayanthvn
Copy link
Contributor

@antoninbas - We have disabled the CI tests for now. I will discuss with @kishorj and see if we can run it manually before merge.

Ok. For what it's worth, I have run the unit tests locally already (with make docker-unit-tests).

Thank you :) But we will have to also run the integration tests. I will see if we can manually trigger it for now :)

@jayanthvn
Copy link
Contributor

Sorry for the delay, we had disabled the GitHub runners. Now it is up and running, I have submitted for integration tests - https://github.com/aws/amazon-vpc-cni-k8s/actions/runs/3139125467

/cc @kishorj

@jayanthvn
Copy link
Contributor

jayanthvn commented Sep 28, 2022

2 attempts of Upstream conformance tests are failing - https://github.com/aws/amazon-vpc-cni-k8s/actions/runs/3139125467. I can look into it next week.

@sushrk
Copy link
Contributor

sushrk commented Oct 31, 2022

I ran the conformance tests locally (after updating the branch), and can confirm that all the tests were passing.
I will re-run the tests in Github.

git log
commit 85e2ef5ac82dfd3f79f2a5b05ec52f485de5a0c0 (HEAD -> add-missing-rules-when-node-port-support-is-disabled)

Ran 19 of 5771 Specs in 1065.502 seconds
SUCCESS! -- 19 Passed | 0 Failed | 0 Flaked | 0 Pending | 5752 Skipped
PASS

@sushrk
Copy link
Contributor

sushrk commented Oct 31, 2022

@sushrk
Copy link
Contributor

sushrk commented Nov 1, 2022

The latest run of the integration tests have passed: https://github.com/aws/amazon-vpc-cni-k8s/actions/runs/3364855441/jobs/5582884522

@sushrk sushrk merged commit f21b587 into aws:master Nov 1, 2022
@jayanthvn jayanthvn added this to the v1.12.1 milestone Nov 23, 2022
jdn5126 added a commit that referenced this pull request Dec 12, 2022
* create publisher with logger (#2119)

* Add missing rules when NodePort support is disabled (#2026)

* Add missing rules when NodePort support is disabled

* the rules that need to be installed for NodePort support and SNAT
  support are very similar. The same traffic mark is needed for both. As
  a result, rules that are currently installed only when NodePort
  support is enabled should also be installed when external SNAT is
  disabled, which is the case by default.
* remove "-m state --state NEW" from a rule in the nat table. This is
  always true for packets that traverse the nat table.
* fix typo in one rule's name (extra whitespace).

Fixes #2025

Co-authored-by: Quan Tian <[email protected]>

Signed-off-by: Antonin Bas <[email protected]>

* Fix typos and unit tests

Signed-off-by: Antonin Bas <[email protected]>

* Minor improvement to code comment

Signed-off-by: Antonin Bas <[email protected]>

* Address review comments

* Delete legacy nat rule
* Fix an unrelated log message

Signed-off-by: Antonin Bas <[email protected]>

Signed-off-by: Antonin Bas <[email protected]>
Co-authored-by: Jayanth Varavani <[email protected]>
Co-authored-by: Sushmitha Ravikumar <[email protected]>

* downgrade test go.mod to align with root go.mod (#2128)

* skip addon installation when addon info is not available (#2131)

* Merging test/Makefile and test/go.mod to the root Makefil and go.mod, adjust the .github/workflows and integration test instructions (#2129)

* update troubleshooting docs for CNI image (#2132)

fix location where make command is run

* fix env name in test script (#2136)

* optionally allow CLUSTER_ENDPOINT to be used rather than the cluster-ip (#2138)

* optionally allow CLUSTER_ENDPOINT to be used rather than the kubernetes cluster ip

* remove check for kube-proxy

* add version to readme

* Add resources config option to cni metrics helper (#2141)

* Add resources config option to cni metrics helper

* Remove default-empty resources block; replace with conditional

* Add metrics for ec2 api calls made by CNI and expose via prometheus (#2142)

Co-authored-by: Jay Deokar <[email protected]>

* increase workflow role duration to 4 hours (#2148)

* Update golang 1.19.2 EKS-D (#2147)

* Update golang

* Move to EKS distro builds

* [HELM]: Move CRD resources to a separate folder as per helm standard (#2144)

Co-authored-by: Jay Deokar <[email protected]>

* VPC-CNI minimal image builds (#2146)

* VPC-CNI minimal image builds

* update dependencies for ginkgo when running integration tests

* address review comments and break up init main function

* review comments for sysctl

* Simplify binary installation, fix review comments

Since init container is required to always run, let binary installation
for external plugins happen in init container. This simplifies the main
container entrypoint and the dockerfile for each image.

* when IPAMD connection fails, try to teardown pod network using prevResult (#2145)

* add env var to enable nftables (#2155)

* fix failing weekly cron tests (#2154)

* Deprecate AWS_VPC_K8S_CNI_CONFIGURE_RPFILTER and remove no-op setter (#2153)

* Deprecate AWS_VPC_K8S_CNI_CONFIGURE_RPFILTER

* update release version comments

Signed-off-by: Antonin Bas <[email protected]>
Co-authored-by: Jeffrey Nelson <[email protected]>
Co-authored-by: Antonin Bas <[email protected]>
Co-authored-by: Jayanth Varavani <[email protected]>
Co-authored-by: Sushmitha Ravikumar <[email protected]>
Co-authored-by: Jerry He <[email protected]>
Co-authored-by: Brandon Wagner <[email protected]>
Co-authored-by: Jonathan Ogilvie <[email protected]>
Co-authored-by: Jay Deokar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switching AWS_VPC_CNI_NODE_PORT_SUPPORT to false breaks SNAT for Pods
4 participants