Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.15-backport] ipsec: Fix unencrypted traffic when IPsec is used with L7 egress proxy #32932

Merged
merged 3 commits into from
Jun 7, 2024

Conversation

jschwinger233
Copy link
Member

@jschwinger233 jschwinger233 commented Jun 6, 2024

@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.15 This PR represents a backport for Cilium 1.15.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. labels Jun 6, 2024
@jschwinger233
Copy link
Member Author

/ci-ipsec-upgrade

[ upstream commit: f93a40c ]

We have an iptables rule to set 0x200 mark for transparent socket:

```
*mangle
-A PREROUTING -m comment --comment "cilium-feeder: CILIUM_PRE_mangle" -j CILIUM_PRE_mangle
-A CILIUM_PRE_mangle -m socket --transparent -m mark ! --mark 0xe00/0xf00 -m comment --comment "cilium: any->pod redirect proxied traffic to host proxy" -j MARK --set-xmark 0x200/0xffffffff
```

This rule is in the mangle PREROUTING which checks packets ingressed
from a netdev.

Let's then focus on the pod to world traffic when IPsec=on + proxy=on +
tunnel=off.

Currently, a pod-to-world packet will go through the path:
1. from_lxc@lxc: skb->mark is set to 0x200 and returned to stack
2. iptables: skb is hijacked by tproxy (due to 0x200), to be accepted by proxy
3. proxy process: the old skb is consumed by proxy, an new skb is sent to upstream (world)
4. stack routing: the new skb is routed to eth0
5. stack iptables: the new skb is traversing OUTPUT chain and POSTROUTING chain
6. to_netdev@eth0: the new skb is going to world

Please note the new skb won't hit PREROUTING chain, where there is a
rule setting skb->mark=0x200.

To fix #31984, we are going to
change the routing for packets from egress proxy; consequently, on the
step 4 above, the new skb will be routed to cilium_host instead:

4. stack routing: the new skb is routed to cilium_host
5. from_host@cilium_host: the new skb is returned to stack
6. to_host@cilium_net: the new skb is returned to stack
7. stack: PREROUTING, routing, FORWARD, POSTROUTING

Look at step 7, we are hitting PREROUTING! Because of
cilium/proxy#742, this to-world skb is also
linked to a transparent socket, matching the "-m socket --transparent"
condition, the packet will fortunately have the 0x200 mark.

If we do nothing, this to-world skb marked with 0x200 will then hit
routiong rule "from all fwmark 0x200/0xf00 lookup 2004" and be routed to
local. It should have gone to the world.

This patch fixes this future issue as a precaution (otherwise we'll
break git-bisect).

This patch provides a straightforward solution: at step 5
from_host@cilium_host, we set a specical mark 0x800
(MARK_MAGIC_PROXY_TO_WORLD), then iptables can exclude this mark using
"-m mark ! --mark 0x800/0xf00".

Signed-off-by: gray <[email protected]>
[ upstream commit: 3384d73 ]

After cilium/proxy#742, proxy traffic keeps
original pod IP as source IP for to-world packets, which must be
masqueraded to eth0 IP. There is no issue for now, but the new
routing rule (0xb00 lookup 2005) to be added for #31984
will cause a side effect breaking masquerading. This patch fixes the
that side effect as a precaution, otherwise git-bisect breaks.

The new routing rule (0xb00 lookup 2005) will cause proxy packets going
through POSTROUTING for twice: first time happens when proxy sends
packets which are routed to cilium_host, these are hitting OUTPUT +
**POSTROUTING**; the second time takes place after packets ingressed
from cilium_net, these skbs will traverse PREROUTING + FORWARD +
**POSTROUTING**.

However, due to kernel's implementation details, an skb won't be
processed by nat POSTROUTING for twice: after the first POSTROUTING
check, skb's ct `(struct nf_conn*)(skb->_nfct & ~7)` has a status
IPS_SRC_NAT_DONE to skip the further traversal at all. [1]

To avoid being set the IPS_SRC_NAT_DONE flag, this patch adds an
iptables rule `--mark 0xb00 -j CT --notrack` at OUTPUT to skip the first
round iptables ct, just for proxy traffic which is characterized by
0xb00 mark.

[1] https://elixir.bootlin.com/linux/v6.6.2/source/net/netfilter/nf_nat_core.c#L825
[1] https://elixir.bootlin.com/linux/v6.6.2/source/include/net/netfilter/nf_nat.h#L111

Signed-off-by: gray <[email protected]>
@jschwinger233 jschwinger233 force-pushed the pr/gray/1.15/egress-proxy-ipsec-fix2 branch from 1ae3b83 to c05c2e6 Compare June 6, 2024 11:26
@jschwinger233
Copy link
Member Author

/ci-ipsec-upgrade

[ upstream commit: 1ce4c7f ]

This commit installs "0xb00/0xf00 lookup 2005" routing rule when IPsec
is enabled with native routing and envoy. This is a necessary step
towards fixing encryption leaks, otherwise egress proxy's return traffic
gets no chance to be set IPsec mark. The new routing rule ensures these
packets are routed to cilium_host, where we have bpf_host to handle
encryption datapath.

This patch uses a different condition from requireFromProxyRoutes() to
determine whether to install the new routing rule, otherwise we
will see breakage on IPsec=off + envoy=on. Specially, the new routing
rule is isolated to IPsec only.

Signed-off-by: gray <[email protected]>
@jschwinger233 jschwinger233 force-pushed the pr/gray/1.15/egress-proxy-ipsec-fix2 branch from c05c2e6 to 4cebdfd Compare June 6, 2024 11:35
@jschwinger233
Copy link
Member Author

/test-backport-1.15

@jschwinger233
Copy link
Member Author

/test-backport-1.15

@jschwinger233 jschwinger233 changed the title Pr/gray/1.15/egress proxy ipsec fix2 [1.14-backport] ipsec: Fix unencrypted traffic when IPsec is used with L7 egress proxy Jun 7, 2024
@jschwinger233 jschwinger233 changed the title [1.14-backport] ipsec: Fix unencrypted traffic when IPsec is used with L7 egress proxy [1.15-backport] ipsec: Fix unencrypted traffic when IPsec is used with L7 egress proxy Jun 7, 2024
@jschwinger233
Copy link
Member Author

CI is almost green except ci-ipsec-e2e https://github.com/cilium/cilium/actions/runs/9411867166/job/25925849574

image

ci-ipsec-e2e's failure is a known racing issue of leak detection that will be fixed by #32930.

I'll drop the last three leak detection patches and mark ready for review.

@jschwinger233 jschwinger233 force-pushed the pr/gray/1.15/egress-proxy-ipsec-fix2 branch from 4cebdfd to 2cb2ae1 Compare June 7, 2024 07:44
@jschwinger233
Copy link
Member Author

/test-backport-1.15

@jschwinger233 jschwinger233 marked this pull request as ready for review June 7, 2024 09:43
@jschwinger233 jschwinger233 requested a review from a team as a code owner June 7, 2024 09:43
Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thank you!

@julianwiedmann julianwiedmann merged commit 5197d4c into v1.15 Jun 7, 2024
249 checks passed
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 7, 2024
@julianwiedmann julianwiedmann deleted the pr/gray/1.15/egress-proxy-ipsec-fix2 branch June 7, 2024 10:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.15 This PR represents a backport for Cilium 1.15.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
No open projects
Status: Released
Development

Successfully merging this pull request may close these issues.

2 participants