-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1.15-backport] ipsec: Fix unencrypted traffic when IPsec is used with L7 egress proxy #32932
Conversation
jschwinger233
commented
Jun 6, 2024
•
edited
Loading
edited
- ipsec: Fix unencrypted traffic when IPsec is used with L7 egress proxy #32683 @jschwinger233
/ci-ipsec-upgrade |
[ upstream commit: f93a40c ] We have an iptables rule to set 0x200 mark for transparent socket: ``` *mangle -A PREROUTING -m comment --comment "cilium-feeder: CILIUM_PRE_mangle" -j CILIUM_PRE_mangle -A CILIUM_PRE_mangle -m socket --transparent -m mark ! --mark 0xe00/0xf00 -m comment --comment "cilium: any->pod redirect proxied traffic to host proxy" -j MARK --set-xmark 0x200/0xffffffff ``` This rule is in the mangle PREROUTING which checks packets ingressed from a netdev. Let's then focus on the pod to world traffic when IPsec=on + proxy=on + tunnel=off. Currently, a pod-to-world packet will go through the path: 1. from_lxc@lxc: skb->mark is set to 0x200 and returned to stack 2. iptables: skb is hijacked by tproxy (due to 0x200), to be accepted by proxy 3. proxy process: the old skb is consumed by proxy, an new skb is sent to upstream (world) 4. stack routing: the new skb is routed to eth0 5. stack iptables: the new skb is traversing OUTPUT chain and POSTROUTING chain 6. to_netdev@eth0: the new skb is going to world Please note the new skb won't hit PREROUTING chain, where there is a rule setting skb->mark=0x200. To fix #31984, we are going to change the routing for packets from egress proxy; consequently, on the step 4 above, the new skb will be routed to cilium_host instead: 4. stack routing: the new skb is routed to cilium_host 5. from_host@cilium_host: the new skb is returned to stack 6. to_host@cilium_net: the new skb is returned to stack 7. stack: PREROUTING, routing, FORWARD, POSTROUTING Look at step 7, we are hitting PREROUTING! Because of cilium/proxy#742, this to-world skb is also linked to a transparent socket, matching the "-m socket --transparent" condition, the packet will fortunately have the 0x200 mark. If we do nothing, this to-world skb marked with 0x200 will then hit routiong rule "from all fwmark 0x200/0xf00 lookup 2004" and be routed to local. It should have gone to the world. This patch fixes this future issue as a precaution (otherwise we'll break git-bisect). This patch provides a straightforward solution: at step 5 from_host@cilium_host, we set a specical mark 0x800 (MARK_MAGIC_PROXY_TO_WORLD), then iptables can exclude this mark using "-m mark ! --mark 0x800/0xf00". Signed-off-by: gray <[email protected]>
[ upstream commit: 3384d73 ] After cilium/proxy#742, proxy traffic keeps original pod IP as source IP for to-world packets, which must be masqueraded to eth0 IP. There is no issue for now, but the new routing rule (0xb00 lookup 2005) to be added for #31984 will cause a side effect breaking masquerading. This patch fixes the that side effect as a precaution, otherwise git-bisect breaks. The new routing rule (0xb00 lookup 2005) will cause proxy packets going through POSTROUTING for twice: first time happens when proxy sends packets which are routed to cilium_host, these are hitting OUTPUT + **POSTROUTING**; the second time takes place after packets ingressed from cilium_net, these skbs will traverse PREROUTING + FORWARD + **POSTROUTING**. However, due to kernel's implementation details, an skb won't be processed by nat POSTROUTING for twice: after the first POSTROUTING check, skb's ct `(struct nf_conn*)(skb->_nfct & ~7)` has a status IPS_SRC_NAT_DONE to skip the further traversal at all. [1] To avoid being set the IPS_SRC_NAT_DONE flag, this patch adds an iptables rule `--mark 0xb00 -j CT --notrack` at OUTPUT to skip the first round iptables ct, just for proxy traffic which is characterized by 0xb00 mark. [1] https://elixir.bootlin.com/linux/v6.6.2/source/net/netfilter/nf_nat_core.c#L825 [1] https://elixir.bootlin.com/linux/v6.6.2/source/include/net/netfilter/nf_nat.h#L111 Signed-off-by: gray <[email protected]>
1ae3b83
to
c05c2e6
Compare
/ci-ipsec-upgrade |
[ upstream commit: 1ce4c7f ] This commit installs "0xb00/0xf00 lookup 2005" routing rule when IPsec is enabled with native routing and envoy. This is a necessary step towards fixing encryption leaks, otherwise egress proxy's return traffic gets no chance to be set IPsec mark. The new routing rule ensures these packets are routed to cilium_host, where we have bpf_host to handle encryption datapath. This patch uses a different condition from requireFromProxyRoutes() to determine whether to install the new routing rule, otherwise we will see breakage on IPsec=off + envoy=on. Specially, the new routing rule is isolated to IPsec only. Signed-off-by: gray <[email protected]>
c05c2e6
to
4cebdfd
Compare
/test-backport-1.15 |
/test-backport-1.15 |
CI is almost green except ci-ipsec-e2e https://github.com/cilium/cilium/actions/runs/9411867166/job/25925849574 ci-ipsec-e2e's failure is a known racing issue of leak detection that will be fixed by #32930. I'll drop the last three leak detection patches and mark ready for review. |
4cebdfd
to
2cb2ae1
Compare
/test-backport-1.15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, thank you!