Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect MTU from skb->_skb_refdst #388

Merged
merged 1 commit into from
Jul 7, 2024
Merged

Conversation

jschwinger233
Copy link
Member

@jschwinger233 jschwinger233 commented Jun 14, 2024

According to source code, kernel uses MTU from skb->_skb_refdst. Let pwru collect MTU from there using the same logic.

It requires to cast skb->_skb_refdst to dst_entry*, then fetch dst_metric_raw(dst, RTAX_MTU) and dst->dev->mtu.

With this patch, pwru can output the correct MTU used by OS.

Case 1: Cilium could reduce the route MTU to 1423 inside a pod. This couldn't be detected by pwru until this patch because it only checked link MTU.

Case 2: Xfrm could reduce the route MTU to 1446 in ip_forward(). Pwru must inspect route MTU from skb->_skb_dstref to understand that, this patch makes it happen.

// https://elixir.bootlin.com/linux/v6.5/source/net/ipv4/ip_forward.c#L86
// net/ipv4/ip_forward.c
int ip_forward(struct sk_buff *skb)
{
[...]
	rt = skb_rtable(skb);
[...]
	mtu = ip_dst_mtu_maybe_forward(&rt->dst, true);
	if (ip_exceeds_mtu(skb, mtu)) {
		IP_INC_STATS(net, IPSTATS_MIB_FRAGFAILS);
		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
			  htonl(mtu));
		SKB_DR_SET(reason, PKT_TOO_BIG);
		goto drop;
	}
[...]
}

// include/linux/skbuff.h
static inline struct rtable *skb_rtable(const struct sk_buff *skb)
{
	return (struct rtable *)skb_dst(skb);
}

// include/linux/skbuff.h
static inline struct dst_entry *skb_dst(const struct sk_buff *skb)
{
[...]
	return (struct dst_entry *)(skb->_skb_refdst & SKB_DST_PTRMASK);
}

// include/net/ip.h
static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
						    bool forwarding)
{
[...]
	mtu = dst_metric_raw(dst, RTAX_MTU);
	if (mtu)
		goto out;

	mtu = READ_ONCE(dst->dev->mtu);
[...]

// include/net/dst.h
#define DST_METRICS_FLAGS		0x3UL
#define __DST_METRICS_PTR(Y)	\
	((u32 *)((Y) & ~DST_METRICS_FLAGS))
#define DST_METRICS_PTR(X)	__DST_METRICS_PTR((X)->_metrics)

}

@jschwinger233 jschwinger233 marked this pull request as ready for review June 14, 2024 10:49
@jschwinger233 jschwinger233 requested a review from a team as a code owner June 14, 2024 10:49
@jschwinger233 jschwinger233 requested review from brb and removed request for a team June 14, 2024 10:49
@jschwinger233 jschwinger233 marked this pull request as draft June 17, 2024 17:30
According to source code, kernel uses MTU from skb->_skb_refdst. Let
pwru collect MTU from there using the same logic.

It requires to cast skb->_skb_refdst to dst_entry*, then fetch
dst_metric_raw(dst, RTAX_MTU) and dst->dev->mtu.

```
// https://elixir.bootlin.com/linux/v6.5/source/net/ipv4/ip_forward.c#L86
// net/ipv4/ip_forward.c
int ip_forward(struct sk_buff *skb)
{
[...]
	rt = skb_rtable(skb);
[...]
	mtu = ip_dst_mtu_maybe_forward(&rt->dst, true);
	if (ip_exceeds_mtu(skb, mtu)) {
		IP_INC_STATS(net, IPSTATS_MIB_FRAGFAILS);
		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
			  htonl(mtu));
		SKB_DR_SET(reason, PKT_TOO_BIG);
		goto drop;
	}
[...]
}

// include/linux/skbuff.h
static inline struct rtable *skb_rtable(const struct sk_buff *skb)
{
	return (struct rtable *)skb_dst(skb);
}

// include/linux/skbuff.h
static inline struct dst_entry *skb_dst(const struct sk_buff *skb)
{
[...]
	return (struct dst_entry *)(skb->_skb_refdst & SKB_DST_PTRMASK);
}

// include/net/ip.h
static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
						    bool forwarding)
{
[...]
	mtu = dst_metric_raw(dst, RTAX_MTU);
	if (mtu)
		goto out;

	mtu = READ_ONCE(dst->dev->mtu);
[...]
}

// include/net/dst.h
	((u32 *)((Y) & ~DST_METRICS_FLAGS))
```

With this patch, pwru can output the correct MTU used by OS.

Case 1: Cilium could reduce the route MTU to 1423 inside a pod. This
can't be detected by pwru because it only checks link MTU.

Case 2: Xfrm could reduce the route MTU to 1446 in ip_forward(). Pwru
must inspect route MTU from skb->_skb_dstref to understand that.

Signed-off-by: gray <[email protected]>
@jschwinger233 jschwinger233 marked this pull request as ready for review June 18, 2024 05:28
Copy link
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@brb brb merged commit b33c3bb into cilium:main Jul 7, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants