Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
As Eric Dumazet pointed out this also needs to be fixed in IPv6. v2: Contains the IPv6 tcp/Ipv6 dccp patches as well. We have seen a few incidents lately where a dst_enty has been freed with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that dst_entry. If the conditions/timings are right a crash then ensues when the freed dst_entry is referenced later on. A Common crashing back trace is: NigelCunningham#8 [] page_fault at ffffffff8163e648 [exception RIP: __tcp_ack_snd_check+74] . . NigelCunningham#9 [] tcp_rcv_established at ffffffff81580b64 NigelCunningham#10 [] tcp_v4_do_rcv at ffffffff8158b54a NigelCunningham#11 [] tcp_v4_rcv at ffffffff8158cd02 NigelCunningham#12 [] ip_local_deliver_finish at ffffffff815668f4 NigelCunningham#13 [] ip_local_deliver at ffffffff81566bd9 NigelCunningham#14 [] ip_rcv_finish at ffffffff8156656d NigelCunningham#15 [] ip_rcv at ffffffff81566f06 NigelCunningham#16 [] __netif_receive_skb_core at ffffffff8152b3a2 NigelCunningham#17 [] __netif_receive_skb at ffffffff8152b608 NigelCunningham#18 [] netif_receive_skb at ffffffff8152b690 NigelCunningham#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3] NigelCunningham#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3] NigelCunningham#21 [] net_rx_action at ffffffff8152bac2 NigelCunningham#22 [] __do_softirq at ffffffff81084b4f NigelCunningham#23 [] call_softirq at ffffffff8164845c NigelCunningham#24 [] do_softirq at ffffffff81016fc5 NigelCunningham#25 [] irq_exit at ffffffff81084ee5 NigelCunningham#26 [] do_IRQ at ffffffff81648ff8 Of course it may happen with other NIC drivers as well. It's found the freed dst_entry here: 224 static bool tcp_in_quickack_mode(struct sock *sk)↩ 225 {↩ 226 ▹ const struct inet_connection_sock *icsk = inet_csk(sk);↩ 227 ▹ const struct dst_entry *dst = __sk_dst_get(sk);↩ 228 ↩ 229 ▹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩ 230 ▹ ▹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩ 231 }↩ But there are other backtraces attributed to the same freed dst_entry in netfilter code as well. All the vmcores showed 2 significant clues: - Remote hosts behind the default gateway had always been redirected to a different gateway. A rtable/dst_entry will be added for that host. Making more dst_entrys with lower reference counts. Making this more probable. - All vmcores showed a postitive LockDroppedIcmps value, e.g: LockDroppedIcmps 267 A closer look at the tcp_v4_err() handler revealed that do_redirect() will run regardless of whether user space has the socket locked. This can result in a race condition where the same dst_entry cached in sk->sk_dst_entry can be decremented twice for the same socket via: do_redirect()->__sk_dst_check()-> dst_release(). Which leads to the dst_entry being prematurely freed with another socket pointing to it via sk->sk_dst_cache and a subsequent crash. To fix this skip do_redirect() if usespace has the socket locked. Instead let the redirect take place later when user space does not have the socket locked. The dccp/IPv6 code is very similar in this respect, so fixing it there too. As Eric Garver pointed out the following commit now invalidates routes. Which can set the dst->obsolete flag so that ipv4_dst_check() returns null and triggers the dst_release(). Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.") Cc: Eric Garver <[email protected]> Cc: Hannes Sowa <[email protected]> Signed-off-by: Jon Maxwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
- Loading branch information