Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use-after-free in tcp_rearm_rto #693

Closed
keshonok opened this issue Mar 3, 2017 · 3 comments
Closed

use-after-free in tcp_rearm_rto #693

keshonok opened this issue Mar 3, 2017 · 3 comments

Comments

@keshonok
Copy link
Contributor

keshonok commented Mar 3, 2017

The setup:

  • Tempesta's current master at c40924b;
  • Tempesta kernel 4.8.15 built with debugging and KASAN support;
  • Both Tempesta and Nginx run in the same VM;
  • Nginx runs with default configuration where is resets connections every 100 requests;
  • Simplest Tempesta configuration:
server 127.0.0.1:8080

The configuration means that Tempesta bypasses cache on all requests. All requests are forwarded to the back-end server without exceptions.

The test is a simple wrk -t 8 -c 1024 -d 60s http://192.168.10.230 from the host machine to the VM with Tempesta and Nginx. The bug may not present itself immediately. I may require several test runs to get it.

The current master at c40924b is the result of the merge of #660. It's likely that this is a bug introduced with that merge. The traces lead to believe that sk pointer to a socket is used after the socket was freed. Please see the traces and the preliminary analysis below.

The kernel (KASAN) reports the following:

[208019.986397] ==================================================================
[208019.987005] BUG: KASAN: use-after-free in tcp_rearm_rto.part.59+0x24a/0x2a0 at addr ffff88013a19e014
[208019.987005] Read of size 4 by task nginx/9565
[208019.987005] page:ffffea0004e86780 count:0 mapcount:0 mapping:          (null) index:0x0
[208019.987005] flags: 0x1000000000000000()
[208019.987005] page dumped because: kasan: bad access detected
[208019.987005] CPU: 0 PID: 9565 Comm: nginx Tainted: G    B   W  O    4.8.15-ab+ #2
[208019.987005] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[208019.987005]  ffff88013a19e000 ffff8801f6e07ab8 ffffffff8199f263 ffff8801f6e07b48
[208019.987005]  ffff88013a19e014 ffff8801f6e07b38 ffffffff8151c33e ffff880134cbf8f8
[208019.987005]  0000000000000000 ffff8801f6e07c00 0000000000000296 ffff8801ceb2cc80
[208019.987005] Call Trace:
[208019.987005]  <IRQ>  [<ffffffff8199f263>] dump_stack+0x67/0x94
[208019.987005]  [<ffffffff8151c33e>] kasan_report_error+0x4ae/0x4e0
[208019.987005]  [<ffffffff8151c453>] __asan_report_load4_noabort+0x43/0x50
[208019.987005]  [<ffffffff81f60800>] ? skb_page_frag_refill+0x280/0x330
[208019.987005]  [<ffffffff820fc53a>] ? tcp_rearm_rto.part.59+0x24a/0x2a0
[208019.987005]  [<ffffffff820fc53a>] tcp_rearm_rto.part.59+0x24a/0x2a0
[208019.987005]  [<ffffffff821136d2>] tcp_rearm_rto+0x72/0xc0
[208019.987005]  [<ffffffff82126505>] tcp_event_new_data_sent+0x225/0x2c0
[208019.987005]  [<ffffffff8212ebd6>] tcp_write_xmit+0x596/0x4d10
[208019.987005]  [<ffffffff82142863>] ? tcp_v4_md5_lookup+0x13/0x20
[208019.987005]  [<ffffffff821241a4>] ? tcp_established_options+0x94/0x3f0
[208019.987005]  [<ffffffff82127e00>] ? tcp_sync_mss+0x5b0/0x5b0
[208019.987005]  [<ffffffff82133add>] __tcp_push_pending_frames+0x8d/0x2a0
[208019.987005]  [<ffffffff820dcce0>] tcp_push+0x460/0x740
[208019.987005]  [<ffffffff820dd1b4>] ? tcp_send_mss+0x24/0x2b0
[208019.987005]  [<ffffffffa2d4cdc2>] ss_tx_action+0x962/0x1210 [tempesta_fw]
[208019.987005]  [<ffffffffa2d4c460>] ? ss_send+0x5b0/0x5b0 [tempesta_fw]
[208019.987005]  [<ffffffff81fbc09a>] ? net_tx_action+0x25a/0x9d0
[208019.987005]  [<ffffffffa2d4c460>] ? ss_send+0x5b0/0x5b0 [tempesta_fw]
[208019.987005]  [<ffffffff81fbc084>] net_tx_action+0x244/0x9d0
[208019.987005]  [<ffffffff81fbc020>] ? net_tx_action+0x1e0/0x9d0
[208019.987005]  [<ffffffff8112932c>] ? irq_exit+0x15c/0x190
[208019.987005]  [<ffffffff822e72ac>] __do_softirq+0x22c/0x99f
[208019.987005]  [<ffffffff813d0c8e>] ? irq_work_run_list+0xbe/0x130
[208019.987005]  [<ffffffff81580fcd>] ? getname_flags+0xbd/0x510
[208019.987005]  [<ffffffff8112932c>] irq_exit+0x15c/0x190
[208019.987005]  [<ffffffff810b74c0>] smp_call_function_single_interrupt+0x70/0x90
[208019.987005]  [<ffffffff822e6249>] call_function_single_interrupt+0x89/0x90
[208019.987005]  <EOI>  [<ffffffff81580fcd>] ? getname_flags+0xbd/0x510
[208019.987005]  [<ffffffff81515722>] ? __slab_alloc+0x52/0x70
[208019.987005]  [<ffffffff81580fcd>] ? getname_flags+0xbd/0x510
[208019.987005]  [<ffffffff81516ad5>] kmem_cache_alloc+0x205/0x300
[208019.987005]  [<ffffffff81580fcd>] getname_flags+0xbd/0x510
[208019.987005]  [<ffffffff81581432>] getname+0x12/0x20
[208019.987005]  [<ffffffff8154c14f>] do_sys_open+0x13f/0x340
[208019.987005]  [<ffffffff81318525>] ? __audit_syscall_entry+0x325/0x6f0
[208019.987005]  [<ffffffff8154c010>] ? filp_open+0x60/0x60
[208019.987005]  [<ffffffff81004c11>] ? syscall_trace_enter+0x351/0x8e0
[208019.987005]  [<ffffffff8154c350>] ? do_sys_open+0x340/0x340
[208019.987005]  [<ffffffff8154c36e>] SyS_open+0x1e/0x20
[208019.987005]  [<ffffffff810062f6>] do_syscall_64+0x196/0x440
[208019.987005]  [<ffffffff8100301a>] ? trace_hardirqs_on_thunk+0x1a/0x1c
[208019.987005]  [<ffffffff822e445a>] entry_SYSCALL64_slow_path+0x25/0x25
[208019.987005] Memory state around the buggy address:
[208019.987005]  ffff88013a19df00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[208019.987005]  ffff88013a19df80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[208019.987005] >ffff88013a19e000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[208019.987005]                          ^
[208019.987005]  ffff88013a19e080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[208019.987005]  ffff88013a19e100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[208019.987005] ==================================================================

Then, the kernel crashes on NULL pointer dereference:

[208019.987005] ==================================================================
[208020.095667] kasan: CONFIG_KASAN_INLINE enabled
[208020.096277] kasan: GPF could be caused by NULL-ptr deref or user memory access
[208020.097247] general protection fault: 0000 [#1] SMP KASAN
[208020.097852] Modules linked in: tfw_sched_rr(O) tfw_sched_http(O) tfw_sched_hash(O) tempesta_fw(O) tempesta_db(O) tempesta_tls(O) ghash_clmulni_intel ppdev aesni_intel aes_x86_64 glue_helper 9pnet_virtio lrw gf128mul ablk_helper cryptd input_leds led_class virtio_balloon acpi_cpufreq parport_pc pcspkr 9pnet i2c_piix4 i2c_core parport ip_tables ata_generic pata_acpi crc32c_intel serio_raw virtio_pci virtio_ring e1000 virtio ata_piix floppy ipv6 crc_ccitt autofs4 [last unloaded: tempesta_tls]
[208020.098241] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G    B   W  O    4.8.15-ab+ #2
[208020.098241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[208020.098241] task: ffff8801d5ba1840 task.stack: ffff8801d5428000
[208020.098241] RIP: 0010:[<ffffffff820fd8b8>]  [<ffffffff820fd8b8>] tcp_clean_rtx_queue+0x4f8/0x2ce0
[208020.098241] RSP: 0018:ffff8801d542f1d8  EFLAGS: 00010202
[208020.098241] RAX: 0000000000000000 RBX: ffff88013bd35280 RCX: 0000000000000004
[208020.098241] RDX: 0000000000000001 RSI: ffff88013a19e010 RDI: 0000000000000008
[208020.098241] RBP: ffff8801d542f300 R08: ffff88013a19e035 R09: 0000000000000000
[208020.098241] R10: ffff88013bd35c00 R11: ffff8801d24144d2 R12: ffff88013a19e000
[208020.098241] R13: dffffc0000000000 R14: 0000000000000004 R15: 0000000000000000
[208020.098241] FS:  0000000000000000(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
[208020.098241] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[208020.098241] CR2: 000055c6149dfa58 CR3: 00000001c7272000 CR4: 00000000001406f0
[208020.098241] Stack:
[208020.098241]  0000000000000292 0000000000000292 0000000000000004 1ffff1003aa85e01
[208020.098241]  ffff88013bd35660 ffff88013bd35a00 000000003bd35b50 0c618e946ef691d0
[208020.098241]  1ffff1003aa85e4f ffff8801bd1fa70f 00000f8400000000 ffffed00277a6acc
[208020.098241] Call Trace:
[208020.098241]  [<ffffffff820fd3c0>] ? tcp_rtt_estimator+0x7f0/0x7f0
[208020.098241]  [<ffffffff821105b7>] tcp_ack+0x1487/0x2e60
[208020.098241]  [<ffffffff81216ec0>] ? debug_check_no_locks_freed+0x280/0x280
[208020.098241]  [<ffffffff8210f130>] ? tcp_fastretrans_alert+0x2f50/0x2f50
[208020.098241]  [<ffffffff81216ec0>] ? debug_check_no_locks_freed+0x280/0x280
[208020.098241]  [<ffffffff8211ac0c>] tcp_rcv_established+0x58c/0x2de0
[208020.098241]  [<ffffffff8200b932>] ? sk_filter_trim_cap+0x2a2/0x700
[208020.098241]  [<ffffffff8200b766>] ? sk_filter_trim_cap+0xd6/0x700
[208020.098241]  [<ffffffff8211a680>] ? tcp_data_queue+0x4ed0/0x4ed0
[208020.098241]  [<ffffffff82154480>] ? tcp_v4_rcv+0x1ee0/0x3420
[208020.098241]  [<ffffffff8214fdf9>] tcp_v4_do_rcv+0x529/0x8c0
[208020.098241]  [<ffffffff821546de>] tcp_v4_rcv+0x213e/0x3420
[208020.098241]  [<ffffffffa2cfb265>] ? tfw_classify_ipv4+0x5/0x190 [tempesta_fw]
[208020.098241]  [<ffffffff820a3a0b>] ip_local_deliver_finish+0x2cb/0x9b0
[208020.098241]  [<ffffffff820a386a>] ? ip_local_deliver_finish+0x12a/0x9b0
[208020.098241]  [<ffffffff820a484d>] ip_local_deliver+0x24d/0x330
[208020.098241]  [<ffffffff820a4600>] ? ip_call_ra_chain+0x510/0x510
[208020.098241]  [<ffffffff8207d2a6>] ? nf_hook_slow+0x186/0x2a0
[208020.098241]  [<ffffffff8207d2c5>] ? nf_hook_slow+0x1a5/0x2a0
[208020.098241]  [<ffffffff8207d125>] ? nf_hook_slow+0x5/0x2a0
[208020.098241]  [<ffffffff820a2469>] ip_rcv_finish+0x599/0x1870
[208020.098241]  [<ffffffff820a51cb>] ip_rcv+0x89b/0x11d0
[208020.098241]  [<ffffffff820a4930>] ? ip_local_deliver+0x330/0x330
[208020.098241]  [<ffffffff820a1ed0>] ? inet_del_offload+0x40/0x40
[208020.098241]  [<ffffffff820a4930>] ? ip_local_deliver+0x330/0x330
[208020.098241]  [<ffffffff81fbdea0>] __netif_receive_skb_core+0x1690/0x2bc0
[208020.098241]  [<ffffffff81216ec0>] ? debug_check_no_locks_freed+0x280/0x280
[208020.098241]  [<ffffffff81216ec0>] ? debug_check_no_locks_freed+0x280/0x280
[208020.098241]  [<ffffffff81fbc810>] ? net_tx_action+0x9d0/0x9d0
[208020.098241]  [<ffffffff810c4015>] ? default_send_IPI_single+0x75/0xa0
[208020.098241]  [<ffffffff810b6190>] ? native_send_call_func_single_ipi+0x60/0x80
[208020.098241]  [<ffffffff81fbf777>] ? process_backlog+0x217/0x650
[208020.098241]  [<ffffffff81fbf42a>] __netif_receive_skb+0x5a/0x190
[208020.098241]  [<ffffffff81fbf638>] process_backlog+0xd8/0x650
[208020.098241]  [<ffffffff81fbf777>] ? process_backlog+0x217/0x650
[208020.098241]  [<ffffffff819e5d95>] ? find_next_bit+0x15/0x20
[208020.098241]  [<ffffffff81fc46e5>] net_rx_action+0x655/0xde0
[208020.098241]  [<ffffffff81fc4090>] ? sk_busy_loop+0xb90/0xb90
[208020.098241]  [<ffffffff8121699d>] ? trace_hardirqs_on+0xd/0x10
[208020.098241]  [<ffffffff8118a88a>] ? finish_task_switch+0x1fa/0x5a0
[208020.098241]  [<ffffffff8118a85e>] ? finish_task_switch+0x1ce/0x5a0
[208020.098241]  [<ffffffff8112908f>] ? run_ksoftirqd+0x1f/0x60
[208020.098241]  [<ffffffff822e72ac>] __do_softirq+0x22c/0x99f
[208020.098241]  [<ffffffff81129070>] ? cpu_callback+0x850/0x850
[208020.098241]  [<ffffffff8112908f>] run_ksoftirqd+0x1f/0x60
[208020.098241]  [<ffffffff8117f144>] smpboot_thread_fn+0x3c4/0x840
[208020.098241]  [<ffffffff8117ed80>] ? smpboot_update_cpumask_percpu_thread+0x590/0x590
[208020.098241]  [<ffffffff822d5d0c>] ? schedule+0x9c/0x1c0
[208020.098241]  [<ffffffff8117502c>] ? __kthread_parkme+0xdc/0x200
[208020.098241]  [<ffffffff8117ed80>] ? smpboot_update_cpumask_percpu_thread+0x590/0x590
[208020.098241]  [<ffffffff8117ed80>] ? smpboot_update_cpumask_percpu_thread+0x590/0x590
[208020.098241]  [<ffffffff811757ee>] kthread+0x20e/0x2d0
[208020.098241]  [<ffffffff811755e0>] ? kthread_create_on_node+0x390/0x390
[208020.098241]  [<ffffffff822e45bf>] ret_from_fork+0x1f/0x40
[208020.098241]  [<ffffffff811755e0>] ? kthread_create_on_node+0x390/0x390
[208020.098241] Code: 80 3c 2a 00 0f 85 90 26 00 00 48 8d 78 08 4d 8b 7c 24 08 49 c7 04 24 00 00 00 00 49 c7 44 24 08 00 00 00 00 48 89 fa 48 c1 ea 03 <42> 80 3c 2a 00 0f 85 4c 26 00 00 4c 89 fa 4c 89 78 08 48 c1 ea 
[208020.098241] RIP  [<ffffffff820fd8b8>] tcp_clean_rtx_queue+0x4f8/0x2ce0
[208020.098241]  RSP <ffff8801d542f1d8>

The address reported in the initial use-after-free message from KASAN leads to this line of code: https://github.com/tempesta-tech/linux-4.8.15-tfw/blob/5b3aa7fe30ce46bb8f112a9544548da926cb3714/net/ipv4/tcp_input.c#L3026

# nm ~/linux-4.8.15-tfw/vmlinux | grep tcp_rearm_rto
ffffffff82113660 T tcp_rearm_rto
ffffffff820fc2f0 t tcp_rearm_rto.part.59
# addr2line -e ~/linux-4.8.15-tfw/vmlinux -i ffffffff820fc53a
/root/linux-4.8.15-tfw/net/ipv4/tcp_input.c:3026
@keshonok keshonok added this to the 0.5.0 Web Server milestone Mar 3, 2017
@keshonok keshonok self-assigned this Mar 3, 2017
@keshonok
Copy link
Contributor Author

keshonok commented Mar 6, 2017

Other relevant traces:

https://github.com/tempesta-tech/linux-4.8.15-tfw/blob/5b3aa7fe30ce46bb8f112a9544548da926cb3714/net/ipv4/tcp_input.c#L3126

[238403.195392] ==================================================================
[238403.196362] BUG: KASAN: use-after-free in tcp_clean_rtx_queue+0x2bbc/0x2ce0 at addr ffff8801ccec9a35
[238403.196362] Read of size 1 by task swapper/3/0
[238403.196362] page:ffffea000733b240 count:0 mapcount:-127 mapping:          (null) index:0x0
[238403.196362] flags: 0x1000000000000000()
[238403.196362] page dumped because: kasan: bad access detected
[238403.196362] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G    B      O    4.8.15-ab+ #2
[238403.196362] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[238403.196362]  dffffc0000000000 ffff8801f6f87358 ffffffff8199f263 ffff8801f6f873e8
[238403.196362]  ffff8801ccec9a35 ffff8801f6f873d8 ffffffff8151c33e ffff8801f6e97d00
[238403.196362]  000000000c000001 ffff8801f6f873a0 0000000000000296 0000000000000046
[238403.196362] Call Trace:
[238403.196362]  <IRQ>  [<ffffffff8199f263>] dump_stack+0x67/0x94
[238403.196362]  [<ffffffff8151c33e>] kasan_report_error+0x4ae/0x4e0
[238403.196362]  [<ffffffff8127aa0e>] ? mod_timer+0x56e/0xe30
[238403.196362]  [<ffffffff812714f9>] ? lock_timer_base+0xf9/0x190
[238403.196362]  [<ffffffff8151c3b3>] __asan_report_load1_noabort+0x43/0x50
[238403.196362]  [<ffffffff820fff7c>] ? tcp_clean_rtx_queue+0x2bbc/0x2ce0
[238403.196362]  [<ffffffff820fff7c>] tcp_clean_rtx_queue+0x2bbc/0x2ce0
[238403.196362]  [<ffffffff820fd3c0>] ? tcp_rtt_estimator+0x7f0/0x7f0
[238403.196362]  [<ffffffff821105b7>] tcp_ack+0x1487/0x2e60
[238403.196362]  [<ffffffff8210f130>] ? tcp_fastretrans_alert+0x2f50/0x2f50
[238403.196362]  [<ffffffff8125b8c7>] ? debug_lockdep_rcu_enabled+0x77/0x90
[238403.196362]  [<ffffffff81216ec0>] ? debug_check_no_locks_freed+0x280/0x280
[238403.196362]  [<ffffffff8211ac0c>] tcp_rcv_established+0x58c/0x2de0
[238403.196362]  [<ffffffff8200b932>] ? sk_filter_trim_cap+0x2a2/0x700
[238403.196362]  [<ffffffff8200b766>] ? sk_filter_trim_cap+0xd6/0x700
[238403.196362]  [<ffffffff8211a680>] ? tcp_data_queue+0x4ed0/0x4ed0
[238403.196362]  [<ffffffff82154480>] ? tcp_v4_rcv+0x1ee0/0x3420
[238403.196362]  [<ffffffff8214fdf9>] tcp_v4_do_rcv+0x529/0x8c0
[238403.196362]  [<ffffffff821546de>] tcp_v4_rcv+0x213e/0x3420
[238403.196362]  [<ffffffff820a3a0b>] ip_local_deliver_finish+0x2cb/0x9b0
[238403.196362]  [<ffffffff820a386a>] ? ip_local_deliver_finish+0x12a/0x9b0
[238403.196362]  [<ffffffff820a484d>] ip_local_deliver+0x24d/0x330
[238403.196362]  [<ffffffff820a4600>] ? ip_call_ra_chain+0x510/0x510
[238403.196362]  [<ffffffff8207d2a6>] ? nf_hook_slow+0x186/0x2a0
[238403.196362]  [<ffffffff8207d2c5>] ? nf_hook_slow+0x1a5/0x2a0
[238403.196362]  [<ffffffff8207d125>] ? nf_hook_slow+0x5/0x2a0
[238403.196362]  [<ffffffff820a2469>] ip_rcv_finish+0x599/0x1870
[238403.196362]  [<ffffffff812165ae>] ? trace_hardirqs_on_caller+0x19e/0x580
[238403.196362]  [<ffffffff820a51cb>] ip_rcv+0x89b/0x11d0
[238403.196362]  [<ffffffff820a4930>] ? ip_local_deliver+0x330/0x330
[238403.196362]  [<ffffffff820a1ed0>] ? inet_del_offload+0x40/0x40
[238403.196362]  [<ffffffff820a4930>] ? ip_local_deliver+0x330/0x330
[238403.196362]  [<ffffffff81fbdea0>] __netif_receive_skb_core+0x1690/0x2bc0
[238403.196362]  [<ffffffff81216ec0>] ? debug_check_no_locks_freed+0x280/0x280
[238403.196362]  [<ffffffff81fbc810>] ? net_tx_action+0x9d0/0x9d0
[238403.196362]  [<ffffffff8100301a>] ? trace_hardirqs_on_thunk+0x1a/0x1c
[238403.196362]  [<ffffffff8125b8c7>] ? debug_lockdep_rcu_enabled+0x77/0x90
[238403.196362]  [<ffffffff822e4d52>] ? retint_kernel+0x10/0x10
[238403.196362]  [<ffffffff81fbf777>] ? process_backlog+0x217/0x650
[238403.196362]  [<ffffffff81fbf42a>] __netif_receive_skb+0x5a/0x190
[238403.196362]  [<ffffffff81fbf638>] process_backlog+0xd8/0x650
[238403.196362]  [<ffffffff81fbf777>] ? process_backlog+0x217/0x650
[238403.196362]  [<ffffffff81fc46e5>] net_rx_action+0x655/0xde0
[238403.196362]  [<ffffffff81fc4090>] ? sk_busy_loop+0xb90/0xb90
[238403.196362]  [<ffffffff822e72ac>] __do_softirq+0x22c/0x99f
[238403.196362]  [<ffffffff8112977e>] ? __raise_softirq_irqoff+0x19e/0x220
[238403.196362]  [<ffffffff8112932c>] irq_exit+0x15c/0x190
[238403.196362]  [<ffffffff810b74c0>] smp_call_function_single_interrupt+0x70/0x90
[238403.196362]  [<ffffffff822e6249>] call_function_single_interrupt+0x89/0x90
[238403.196362]  <EOI>  [<ffffffff81073303>] ? default_idle+0x53/0x3b0
[238403.196362]  [<ffffffff81074a4f>] arch_cpu_idle+0xf/0x20
[238403.196362]  [<ffffffff811fb18d>] default_idle_call+0x4d/0x60
[238403.196362]  [<ffffffff811fb6a2>] cpu_startup_entry+0x502/0x710
[238403.196362]  [<ffffffff810ba038>] start_secondary+0x2b8/0x3f0
[238403.196362]  [<ffffffff810b9d80>] ? set_cpu_sibling_map+0x1bf0/0x1bf0
[238403.196362] Memory state around the buggy address:
[238403.196362]  ffff8801ccec9900: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[238403.196362]  ffff8801ccec9980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[238403.196362] >ffff8801ccec9a00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[238403.196362]                                      ^
[238403.196362]  ffff8801ccec9a80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[238403.196362]  ffff8801ccec9b00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[238403.196362] ==================================================================

https://github.com/tempesta-tech/linux-4.8.15-tfw/blob/5b3aa7fe30ce46bb8f112a9544548da926cb3714/net/ipv4/tcp_input.c#L3129
https://github.com/tempesta-tech/linux-4.8.15-tfw/blob/5b3aa7fe30ce46bb8f112a9544548da926cb3714/net/ipv4/tcp_input.c#L3090

[238403.196362] ==================================================================
[238403.196362] BUG: KASAN: use-after-free in tcp_clean_rtx_queue+0x2a29/0x2ce0 at addr ffff8801ccec9a37
[238403.196362] Read of size 1 by task swapper/3/0
[238403.196362] page:ffffea000733b240 count:0 mapcount:-127 mapping:          (null) index:0x0
[238403.196362] flags: 0x1000000000000000()
[238403.196362] page dumped because: kasan: bad access detected
[238403.196362] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G    B      O    4.8.15-ab+ #2
[238403.196362] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[238403.196362]  dffffc0000000000 ffff8801f6f87358 ffffffff8199f263 ffff8801f6f873e8
[238403.196362]  ffff8801ccec9a37 ffff8801f6f873d8 ffffffff8151c33e 0000000000000010
[238403.196362]  0000000000000000 ffffed00399d9346 0000000000000296 ffffffff00000046
[238403.196362] Call Trace:
[238403.196362]  <IRQ>  [<ffffffff8199f263>] dump_stack+0x67/0x94
[238403.196362]  [<ffffffff8151c33e>] kasan_report_error+0x4ae/0x4e0
[238403.196362]  [<ffffffff8151c3b3>] __asan_report_load1_noabort+0x43/0x50
[238403.196362]  [<ffffffff820ffde9>] ? tcp_clean_rtx_queue+0x2a29/0x2ce0
[238403.196362]  [<ffffffff820ffde9>] tcp_clean_rtx_queue+0x2a29/0x2ce0
[238403.196362]  [<ffffffff820fd3c0>] ? tcp_rtt_estimator+0x7f0/0x7f0
[238403.196362]  [<ffffffff821105b7>] tcp_ack+0x1487/0x2e60
[238403.196362]  [<ffffffff8210f130>] ? tcp_fastretrans_alert+0x2f50/0x2f50
[238403.196362]  [<ffffffff8125b8c7>] ? debug_lockdep_rcu_enabled+0x77/0x90
[238403.196362]  [<ffffffff81216ec0>] ? debug_check_no_locks_freed+0x280/0x280
[238403.196362]  [<ffffffff8211ac0c>] tcp_rcv_established+0x58c/0x2de0
[238403.196362]  [<ffffffff8200b932>] ? sk_filter_trim_cap+0x2a2/0x700
[238403.196362]  [<ffffffff8200b766>] ? sk_filter_trim_cap+0xd6/0x700
[238403.196362]  [<ffffffff8211a680>] ? tcp_data_queue+0x4ed0/0x4ed0
[238403.196362]  [<ffffffff82154480>] ? tcp_v4_rcv+0x1ee0/0x3420
[238403.196362]  [<ffffffff8214fdf9>] tcp_v4_do_rcv+0x529/0x8c0
[238403.196362]  [<ffffffff821546de>] tcp_v4_rcv+0x213e/0x3420
[238403.196362]  [<ffffffff820a3a0b>] ip_local_deliver_finish+0x2cb/0x9b0
[238403.196362]  [<ffffffff820a386a>] ? ip_local_deliver_finish+0x12a/0x9b0
[238403.196362]  [<ffffffff820a484d>] ip_local_deliver+0x24d/0x330
[238403.196362]  [<ffffffff820a4600>] ? ip_call_ra_chain+0x510/0x510
[238403.196362]  [<ffffffff8207d2a6>] ? nf_hook_slow+0x186/0x2a0
[238403.196362]  [<ffffffff8207d2c5>] ? nf_hook_slow+0x1a5/0x2a0
[238403.196362]  [<ffffffff8207d125>] ? nf_hook_slow+0x5/0x2a0
[238403.196362]  [<ffffffff820a2469>] ip_rcv_finish+0x599/0x1870
[238403.196362]  [<ffffffff812165ae>] ? trace_hardirqs_on_caller+0x19e/0x580
[238403.196362]  [<ffffffff820a51cb>] ip_rcv+0x89b/0x11d0
[238403.196362]  [<ffffffff820a4930>] ? ip_local_deliver+0x330/0x330
[238403.196362]  [<ffffffff820a1ed0>] ? inet_del_offload+0x40/0x40
[238403.196362]  [<ffffffff820a4930>] ? ip_local_deliver+0x330/0x330
[238403.196362]  [<ffffffff81fbdea0>] __netif_receive_skb_core+0x1690/0x2bc0
[238403.196362]  [<ffffffff81216ec0>] ? debug_check_no_locks_freed+0x280/0x280
[238403.196362]  [<ffffffff81fbc810>] ? net_tx_action+0x9d0/0x9d0
[238403.196362]  [<ffffffff8100301a>] ? trace_hardirqs_on_thunk+0x1a/0x1c
[238403.196362]  [<ffffffff8125b8c7>] ? debug_lockdep_rcu_enabled+0x77/0x90
[238403.196362]  [<ffffffff822e4d52>] ? retint_kernel+0x10/0x10
[238403.196362]  [<ffffffff81fbf777>] ? process_backlog+0x217/0x650
[238403.196362]  [<ffffffff81fbf42a>] __netif_receive_skb+0x5a/0x190
[238403.196362]  [<ffffffff81fbf638>] process_backlog+0xd8/0x650
[238403.196362]  [<ffffffff81fbf777>] ? process_backlog+0x217/0x650
[238403.196362]  [<ffffffff81fc46e5>] net_rx_action+0x655/0xde0
[238403.196362]  [<ffffffff81fc4090>] ? sk_busy_loop+0xb90/0xb90
[238403.196362]  [<ffffffff822e72ac>] __do_softirq+0x22c/0x99f
[238403.196362]  [<ffffffff8112977e>] ? __raise_softirq_irqoff+0x19e/0x220
[238403.196362]  [<ffffffff8112932c>] irq_exit+0x15c/0x190
[238403.196362]  [<ffffffff810b74c0>] smp_call_function_single_interrupt+0x70/0x90
[238403.196362]  [<ffffffff822e6249>] call_function_single_interrupt+0x89/0x90
[238403.196362]  <EOI>  [<ffffffff81073303>] ? default_idle+0x53/0x3b0
[238403.196362]  [<ffffffff81074a4f>] arch_cpu_idle+0xf/0x20
[238403.196362]  [<ffffffff811fb18d>] default_idle_call+0x4d/0x60
[238403.196362]  [<ffffffff811fb6a2>] cpu_startup_entry+0x502/0x710
[238403.196362]  [<ffffffff810ba038>] start_secondary+0x2b8/0x3f0
[238403.196362]  [<ffffffff810b9d80>] ? set_cpu_sibling_map+0x1bf0/0x1bf0
[238403.196362] Memory state around the buggy address:
[238403.196362]  ffff8801ccec9900: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[238403.196362]  ffff8801ccec9980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[238403.196362] >ffff8801ccec9a00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[238403.196362]                                      ^
[238403.196362]  ffff8801ccec9a80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[238403.196362]  ffff8801ccec9b00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[238403.196362] ==================================================================

This shows that not just skb is used after it was freed, but sk as well.
https://github.com/tempesta-tech/linux-4.8.15-tfw/blob/5b3aa7fe30ce46bb8f112a9544548da926cb3714/net/ipv4/tcp_recovery.c#L86
https://github.com/tempesta-tech/linux-4.8.15-tfw/blob/5b3aa7fe30ce46bb8f112a9544548da926cb3714/include/linux/skbuff.h#L558

[238403.196362] ==================================================================
[238403.196362] BUG: KASAN: use-after-free in tcp_rack_advance+0x2c3/0x3a0 at addr ffff8801ccec9a14
[238403.196362] Read of size 4 by task swapper/3/0
[238403.196362] page:ffffea000733b240 count:0 mapcount:-127 mapping:          (null) index:0x0
[238403.196362] flags: 0x1000000000000000()
[238403.196362] page dumped because: kasan: bad access detected
[238403.196362] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G    B      O    4.8.15-ab+ #2
[238403.196362] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[238403.196362]  ffff8801c3959fc8 ffff8801f6f87318 ffffffff8199f263 ffff8801f6f873a8
[238403.196362]  ffff8801ccec9a14 ffff8801f6f87398 ffffffff8151c33e 0000000000000018
[238403.196362]  ffff8801f6f87380 ffffed00399d9370 0000000000000296 ffffffff8151be77
[238403.196362] Call Trace:
[238403.196362]  <IRQ>  [<ffffffff8199f263>] dump_stack+0x67/0x94
[238403.196362]  [<ffffffff8151c33e>] kasan_report_error+0x4ae/0x4e0
[238403.196362]  [<ffffffff8151be77>] ? kasan_end_report+0x37/0x50
[238403.196362]  [<ffffffff8151c2bd>] ? kasan_report_error+0x42d/0x4e0
[238403.196362]  [<ffffffff8151c453>] __asan_report_load4_noabort+0x43/0x50
[238403.196362]  [<ffffffff82168683>] ? tcp_rack_advance+0x2c3/0x3a0
[238403.196362]  [<ffffffff82168683>] tcp_rack_advance+0x2c3/0x3a0
[238403.196362]  [<ffffffff820fe4fe>] tcp_clean_rtx_queue+0x113e/0x2ce0
[238403.196362]  [<ffffffff820fd3c0>] ? tcp_rtt_estimator+0x7f0/0x7f0
[238403.196362]  [<ffffffff821105b7>] tcp_ack+0x1487/0x2e60
[238403.196362]  [<ffffffff8210f130>] ? tcp_fastretrans_alert+0x2f50/0x2f50
[238403.196362]  [<ffffffff8125b8c7>] ? debug_lockdep_rcu_enabled+0x77/0x90
[238403.196362]  [<ffffffff81216ec0>] ? debug_check_no_locks_freed+0x280/0x280
[238403.196362]  [<ffffffff8211ac0c>] tcp_rcv_established+0x58c/0x2de0
[238403.196362]  [<ffffffff8200b932>] ? sk_filter_trim_cap+0x2a2/0x700
[238403.196362]  [<ffffffff8200b766>] ? sk_filter_trim_cap+0xd6/0x700
[238403.196362]  [<ffffffff8211a680>] ? tcp_data_queue+0x4ed0/0x4ed0
[238403.196362]  [<ffffffff82154480>] ? tcp_v4_rcv+0x1ee0/0x3420
[238403.196362]  [<ffffffff8214fdf9>] tcp_v4_do_rcv+0x529/0x8c0
[238403.196362]  [<ffffffff821546de>] tcp_v4_rcv+0x213e/0x3420
[238403.196362]  [<ffffffff820a3a0b>] ip_local_deliver_finish+0x2cb/0x9b0
[238403.196362]  [<ffffffff820a386a>] ? ip_local_deliver_finish+0x12a/0x9b0
[238403.196362]  [<ffffffff820a484d>] ip_local_deliver+0x24d/0x330
[238403.196362]  [<ffffffff820a4600>] ? ip_call_ra_chain+0x510/0x510
[238403.196362]  [<ffffffff8207d2a6>] ? nf_hook_slow+0x186/0x2a0
[238403.196362]  [<ffffffff8207d2c5>] ? nf_hook_slow+0x1a5/0x2a0
[238403.196362]  [<ffffffff8207d125>] ? nf_hook_slow+0x5/0x2a0
[238403.196362]  [<ffffffff820a2469>] ip_rcv_finish+0x599/0x1870
[238403.196362]  [<ffffffff812165ae>] ? trace_hardirqs_on_caller+0x19e/0x580
[238403.196362]  [<ffffffff820a51cb>] ip_rcv+0x89b/0x11d0
[238403.196362]  [<ffffffff820a4930>] ? ip_local_deliver+0x330/0x330
[238403.196362]  [<ffffffff820a1ed0>] ? inet_del_offload+0x40/0x40
[238403.196362]  [<ffffffff820a4930>] ? ip_local_deliver+0x330/0x330
[238403.196362]  [<ffffffff81fbdea0>] __netif_receive_skb_core+0x1690/0x2bc0
[238403.196362]  [<ffffffff81216ec0>] ? debug_check_no_locks_freed+0x280/0x280
[238403.196362]  [<ffffffff81fbc810>] ? net_tx_action+0x9d0/0x9d0
[238403.196362]  [<ffffffff8100301a>] ? trace_hardirqs_on_thunk+0x1a/0x1c
[238403.196362]  [<ffffffff8125b8c7>] ? debug_lockdep_rcu_enabled+0x77/0x90
[238403.196362]  [<ffffffff822e4d52>] ? retint_kernel+0x10/0x10
[238403.196362]  [<ffffffff81fbf777>] ? process_backlog+0x217/0x650
[238403.196362]  [<ffffffff81fbf42a>] __netif_receive_skb+0x5a/0x190
[238403.196362]  [<ffffffff81fbf638>] process_backlog+0xd8/0x650
[238403.196362]  [<ffffffff81fbf777>] ? process_backlog+0x217/0x650
[238403.196362]  [<ffffffff81fc46e5>] net_rx_action+0x655/0xde0
[238403.196362]  [<ffffffff81fc4090>] ? sk_busy_loop+0xb90/0xb90
[238403.196362]  [<ffffffff822e72ac>] __do_softirq+0x22c/0x99f
[238403.196362]  [<ffffffff8112977e>] ? __raise_softirq_irqoff+0x19e/0x220
[238403.196362]  [<ffffffff8112932c>] irq_exit+0x15c/0x190
[238403.196362]  [<ffffffff810b74c0>] smp_call_function_single_interrupt+0x70/0x90
[238403.196362]  [<ffffffff822e6249>] call_function_single_interrupt+0x89/0x90
[238403.196362]  <EOI>  [<ffffffff81073303>] ? default_idle+0x53/0x3b0
[238403.196362]  [<ffffffff81074a4f>] arch_cpu_idle+0xf/0x20
[238403.196362]  [<ffffffff811fb18d>] default_idle_call+0x4d/0x60
[238403.196362]  [<ffffffff811fb6a2>] cpu_startup_entry+0x502/0x710
[238403.196362]  [<ffffffff810ba038>] start_secondary+0x2b8/0x3f0
[238403.196362]  [<ffffffff810b9d80>] ? set_cpu_sibling_map+0x1bf0/0x1bf0
[238403.196362] Memory state around the buggy address:
[238403.196362]  ffff8801ccec9900: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[238403.196362]  ffff8801ccec9980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[238403.196362] >ffff8801ccec9a00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[238403.196362]                          ^
[238403.196362]  ffff8801ccec9a80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[238403.196362]  ffff8801ccec9b00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[238403.196362] ==================================================================

@keshonok
Copy link
Contributor Author

keshonok commented Mar 10, 2017

In attempt to close in on the cause of this bug, the previous version of Linux kernel for Tempesta was tested. The changes included:

  • Linux kernel 4.1.27 for Tempesta. The FPU patch from the kernel 4.8.15 for Tempesta was backported to this version of the kernel. The patch from 4.8.15 that removed the use of the accept queue was NOT backported due to difficulty of doing that in 4.1.27. The changes are in the branch ab-4.8.15-backport of the kernel 4.1.27.
  • Tempesta code base. Current master at c40924b was made compatible with the kernel 4.1.27 and with the patches to that kernel. The changes are in the branch ab-compat-4.1.27 of Tempesta. The changes include the manipulations with the accept queue the way it was before (plus a small bug fix).

These versions of Tempesta and the kernel were tested in the original setup for this issue. The tests did not produce a single warning from KASAN. The whole system worked stable for many hours.

This results of this research may be considered as a proof that the real issue is NOT in the latest massive changes to Tempesta code base, but rather it's in the patch to Tempesta kernel 4.8.15. It appears that there's something wrong with memory management in the patch. The traces above lead to believe that kernel memory may be taken away at any random time right from under Tempesta or the kernel itself.

This issue is related to #691.

@keshonok
Copy link
Contributor Author

After the latest kernel patch https://github.com/tempesta-tech/linux-4.8.15-tfw/commit/0bf3cf888a6bc61da0919c5c989bec938bda6e4b this bug still occurs. By observations of overall behaviour, there are a few small differences in my VM:

  • The issue occurs less often than before.
  • The issue occurs in "bursts". During a 10 min test there's just one burst of use-after-free warnings from KASAN, obviously occurring on all CPUs at the same time. The kernel stays put and doesn't crash - yet.
  • The kernel crash occurs eventually, but NOT as before on every test after these use-after-free warnings from KASAN.

This issue and the issue mentioned in #692 (comment) appear to be duplicates.

test trace.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants