Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client connections are not closed when Tempesta is stopped #116

Closed
keshonok opened this issue Jun 16, 2015 · 3 comments
Closed

Client connections are not closed when Tempesta is stopped #116

keshonok opened this issue Jun 16, 2015 · 3 comments
Assignees
Milestone

Comments

@keshonok
Copy link
Contributor

Loaded Tempesta has a handful of listening sockets open, a handful of connections to backend servers, and a potentially huge number of client connections at any given time. When Tempesta is stopped, listening sockets, as well as connections to backend servers are closed. However, client connections are not closed at that time. Sockets will eventually be closed by the kernel, but not closing connections by Tempesta leads to memory leak as Tempesta internal data linked to each socket is not released.

The issue is somewhat complicated by the fact that currently Tempesta doesn't keep track of client connections. A solution need to be developed that takes the minimal amount of resources, both CPU and memory.

@krizhanovsky krizhanovsky added this to the 0.5.0 SSL, Stable milestone Jun 19, 2015
@krizhanovsky krizhanovsky modified the milestones: 0.4.0 Web Server, 0.5.0 SSL & TDB Jun 26, 2015
@krizhanovsky
Copy link
Contributor

Linked with #100, so it's relatively heavy task and we're not it time to deliver it for 0.4. As a temporal workaround (dirty and nasty) - just reboot the system if you need to stop Tempesta.

@krizhanovsky krizhanovsky modified the milestones: 0.5.0 SSL & TDB, 0.4.0 Web Server Jun 26, 2015
@i-rinat
Copy link
Contributor

i-rinat commented Aug 3, 2015

I've seen

[  113.711530] kmem_cache_destroy tfw_cli_conn_cache: Slab cache still has objects
[  113.712777] CPU: 0 PID: 820 Comm: rmmod Tainted: G           O 3.10.10-syncsock+ #4
[  113.714094] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
[  113.715494]  ffffffff814afc55 ffffffff811353e7 0000000000000800 ffffffffa049e302
[  113.716687]  ffff88007b3d3ef8 ffffffffa0499cce 0000000700000800 ffffffffa04b2440
[  113.717755]  ffffffff810b1f2e 000000007bcc4cd8 ffffffffa04b2440 0000000000000800
[  113.718828] Call Trace:
[  113.719172]  [<ffffffff814afc55>] ? dump_stack+0xc/0x15
[  113.719891]  [<ffffffff811353e7>] ? kmem_cache_destroy+0xe7/0xf0
[  113.720744]  [<ffffffffa049e302>] ? tfw_sock_clnt_exit+0x18/0x1a [tempesta_fw]
[  113.721792]  [<ffffffffa0499cce>] ? tfw_exit+0x3c/0x48 [tempesta_fw]
[  113.722647]  [<ffffffff810b1f2e>] ? SyS_delete_module+0x16e/0x2e0
[  113.723483]  [<ffffffff814b4f88>] ? async_page_fault+0x28/0x30
[  113.724293]  [<ffffffff814bd05d>] ? system_call_fastpath+0x1a/0x1f

warning after I start Tempesta, make a request GET / HTTP/1.0\r\n\r\n while keeping connection, and then stop Tempesta.

If then client closes connection, server's kernel panics:

[  223.572785] BUG: unable to handle kernel paging request at ffffffffa049cb60
[  223.574477] IP: [<ffffffffa049cb60>] 0xffffffffa049cb5f
[  223.575745] PGD 1811067 PUD 1812063 PMD 797c5067 PTE 0
[  223.576613] Oops: 0010 [#1] SMP 
[  223.576613] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs lockd dns_resolver fscache sunrpc crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hda_intel aesni_intel snd_hda_codec aes_x86_64 lrw gf128mul snd_hwdep glue_helper qxl ttm drm_kms_helper ablk_helper evdev joydev psmouse drm cryptd snd_pcm pcspkr snd_page_alloc serio_raw processor snd_timer virtio_balloon snd soundcore virtio_console i2c_piix4 thermal_sys i2c_core button autofs4 hid_generic usbhid hid ext4 crc16 mbcache jbd2 sg sr_mod cdrom ata_generic virtio_blk uhci_hcd ehci_pci ehci_hcd ata_piix usbcore usb_common libata virtio_pci e1000 virtio_ring virtio scsi_mod floppy [last unloaded: tempesta_db]
[  223.576613] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O 3.10.10-syncsock+ #4
[  223.576613] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
[  223.576613] task: ffffffff818154c0 ti: ffffffff81800000 task.ti: ffffffff81800000
[  223.576613] RIP: 0010:[<ffffffffa049cb60>]  [<ffffffffa049cb60>] 0xffffffffa049cb5f
[  223.576613] RSP: 0018:ffff88007fc03b30  EFLAGS: 00010246
[  223.576613] RAX: 0000000000000302 RBX: ffff88007b3d1100 RCX: 0000000000000000
[  223.576613] RDX: 0000000000000003 RSI: 0000000000000008 RDI: ffff88007b3d1100
[  223.576613] RBP: ffff88007b3d1708 R08: 0000000000000000 R09: 0000000000000000
[  223.576613] R10: ffff880078f8cd00 R11: 0000000000000002 R12: ffff88007a9d0f62
[  223.576613] R13: ffff88007a9d0f00 R14: 0000000000000000 R15: ffff88007a9d0f5a
[  223.576613] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[  223.576613] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  223.576613] CR2: ffffffffa049cb60 CR3: 00000000794e8000 CR4: 00000000000406f0
[  223.576613] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  223.576613] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  223.576613] Stack:
[  223.576613]  ffffffff81407379 ffff88007b3d1100 ffff880078f8cd00 ffffffff8140a5c8
[  223.576613]  fffffffff7b34805 0000000000000000 0000000000000000 ffff88007b3d1100
[  223.576613]  ffff88007a9d0f62 ffff880078f8cd00 ffff88007a9d0f4e 0000000000000000
[  223.576613] Call Trace:
[  223.576613]  <IRQ> 
[  223.576613]  [<ffffffff81407379>] ? tcp_fin+0x159/0x1d0
[  223.576613]  [<ffffffff8140a5c8>] ? tcp_data_queue+0x6e8/0xc80
[  223.576613]  [<ffffffff8140d458>] ? tcp_rcv_established+0x1a8/0x840
[  223.576613]  [<ffffffff8141860e>] ? tcp_v4_do_rcv+0x1ae/0x500
[  223.576613]  [<ffffffff8108e9e5>] ? update_group_power+0x135/0x220
[  223.576613]  [<ffffffff814198ef>] ? tcp_v4_rcv+0x6df/0x7d0
[  223.576613]  [<ffffffff813f3aae>] ? ip_local_deliver_finish+0xbe/0x1f0
[  223.576613]  [<ffffffff813bfc5a>] ? __netif_receive_skb_core+0x48a/0x8a0
[  223.576613]  [<ffffffff813c00ff>] ? netif_receive_skb+0x1f/0x90
[  223.576613]  [<ffffffff813c0b38>] ? napi_gro_receive+0x68/0x90
[  223.576613]  [<ffffffffa006a573>] ? e1000_clean_rx_irq+0x193/0x5a0 [e1000]
[  223.576613]  [<ffffffffa006cce4>] ? e1000_clean+0x2b4/0x9a0 [e1000]
[  223.576613]  [<ffffffff813c0432>] ? net_rx_action+0x132/0x250
[  223.576613]  [<ffffffff8105c4b4>] ? __do_softirq+0x114/0x270
[  223.576613]  [<ffffffff8105c775>] ? irq_exit+0xa5/0xb0
[  223.576613]  [<ffffffff814bee4e>] ? do_IRQ+0x4e/0xb0
[  223.576613]  [<ffffffff814b4c6d>] ? common_interrupt+0x6d/0x6d
[  223.576613]  <EOI> 
[  223.576613]  [<ffffffff810195f0>] ? idle_notifier_unregister+0x20/0x20
[  223.576613]  [<ffffffff8103dfe2>] ? native_safe_halt+0x2/0x10
[  223.576613]  [<ffffffff81019609>] ? default_idle+0x19/0xb0
[  223.576613]  [<ffffffff8109e862>] ? cpu_startup_entry+0x102/0x290
[  223.576613]  [<ffffffff818fcd7e>] ? start_kernel+0x42a/0x432
[  223.576613]  [<ffffffff818fc120>] ? early_idt_handlers+0x120/0x120
[  223.576613]  [<ffffffff818fc59d>] ? x86_64_start_kernel+0xf2/0xff
[  223.576613] Code:  Bad RIP value.
[  223.576613] RIP  [<ffffffffa049cb60>] 0xffffffffa049cb5f
[  223.576613]  RSP <ffff88007fc03b30>
[  223.576613] CR2: ffffffffa049cb60
[  223.576613] ---[ end trace d2a9240abc2c522d ]---
[  223.576613] Kernel panic - not syncing: Fatal exception in interrupt
[  223.576613] drm_kms_helper: panic occurred, switching back to text console

@krizhanovsky
Copy link
Contributor

The fix is also required to finish #228 (comment) : we have to close all active client connections in TfwCfgMod->stop() callback (see f3fc64d#diff-0de6ca483cd4f24745525383f12d374eR335) to guarantee that there is no server socket users when sock_srv runs its stop() callback.

Since the issue is also linked with #100, it must be possible to close client connection concurrently with other operations on it. Also server socket can be in process of transferring huge data in response to some client request. So a generic synchronized connection (TfwConnection as well as struct sock) closing must be implemented to satisfy all the requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants