Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel panic when Tempesta starts with an error in the config file #418

Closed
sergsever opened this issue Feb 9, 2016 · 4 comments
Closed
Assignees
Labels
Milestone

Comments

@sergsever
Copy link
Contributor

There's an error in tempesta_fw.conf file - the listen option is listed twice. When Tempesta starts we get kernel panic:

kernel BUG at /usr/src/projects/tempesta/tempesta_fw/sched/tfw_sched_rr.c:113!
invalid opcode: 0000 [#1] SMP 
Modules linked in: tfw_sched_rr(O) tfw_sched_http(O) tfw_sched_hash(O) tempesta_fw(O) tempesta_db(O) ppdev serio_raw pcspkr i2c_piix4 microcode parport_pc parport i2c_core acpi_cpufreq ipv6 autofs4 ata_generic pata_acpi crc32c_intel e1000 floppy ata_piix
CPU: 0 PID: 716 Comm: apache2 Tainted: G           O    4.1.12 #2
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
task: ffff88003b09d570 ti: ffff88003b0a0000 task.ti: ffff88003b0a0000
RIP: 0010:[<ffffffffa01fc14c>]  [<ffffffffa01fc14c>] tfw_sched_rr_update_data+0xbc/0xc0 [tfw_sched_rr]
RSP: 0018:ffff88003fc03a28  EFLAGS: 00010246
RAX: ffffffffa01fc090 RBX: ffff88003b1bcb98 RCX: 0000000000000000
RDX: ffff88003fc11038 RSI: ffff88003fc0d758 RDI: ffff88003b11ea40
RBP: ffff88003fc03a28 R08: 0000000000000000 R09: 6e6f632020205d61
R10: 0000000000002698 R11: 3231203a74736f6c R12: ffff880039473800
R13: ffffffffa01ca4fc R14: ffff880037c35f50 R15: ffff88003b1b40f8
FS:  00007f00bf7f6700(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f00ccb849f0 CR3: 000000003a6d7000 CR4: 00000000000006f0
Stack:
 ffff88003fc03a38 ffffffffa01b781d ffff88003fc03aa8 ffffffffa01bacee
 ffff880039473898 2e302e3732310246 0000000000312e30 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
 <IRQ> 
 [<ffffffffa01b781d>] tfw_sg_update+0x1d/0x20 [tempesta_fw]
 [<ffffffffa01bacee>] tfw_sock_srv_do_failover+0x7e/0xf0 [tempesta_fw]
 [<ffffffffa01bad95>] tfw_sock_srv_connect_failover+0x15/0x20 [tempesta_fw]
 [<ffffffffa01b8873>] ss_droplink+0x33/0x50 [tempesta_fw]
 [<ffffffffa01b8dfd>] ss_tcp_state_change+0xdd/0x370 [tempesta_fw]
 [<ffffffff8151b5f9>] tcp_fin+0x179/0x1f0
 [<ffffffff8151d7a8>] tcp_data_queue+0x728/0xd30
 [<ffffffff81520b11>] tcp_rcv_established+0x1e1/0x730
 [<ffffffff8152ba53>] tcp_v4_do_rcv+0x103/0x3e0
 [<ffffffff812908f6>] ? security_sock_rcv_skb+0x16/0x20
 [<ffffffff8152d0d4>] tcp_v4_rcv+0x854/0xa30
 [<ffffffff812afb4f>] ? blk_peek_request+0x4f/0x290
 [<ffffffff81505bfa>] ip_local_deliver_finish+0xba/0x240
 [<ffffffff81505f6a>] ip_local_deliver+0x9a/0xb0
 [<ffffffff812aa845>] ? blk_run_queue+0x35/0x40
 [<ffffffff81404cb0>] ? scsi_run_queue+0x230/0x2d0
 [<ffffffff81505858>] ip_rcv_finish+0x88/0x370
 [<ffffffff81506262>] ip_rcv+0x2e2/0x3c0
 [<ffffffff81404dfe>] ? scsi_end_request+0xae/0x1d0
 [<ffffffff814c7a13>] __netif_receive_skb_core+0x6e3/0xa20
 [<ffffffff814c7d6f>] __netif_receive_skb+0x1f/0x80
 [<ffffffff814c8b4c>] process_backlog+0x9c/0x130
 [<ffffffff814c82ea>] net_rx_action+0x1ea/0x320
 [<ffffffff8105e5c7>] __do_softirq+0xa7/0x290
 [<ffffffff81005ecc>] __tempesta_do_softirq_fpusafe+0x1c/0x50
 [<ffffffff8159c0cc>] do_softirq_own_stack+0x1c/0x30
 <EOI> 
 [<ffffffff8105db8d>] do_softirq.part.20+0x2d/0x30
 [<ffffffff8105dc26>] __local_bh_enable_ip+0x96/0xa0
 [<ffffffff8150a6db>] ip_finish_output+0x1eb/0x860
 [<ffffffff8150c71b>] ip_output+0xab/0xc0
 [<ffffffff811480e3>] ? __alloc_pages_nodemask+0x193/0xa10
 [<ffffffff8150bdf1>] ip_local_out_sk+0x31/0x40
 [<ffffffff8150c192>] ip_queue_xmit+0x152/0x3e0
 [<ffffffff81524306>] tcp_transmit_skb+0x4d6/0x9c0
 [<ffffffff815249e8>] tcp_write_xmit+0x1f8/0xe90
 [<ffffffff81528d23>] ? tcp_v4_md5_lookup+0x13/0x20
 [<ffffffff815258b2>] __tcp_push_pending_frames+0x32/0xd0
 [<ffffffff8152734d>] tcp_send_fin+0x6d/0x1a0
 [<ffffffff81514d08>] tcp_shutdown+0x58/0x60
 [<ffffffff8154151b>] inet_shutdown+0x8b/0x120
 [<ffffffff814ad23d>] SyS_shutdown+0x7d/0x90
 [<ffffffff8159a857>] system_call_fastpath+0x12/0x6a
Code: c7 01 48 c1 e2 04 48 c1 e0 08 48 8d 44 02 10 49 89 74 00 08 48 8b 41 38 49 39 c2 48 8d 48 c8 0f 85 7a ff ff ff 49 89 78 08 5d c3 <0f> 0b 66 90 66 66 66 66 90 48 89 f8 48 8b 7f 30 48 85 ff 74 27 
RIP  [<ffffffffa01fc14c>] tfw_sched_rr_update_data+0xbc/0xc0 [tfw_sched_rr]
 RSP <ffff88003fc03a28>
@keshonok keshonok changed the title Bug - kernel panic whith an error in the config Kernel panic when Tempesta starts with an error in the config file Feb 9, 2016
@keshonok keshonok added the bug label Feb 9, 2016
@keshonok keshonok added this to the 0.5.0 Web Server milestone Feb 9, 2016
@keshonok
Copy link
Contributor

To reproduce the issue, the following configuration file is causing the kernel crash with the above trace.

cache 0;
listen 80;
server 127.0.0.1:8080;
listen 80;

Tempesta gives the following output in console when it starts, but then the kernel crashes with the above trace.

# ./tempesta.sh --start
Loading Tempesta kernel modules...
Starting Tempesta...
...setup interface eth0
...setup interface lo
sysctl: setting key "net.tempesta.state": Address already in use
net.tempesta.state = start
done

@keshonok
Copy link
Contributor

This issue depends on #254.

The primary reason for this bug appearance is that ss_close() has changed from synchronous to asynchronous execution.

When Tempesta stops it executes tfw_sock_srv_disconnect() for each server connection, which calls ss_close(). Synchronous ss_close() does just that, closes the connection. Therefore, additional code in tfw_sock_srv_disconnect() takes care of releasing all other resources linked with the connection. In essence, it does the same job as tfw_sock_srv_do_failover() and tfw_srv_conn_release() with the exception of starting a reconnect. However, asynchronous ss_close() performs complete connection closing, that includes the release of all resources linked with the connection.

When tfw_sock_srv_disconnect() is executed at the time Tempesta stops it calls 'ss_close()', then proceeds to execute the additional code that releases connection resources. Then Tempesta proceeds to release other resources such as server groups, schedulers, etc. It happens so that all of that is executed before the asynchronous ss_close() has a chance to run. By the time it runs lots of resources have been released.

Also, a different scenario may occur. A connect to a back end server is initiated when Tempesta starts. If there's an error in the configuration file then Tempesta stops and releases resources, as described above. But at the same time connect completes, and tfw_sock_srv_connect_complete() is called from ss_tcp_state_change(). Then tfw_sg_update() is called as in the trace above, and we hit the same BUG_ON() at tempesta_fw/sched/tfw_sched_rr.c:113 because the specific resource has been released already and Tempesta stops. Please see the trace below.

[11191.861989] [tempesta] ERROR: failed to start modules
[11192.826445] [sync_sockets]   ss_tcp_state_change: cpu=1 sk=ffff8800954f4100 state=Established
[11192.827098] ------------[ cut here ]------------
[11192.827098] kernel BUG at /home/natsys-lab/tfw-run/tempesta/tempesta_fw/sched/tfw_sched_rr.c:113!
[11192.827098] invalid opcode: 0000 [#1] SMP
[11192.827098] Modules linked in: tfw_sched_rr(O) tfw_sched_http(O) tfw_sched_hash(O) tempesta_fw(O) tempesta_db(O) 9p ppdev microcode i2c_piix4 i2c_core pcspkr 9pnet_virtio 9pnet serio_raw virtio_balloon parport_pc parport acpi_cpufreq uinput xfs libcrc32c ata_generic pata_acpi virtio_net virtio_pci virtio_ring virtio ata_piix floppy dm_mirror dm_region_hash dm_log dm_mod ipv6 autofs4
[11192.827098] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G           O    4.1.12-ab+ #15
[11192.827098] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[11192.827098] task: ffff88011a37a290 ti: ffff88011a37c000 task.ti: ffff88011a37c000
[11192.827098] RIP: 0010:[<ffffffffa03d0113>]  [<ffffffffa03d0113>] tfw_sched_rr_update_data+0xa3/0xb0 [tfw_sched_rr]
[11192.827098] RSP: 0018:ffff88013f503a38  EFLAGS: 00010246
[11192.827098] RAX: ffffffffa03d0070 RBX: ffff880095a0ede0 RCX: 0000000000000003
[11192.827098] RDX: ffff88013f517580 RSI: ffffffff81c41880 RDI: ffff8800bae55080
[11192.827098] RBP: ffff88013f503a38 R08: 0000000000000001 R09: 0000000000000001
[11192.827098] R10: 0000000000000000 R11: 0000000000000430 R12: 0000000000000000
[11192.827098] R13: ffff8800021ea088 R14: 0000000000000218 R15: ffff8800954f4100
[11192.827098] FS:  0000000000000000(0000) GS:ffff88013f500000(0000) knlGS:0000000000000000
[11192.827098] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11192.827098] CR2: 00007f117caf4620 CR3: 00000000baa73000 CR4: 00000000000006e0
[11192.827098] Stack:
[11192.827098]  ffff88013f503a48 ffffffffa038a278 ffff88013f503ab8 ffffffffa038cd3d
[11192.827098]  ffff88013f503a88 ffffffff810b1764 ffff880096e62d00 ffff88011a37a290
[11192.827098]  0000000000000246 ffff88010c394ba8 ffff88013f503aa8 0000000000000246
[11192.827098] Call Trace:
[11192.827098]  <IRQ>
[11192.827098]  [<ffffffffa038a278>] tfw_sg_update+0x18/0x20 [tempesta_fw]
[11192.827098]  [<ffffffffa038cd3d>] tfw_sock_srv_connect_complete+0x5d/0x1c0 [tempesta_fw]
[11192.827098]  [<ffffffff810b1764>] ? __lock_is_held+0x54/0x80
[11192.827098]  [<ffffffffa038bd3e>] ss_tcp_state_change+0x10e/0x370 [tempesta_fw]
[11192.827098]  [<ffffffff8156b6b1>] tcp_finish_connect+0xc1/0x120
[11192.827098]  [<ffffffff8156bd27>] tcp_rcv_state_process+0x617/0xf30
[11192.827098]  [<ffffffff81577b3e>] ? tcp_v4_rcv+0xade/0xc20
[11192.827098]  [<ffffffff815764f1>] tcp_v4_do_rcv+0x171/0x420
[11192.827098]  [<ffffffff81577b65>] tcp_v4_rcv+0xb05/0xc20
[11192.827098]  [<ffffffff8154e60f>] ip_local_deliver_finish+0x13f/0x3f0
[11192.827098]  [<ffffffff8154e517>] ? ip_local_deliver_finish+0x47/0x3f0
[11192.827098]  [<ffffffff8154eb3a>] ip_local_deliver+0x9a/0xb0
[11192.827098]  [<ffffffff810b3cd9>] ? __lock_acquire+0x4c9/0x1e70
[11192.827098]  [<ffffffff8154e080>] ip_rcv_finish+0x160/0x5b0
[11192.827098]  [<ffffffff8154ee3f>] ip_rcv+0x2ef/0x410
[11192.827098]  [<ffffffff815051a8>] __netif_receive_skb_core+0x328/0xcd0
[11192.827098]  [<ffffffff81505b68>] __netif_receive_skb+0x18/0x60
[11192.827098]  [<ffffffff81505c26>] process_backlog+0x76/0x240
[11192.827098]  [<ffffffff81505c8b>] ? process_backlog+0xdb/0x240
[11192.827098]  [<ffffffff815077b0>] ? net_rx_action+0x90/0x4e0
[11192.827098]  [<ffffffff8150786b>] net_rx_action+0x14b/0x4e0
[11192.827098]  [<ffffffff8106278e>] __do_softirq+0xde/0x5d0
[11192.827098]  [<ffffffff81062f35>] irq_exit+0xa5/0xb0
[11192.827098]  [<ffffffff8103a0a5>] smp_apic_timer_interrupt+0x45/0x60
[11192.827098]  [<ffffffff815fd6a0>] apic_timer_interrupt+0x70/0x80
[11192.827098]  <EOI>
[11192.827098]  [<ffffffff810828f0>] ? __atomic_notifier_call_chain+0x120/0x120
[11192.827098]  [<ffffffff8100d5c0>] ? default_idle+0x20/0x200
[11192.827098]  [<ffffffff8100d5be>] ? default_idle+0x1e/0x200
[11192.827098]  [<ffffffff8100e3aa>] arch_cpu_idle+0xa/0x10
[11192.827098]  [<ffffffff810a946b>] cpu_startup_entry+0x25b/0x4f0
[11192.827098]  [<ffffffff810e9dfc>] ? clockevents_register_device+0xec/0x1c0
[11192.827098]  [<ffffffff81037dc3>] start_secondary+0x143/0x170
[11192.827098] Code: 8b 50 18 48 83 c6 01 48 39 d1 48 8d 42 e8 75 d5 48 89 77 08 49 83 c0 01 48 8b 41 68 49 39 c1 48 8d 48 98 75 8b 4d 89 42 08 5d c3 <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 89 fb
[11192.827098] RIP  [<ffffffffa03d0113>] tfw_sched_rr_update_data+0xa3/0xb0 [tfw_sched_rr]
[11192.827098]  RSP <ffff88013f503a38>

@krizhanovsky
Copy link
Contributor

The crash is fixed in #676 . However, it appears that #65 isn't fixed at all. The sample configuration file

    # cat etc/tempesta_fw.conf 
    cache 0;
    listen 80;
    server 127.0.0.1:8080;
    listen 80;

Produces wrong output (note that the actual error is at 4th line):

    [13936.069834] [tempesta] ERROR: configuration handler returned error: -22
    [13936.071569] [tempesta] ERROR: configuration parsing error: str:1;w:(null)
    [13936.073360] [tempesta] ERROR: can't parse configuration data
    [13936.075681] [tempesta] ERROR: failed to start modules

So I reopen #65.

@krizhanovsky
Copy link
Contributor

Fixed in 2027734

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants