Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on creating multiple server groups #698

Closed
vankoven opened this issue Mar 13, 2017 · 2 comments
Closed

Crash on creating multiple server groups #698

vankoven opened this issue Mar 13, 2017 · 2 comments
Assignees
Milestone

Comments

@vankoven
Copy link
Contributor

Seems, that merged #660 has bugs in schedulers configuration.

First, setting multiple server groups causes kernel crash.
Config:

srv_group uri_p {
server 10.0.10.1:8000;
}
srv_group uri_s {
server 10.0.10.1:8001;
}
srv_group host_p {
server 10.0.10.1:8002;
}
srv_group host_s {
server 10.0.10.1:8003;
}
srv_group host_e {
server 10.0.10.1:8004;
}
srv_group hdr_c_e {
server 10.0.10.1:8005;
}
srv_group hdr_h_p {
server 10.0.10.1:8006;
}
srv_group hdr_h_s {
server 10.0.10.1:8007;
}
srv_group hdr_h_e {
server 10.0.10.1:8008;
}
srv_group hdr_r_p {
server 10.0.10.1:8009;
}
sched round-robin;
server 10.0.10.1:8010;
cache 0;

sched_http_rules {
  match uri_p    uri       prefix  "/static";
  match uri_s    uri       suffix  ".php";
  match host_p   host      prefix  "static.";
  match host_s   host      suffix  "tempesta-tech.com";
  match host_e   host      eq      "foo.example.com";
  match hdr_c_e  hdr_conn  eq      "keep-alive";
  match hdr_h_p  hdr_host  prefix  "bar.";
  match hdr_h_s  hdr_host  suffix  "natsys-lab.com";
  match hdr_h_e  hdr_host  eq      "bar.natsys-lab.com";
  match hdr_r_p  hdr_raw   prefix  "X-Custom-Bar-Hdr: ";
}

Crash:

[ 1611.028138] [tempesta] Initializing Tempesta FW kernel module...
[ 1611.032196] [tempesta] Registering new scheduler: hash
[ 1611.034688] [tempesta] Registering new scheduler: http
[ 1611.037787] [tempesta] Registering new scheduler: round-robin
[ 1611.041819] [tempesta] Starting all modules...
[ 1611.042525] ------------[ cut here ]------------
[ 1611.043185] kernel BUG at /home/user/tempesta/tempesta_fw/sock_srv.c:910!
[ 1611.044075] invalid opcode: 0000 [#1] PREEMPT SMP
[ 1611.044692] Modules linked in: tfw_sched_rr(O) tfw_sched_http(O) tfw_sched_hash(O) tempesta_fw(O) tempesta_db(O) tempesta_tls(O) mousedev crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_generic bochs_drm aesni_intel aes_x86_64 snd_hda_intel lrw snd_hda_codec gf128mul glue_helper ttm ablk_helper ppdev iTCO_wdt cryptd iTCO_vendor_support drm_kms_helper snd_hda_core evdev snd_hwdep psmouse input_leds led_class mac_hid snd_pcm snd_timer pcspkr snd soundcore parport_pc shpchp parport drm acpi_cpufreq syscopyarea tpm_tis sysfillrect i2c_i801 sysimgblt tpm_tis_core fb_sys_fops intel_agp i2c_smbus lpc_ich intel_gtt tpm qemu_fw_cfg button sch_fq_codel ip_tables x_tables ext4 crc16 jbd2 fscrypto mbcache sr_mod cdrom serio_raw virtio_console virtio_balloon virtio_blk atkbd libps2 virtio_net uhci_hcd i8042 serio virtio_pci virtio_ring virtio ahci libahci libata scsi_mod ehci_pci ehci_hcd usbcore usb_common [last unloaded: tempesta_tls]
[ 1611.063414] CPU: 0 PID: 2116 Comm: sysctl Tainted: G           O    4.8.15-3-tempesta #3
[ 1611.065257] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
[ 1611.067514] task: ffff880056de2ac0 task.stack: ffff880056198000
[ 1611.068856] RIP: 0010:[<ffffffffa052f00e>]  [<ffffffffa052f00e>] tfw_cfgop_finish_srv_group+0x1de/0x210 [tempesta_fw]
[ 1611.071390] RSP: 0018:ffff88005619b9d8  EFLAGS: 00010246
[ 1611.072593] RAX: ffff88005b45a6b8 RBX: 0000000000000000 RCX: 0000000000000000
[ 1611.074214] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa0543c28
[ 1611.075828] RBP: ffff88005619ba58 R08: 0000000000000000 R09: ffffffffa050f30a
[ 1611.077235] R10: ffffea00015b4780 R11: 0000000000000001 R12: ffff88005582ec60
[ 1611.078217] R13: ffffffffa0543cc0 R14: ffffffffa0543cc0 R15: ffffffffa0543ca0
[ 1611.079821] FS:  00007f9ecc3ff440(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[ 1611.081667] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1611.082978] CR2: 00007f9ecbe90da0 CR3: 0000000056ce1000 CR4: 00000000003406f0
[ 1611.084610] Stack:
[ 1611.085077]  ffff88005619bb48 ffffffffa0543cc0 ffffffffa0543cc0 ffffffffa0543ca0
[ 1611.086812]  ffff88005619ba28 ffffffffa052fee9 ffff88005619ba38 ffff88005619bb48
[ 1611.088559]  0000000000000000 00000000b3c7abbf ffff88005619ba58 0000000000000000
[ 1611.090293] Call Trace:
[ 1611.090867]  [<ffffffffa052fee9>] ? tfw_cfgop_in_server+0x39/0x90 [tempesta_fw]
[ 1611.092524]  [<ffffffffa050fb51>] tfw_cfg_handle_children+0x1f1/0x240 [tempesta_fw]
[ 1611.094272]  [<ffffffffa050f216>] spec_handle_entry+0x26/0xa0 [tempesta_fw]
[ 1611.095855]  [<ffffffffa050fcf6>] tfw_cfg_parse_mods_cfg+0x156/0x210 [tempesta_fw]
[ 1611.097581]  [<ffffffff81411200>] ? wait_for_xmitr+0xa0/0xa0
[ 1611.098868]  [<ffffffffa050e469>] ? read_next_token+0x129/0x350 [tempesta_fw]
[ 1611.101134]  [<ffffffffa051018b>] tfw_cfg_start+0x5b/0x1d0 [tempesta_fw]
[ 1611.102857]  [<ffffffffa052a7b6>] handle_sysctl_state_io+0xf6/0x1d0 [tempesta_fw]
[ 1611.104787]  [<ffffffffa052a6ff>] ? handle_sysctl_state_io+0x3f/0x1d0 [tempesta_fw]
[ 1611.106735]  [<ffffffff81089eef>] ? ns_capable_common+0x2f/0x80
[ 1611.108250]  [<ffffffff81287129>] proc_sys_call_handler.isra.14+0xe9/0x110
[ 1611.109951]  [<ffffffff81287168>] proc_sys_write+0x18/0x20
[ 1611.111214]  [<ffffffff8120c777>] __vfs_write+0x37/0x140
[ 1611.112572]  [<ffffffff810cbbd7>] ? percpu_down_read+0x17/0x50
[ 1611.114046]  [<ffffffff8120d546>] vfs_write+0xb6/0x1a0
[ 1611.115357]  [<ffffffff8120e9c5>] SyS_write+0x55/0xc0
[ 1611.116649]  [<ffffffff815ff9f2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
[ 1611.118289] Code: 00 49 8d 54 24 50 48 c7 c7 60 cf 53 a0 48 8b 08 48 8b 45 80 48 8b 30 31 c0 e8 75 b0 c4 e0 c7 45 8c ea ff ff ff eb 9b 0f 0b 0f 0b <0f> 0b 48 8b 45 80 49 8d 54 24 50 48 8d 4d 90 48 c7 c7 a0 cf 53 
[ 1611.123815] RIP  [<ffffffffa052f00e>] tfw_cfgop_finish_srv_group+0x1de/0x210 [tempesta_fw]
[ 1611.125951]  RSP <ffff88005619b9d8>
[ 1611.128960] ---[ end trace 9d5b3d130adfe0dd ]---
[ 1611.132001] [tempesta] Un-registering scheduler: hash
[ 1611.163527] [tempesta] Un-registering scheduler: http
[ 1611.198420] [tempesta] Un-registering scheduler: round-robin
[ 1611.234623] [tempesta] exiting...
[ 1611.236753] kmem_cache_destroy tfw_srv_conn_cache: Slab cache still has objects
[ 1611.238849] CPU: 0 PID: 2131 Comm: rmmod Tainted: G      D    O    4.8.15-3-tempesta #3
[ 1611.240655] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
[ 1611.242867]  0000000000000286 00000000a394b34c ffff880056e03e38 ffffffff81303600
[ 1611.244520]  ffff880059bb63c0 ffff880059bb63c0 ffff880056e03ea8 ffffffff811a88e2
[ 1611.246234]  ffff880059aa6420 00ff880056e03eb8 ffff880056e03e58 ffff880056e03e58
[ 1611.248047] Call Trace:
[ 1611.248673]  [<ffffffff81303600>] dump_stack+0x63/0x83
[ 1611.249779]  [<ffffffff811a88e2>] kmem_cache_destroy+0x242/0x250
[ 1611.250603]  [<ffffffffa05300d5>] tfw_sock_srv_exit+0x15/0x20 [tempesta_fw]
[ 1611.251606]  [<ffffffffa052a8c2>] tfw_exit+0x32/0x60 [tempesta_fw]
[ 1611.253022]  [<ffffffff81109d12>] SyS_delete_module+0x192/0x260
[ 1611.254376]  [<ffffffff8100360e>] ? exit_to_usermode_loop+0x5e/0xc0
[ 1611.255791]  [<ffffffff815ff9f2>] entry_SYSCALL_64_fastpath+0x1a/0xa4

Next, trying to add sched configuration option to server groups raises configuration parse error.
Config:

srv_group uri_p {
sched round-robin;
server 10.0.10.1:8000;
}
srv_group uri_s {
sched round-robin;
server 10.0.10.1:8001;
}
srv_group host_p {
sched round-robin;
server 10.0.10.1:8002;
}
srv_group host_s {
sched round-robin;
server 10.0.10.1:8003;
}
srv_group host_e {
sched round-robin;
server 10.0.10.1:8004;
}
srv_group hdr_c_e {
sched round-robin;
server 10.0.10.1:8005;
}
srv_group hdr_h_p {
sched round-robin;
server 10.0.10.1:8006;
}
srv_group hdr_h_s {
sched round-robin;
server 10.0.10.1:8007;
}
srv_group hdr_h_e {
sched round-robin;
server 10.0.10.1:8008;
}
srv_group hdr_r_p {
sched round-robin;
server 10.0.10.1:8009;
}
sched round-robin;
server 10.0.10.1:8010;
cache 0;

sched_http_rules {
  match uri_p    uri       prefix  "/static";
  match uri_s    uri       suffix  ".php";
  match host_p   host      prefix  "static.";
  match host_s   host      suffix  "tempesta-tech.com";
  match host_e   host      eq      "foo.example.com";
  match hdr_c_e  hdr_conn  eq      "keep-alive";
  match hdr_h_p  hdr_host  prefix  "bar.";
  match hdr_h_s  hdr_host  suffix  "natsys-lab.com";
  match hdr_h_e  hdr_host  eq      "bar.natsys-lab.com";
  match hdr_r_p  hdr_raw   prefix  "X-Custom-Bar-Hdr: ";
}

Error message:

[ 1231.495883] [tdb] Start Tempesta DB
[ 1231.500296] [tempesta] Initializing Tempesta FW kernel module...
[ 1231.505233] [tempesta] Registering new scheduler: hash
[ 1231.507382] [tempesta] Registering new scheduler: http
[ 1231.509813] [tempesta] Registering new scheduler: round-robin
[ 1231.513607] [tempesta] Starting all modules...
[ 1231.514275] [tempesta] ERROR: duplicate entry: 'sched', only one such entry is allowed.
[ 1231.515532] [tempesta] ERROR: configuration handler returned error: -22
[ 1231.516542] [tempesta] ERROR: configuration parsing error: str:1;w:(null)
[ 1231.517518] [tempesta] ERROR: can't parse configuration data
[ 1231.518264] [tempesta] ERROR: failed to start modules
[ 1231.744186] [tdb] Shutdown Tempesta DB
@vankoven vankoven added this to the 0.5.0 Web Server milestone Mar 13, 2017
@keshonok keshonok self-assigned this Mar 13, 2017
@vankoven vankoven self-assigned this Mar 14, 2017
@keshonok keshonok changed the title Regression: Crash on create multiple server groups Crash on creating multiple server groups Mar 14, 2017
@keshonok
Copy link
Contributor

The conducted research shows that this is an issue in cfg.c module and not a regression. Therefore the issue has been renamed.

@keshonok
Copy link
Contributor

keshonok commented Mar 14, 2017

cfg.c module maintains call_counter variable that is an internal (not public) member of TfwCfgSpec{} structure used to define configuration details for a Tempesta directive. This variable works as a use counter for a directive. Then the use counter is used determine the following actions:

  • Applying the default configuration for a directive if the counter is zero (the directive was not found in the configuration);
  • Creating a default section if the counter is zero (the section was not found in the configuration file);
  • Calling a cleanup() callback if the counter is NOT zero (the directive was found in the configuration and therefore there may be things to clean up).

The issue however is that this counter is maintained in the exact TfwCfgSpec{} structure that defines the configuration for a directive. That leads to all sorts of unexpected behaviour.

  • Each server group defined by srv_group directive must have a proper scheduler assigned to it. The scheduler can be defined explicitly with sched directive, otherwise it's set implicitly to the default value. The condition for setting the default value is that the directive was not found within a group, i.e. call_counter is zero. The default value is set and processed via the same mechanisms in cfg.c as explicit directive. Therefore, after the default scheduler is set for a group, call_counter for TfwCfgSpec{} that defines sched directive in the code is set to 1. Processing of the next srv_group directive in the configuration file goes the same route. However, this time call_counter equals to 1 already as it's maintained in the same structure. The default value is not set and processed, and the code catches this with a BUG_ON().
  • The bug with a duplicate directive occurs exactly the same way. The configuration for sched directive says that this directive can be seen only once per srv_group section. After it's processed in the first server group in the configuration file, call_counter for the sched directive is set to 1. The next sched directive which is in another srv_group section then causes this error message.

Fixing this issue is not as simple as resetting call_counter after each section is processed. The same call_counter is used on emergency or regular shutdown clean up as the condition for calling the cleanup() callback function. If the counter is not zero, then there may be something that requires a clean up. So the consequence of resetting call_counter after each section was that the cleanup() callback was not called on shutdown.

This appears to be a very old bug in cfg.c.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants