Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warnings on dirty socket destruction during regression.test_stress_pipeline.Pipeline test #984

Closed
vankoven opened this issue Mar 28, 2018 · 4 comments
Assignees

Comments

@vankoven
Copy link
Contributor

vankoven commented Mar 28, 2018

Tempesta is at 7ee5ded linux kernel is at 4.9.35-tfw6 tag (latest release). Simply run regression.test_stress_pipeline.Pipeline functional tests and multiple oopses will happen on wrk shutdown (client disconnects). At least 100 concurrent connections was required to reproduce the issue.

Test log:

% ./run_tests.py -vvv regression.test_stress_pipeline.Pipeline
Not resuming: File tests_resume.json not found

----------------------------------------------------------------------
Running functional tests...
----------------------------------------------------------------------

test_pipelined_requests (regression.test_stress_pipeline.Pipeline) ... 
	Init test case...
	Starting Nginx on 192.168.122.12:8000	Starting Nginx on 192.168.122.12:8002

	Starting Nginx on 192.168.122.12:8004	Starting Nginx on 192.168.122.12:8006

	Starting Nginx on 192.168.122.12:8003
	Starting Nginx on 192.168.122.12:8001
	Starting Nginx on 192.168.122.12:8007
	Starting Nginx on 192.168.122.12:8005
	Starting Nginx on 192.168.122.12:8008
	Starting Nginx on 192.168.122.12:8010
	Starting Nginx on 192.168.122.12:8012
	Starting Nginx on 192.168.122.12:8014
	Starting Nginx on 192.168.122.12:8009
	Starting Nginx on 192.168.122.12:8011
	Starting Nginx on 192.168.122.12:8013	Starting Nginx on 192.168.122.12:8015

	Starting Nginx on 192.168.122.12:8016
	Starting Nginx on 192.168.122.12:8018
	Starting Nginx on 192.168.122.12:8017
	Starting Nginx on 192.168.122.12:8020
	Starting Nginx on 192.168.122.12:8022
	Starting Nginx on 192.168.122.12:8019
	Starting Nginx on 192.168.122.12:8024
	Starting Nginx on 192.168.122.12:8023
	Starting Nginx on 192.168.122.12:8021
	Starting Nginx on 192.168.122.12:8025
	Starting Nginx on 192.168.122.12:8026
	Starting Nginx on 192.168.122.12:8028
	Starting Nginx on 192.168.122.12:8030
	Starting Nginx on 192.168.122.12:8027
	Starting Nginx on 192.168.122.12:8029
	Starting Nginx on 192.168.122.12:8031
	Starting TempestaFW on 192.168.122.12
	Running 1 HTTP clients on localhost
Running 10s test @ http://192.168.122.12/
  4 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    25.43ms   19.63ms 178.98ms   80.18%
    Req/Sec     6.03k   562.57     7.72k    69.25%
  241173 requests in 10.04s, 4.15GB read
Requests/sec:  24022.62
Transfer/sec:    423.26MB
---- RESULTS --------
Status 200 : 241173 times
---- END ------------
 
	Client: errors: 0, requests: 241173, rate: 24022
	Clients in total: errors: 0, requests: 241173, rate: 24022
	Stoping TempestaFW on 192.168.122.12
	Stoping Nginx on 192.168.122.12:8000
	Stoping Nginx on 192.168.122.12:8002
	Stoping Nginx on 192.168.122.12:8004
	Stoping Nginx on 192.168.122.12:8006
	Removing Nginx config for 192.168.122.12:8000
	Removing Nginx config for 192.168.122.12:8002
	Removing Nginx config for 192.168.122.12:8006
	Removing Nginx config for 192.168.122.12:8004
	Stoping Nginx on 192.168.122.12:8005
	Stoping Nginx on 192.168.122.12:8001
	Stoping Nginx on 192.168.122.12:8007
	Stoping Nginx on 192.168.122.12:8003
	Removing Nginx config for 192.168.122.12:8001
	Removing Nginx config for 192.168.122.12:8005
	Removing Nginx config for 192.168.122.12:8003
	Removing Nginx config for 192.168.122.12:8007
	Stoping Nginx on 192.168.122.12:8008
	Stoping Nginx on 192.168.122.12:8010
	Stoping Nginx on 192.168.122.12:8012
	Stoping Nginx on 192.168.122.12:8014
	Removing Nginx config for 192.168.122.12:8014
	Removing Nginx config for 192.168.122.12:8012
	Removing Nginx config for 192.168.122.12:8010
	Removing Nginx config for 192.168.122.12:8008
	Stoping Nginx on 192.168.122.12:8011
	Stoping Nginx on 192.168.122.12:8015
	Stoping Nginx on 192.168.122.12:8013
	Stoping Nginx on 192.168.122.12:8009
	Removing Nginx config for 192.168.122.12:8015
	Removing Nginx config for 192.168.122.12:8011
	Removing Nginx config for 192.168.122.12:8013
	Stoping Nginx on 192.168.122.12:8016
	Removing Nginx config for 192.168.122.12:8009
	Removing Nginx config for 192.168.122.12:8016
	Stoping Nginx on 192.168.122.12:8018
	Stoping Nginx on 192.168.122.12:8020
	Stoping Nginx on 192.168.122.12:8017
	Stoping Nginx on 192.168.122.12:8022
	Removing Nginx config for 192.168.122.12:8018
	Removing Nginx config for 192.168.122.12:8020
	Removing Nginx config for 192.168.122.12:8017
	Removing Nginx config for 192.168.122.12:8022
	Stoping Nginx on 192.168.122.12:8019
	Stoping Nginx on 192.168.122.12:8021
	Stoping Nginx on 192.168.122.12:8024
	Stoping Nginx on 192.168.122.12:8023
	Removing Nginx config for 192.168.122.12:8019
	Removing Nginx config for 192.168.122.12:8021
	Removing Nginx config for 192.168.122.12:8024
	Removing Nginx config for 192.168.122.12:8023
	Stoping Nginx on 192.168.122.12:8026
	Stoping Nginx on 192.168.122.12:8028
	Stoping Nginx on 192.168.122.12:8025
	Stoping Nginx on 192.168.122.12:8030
	Removing Nginx config for 192.168.122.12:8026
	Removing Nginx config for 192.168.122.12:8028
	Removing Nginx config for 192.168.122.12:8025
	Removing Nginx config for 192.168.122.12:8030
	Stoping Nginx on 192.168.122.12:8027
	Stoping Nginx on 192.168.122.12:8029
	Stoping Nginx on 192.168.122.12:8031
	Removing Nginx config for 192.168.122.12:8027
	Removing Nginx config for 192.168.122.12:8029
	Removing Nginx config for 192.168.122.12:8031
ok

----------------------------------------------------------------------
Ran 1 test in 22.256s

OK

Kernel log (a part of it since it was flooded and overflowed):

[154655.676795] ------------[ cut here ]------------
[154655.676797] WARNING: CPU: 0 PID: 3 at /home/user/kernel/linux-packages/net/ipv4/af_inet.c:155 inet_sock_destruct+0x1f1/0x210
[154655.676814] Modules linked in: tfw_sched_ratio(O) tfw_sched_http(O) tfw_sched_hash(O) tempesta_fw(O) tempesta_db(O) tempesta_tls(O) tcp_diag inet_diag xt_tcpudp iptable_filter crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support snd_hda_codec_generic ppdev snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm ghash_clmulni_intel snd_timer sg snd soundcore virtio_balloon virtio_console binfmt_misc qxl ttm evdev drm_kms_helper serio_raw pcspkr lpc_ich mfd_core drm shpchp parport_pc parport button ip_tables x_tables autofs4 ext4 crc16 jbd2 crc32c_generic fscrypto ecb mbcache sr_mod cdrom crc32c_intel virtio_blk aesni_intel virtio_net aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd ahci libahci psmouse sym53c8xx libata scsi_transport_spi i2c_i801 i2c_smbus ehci_pci uhci_hcd ehci_hcd
[154655.676816]  usbcore usb_common scsi_mod virtio_pci virtio_ring virtio [last unloaded: tempesta_tls]
[154655.676817] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G        W  O    4.9.0-tempesta-amd64 #1 Debian 4.9.35-tfw6-1
[154655.676817] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[154655.676819]  0000000000000000 ffffffffb392f194 0000000000000000 0000000000000000
[154655.676820]  ffffffffb367adee ffff9da78b9c6900 ffff9da78b9c6a88 00000000000001cb
[154655.676821]  ffff9da7bfc19200 000000000001a780 ffffffffc0c785f8 ffffffffb3b84ab1
[154655.676822] Call Trace:
[154655.676824]  [<ffffffffb392f194>] ? dump_stack+0x5c/0x78
[154655.676825]  [<ffffffffb367adee>] ? __warn+0xbe/0xe0
[154655.676827]  [<ffffffffb3b84ab1>] ? inet_sock_destruct+0x1f1/0x210
[154655.676829]  [<ffffffffb3aee561>] ? __sk_destruct+0x21/0x190
[154655.676837]  [<ffffffffc083110c>] ? ss_tx_action+0x1dc/0x4b0 [tempesta_fw]
[154655.676839]  [<ffffffffb3b08d08>] ? process_backlog+0x88/0x130
[154655.676841]  [<ffffffffb3b0493b>] ? net_tx_action+0x9b/0x1d0
[154655.676843]  [<ffffffffb3c10dda>] ? __do_softirq+0x10a/0x29a
[154655.676845]  [<ffffffffb369dcd0>] ? sort_range+0x20/0x20
[154655.676846]  [<ffffffffb3680e15>] ? run_ksoftirqd+0x25/0x40
[154655.676848]  [<ffffffffb369ddce>] ? smpboot_thread_fn+0xfe/0x150
[154655.676850]  [<ffffffffb369a577>] ? kthread+0xd7/0xf0
[154655.676851]  [<ffffffffb369a4a0>] ? kthread_park+0x60/0x60
[154655.676853]  [<ffffffffb3c0e535>] ? ret_from_fork+0x25/0x30
[154655.676854] ---[ end trace 58b01fd3af7038e2 ]---
[154655.676856] ------------[ cut here ]------------
[154655.676858] WARNING: CPU: 0 PID: 3 at /home/user/kernel/linux-packages/net/ipv4/af_inet.c:154 inet_sock_destruct+0x1d2/0x210
[154655.676883] Modules linked in: tfw_sched_ratio(O) tfw_sched_http(O) tfw_sched_hash(O) tempesta_fw(O) tempesta_db(O) tempesta_tls(O) tcp_diag inet_diag xt_tcpudp iptable_filter crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support snd_hda_codec_generic ppdev snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm ghash_clmulni_intel snd_timer sg snd soundcore virtio_balloon virtio_console binfmt_misc qxl ttm evdev drm_kms_helper serio_raw pcspkr lpc_ich mfd_core drm shpchp parport_pc parport button ip_tables x_tables autofs4 ext4 crc16 jbd2 crc32c_generic fscrypto ecb mbcache sr_mod cdrom crc32c_intel virtio_blk aesni_intel virtio_net aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd ahci libahci psmouse sym53c8xx libata scsi_transport_spi i2c_i801 i2c_smbus ehci_pci uhci_hcd ehci_hcd
[154655.676886]  usbcore usb_common scsi_mod virtio_pci virtio_ring virtio [last unloaded: tempesta_tls]
[154655.676888] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G        W  O    4.9.0-tempesta-amd64 #1 Debian 4.9.35-tfw6-1
[154655.676888] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[154655.676890]  0000000000000000 ffffffffb392f194 0000000000000000 0000000000000000
[154655.676892]  ffffffffb367adee ffff9da78b9c3480 ffff9da78b9c3608 00000000000001cb
[154655.676894]  ffff9da7bfc19200 000000000001a780 ffffffffc0c785f8 ffffffffb3b84a92
[154655.676895] Call Trace:
[154655.676897]  [<ffffffffb392f194>] ? dump_stack+0x5c/0x78
[154655.676898]  [<ffffffffb367adee>] ? __warn+0xbe/0xe0
[154655.676901]  [<ffffffffb3b84a92>] ? inet_sock_destruct+0x1d2/0x210
[154655.676902]  [<ffffffffb3aee561>] ? __sk_destruct+0x21/0x190
[154655.676911]  [<ffffffffc083110c>] ? ss_tx_action+0x1dc/0x4b0 [tempesta_fw]
[154655.676913]  [<ffffffffb3b08d08>] ? process_backlog+0x88/0x130
[154655.676914]  [<ffffffffb3b0493b>] ? net_tx_action+0x9b/0x1d0
[154655.676916]  [<ffffffffb3c10dda>] ? __do_softirq+0x10a/0x29a
[154655.676917]  [<ffffffffb369dcd0>] ? sort_range+0x20/0x20
[154655.676918]  [<ffffffffb3680e15>] ? run_ksoftirqd+0x25/0x40
[154655.676919]  [<ffffffffb369ddce>] ? smpboot_thread_fn+0xfe/0x150
[154655.676921]  [<ffffffffb369a577>] ? kthread+0xd7/0xf0
[154655.676923]  [<ffffffffb369a4a0>] ? kthread_park+0x60/0x60
[154655.676924]  [<ffffffffb3c0e535>] ? ret_from_fork+0x25/0x30
[154655.676924] ---[ end trace 58b01fd3af7038e3 ]---
[154655.068546] ------------[ cut here ]------------
[154655.070104] WARNING: CPU: 0 PID: 9268 at /home/user/kernel/linux-packages/net/core/stream.c:203 sk_stream_kill_queues+0x11f/0x160
[154655.073075] Modules linked in: tfw_sched_ratio(O) tfw_sched_http(O) tfw_sched_hash(O) tempesta_fw(O) tempesta_db(O) tempesta_tls(O) tcp_diag inet_diag xt_tcpudp iptable_filter crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support snd_hda_codec_generic ppdev snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm ghash_clmulni_intel snd_timer sg snd soundcore virtio_balloon virtio_console binfmt_misc qxl ttm evdev drm_kms_helper serio_raw pcspkr lpc_ich mfd_core drm shpchp parport_pc parport button ip_tables x_tables autofs4 ext4 crc16 jbd2 crc32c_generic fscrypto ecb mbcache sr_mod cdrom crc32c_intel virtio_blk aesni_intel virtio_net aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd ahci libahci psmouse sym53c8xx libata scsi_transport_spi i2c_i801 i2c_smbus ehci_pci uhci_hcd ehci_hcd
[154655.092574]  usbcore usb_common scsi_mod virtio_pci virtio_ring virtio [last unloaded: tempesta_tls]
[154655.095211] CPU: 0 PID: 9268 Comm: systemd-journal Tainted: G        W  O    4.9.0-tempesta-amd64 #1 Debian 4.9.35-tfw6-1
[154655.098253] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[154655.100952]  0000000000000000 ffffffffb392f194 0000000000000000 0000000000000000
[154655.103616]  ffffffffb367adee ffff9da79bcd6180 ffff9da79bcd6308 ffff9da798e47100
[154655.106208]  ffff9da79a717a62 0000000000000001 ffff9da79bcd6180 ffffffffb3afa72f
[154655.108750] Call Trace:
[154655.109289]  <IRQ> 
[154655.109913]  [<ffffffffb392f194>] ? dump_stack+0x5c/0x78
[154655.111118]  [<ffffffffb367adee>] ? __warn+0xbe/0xe0
[154655.112557]  [<ffffffffb3afa72f>] ? sk_stream_kill_queues+0x11f/0x160
[154655.113878]  [<ffffffffb3b5404f>] ? inet_csk_destroy_sock+0x4f/0x160
[154655.115816]  [<ffffffffc08309ee>] ? ss_linkerror+0xe/0x60 [tempesta_fw]
[154655.117370]  [<ffffffffb3b61ade>] ? tcp_validate_incoming+0x27e/0x370
[154655.118895]  [<ffffffffb3b62bab>] ? tcp_rcv_established+0x19b/0x6c0
[154655.120802]  [<ffffffffb3b20d4d>] ? sk_filter_trim_cap+0x2d/0x2c0
[154655.122821]  [<ffffffffb3b6d793>] ? tcp_v4_do_rcv+0x133/0x1f0
[154655.124723]  [<ffffffffb3b6ef37>] ? tcp_v4_rcv+0x8a7/0x9b0
[154655.126145]  [<ffffffffb3b484e9>] ? ip_local_deliver_finish+0x99/0x1c0
[154655.127948]  [<ffffffffb3b487ab>] ? ip_local_deliver+0x6b/0xf0
[154655.129392]  [<ffffffffb3b6e602>] ? tcp_v4_early_demux+0xb2/0x140
[154655.130242]  [<ffffffffb3b48450>] ? ip_rcv_finish+0x3f0/0x3f0
[154655.131337]  [<ffffffffb3b48ab6>] ? ip_rcv+0x286/0x3d0
[154655.132992]  [<ffffffffb3b48060>] ? inet_del_offload+0x40/0x40
[154655.134862]  [<ffffffffb3b07669>] ? __netif_receive_skb_core+0x4f9/0xa00
[154655.137026]  [<ffffffffb36ac400>] ? update_cfs_rq_load_avg+0x1a0/0x490
[154655.139044]  [<ffffffffc083c3ed>] ? tfw_wq_pop_ticket+0x7d/0xc90 [tempesta_fw]
[154655.140900]  [<ffffffffb3b08d08>] ? process_backlog+0x88/0x130
[154655.142180]  [<ffffffffb3b08420>] ? net_rx_action+0x240/0x370
[154655.143367]  [<ffffffffb3c10dda>] ? __do_softirq+0x10a/0x29a
[154655.144276]  [<ffffffffb3680f4e>] ? irq_exit+0xae/0xb0
[154655.145727]  [<ffffffffb3c104a2>] ? call_function_single_interrupt+0x82/0x90
[154655.148726]  <EOI> 
[154655.149475]  [<ffffffffb382a370>] ? mntput_no_expire+0x170/0x170
[154655.150562]  [<ffffffffb3812989>] ? terminate_walk+0x89/0xf0
[154655.151603]  [<ffffffffb3816162>] ? path_lookupat+0x112/0x120
[154655.152583]  [<ffffffffb3818a21>] ? filename_lookup+0xb1/0x180
[154655.153744]  [<ffffffffb380490a>] ? __check_object_size+0xfa/0x1e0
[154655.155617]  [<ffffffffb395d558>] ? strncpy_from_user+0x48/0x160
[154655.157126]  [<ffffffffb381865a>] ? getname_flags+0x6a/0x1e0
[154655.158719]  [<ffffffffb380643d>] ? SyS_access+0xad/0x220
[154655.160294]  [<ffffffffb3603b1c>] ? do_syscall_64+0x7c/0xf0
[154655.162050]  [<ffffffffb3c0e36f>] ? entry_SYSCALL64_slow_path+0x25/0x25
[154655.163296] ---[ end trace 58b01fd3af7038d7 ]---
[154655.164605] ------------[ cut here ]------------
[154655.165930] WARNING: CPU: 0 PID: 9268 at /home/user/kernel/linux-packages/net/core/stream.c:204 inet_csk_destroy_sock+0x4f/0x160
[154655.169386] Modules linked in: tfw_sched_ratio(O) tfw_sched_http(O) tfw_sched_hash(O) tempesta_fw(O) tempesta_db(O) tempesta_tls(O) tcp_diag inet_diag xt_tcpudp iptable_filter crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support snd_hda_codec_generic ppdev snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm ghash_clmulni_intel snd_timer sg snd soundcore virtio_balloon virtio_console binfmt_misc qxl ttm evdev drm_kms_helper serio_raw pcspkr lpc_ich mfd_core drm shpchp parport_pc parport button ip_tables x_tables autofs4 ext4 crc16 jbd2 crc32c_generic fscrypto ecb mbcache sr_mod cdrom crc32c_intel virtio_blk aesni_intel virtio_net aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd ahci libahci psmouse sym53c8xx libata scsi_transport_spi i2c_i801 i2c_smbus ehci_pci uhci_hcd ehci_hcd
[154655.191281]  usbcore usb_common scsi_mod virtio_pci virtio_ring virtio [last unloaded: tempesta_tls]
[154655.193902] CPU: 0 PID: 9268 Comm: systemd-journal Tainted: G        W  O    4.9.0-tempesta-amd64 #1 Debian 4.9.35-tfw6-1
[154655.196550] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[154655.199111]  0000000000000000 ffffffffb392f194 0000000000000000 0000000000000000
[154655.201252]  ffffffffb367adee ffff9da79bcd6180 0000000000000001 ffff9da798e47100
[154655.203454]  ffff9da79a717a62 0000000000000001 ffff9da79bcd6180 ffffffffb3b5404f
[154655.205529] Call Trace:
[154655.205989]  <IRQ> 
[154655.206474]  [<ffffffffb392f194>] ? dump_stack+0x5c/0x78
[154655.207830]  [<ffffffffb367adee>] ? __warn+0xbe/0xe0
[154655.208927]  [<ffffffffb3b5404f>] ? inet_csk_destroy_sock+0x4f/0x160
[154655.210258]  [<ffffffffc08309ee>] ? ss_linkerror+0xe/0x60 [tempesta_fw]
[154655.212398]  [<ffffffffb3b61ade>] ? tcp_validate_incoming+0x27e/0x370
[154655.214254]  [<ffffffffb3b62bab>] ? tcp_rcv_established+0x19b/0x6c0
[154655.216208]  [<ffffffffb3b20d4d>] ? sk_filter_trim_cap+0x2d/0x2c0
[154655.217747]  [<ffffffffb3b6d793>] ? tcp_v4_do_rcv+0x133/0x1f0
[154655.219076]  [<ffffffffb3b6ef37>] ? tcp_v4_rcv+0x8a7/0x9b0
[154655.220569]  [<ffffffffb3b484e9>] ? ip_local_deliver_finish+0x99/0x1c0
[154655.222480]  [<ffffffffb3b487ab>] ? ip_local_deliver+0x6b/0xf0
[154655.224354]  [<ffffffffb3b6e602>] ? tcp_v4_early_demux+0xb2/0x140
[154655.226479]  [<ffffffffb3b48450>] ? ip_rcv_finish+0x3f0/0x3f0
[154655.228320]  [<ffffffffb3b48ab6>] ? ip_rcv+0x286/0x3d0
[154655.229684]  [<ffffffffb3b48060>] ? inet_del_offload+0x40/0x40
[154655.231281]  [<ffffffffb3b07669>] ? __netif_receive_skb_core+0x4f9/0xa00
[154655.232497]  [<ffffffffb36ac400>] ? update_cfs_rq_load_avg+0x1a0/0x490
[154655.233843]  [<ffffffffc083c3ed>] ? tfw_wq_pop_ticket+0x7d/0xc90 [tempesta_fw]
[154655.235407]  [<ffffffffb3b08d08>] ? process_backlog+0x88/0x130
[154655.237355]  [<ffffffffb3b08420>] ? net_rx_action+0x240/0x370
[154655.239297]  [<ffffffffb3c10dda>] ? __do_softirq+0x10a/0x29a
[154655.241292]  [<ffffffffb3680f4e>] ? irq_exit+0xae/0xb0
[154655.242962]  [<ffffffffb3c104a2>] ? call_function_single_interrupt+0x82/0x90
[154655.245299]  <EOI> 
[154655.245844]  [<ffffffffb382a370>] ? mntput_no_expire+0x170/0x170
[154655.247903]  [<ffffffffb3812989>] ? terminate_walk+0x89/0xf0
[154655.249826]  [<ffffffffb3816162>] ? path_lookupat+0x112/0x120
[154655.251450]  [<ffffffffb3818a21>] ? filename_lookup+0xb1/0x180
[154655.253438]  [<ffffffffb380490a>] ? __check_object_size+0xfa/0x1e0
[154655.255357]  [<ffffffffb395d558>] ? strncpy_from_user+0x48/0x160
[154655.257411]  [<ffffffffb381865a>] ? getname_flags+0x6a/0x1e0
[154655.259077]  [<ffffffffb380643d>] ? SyS_access+0xad/0x220
[154655.260919]  [<ffffffffb3603b1c>] ? do_syscall_64+0x7c/0xf0
[154655.262764]  [<ffffffffb3c0e36f>] ? entry_SYSCALL64_slow_path+0x25/0x25
[154655.265101] ---[ end trace 58b01fd3af7038d8 ]---

Seems that same Oopses was seen in #692

@krizhanovsky
Copy link
Contributor

Probably the same problem with non-updating socket memory after adjusting skb size as in https://github.com/tempesta-tech/linux-4.9.35-tfw/commit/60883ec7aeec1000af98b69e62ec173ff5c1b988

@krizhanovsky
Copy link
Contributor

Just hit the bug on Tempesta FW proxying 5000-byte index.html sericed by Nginx running on the same VM with Tempesta FW:

# cat ~/tempesta/etc/tempesta_fw.conf
listen 192.168.100.4:80;
server 127.0.0.1:9090;
cache 0;
# ulimit -n 65536; ./wrk -c 4096 -t 8 -d 30 http://192.168.100.4:80/
[11329969575.738787] [tdb] Start Tempesta DB
[11329969575.759262] [tempesta] Initializing Tempesta FW kernel module...
[11329969575.768698] [tempesta] Registering new scheduler: hash 
[11329969575.776608] [tempesta] Registering new scheduler: http
[11329969575.807501] [tdb] Opened table /opt/tempesta/db/filter.tdb: size=16777216 rec_size=20 base=ffff8d0cba800000
[11329969589.422002] TCP: request_sock_TCP: Possible SYN flooding on port 80. Sending cookies.  Check SNMP counters.
[11329969616.403364] ------------[ cut here ]------------
[11329969616.407433] WARNING: CPU: 3 PID: 28 at net/core/stream.c:205 sk_stream_kill_queues+0x106/0x120
[11329969616.411813] Modules linked in: tfw_sched_ratio(O) tfw_sched_http(O) tfw_sched_hash(O) tempesta_fw(O) tempesta_db(O) tempesta_tls(O) tcp_diag inet_diag kmemcpy(O) binfmt_misc crct10dif_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 bochs_drm ttm drm_kms_helper drm fb_sys_fops syscopyarea crypto_simd glue_helper cryptd ppdev parport_pc sysfillrect sysimgblt parport serio_raw sg button pcspkr ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto sr_mod cdrom sd_mod ata_generic ata_piix libata i2c_piix4 psmouse scsi_mod e1000 [last unloaded: tempesta_tls]
[11329969616.430415] CPU: 3 PID: 28 Comm: ksoftirqd/3 Tainted: G           O    4.14.32-kdump+ #1
[11329969616.433921] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
[11329969616.437326] task: ffff8d0c9acfb1c0 task.stack: ffffafce00418000 
[11329969616.441104] RIP: 0010:sk_stream_kill_queues+0x106/0x120
[11329969616.444787] RSP: 0018:ffffafce0041bb08 EFLAGS: 00010286
[11329969616.447491] RAX: 00000000ffffff00 RBX: ffff8d0c9a6f7000 RCX: 0000000000000020
[11329969616.449393] RDX: ffffffffffffffe0 RSI: 0000000000000100 RDI: ffff8d0c9a6f70b0
[11329969616.451050] RBP: ffff8d0c9a6f70b0 R08: 0000000000000000 R09: 0000000000000000
[11329969616.452854] R10: 0000000000000001 R11: 0000000000015bc6 R12: ffff8d0c9a6f7158
[11329969616.454647] R13: ffff8d0c9749fc62 R14: ffff8d0c9749fc62 R15: ffff8d0c9a6f7000
[11329969616.456506] FS:  0000000000000000(0000) GS:ffff8d0cbfd80000(0000) knlGS:0000000000000000
[11329969616.458783] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11329969616.462618] CR2: 00005575e4992628 CR3: 000000007c80a003 CR4: 00000000003606e0
[11329969616.464943] Call Trace:
[11329969616.466191]  inet_csk_destroy_sock+0x50/0x100
[11329969616.467667]  ss_linkerror+0x9/0x60 [tempesta_fw] 
[11329969616.469242]  tcp_reset+0x129/0x170
[11329969616.470692]  tcp_validate_incoming+0x124/0x3f0
[11329969616.473524]  tcp_rcv_established+0x1ca/0x570
[11329969616.475004]  ? tcp_v4_inbound_md5_hash+0x5d/0x1b0
[11329969616.476537]  tcp_v4_do_rcv+0x124/0x1c0 
[11329969616.477840]  tcp_v4_rcv+0x93f/0xa40
[11329969616.479207]  ip_local_deliver_finish+0x95/0x1c0
[11329969616.480775]  ip_local_deliver+0x66/0xe0
[11329969616.482553]  ? tcp_v4_early_demux+0x10d/0x140
[11329969616.484080]  ? ip_rcv_finish+0x175/0x400 
[11329969616.485643]  ip_rcv+0x284/0x3b0
[11329969616.486950]  ? inet_del_offload+0x30/0x30
[11329969616.488636]  __netif_receive_skb_core+0x84a/0xb30
[11329969616.492160]  ? process_backlog+0x92/0x120
[11329969616.494935]  process_backlog+0x92/0x120
[11329969616.496660]  net_rx_action+0x261/0x3a0
[11329969616.498065]  __do_softirq+0x104/0x297

Looks very close to #926, it seems socket buffers weren't updated in all the places correctly.

@krizhanovsky krizhanovsky self-assigned this Apr 18, 2018
@krizhanovsky
Copy link
Contributor

Just got plenty of net/ipv4/af_inet.c:155 inet_sock_destruct warnings on tests from #635 (comment) on hardware testbed, so the issues is kind of crucial.

krizhanovsky added a commit that referenced this issue Nov 29, 2018
* Encrypt hash for server finished (missed functionality).
* Multiple fixes in handling scatter lists;
* Multiple fixes for IV handling in encryption and decryption code.
* Fix TLS record header and tag allocation in skb (linked with #391.11).
* Many cleanups and nicer debug and errors reporting.

Kernel:
* Fix TLS skb type handling to call sk_write_xmit() callback.
* Reserve room for TLS header in skb headroom.
* Reset TCP connection if we can not encrypt data on it instead of retransmit
  it in plaintext. This leads to warning similar to #984 - leave as TODO for now.
krizhanovsky added a commit that referenced this issue Dec 25, 2018
* Encrypt hash for server finished (missed functionality).
* Multiple fixes in handling scatter lists;
* Multiple fixes for IV handling in encryption and decryption code.
* Fix TLS record header and tag allocation in skb (linked with #391.11).
* Many cleanups and nicer debug and errors reporting.

Kernel:
* Fix TLS skb type handling to call sk_write_xmit() callback.
* Reserve room for TLS header in skb headroom.
* Reset TCP connection if we can not encrypt data on it instead of retransmit
  it in plaintext. This leads to warning similar to #984 - leave as TODO for now.
@krizhanovsky krizhanovsky changed the title Oopses on dirty socket destruction during regression.test_stress_pipeline.Pipeline test Warnings on dirty socket destruction during regression.test_stress_pipeline.Pipeline test Jan 5, 2019
@krizhanovsky
Copy link
Contributor

krizhanovsky commented Jan 8, 2019

The assertions fail on client sockets on Tempesta FW's side and seem caused by wrong data writings to the client sockets.

Able to reproduce the issue just with

wrk -d 10 -c 2 -t 1 http://192.168.100.4/index.html

, where index.html is ~19KB file. Some concurrency is required - I couldn't reproduce the issue for only one connection. Also I couldn't hit the problem for very small (~20B) files. HTTP headers adjustment seems doesn't affect the issue anyhow.

Only one backend connection is enough:

# cat etc/tempesta_fw.conf 
listen 192.168.100.4:80;

srv_group default {
	server 127.0.0.1:8080 conns_n=1;
}
vhost default {
	proxy_pass default;
}

cache 0;

http_chain {
	-> default;
}

However, bigger number of backend connections doesn't affect the issue. There are no connection resets from the Apache backend.

One more concurrency scenario to reproduce the issue is putting single connection to a backend directly and one more through Tempesta:

$ wrk -d 600 -c 1 -t 1 http://192.168.100.4:8080/index.html
$ wrk -d 5 -c 1 -t 1 http://192.168.100.4/index.html

Again, both the connections must request the big file, if only one of them requests the big file while the other one requests a small file, then the warnings don't appear.

krizhanovsky added a commit that referenced this issue Jan 20, 2019
1. accurately fix skb->truesize and TCP write memory in kernel
   by tcp_skb_unclone();

2. __split_pgfrag_del() if we just move pointers, then we do not free
   TCP write memory, so do not change skb->truesize.

3. ss_skb_unroll(): truesize and data_len/len are completely different counters,
   so do not mix them in ss_skb_adjust_data_len(). By the way, during the tests
   I saw crazy skb overheads - truesize can be larger than len in tens kilobytes.
   The explanation for such overheads is various fragments stoling (e.g. our
   __split_pgfrag_del) and cloning.

4. cleanup: move ss_skb coalescing functions closer to their calls.
@krizhanovsky krizhanovsky mentioned this issue Jan 20, 2019
krizhanovsky added a commit that referenced this issue Jan 26, 2019
1. accurately fix skb->truesize and TCP write memory in kernel
   by tcp_skb_unclone();

2. __split_pgfrag_del() if we just move pointers, then we do not free
   TCP write memory, so do not change skb->truesize.

3. ss_skb_unroll(): truesize and data_len/len are completely different counters,
   so do not mix them in ss_skb_adjust_data_len(). By the way, during the tests
   I saw crazy skb overheads - truesize can be larger than len in tens kilobytes.
   The explanation for such overheads is various fragments stoling (e.g. our
   __split_pgfrag_del) and cloning.

4. cleanup: move ss_skb coalescing functions closer to their calls.
krizhanovsky added a commit that referenced this issue Jan 28, 2019
1. accurately fix skb->truesize and TCP write memory in kernel
   by tcp_skb_unclone();

2. __split_pgfrag_del() if we just move pointers, then we do not free
   TCP write memory, so do not change skb->truesize.

3. ss_skb_unroll(): truesize and data_len/len are completely different counters,
   so do not mix them in ss_skb_adjust_data_len(). By the way, during the tests
   I saw crazy skb overheads - truesize can be larger than len in tens kilobytes.
   The explanation for such overheads is various fragments stoling (e.g. our
   __split_pgfrag_del) and cloning.

4. cleanup: move ss_skb coalescing functions closer to their calls.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants