TCPListener may have memory exploit by clients that don't send FIN. #1998

mfelsche · 2017-07-01T20:56:00Z

EDIT by @jemc: This turns out to be due to a bug in wrk/wrk2, but may have security implications for TCP in general, when dealing with a malicious/misbehaving client. Read on to the rest of the comments for more info.

I wanted to reproduce the benchmark discussed on the pony user mailing list: https://pony.groups.io/g/user/topic/pony_did_poorly_in_a_recent/5411536

A just slightly modified example of the example HTTPServer is used to return only the string "Hello world." and to run with DiscardLog.

A run of wrk with Connection: close header set results in > 99% of the requests to result in socket read errors and the RAM of the httpserver process increases rapidly (a 30 s benchmark run resulted in 2.8 GB VIRT RAM) in contrast to a fresh httpserver process with a benchmark without Connection: close where RAM remains at 621MB VIRT during and after the whole benchmark.

Example output:

~/dev/wrk/wrk -t 4 -c 10 -d30s --timeout 2000 -H 'Connection: close' http://localhost:50000
Running 30s test @ http://localhost:50000
  4 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.97ms   14.85ms 333.71ms   98.19%
    Req/Sec     1.07k   244.22     1.79k    78.19%
  127616 requests in 30.02s, 6.57MB read
  Socket errors: connect 0, read 127615, write 0, timeout 0
Requests/sec:   4251.25
Transfer/sec:    224.19KB

Example output without Connection: close:

 ~/dev/wrk/wrk -t 4 -c 10 -d30s --timeout 2000 http://localhost:50000 
Running 30s test @ http://localhost:50000
  4 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    39.96ms    1.50ms  52.50ms   98.83%
    Req/Sec    50.00     10.00    60.00    100.00%
  6000 requests in 30.01s, 457.20KB read
Requests/sec:    199.96
Transfer/sec:     15.24KB

I dug throught the net and net/http source code to understand the architecture and possible problems of the tcp/http code in pony.
I also tried to profile the server but i have not been very successful.
Most of the time seems to be spent in epoll which is fine afaik.
But then there is lots of time spent in socket close syscalls but most importantly in gc and alloc activity around the _sessions set where connections are kept in order to close them. But i am not 100% sure about this.

My best guess was the _sessions code to be the bottleneck, as it kept references to the tcp-connection actors. Those could only be fully gc'ed after their reference has been removed from the server actor. But removing the code storing a reference to them did not change the results.

I guess there is a bug in how tcp connection close is handled but i was not able to find the root cause.
That is why i file this issue here.

Operating system: Ubuntu 16.04 on Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
Pony: 0.14.0-0d920b8ff [debug]
compiled with: llvm 3.9.1 -- cc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609

Further observations

Setting a connection limit made no difference
using CommonLog instead of DiscardLog also didn't change the results.
only 2 threads were busy during the benchmark run
removing the _sessions related code (not storing actor references in a set, so they can be gc'ed before the server actor gets rid of its reference) did not change the results
increasing or decreasing the number of open connections kept at wrk side did not change the results.
ulimit for max open files is at 128000

The text was updated successfully, but these errors were encountered:

mfelsche · 2017-07-01T21:06:43Z

To make it clear, when i talk about the results that are problematic here i mean that nearly all connections (requests) have been closed with socket read errors on client side. This issue is actually not concerned about performance.

mfelsche · 2017-07-02T22:17:33Z

when testing with wrk2 from another machine, the number of successful requests reported by the tool equals the number of connections configured with the limit parameter to HTTPServer. All further requests seem to run into a socket timeout.

Looking at the wireshark package capture data, i can see that:

the FIN from the server is sent with the HTTP Response, the client (wrk2) ACKs it, but it is not sending its FIN until the end of the wrk2 process (where the OS is closing all fds and sending FINS on its behalf).
In all sessions there was another request sent on that port after roughly 2 seconds

An example session wireshark session screenshot is attached:

It also seems it could be related to wrk/wrk2 misbehaving:

mfelsche · 2017-07-02T22:28:46Z

aaaaaaaaaand .....

it really is a wrk2 problem. Running it from giltene/wrk2#33 fixes the memory and socket read error issue.

I actually don't know if this issue is worth keeping and safeguarding against malicious clients making the httpserver eating up memory. So i keep it open and leave the decision to you.

jemc · 2017-07-02T22:32:09Z

Great work with your investigation, it's much appreciated!

I'm going to change the ticket title and add a "needs discussion" tag to this to discuss whether it is an important security issue to keep open (for TCP in general, not just HTTP), and how it might be mitigated.

jemc changed the title ~~HTTPServer: socket read errors on 'Connection: close' requests~~ TCPListener may have memory exploit by clients that don't send FIN. Jul 2, 2017

jemc added the needs discussion during sync label Jul 2, 2017

jemc added bug: 1 - needs investigation and removed needs discussion during sync labels Jul 19, 2017

SeanTAllen removed the priority: 2 - medium label Sep 15, 2017

SeanTAllen added needs investigation This needs to be looked into before its "ready for work" bug and removed bug: 1 - needs investigation labels May 12, 2020

SeanTAllen removed the bug label Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TCPListener may have memory exploit by clients that don't send FIN. #1998

TCPListener may have memory exploit by clients that don't send FIN. #1998

mfelsche commented Jul 1, 2017 •

edited by jemc

Loading

mfelsche commented Jul 1, 2017

mfelsche commented Jul 2, 2017 •

edited by jemc

Loading

mfelsche commented Jul 2, 2017

jemc commented Jul 2, 2017

TCPListener may have memory exploit by clients that don't send FIN. #1998

TCPListener may have memory exploit by clients that don't send FIN. #1998

Comments

mfelsche commented Jul 1, 2017 • edited by jemc Loading

Further observations

mfelsche commented Jul 1, 2017

mfelsche commented Jul 2, 2017 • edited by jemc Loading

mfelsche commented Jul 2, 2017

jemc commented Jul 2, 2017

mfelsche commented Jul 1, 2017 •

edited by jemc

Loading

mfelsche commented Jul 2, 2017 •

edited by jemc

Loading