Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCPListener may have memory exploit by clients that don't send FIN. #1998

Open
mfelsche opened this issue Jul 1, 2017 · 4 comments
Open
Labels
needs investigation This needs to be looked into before its "ready for work"

Comments

@mfelsche
Copy link
Contributor

mfelsche commented Jul 1, 2017

EDIT by @jemc: This turns out to be due to a bug in wrk/wrk2, but may have security implications for TCP in general, when dealing with a malicious/misbehaving client. Read on to the rest of the comments for more info.

I wanted to reproduce the benchmark discussed on the pony user mailing list: https://pony.groups.io/g/user/topic/pony_did_poorly_in_a_recent/5411536

A just slightly modified example of the example HTTPServer is used to return only the string "Hello world." and to run with DiscardLog.

A run of wrk with Connection: close header set results in > 99% of the requests to result in socket read errors and the RAM of the httpserver process increases rapidly (a 30 s benchmark run resulted in 2.8 GB VIRT RAM) in contrast to a fresh httpserver process with a benchmark without Connection: close where RAM remains at 621MB VIRT during and after the whole benchmark.

Example output:

~/dev/wrk/wrk -t 4 -c 10 -d30s --timeout 2000 -H 'Connection: close' http://localhost:50000
Running 30s test @ http://localhost:50000
  4 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.97ms   14.85ms 333.71ms   98.19%
    Req/Sec     1.07k   244.22     1.79k    78.19%
  127616 requests in 30.02s, 6.57MB read
  Socket errors: connect 0, read 127615, write 0, timeout 0
Requests/sec:   4251.25
Transfer/sec:    224.19KB

Example output without Connection: close:

 ~/dev/wrk/wrk -t 4 -c 10 -d30s --timeout 2000 http://localhost:50000 
Running 30s test @ http://localhost:50000
  4 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    39.96ms    1.50ms  52.50ms   98.83%
    Req/Sec    50.00     10.00    60.00    100.00%
  6000 requests in 30.01s, 457.20KB read
Requests/sec:    199.96
Transfer/sec:     15.24KB

I dug throught the net and net/http source code to understand the architecture and possible problems of the tcp/http code in pony.
I also tried to profile the server but i have not been very successful.
Most of the time seems to be spent in epoll which is fine afaik.
But then there is lots of time spent in socket close syscalls but most importantly in gc and alloc activity around the _sessions set where connections are kept in order to close them. But i am not 100% sure about this.

My best guess was the _sessions code to be the bottleneck, as it kept references to the tcp-connection actors. Those could only be fully gc'ed after their reference has been removed from the server actor. But removing the code storing a reference to them did not change the results.

I guess there is a bug in how tcp connection close is handled but i was not able to find the root cause.
That is why i file this issue here.

Operating system: Ubuntu 16.04 on Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
Pony: 0.14.0-0d920b8ff [debug]
compiled with: llvm 3.9.1 -- cc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609

Further observations

  • Setting a connection limit made no difference
  • using CommonLog instead of DiscardLog also didn't change the results.
  • only 2 threads were busy during the benchmark run
  • removing the _sessions related code (not storing actor references in a set, so they can be gc'ed before the server actor gets rid of its reference) did not change the results
  • increasing or decreasing the number of open connections kept at wrk side did not change the results.
  • ulimit for max open files is at 128000
@mfelsche
Copy link
Contributor Author

mfelsche commented Jul 1, 2017

To make it clear, when i talk about the results that are problematic here i mean that nearly all connections (requests) have been closed with socket read errors on client side. This issue is actually not concerned about performance.

@mfelsche
Copy link
Contributor Author

mfelsche commented Jul 2, 2017

when testing with wrk2 from another machine, the number of successful requests reported by the tool equals the number of connections configured with the limit parameter to HTTPServer. All further requests seem to run into a socket timeout.

Looking at the wireshark package capture data, i can see that:

  • the FIN from the server is sent with the HTTP Response, the client (wrk2) ACKs it, but it is not sending its FIN until the end of the wrk2 process (where the OS is closing all fds and sending FINS on its behalf).
  • In all sessions there was another request sent on that port after roughly 2 seconds

An example session wireshark session screenshot is attached:

image

It also seems it could be related to wrk/wrk2 misbehaving:

@mfelsche
Copy link
Contributor Author

mfelsche commented Jul 2, 2017

aaaaaaaaaand .....

it really is a wrk2 problem. Running it from giltene/wrk2#33 fixes the memory and socket read error issue.

I actually don't know if this issue is worth keeping and safeguarding against malicious clients making the httpserver eating up memory. So i keep it open and leave the decision to you.

@jemc
Copy link
Member

jemc commented Jul 2, 2017

Great work with your investigation, it's much appreciated!

I'm going to change the ticket title and add a "needs discussion" tag to this to discuss whether it is an important security issue to keep open (for TCP in general, not just HTTP), and how it might be mitigated.

@jemc jemc changed the title HTTPServer: socket read errors on 'Connection: close' requests TCPListener may have memory exploit by clients that don't send FIN. Jul 2, 2017
@SeanTAllen SeanTAllen added needs investigation This needs to be looked into before its "ready for work" bug and removed bug: 1 - needs investigation labels May 12, 2020
@SeanTAllen SeanTAllen removed the bug label Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs investigation This needs to be looked into before its "ready for work"
Projects
None yet
Development

No branches or pull requests

3 participants