Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Unhandled error in connectproxyclient.py (503 status response from CONNECT proxy) #15883

Closed
ghost opened this issue Jul 5, 2023 · 11 comments
Closed
Labels
A-Logging Synapse's logs (structured or otherwise). Not metrics. O-Occasional Affects or can be seen by some users regularly or most users rarely S-Minor Blocks non-critical functionality, workarounds exist. T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements.

Comments

@ghost
Copy link

ghost commented Jul 5, 2023

Description

Hundreds of critical log entries claiming an error wasn't handled in either:

  • synapse/http/connectproxyclient.py
  • twisted/web/http.py
  • twisted/protocols/basic.py
  • synapse/http/connectproxyclient.py
  • twisted/internet/tcp.py

My instance is being blocked by some other ones, or they are simply unreachable/out of order. This doesn't cause any errors when I'm not using proxy, I suppose the error should be handled in a lower layer and not cause a critical log entry with all the call stack. I also observe performance degradation because of these errors.

Steps to reproduce

  • run the instance with config as described
  • let it federate to others a lot
  • just observe the logs

Homeserver

my own homeserver

Synapse Version

{"server":{"name":"Synapse","version":"1.87.0"}}

Installation Method

Debian packages from packages.matrix.org

Database

Single PostgreSQL, ported with portdb, never restored from a backup

Workers

Single process

Platform

Linux 6.1.0-9-amd64 Debian 6.1.27-1 x86_64 GNU/Linux

Configuration

Configured to support Tor connections by adding this to the systemd service definition:

Environment="http_proxy=http://127.0.0.1:8118"
Environment="https_proxy=http://127.0.0.1:8118"
Environment="matrix_proxy=http://127.0.0.1:8118"

Running privoxy on port 8118 with the config including:

listen-address  127.0.0.1:8118
forward-socks5t .onion  127.0.0.1:9050 .

Running tor on port 9050 with the config including:

SOCKSPort 127.0.0.1:9050
DNSPort 127.0.0.2:53

AutomapHostsOnResolve 1

Using Tor for DNS resolution, in /etc/resolv.conf:

nameserver 127.0.0.2

And the homeserver.yaml includes:

use_insecure_ssl_client_just_for_testing_do_not_use: true # doesn't cause the issue
federation_verify_certificates: true
suppress_key_server_warning: true
federation_certificate_verification_whitelist:
  - "*.onion"

Relevant log output

2023-07-05 06:28:55,908 - twisted - 275 - CRITICAL - sentinel - Unhandled Error
Traceback (most recent call last):
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/python/log.py", line 96, in callWithLogger
    return callWithContext({"system": lp}, func, *args, **kw)
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/python/log.py", line 80, in callWithContext
    return context.call({ILogContext: newCtx}, func, *args, **kw)
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/python/context.py", line 117, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/python/context.py", line 82, in callWithContext
    return func(*args, **kw)
--- <exception caught here> ---
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/internet/posixbase.py", line 487, in _doReadOrWrite
    why = selectable.doRead()
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/internet/tcp.py", line 248, in doRead
    return self._dataReceived(data)
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/internet/tcp.py", line 253, in _dataReceived
    rval = self.protocol.dataReceived(data)
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/internet/endpoints.py", line 149, in dataReceived
    return self._wrappedProtocol.dataReceived(data)
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/http/connectproxyclient.py", line 230, in dataReceived
    return self.http_setup_client.dataReceived(data)
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/protocols/basic.py", line 549, in dataReceived
    why = self.lineReceived(line)
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/web/http.py", line 664, in lineReceived
    self.handleStatus(version, status, message)
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/http/connectproxyclient.py", line 270, in handleStatus
    raise ProxyConnectError(f"Unexpected status on CONNECT: {status!s}")
synapse.http.connectproxyclient.ProxyConnectError: An error occurred while connecting: Unexpected status on CONNECT: b'503'.

Anything else that would be useful to know?

Using caddy as reverse proxy, no delegation.

Further config details and rationale in: #5152 (this is not a tor2tor instance but a regular one plus proxy).

Tried to investigate and patch it myself but twisted is not too straight.

@MadLittleMods MadLittleMods added the A-Logging Synapse's logs (structured or otherwise). Not metrics. label Jul 13, 2023
@MadLittleMods
Copy link
Contributor

To explain plainly what's going on, this just means your Tor proxy is returning 503 responses (a problem with your proxy).

I suppose the error should be handled in a lower layer and not cause a critical log entry with all the call stack. I also observe performance degradation because of these errors.

Not being able to CONNECT to your proxy feels like a critical thing to log about. I'm not quite sure what your expecting beyond just ignoring the errors. Can you expand on your expectations?

@reivilibre reivilibre changed the title Unhandled error in connectproxyclient.py Unhandled error in connectproxyclient.py (503 status response from CONNECT proxy) Jul 18, 2023
@reivilibre
Copy link
Contributor

Agreed with MadLittleMods here; this is an error coming from your proxy I'm afraid (503 = Service Unavailable).

If you search around for '503 tor privoxy' you can see a few people having similar problems with other applications. I'm afraid I can't really help you out with this; but as a basic sanity check you might want to try and see whether curl is capable of using your proxy to perform the same request and then taking the debugging from there.

@ghost
Copy link
Author

ghost commented Jul 25, 2023

It shouldn't matter if the error is 503 from privoxy or 404 from http server, Synapse just shouldn't just raise CRITICAL Unhandled Error in any case as it causes massive performance degradation and perhaps security issues as well. For the very least it could have a placeholder that knowingly ignores these errors on .onion homeservers (see #5152) and in the future it could be handled better as a part of TOR support.

@MatMaul MatMaul added S-Minor Blocks non-critical functionality, workarounds exist. T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements. O-Occasional Affects or can be seen by some users regularly or most users rarely labels Jul 27, 2023
@reivilibre
Copy link
Contributor

If your forward proxy is not working, this is as bad as e.g. the database not working and I don't see why Synapse should not raise critical errors for that — indeed, this is something that, on a heavy production service, you'd want to be woken up for in order to fix.
It seems like it would be worse not to log any errors as this would mask the underlying problem, if you see what I mean?

I don't currently see how this would lead to security issues — do you have any more information on that?

@ghost
Copy link
Author

ghost commented Aug 25, 2023

forward proxy is not working

Who said it's not working? I said explicitly:

My instance is being blocked by some other ones, or they are simply unreachable/out of order. This doesn't cause any errors when I'm not using proxy

It's perfectly normal situation that some network connections fail. This shouldn't cause a critical error inside the library handling the connection, such connections should be retried or closed gracefully.

Unhandled errors are a well-identified anti-pattern of system design. I couldn't tell you what specific attack vector (if any) is possible due to this specific mistake, you should probably ask for a code review if you wish to know more. More on the topic: https://owasp.org/www-community/Improper_Error_Handling

@ghost
Copy link
Author

ghost commented Aug 25, 2023

A related issue, but I'm not creating a ticket for this today, is that when synapse receives an inbound connection from .onion instance, it tries to perform a TLS handshake with it and it fails, if no .onion resolving DNS is configured (which is true for most instances and could be considered a default situation). This causes the inbound connection to be failed with error 503 and could be perhaps improved by responding with 501 or even 400, just for clarity. Apparently 503 causes people to think there is something wrong with the proxy and ignore the fact the same proxy connects fine with other instances (the ones that can connect to .onion hs). Synapse should probably have a default configuration setting for ignoring .onion addresses at any situation, and only if the hs admin intentionally toggles the setting (implying they have enabled a proxy setting, or put the instance behind a router that can route .onion connections), it should attempt to connect with those.

@reivilibre
Copy link
Contributor

Who said it's not working?

Your logs are showing that the proxy is giving a 503 error, which is not how a proxy should operate normally. This is why I say it's not working.

@DMRobertson
Copy link
Contributor

To summarise, I think there are three points in this report.

  1. If a CONNECT attempt fails, Synapse raises errors which are unhandled.
  2. Unhandled errors are bad.
  3. If a CONNECT attempt fails, Synapse gives up immediately rather than retrying gracefully.

We agree on 1.

We are not particularly convinced by 2. I doubt there is a performance cost---this is Python, and exceptions are being raised internally all the time. I also don't see how there is a security problem. Your OWASP link recommends not failing open and not presenting error messages to users; AFAICS Synapse does neither of these things in this situation.

On 3: fair enough. There may be some better way to drive twisted here. It looks like

def clientConnectionFailed(self, connector: IConnector, reason: Failure) -> None:
logger.debug("Connection to proxy failed: %s", reason)
if not self.on_connection.called:
self.on_connection.errback(reason)
return self.wrapped_factory.clientConnectionFailed(connector, reason)

would be the natural place to attempt a reCONNECT. (Do you see the debug message in your logs?)

Unfortunately the core team is unlikely to find the time to improve this. We would gladly review a PR which reproduces the problem in a test case and finds a better way to handle the CONNECT failure.

@ghost
Copy link
Author

ghost commented Aug 30, 2023

We are not particularly convinced by 2. I doubt there is a performance cost---this is Python, and exceptions are being raised internally all the time. I also don't see how there is a security problem. Your OWASP link recommends not failing open and not presenting error messages to users; AFAICS Synapse does neither of these things in this situation.

# for f in homeserver.log.2023-08-2* ; do grep "Unexpected status on CONNECT: b'503'" $f | wc -l ; done
61930
34683
90579

A 2-kB error message (as shown above) is occurring on average 60k times on my log daily, yielding 120 MB of critical error logs daily, just because of connection errors. I'd say that just spamming the logfile is a plausible attack vector on availability of the instance - a malicious agent could create a network of instances that will attract such connections and cause enormous disk space usage. Also, it forces a reasonable homeserver admin like myself to either disable logging completely (this is a critical loglevel after all) or work with log files that are not at all informative as consist mostly of those unwanted log entries. Now this was just a security side, I don't have to tell you how silly it looks.

would be the natural place to attempt a reCONNECT. (Do you see the debug message in your logs?)

Debug logs follow (another occurrence of the issue though, synapse 1.91, python 3.11 and such, redacted with XXX):

2023-08-30 18:08:06,356 - synapse.http.matrixfederationclient - 656 - DEBUG - PerspectivesKeyFetcher-15 - {POST-O-9745} [matrix.org] Sending request: POST matrix-federation://matrix.org/_mat
rix/key/v2/query; timeout 60.000000s
2023-08-30 18:08:06,356 - synapse.util.metrics - 163 - DEBUG - PerspectivesKeyFetcher-15 - Entering block outbound_request
2023-08-30 18:08:06,747 - synapse.http.connectproxyclient - 286 - DEBUG - sentinel - Got Status: b'503' b'Connect failed' b'HTTP/1.0'
2023-08-30 18:08:06,747 - twisted - 275 - CRITICAL - sentinel - Unhandled Error
Traceback (most recent call last):
  File "/opt/venvs/matrix-synapse/lib/python3.11/site-packages/twisted/python/log.py", line 96, in callWithLogger
    return callWithContext({"system": lp}, func, *args, **kw)
  File "/opt/venvs/matrix-synapse/lib/python3.11/site-packages/twisted/python/log.py", line 80, in callWithContext
    return context.call({ILogContext: newCtx}, func, *args, **kw)
  File "/opt/venvs/matrix-synapse/lib/python3.11/site-packages/twisted/python/context.py", line 117, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/opt/venvs/matrix-synapse/lib/python3.11/site-packages/twisted/python/context.py", line 82, in callWithContext
    return func(*args, **kw)
--- <exception caught here> ---
  File "/opt/venvs/matrix-synapse/lib/python3.11/site-packages/twisted/internet/posixbase.py", line 487, in _doReadOrWrite
    why = selectable.doRead()
  File "/opt/venvs/matrix-synapse/lib/python3.11/site-packages/twisted/internet/tcp.py", line 248, in doRead
    return self._dataReceived(data)
  File "/opt/venvs/matrix-synapse/lib/python3.11/site-packages/twisted/internet/tcp.py", line 253, in _dataReceived
    rval = self.protocol.dataReceived(data)
  File "/opt/venvs/matrix-synapse/lib/python3.11/site-packages/twisted/internet/endpoints.py", line 149, in dataReceived
    return self._wrappedProtocol.dataReceived(data)
  File "/opt/venvs/matrix-synapse/lib/python3.11/site-packages/synapse/http/connectproxyclient.py", line 248, in dataReceived
    return self.http_setup_client.dataReceived(data)
  File "/opt/venvs/matrix-synapse/lib/python3.11/site-packages/twisted/protocols/basic.py", line 549, in dataReceived
    why = self.lineReceived(line)
  File "/opt/venvs/matrix-synapse/lib/python3.11/site-packages/twisted/web/http.py", line 664, in lineReceived
    self.handleStatus(version, status, message)
  File "/opt/venvs/matrix-synapse/lib/python3.11/site-packages/synapse/http/connectproxyclient.py", line 288, in handleStatus
    raise ProxyConnectError(f"Unexpected status on CONNECT: {status!s}")
synapse.http.connectproxyclient.ProxyConnectError: An error occurred while connecting: Unexpected status on CONNECT: b'503'.
2023-08-30 18:08:11,020 - synapse.access.http.8008 - 420 - DEBUG - PUT-420 - XXX:XXX:XXX:XXX::XXX - 8008 - Received request: PUT /_matrix/federation/v1/send/1693390193400
2023-08-30 18:08:11,021 - synapse.crypto.keyring - 372 - DEBUG - keyring_server-189 - Starting fetch for [_FetchKeyRequest(server_name='matrix.org', minimum_valid_until_ts=1693433291021, key_ids=['ed25519:a_RXGa'])]
2023-08-30 18:08:11,021 - synapse.crypto.keyring - 390 - DEBUG - keyring_server-189 - Deduplicated key requests to [_FetchKeyRequest(server_name='matrix.org', minimum_valid_until_ts=1693433291021, key_ids=['ed25519:a_RXGa'])]
2023-08-30 18:08:11,021 - synapse.crypto.keyring - 433 - DEBUG - keyring_server-189 - Starting fetch for _FetchKeyRequest(server_name='matrix.org', minimum_valid_until_ts=1693433291021, key_ids=['ed25519:a_RXGa'])
2023-08-30 18:08:11,021 - synapse.crypto.keyring - 442 - DEBUG - keyring_server-189 - Getting keys from <synapse.crypto.keyring.StoreKeyFetcher object at 0x7f459a2ce750> for _FetchKeyRequest(server_name='matrix.org', minimum_valid_until_ts=1693433291021, key_ids=['ed25519:a_RXGa'])
2023-08-30 18:08:11,021 - synapse.federation.transport.server._base - 131 - DEBUG - PUT-420 - Request from matrix.org
2023-08-30 18:08:11,021 - synapse.federation.transport.server.federation - 103 - DEBUG - PUT-420 - Decoded 1693390193400: {'edus': [{'content': {'room_id': '!XXX:XXX.XXX', 'typing': False, 'user_id': '@XXX:matrix.org'}, 'edu_type': 'm.typing'}], 'origin': 'matrix.org', 'origin_server_ts': 1693433290951, 'pdus': []}
2023-08-30 18:08:11,021 - synapse.federation.transport.server.federation - 105 - INFO - PUT-420 - Received txn 1693390193400 from matrix.org. (PDUs: 0, EDUs: 1)
2023-08-30 18:08:11,021 - synapse.federation.federation_server - 307 - DEBUG - PUT-420 - [1693390193400] Got transaction
2023-08-30 18:08:11,021 - synapse.util.caches.response_cache - 242 - DEBUG - PUT-420 - [fed_txn_handler]: no cached result for [('matrix.org', '1693390193400')], calculating new one
2023-08-30 18:08:11,022 - synapse.storage.txn - 738 - DEBUG - PUT-420 - [TXN START] {get_received_txn_response-1209}
2023-08-30 18:08:11,023 - synapse.storage.txn - 842 - DEBUG - PUT-420 - [TXN END] {get_received_txn_response-1209} 0.000637 sec
2023-08-30 18:08:11,023 - synapse.federation.federation_server - 384 - DEBUG - PUT-420 - [1693390193400] Transaction is new

Now, I'm not really an expert on synapse federation protocols, but from what I understand, it does some level on retrying of its own, that's why I'm getting these messages daily.

Since I enabled debug for a moment, I'm observing a lot of:

2023-08-30 18:24:27,183 - synapse.http.matrixfederationclient - 758 - INFO - federation_transaction_transmission_loop-465 - {PUT-O-44100} [XXX] Request failed: PUT matrix-federation://XXX/_matrix/federation/v1/send/1693433016556: ProxyConnectError('')

Perhaps this one should observe the connection error and stop trying to reach the instance that can't understand connections incoming from Tor. I suppose this also causes a lot of messages in the instances that receive such connections.

The only PR, if any I can provide at this point, would be to catch and ignore this level intentionally, as an INFO message.

@reivilibre
Copy link
Contributor

A 2-kB error message (as shown above) is occurring on average 60k times on my log daily, yielding 120 MB of critical error logs daily, just because of connection errors. I'd say that just spamming the logfile is a plausible attack vector on availability of the instance - a malicious agent could create a network of instances that will attract such connections and cause enormous disk space usage. Also, it forces a reasonable homeserver admin like myself to either disable logging completely (this is a critical loglevel after all) or work with log files that are not at all informative as consist mostly of those unwanted log entries. Now this was just a security side, I don't have to tell you how silly it looks.

Connection errors to your proxy are a critical fault though; if something critical like that happens you'd want to know about it as in theory it might be indicating that federation is not working at all. Of course it's not ideal to have logs fill your disk but in this instance it seems like it should be justifiable, because something is actually wrong.
In production you would ideally want to be using logrotate or journalctl or something else to ensure log files are rotated out and aren't filling your disk anyway.
(In your case the proxy server is returning 503 when it probably should be returning 502 or 504 instead — we should ideally handle those at least and treat them as though they were direct connection failures. If proxy servers returning 503 is common in this situation despite seeming incorrect, we might want to support handling 503 as a type of quirk.)

@reivilibre
Copy link
Contributor

I've tried to summarise what I think we should do about this in #16462 to save future people from having to make sense of this thread. That issue supersedes this one.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Logging Synapse's logs (structured or otherwise). Not metrics. O-Occasional Affects or can be seen by some users regularly or most users rarely S-Minor Blocks non-critical functionality, workarounds exist. T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements.
Projects
None yet
Development

No branches or pull requests

4 participants