-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[2.6 beta1 w/ dco] server side explicit-exit-notify not working #189
Comments
I cannot reproduce this. Server side: Server side:
Client side:
|
Not only this, also this(reneg 15 secs for debug):
Something is wrong with the control channel. |
I'm using dh and tls-crypt, try add those two? data-ciphers is AES-128-GCM |
|
My network is super fast/low latency w/ effectively 0 packet loss. And yes, this happened on default 3600. I even never used reneg-*/hand-window configs before. |
add dh and tls-crypt to your config to see if you can reproduce, add cannotreproduce is too early PS:
So if TLS didn't complete even in one sec I can safely assume it's failed(a test by SIGUSR1):
I'll moving on without reneg it doesn't seem to matter with AES-128. |
@Originalimoc I cannot reproduce this. You need to provide proper instructions to reproduce this as I cannot reproduce this. Please try to also reproduce that with a minimal configuration without all kind of extra/unusual options to figure out if that is related to one of the options that you seem to be using. Try removing tls-crypt and see if you can reproduce it and so on. |
full both side config: #191 (comment) |
Hi,
On Thu, Dec 08, 2022 at 06:42:03AM -0800, Originalimoc wrote:
add dh and tls-crypt to your config to see if you can reproduce.
You are missing the point.
"Adding arbitrary stuff to the config just to make it stop working"
is not helpful for diagnosing actual breakage - in this case,
`reneg-sec 15` is something that is very likely going to break things,
and we will not(!) fix this. Reasonable values for `reneg-sec` are
"a few minutes and up" - anything below 120 can be helpful for testing
*iff* you are trying to diagnose TLS renegotiation issues, and exactly
know what you are looking for.
A good sample config has "the absolute minimum lines needed to reproduce
a given problem". Then this can be fixed, and the next problem tackled.
gert
…--
"If was one thing all people took for granted, was conviction that if you
feed honest figures into a computer, honest figures come out. Never doubted
it myself till I met a computer with a sense of humor."
Robert A. Heinlein, The Moon is a Harsh Mistress
Gert Doering - Munich, Germany ***@***.***
|
It does happen on default. I already stated reneg 15 secs for debug from the very beginning |
Hi,
On Thu, Dec 08, 2022 at 06:58:46AM -0800, Originalimoc wrote:
It does happen on default. I already stated **reneg 15 secs for debug**
This adds no value, just makes it much harder to follow what you're
actually trying to demonstrate (= it is taking away valuable time that
we could instead use for actual fixing the issue).
Stick to one set of configs, then show a complete log from both sides
with `verb 3` or `verb 4` that demonstrates the problem.
gert
--
"If was one thing all people took for granted, was conviction that if you
feed honest figures into a computer, honest figures come out. Never doubted
it myself till I met a computer with a sense of humor."
Robert A. Heinlein, The Moon is a Harsh Mistress
Gert Doering - Munich, Germany ***@***.***
|
Hi,
On Thu, Dec 08, 2022 at 06:58:46AM -0800, Originalimoc wrote:
It does happen on default. I already stated **reneg 15 secs for debug**
This ticket, for example, is about "explicit-exit-notify not working".
What happens at renegotiation has nothing to do whatsoever with EEN,
so the whole subthread about reneg-sec, dh, tls-crypt is just
irrlevant.
gert
--
"If was one thing all people took for granted, was conviction that if you
feed honest figures into a computer, honest figures come out. Never doubted
it myself till I met a computer with a sense of humor."
Robert A. Heinlein, The Moon is a Harsh Mistress
Gert Doering - Munich, Germany ***@***.***
|
To me both look like something is wrong with the control channel, so I post them both here. And full config is given: #191 (comment) |
@Originalimoc I did not happen for me with default setting as I already posted in my comment. So currently this not reproducible. YOu are adding all kind of questionable non default settings like To repeat myself, unless you give a proper way to reproduce this issue, I consider this issue as non-reproducible. You have not even provided full logs from server and client yet. |
After disable server side dco, both problem(EEN/RENEG) goes away. And hand-window 5/reneg 15s work perfectly even it's "aggressive". Nah... |
Hi,
On Thu, Dec 08, 2022 at 07:06:57AM -0800, Originalimoc wrote:
To me both look like something is wrong with the control channel,
so I post them both here.
This ticket is not about "things wrong with the control channel", but
very explicit about "EEN not working with DCO".
There's a reason why each problem needs to go into an individual ticket -
it is much less time consuming to work on one problem at a time, and
not having to read through multiple unrelated different aspects intermixed
in one ticket. We do this in our spare time, and time is limited...
gert
…--
"If was one thing all people took for granted, was conviction that if you
feed honest figures into a computer, honest figures come out. Never doubted
it myself till I met a computer with a sense of humor."
Robert A. Heinlein, The Moon is a Harsh Mistress
Gert Doering - Munich, Germany ***@***.***
|
Hi,
On Thu, Dec 08, 2022 at 07:17:31AM -0800, Originalimoc wrote:
After disable server side dco, both problem(EEN/RENEG) goes away. And hand-window 5/reneg 15s work perfectly even it's "aggressive".
"It happens to work".
EEN with DCO changes signalling to use the control-channel for EEN, and
that does no longer work with your aggressive timings. As you have seen.
"Leave the timers alone" is generally good advice unless you know very
exactly what these values do, and why you want to change this.
gert
--
"If was one thing all people took for granted, was conviction that if you
feed honest figures into a computer, honest figures come out. Never doubted
it myself till I met a computer with a sense of humor."
Robert A. Heinlein, The Moon is a Harsh Mistress
Gert Doering - Munich, Germany ***@***.***
|
@Originalimoc let me tell you as someone that actually understands openvpn that these timing setting can cause problems that you do not even understand. But if you think you are more knowledgable of OpenVPN than I am, then please fix the issue yourself and psot a patch instead of questioning our knowlege of OpenVPN. |
In case you didn't read above. I already said: |
It's not my job either. I can move on without reneg necessary. It's all under a watchdog anyway I can just auto SIGUSR1 restart it all within 0.5 sec. I report, hoping for an improve, you patch if can, great, not really enter arguing mode, having a bad day huh @swb? |
@Originalimoc if you are not going to help us to figure out why it is not working on your side, I am just going to close the ticket. |
You're not even use the same config to say "it's not reproducible" |
@cron2 hand-window 5, reneg-sec, tran-window IS NOT THERE WHEN PROBLEM FIRST APPEAR. Can you read FGS? |
Whatever "FGS" is supposed to stand for, but yes, I can read. And I read that you are bringing up new and unrelated stuff all the time, which eats my time. |
For the original problem, it is caused by your aggressive reconnect timers. In DCO mode (EEN in control channel) the server waits for 3 seconds after sending the EEN so all clients have time to ACK the control-channel message. Your client connects so fast that it hits the server "going down", and then the server goes down "for real" and will not send another EEN. If you look properly into your client log files, you see that it receives the RESTART, reconnects, and then the server disappears (without another RESTART). We're going to fix this server-side by not allowing reconnect in this 2-3 second time frame. |
ForGodSake. original problem? Let's turn off reneg(0) for now. So you mean connect-retry? |
We better start clean, a new issue post with all info packed into one. |
EEN with DCO needs to be sent over the control channel, and control-channel packets always need an ACK. This is the way the OpenVPN protocol works (we could make the wait time shorter if all ACKs are in, but this would be a much larger code change, which nobody had time and interest to implement). Old-style EEN are sent as part of the data channel (OCC messages), which does not work with DCO. |
Do not randomly open new issues. One issue per problem, relevant content in there, non-relevant content left out. If you feel part of your contributions have not been relevant, delete them. |
DONE, NOT A BUG? Can the wait be removed between Better man page needed, warn regarding |
@Originalimoc either change your attitude or go away. Telling us something like "for god's sake" and questioning our ability toi read is something I perceive as an insult. I do not like working with people that insult me. |
I don't like talking to you from beginning either thanks. Getting angry afterwards. c2 is clearly more actually knowledgeable on the subject. |
Currently we still allow clients to connect while the server is waiting to shut down. This window is very small (2s) and is only used when explicit-exit-notify is enabled on the server side. The chance of a client connecting during this time period is very low unless someone puts something stupid like --connect-retry 1 3 into his/her client config and forces the client to reconnect during this time period. Github: #189 Signed-off-by: Arne Schwabe <[email protected]> Acked-by: Gert Doering <[email protected]> Message-Id: <[email protected]> URL: https://www.mail-archive.com/[email protected]/msg25638.html Signed-off-by: Gert Doering <[email protected]> (cherry picked from commit 7d0a903)
Currently we still allow clients to connect while the server is waiting to shut down. This window is very small (2s) and is only used when explicit-exit-notify is enabled on the server side. The chance of a client connecting during this time period is very low unless someone puts something stupid like --connect-retry 1 3 into his/her client config and forces the client to reconnect during this time period. Github: #189 Signed-off-by: Arne Schwabe <[email protected]> Acked-by: Gert Doering <[email protected]> Message-Id: <[email protected]> URL: https://www.mail-archive.com/[email protected]/msg25638.html Signed-off-by: Gert Doering <[email protected]>
This patch (from @schwabe, as the subject matter expert on this) will sidestep the problem somewhat - when "in the process of shutting down", the server will no longer accept new connections from clients. So the race condition between "the client reconnects very quickly, and the server is not yet restarted" will no longer lead to clients thinking they have a valid connection, which isn't true anymore (= client has to wait for ping timeout). Yes, we should probably rework the server side code, to take less time to actually shut down - but the openvpn event loop is complex, and not very good in doing things "really quick now!". So this will not happen in the 2.6 beta cycle, but we might revisit this in the "refactor big parts in master, after a release has been done" phase |
And I am not sure that the extra logic/code for doing a shutdown really quick is worth adding. Aside from this ticket, I never heard that the shutdown time of a server that waits 5s is causing problems. So adding 30+ lines of codes to fix such an obscure problem is not worth the complexity it adds. |
"Something stupid like --connect-retry 1 3" LOL. |
It can/need to be a resource concern but not an architectural one. Unless you're EXPECTING server/client not working anymore of a session after an unexpected incoming connection, this is a serious bug. |
This is not "arbitrary". It's called exponential backoff, and is standard practice for well-behaving software. If you have your first system go down because a software retried in milliseconds and logged every attempt, filling all disk space in the process, you'll understand. |
Not the point. |
No, we did not expect this. Control-channel EEN is new, and you triggered an unexpected behaviour by using more aggressive timers than our test framework does. So @schwabe fixed it. Was this an annoying side effect? I'm sure it was. Was it "a serious bug"? Certainly not. No side crashed, no security impact. |
The starting point is 5s, first retry 5s, then 10, 20,... - and if a remote server is down hard, retrying every 60 seconds is a very reasonable compromise between "reconnecting quickly" and "needlessly burning resources" |
The EEN part essentially crashed because on next session peer receive nothing. |
And besides the defaults are chosen to be a compromise that works for as many people as possible. Your setup might be able to tolerate a 3s retry from clients but that is not the case for everyone. I implmeented this exponentional backup since mobile phones that did loose server connection for an hour or two would quickly burn their battery with the default of 5s. |
I don't think you understand what crashed means. "crashed" means the program is not able to recover. And your "big problem" is only caused because you tweak all kind of obscure settings in even more obscure ways. So you walking besides the established and well-tested paths. So that your setup has more problem than the ones that run with well-tested defaults is not surprising at all. |
Hand window is 60. |
Nah the state maintenance still sucks. I now configured to bypassed these. Not mean there's no bug though, way too much legacy debt. |
And what, exactly, does this have to do with the issue here?
I will now block you. The way you refuse to listen, and keep being offensive ("sucks") is really a waste of our time. |
You are just trolling at this point. I don't think there is any use in continuing the coversation with you. |
Huh blocking, that's a good move. I'm here spending time to praise you or what? Coding from ground up is better than deal with all these legacy bad decision without proper state management mess and that's is exactly what I'll do next. You are the one that refused to listen and insisting "this is good and well tested", live in your old dream. What sucks, remains sucks. |
If you pay for software and support, feel free to be as unfriendly as you want. If you get some piece of software for free, and get support for free as well, we expect you to spend some of your time on "be friendly and provide the information that is asked from you" in return. You prefer to insult us, which is okay, but we are free to just not listen to you. So, yes, block. |
Describe the bug
server client both 2.6 beta1 w/ dco
server:
udp
explicit-exit-notify 1
client:
udp
explicit-exit-notify 2
To Reproduce
Establish a TLS config connection first
then send server a SIGUSR1/SIGHUP/SIGTERM
server will log(this one is SIGTERM):
but client receives nothing/log nothing, need a manual SIGUSR1 on client to reestablish connection.
Expected behavior
Client receives RESTART then generates an internal SIGUSR1. This makes client will reconnect after server reboot.
Version information (please complete the following information):
The text was updated successfully, but these errors were encountered: