Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to clear partial state when debug connection restarted #262

Closed
fatcerberus opened this issue Aug 8, 2015 · 13 comments
Closed

Option to clear partial state when debug connection restarted #262

fatcerberus opened this issue Aug 8, 2015 · 13 comments

Comments

@fatcerberus
Copy link
Contributor

Duktape currently parses debug commands piece by piece, often executing the command even before EOM is reached. As a result, it's possible to end up with the engine in a state where it's in the middle of reading a command, but the transport link is severed.

In minisphere I have a system where if a TCP reset is detected during debugging, it will wait 30 seconds for the client to reconnect. However, if Duktape hasn't received the full command by this time, the client won't know which bytes are missing to resend them. Attempting to send a new command while Duktape is in this state will likely result in issues similar to that described in #244 (or in the worst case, a segfault).

What I suggest to solve this issue is some method to reset the command-reading state such that the next byte received by Duktape is assumed to start a new command.

@svaarala
Copy link
Owner

svaarala commented Aug 8, 2015

Just to be sure, in your case the logical debugger transport is not severed, but the concrete transport (TCP) is? In other words, you'd want the debugger transport to support transparent TCP reconnect, so that the debug session won't become detached if a TCP connection breaks.

There's no easy way to reset the command-reading state - there is really no such state, as Duktape is just pulling in data (and writing data) as it needs using read calls. Resetting the reading state would be equivalent to longjmp'ing out of the current debug command with all the necessary cleanup (if any), and all commands would need to support it.

But you should still be able to solve this by writing an appropriate transport framing yourself. For example, you could send data in numbered chunks and buffer them on both sending sides so that a chunk can be resent if it wasn't received completely; this is essentially a pretty standard queue protocol rather than a plain stream. Debug read/write callbacks would then be oblivious to the reconnection process. This is one reason why the transport is abstract so that you can implement something like this if you want.

@svaarala
Copy link
Owner

svaarala commented Aug 8, 2015

Btw, coming back to the original problem (recovering from a dropped TCP connection), I don't think it'll be possible to resynchronize the debug connection using plain TCP alone even if Duktape changes were made. Some issues one would have:

  • The current command being executed would need aborting like you say. Similarly the remote peer must be able to abort parsing a current reply/notify or other messages at any point, including in the middle of a dvalue.
  • Aborting a command means the reply for that command won't be sent - or it may mean that a partial reply has been written out by Duktape (and it may be in the middle of a dvalue even). Note that replies are just streamed out so it's very normal for them to be written out in pieces.
  • Any number of requests may have been lost; any number of replies may have been lost by TCP.
  • Note that whatever Duktape has sent and what TCP has delivered are different things; even if Duktape had send out complete reply messages, the remote peer might not have received them or might have received them only partially. Same thing in the other direction.

As a result, the request/reply matching on the remote client has no chance of matching requests and replies after a recovery. You might be able to do this in special cases though, and apply a lot of hacks to make it somewhat workable :-)

However, I'd just much rather solve the problem reliably, by using a transport that doesn't trust individual TCP connections and sends data in a queued and acknowledged fashion. That will solve all cases and will be fully transparent.

@fatcerberus
Copy link
Contributor Author

I had assumed the read callbacks were done statefully, where Duktape maintains a state that says what the next byte(s) should be, for example "start of command", "argument", "skipping to EOM", etc. so that a reset would be as simple as resetting the state to "start of command". Basically an attempt to resynchronize things and return to a known state.

In any case, the main thing I'm concerned with is the case where Duktape hasn't received a complete message by the time of disconnection (and therefore never will). My client is set up to send out full messages wholesale, and if it detects a disconnect, all its own state is reset, so reply matching, etc. on the client side isn't an issue. The client is in a known state. My issue is that the client can't trust that Duktape is in a consistent state at this point to be able to receive new messages.

My workaround at present is, after a disconnect/reconnect cycle, to detach the debugger on the target side and immediately reattach:

duk_debugger_detach(g_duk);
duk_debugger_attach(g_duk,
    duk_cb_debug_read,
    duk_cb_debug_write,
    duk_cb_debug_peek,
    NULL,
    NULL,
    duk_cb_debug_detach,
    NULL);

But I don't know what other negative side effects this may have, if any.

@svaarala
Copy link
Owner

svaarala commented Aug 8, 2015

I'm sure you understand why I'd hesitate to add a workaround for one debug client if it didn't address a more general set of issues. It's also important to avoid creating a situation where a partial solution creates the need for more workarounds (which is very uncomfortable because it will never work for everyone). That's why I'd suggest solving this in the transport level because it's quite difficult to solve elsewhere without something always breaking.

Regarding the workaround, detaching and reattaching is probably a pretty good solution. At the moment breakpoint state is not lost in this process (IIRC), but that might need to be changed so that each debug session began with a clean slate. If that change was made, you'd need to re-add all breakpoints before resuming.

@fatcerberus
Copy link
Contributor Author

I'm not sure that in the general case it can be solved at the transport level--the transport callbacks are, by design, stateless, and merely serve to stream the bytes through to Duktape. Only Duktape itself knows what those bytes represent.

@svaarala
Copy link
Owner

svaarala commented Aug 8, 2015

Duktape expects the transport to provide a bi-directional reliable byte stream. You can most certainly write a transport which provides this over multiple TCP connections. There's no need to know about message boundaries etc.

@svaarala
Copy link
Owner

svaarala commented Aug 8, 2015

The transport callbacks are certainly not stateless in the general case by the way. In the case of using TCP directly, the state is in the operating system sockets. For packet-based transports there's quite often the need for coalescing writes which needs a small write buffer for doing so. For compressing a transport there'd be need for compression state, and so on. Duktape doesn't assume the transport is stateless - it just expects the transport to provide a reliable stream fulfilling the callback semantics in whatever way.

@fatcerberus
Copy link
Contributor Author

Hm, that must be a little outside of my level of expertise because I can't imagine how that would work any better than a single TCP link. For example, if you're debugging remotely and either end goes offline momentarily, the multiple TCP connections don't help you any more than a single one would.

You mentioned in the past that many users want to implement message parsing on the target, and you couldn't understand why. I think this is a big part of the rationale--since any remote protocol is intrinsically unreliable (neither side can know that the other one hasn't lost information), providing a 100% guaranteed reliable byte stream is, in practice, very difficult. As a result it's easier to write a client if both sides can know that messages will either be received in their entirety or not at all. Duktape's raw protocol provides no such guarantee, so a proxy is needed.

@svaarala
Copy link
Owner

svaarala commented Aug 8, 2015

What TCP does is send segmented data over an unreliable stream of raw IP packets. When packets are lost, data gets resent, with the mechanism being based on a window over sent and acknowledged byte offsets.

In your case, if you don't trust TCP to be reliable enough, you'd do a similar construct over TCP. You buffer sent data on each end, and re-send it if a TCP connection is lost from the offset indicated by the peer. For example, once you reconnect, your peer can tell you "I've received 123456 bytes from you"; you'd then resend any bytes after that to ensure reliability. You probably also need acknowledgements so you can forget data (= you know it will never need to be retransmitted), etc.

But considering TCP is very persistent in doing its job, in what scenarios is TCP not reliable enough in practice? Why do you get connection drops?

@svaarala
Copy link
Owner

svaarala commented Aug 8, 2015

Regarding the stream assumption made by Duktape: I think you'll find many other debug protocols take the same approach; if they're IP based they're very often based on TCP and will not gracefully recover from a lost connection.

The problem with adding a generic recovery mechanism to Duktape would be that it'd be addressing a pretty difficult problem over an unknown transport ranging from serial to Ethernet. It would complicate the implementation a great deal and it would also prevent a simple TCP client from being used - there'd need to be a recovery protocol etc for all clients, not just those who need reliability better than TCP (or whatever naive transport is avaiable, like serial).

Also, one major reason for the current design is to make it possible to run a debugger session with very little RAM overhead. I think it's possible to do this now with ~1 kB constant RAM overhead with virtually no memory churn. Buffering full messages before sending would be very awkward in these environments because there's no upper limit for message size - consider e.g. an Eval whose result is a string of arbitrary size.

@fatcerberus
Copy link
Contributor Author

It's weird, but I've actually seen TCP connections dropped over loopback. Admittedly it's a rare occurrence, but I'd still like to handle it gracefully

To clarify, I don't actually expect Duktape to be able to recover by itself--that's indeed outside the scope of its responsibility (it should assume its input is reliable). All I'm asking for is an API call which does the equivalent of my detach/attach cycle above, so that when my transport callbacks detect TCP reset they can call that to get the debugger back into a consistent state. If the detach/attach already accomplishes this adequately, however, then feel free to disregard. :)

@svaarala
Copy link
Owner

svaarala commented Aug 8, 2015

Ensuring a detach/attach works well is much more feasible than trying to work in additional recovery mechanisms :)

Forced detach handling (caused by format errors and such) may be a small issue. Right now a forced detach will cause Duktape to stop debugging and automatically resume execution. This is obviously not what you want if you want to re-attach and resume debugging.

I'm not sure if this a concrete problem though: quite possibly the detached callback can just wait for the debug client to reconnect, and then call duk_debugger_attach() from inside the detached callback. If not, this is probably fixable, so that there is a clear place where to (a) wait for the reconnect, and (b) reattach so that execution doesn't autoresume. I'll need to check the current behavior.

Duktape could also be changed to handle user requested detach and format/transport error differently (not sure if that's needed or a good idea). For example, if user code requests detach it's reasonable to resume automatically. If the detach is caused by a format/transport error, perhaps user code could decide what to do next. I'd like to avoid introducing more hooks and callbacks if possible though, so it'd be nice if the detached callback would be enough to handle this. Adding an argument to the detach callback indicating the detach cause would be relatively straightforward if it helps.

@fatcerberus
Copy link
Contributor Author

The use cases for this were covered in 1.4.0 so I think the issue can be closed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants