-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate git daemon
problems
#304
Comments
Not easy to report a MCVE, because after a few times of retry, it succeeds. |
@wq9 the trick, now, is to put in some wait cycles at the appropriate point to trigger this always (i.e. also here)... ;-) |
I also get this problem when I try fetching from remote machine. I cannot reproduce it using Remote machine: Windows 7 64bit, Git 2.7.0 64bit Command on remote machine:
Command on local machine:
I think you can try any repo on remote machine and |
Still a problem in 2.7 |
I prepared the following MCVE: My setup is
The problem is reproducible when cloning from a local machine. To
Then, try cloning this repository:
Most of the time (but not always), the above
The corresponding git daemon output is
Sometimes,
I also observe the following surprising behavior. When I select any text The problem is also reproducible in 32-bit version 2.11.0.windows.3. |
It's 100% reproducible on my 6 MB repository with 183 history entries. Can't you remove the mcve-required label having @guban script? |
If there is a reason for keeping this issue as "mcve-required", it would be nice to let me know what needs to be improved in my MCVE that was provided back in January. |
I did not yet have time to come back to this issue. So I am reluctant to remove that tag, especially given that no contributor who would be able to fix the underlying problem gave the MCVE a whirl. Sorry, busy times! |
We have a team of ~ 10 people who run into this exact issue daily on windows Server 2016 on multiple Git versions through Git-2.16.1.3-64-bit I support removing the mcve-required tag. |
@edropps are you saying that you are willing to contribute time and effort to see this problem fixed? If so, do the steps outlined in #304 (comment) reproduce the problem on your machine? Do you have a Git for Windows SDK set up already to investigate further? |
A dirty fix to the problem is to add
Without this fix, I reproduce the problem about 75% of the time |
I think toshiyuki-ogawa determined the root cause correctly: To eliminate the race condition, we need to wait in
where Then, to introduce a timeout, I replaced
This works just like I also tried an alternative solution that involves using the To summarize, I think that polling with timeout in |
@guban excellent work! Thank you very much.
I agree. Would you mind opening a Pull Request? Most likely, this comment can be turned into a stellar commit message with minimal effort.
That is a good workaround. I think we can do even better, though: there is already an That way, all users of In short: it would be awesome if you could prepare a Pull Request with one patch that teaches |
…nnects This fixes the issue identified in git-for-windows#304 On Windows, closing a socket destroys its OS-allocated buffer even when the latter still contains the last chunk of the data to be sent. Hence, closing the socket immediately after sending the pack results in a race condition between sending that last chunk and destroying the OS socket object. When the OS socket object is destroyed first, git-daemon fails to actually send the last chunk of the pack to the client, and the latter fails with the `early EOF` / `index-pack failed`. That is, the above race condition results in intermittent failures to clone a repository hosted on a Windows machine by git-daemon. To eliminate the race condition, `upload-pack` now sends all data and then waits until the client closes connection from its side.
…nnects This fixes the issue identified in git-for-windows#304 On Windows, closing a socket destroys its OS-allocated buffer even when the latter still contains the last chunk of the data to be sent. Hence, closing the socket immediately after sending the pack results in a race condition between sending that last chunk and destroying the OS socket object. When the OS socket object is destroyed first, git-daemon fails to actually send the last chunk of the pack to the client, and the latter fails with the `early EOF` / `index-pack failed`. That is, the above race condition results in intermittent failures to clone a repository hosted on a Windows machine by git-daemon. To eliminate the race condition, `upload-pack` now sends all data and then waits until the client closes connection from its side. Signed-off-by: Oleg Gubanov <[email protected]>
@dscho you are welcome! |
@guban I propose a different solution. Can you try kgybels/git@b7fb4db? This way the logic stays isolated to git daemon, and upload-pack doesn't need to worry about its stdout being a network connection. |
@kgybels your solution works. I couldn't reproduce the bug after about 300 attempts. I also verified that I can still reproduce the bug after reverting your changes. I agree that your solution is better. Unlike mine, it does not need to explain why polling for read from stdout would make sense :) Accepting your proposal is the way to go. |
Excellent. Thanks, both! |
@kgybels what is the state of your patch? I am eager to get this into Git for Windows... |
@dscho I have some time this Thursday to continue were I left off. |
I submitted my changes to the git mailing list: I tested on Windows 10 and Ubuntu 17.10. |
On Windows, a connection is shutdown when the last open handle to it is closed. When that last open handle is stdout of our child process, an abortive shutdown is triggered when said process exits. Ensure a graceful shutdown of the client connection by keeping an open handle until we detect our child process has finished. This allows all the data to be sent to the client, instead of being discarded. Fixes git-for-windows#304 Signed-off-by: Kim Gybels <[email protected]>
Thank you. This stalled, though. You hinted at something you wanted to teach |
Any chance we could get the workaround applied to only Git for Windows and not upstream git in the interim? It would be really useful to have. |
Same situation here. Will it be possible to provide "alpha-pre-release" git-daemon binary version mentioned in [PATCH 2/2] daemon: graceful shutdown of client connection with xpoll() implementation? Last message is from April but at v2.19.0 bug still exists. Anyway, a hint to compile project would be helpful. Doc at INSTALL and Makefile refeer to *nix. I would want to try @guban code untill @kgybels is up to standards. |
Go to the Git-for-Windows home page, scroll down to Contribute, install the SDK (which will do the first full compile..) and the whole thing is right there for you to try. Search all the issues for SDK, and you'll see a number of repeats of the (fairly easy, step by step) instructions[1] of how to build and test a new version. (so simple even an engineer like me can do it ;-) See also the wiki/FAQs. HTH [1] https://github.com/git-for-windows/git/wiki/Building-Git |
OK, thanks for the fast response and sorry if question seemed "dumb". Just didn't look at web, only the repositories. 🤷♂️ |
On Windows, a connection is shutdown when the last open handle to it is closed. When that last open handle is stdout of our child process, an abortive shutdown is triggered when said process exits. Ensure a graceful shutdown of the client connection by keeping an open handle until we detect our child process has finished. This allows all the data to be sent to the client, instead of being discarded. Fixes git-for-windows#304 Signed-off-by: Kim Gybels <[email protected]>
Done and done, thanks! I rebased @kgybels patches onto |
Wow! fast replies! |
On Windows, a connection is shutdown when the last open handle to it is closed. When that last open handle is stdout of our child process, an abortive shutdown is triggered when said process exits. Ensure a graceful shutdown of the client connection by keeping an open handle until we detect our child process has finished. This allows all the data to be sent to the client, instead of being discarded. Fixes git-for-windows#304 Signed-off-by: Kim Gybels <[email protected]>
There is no issue that I know of with the patches for Windows builds. They, however, introduce a problem for linux builds, where connections would remain open, as explained on the git mailing list:
Maybe it is time for that reroll, anyone up for some pair programming? :) |
Ok, I misread it in mailing list. Thanks again for your work. |
@kgybels would you kindly give me a summary of the current state? |
No change since the patches on the git mailing list, to make the change correct for linux, the code needs to be changed to use ppoll, however, for windows this is problematic, since we don't have ppoll in compat. |
So how about introducing a |
Just to be clear, I'm not working on this, nor do I plan on doing so. |
Okay, so I have to admit openly, to myself as well as to everybody else, that I won't find the time to work on this and drive this forward. I'll leave it open as an "up for grabs", as @kgybels' patch provides a really good head start for anybody who is actually interested. (And by "actually interested" I don't mean "likes to talk about it" but "likes to do something about it".) |
I removed this from the milestone because, realistically, I won't have time to complete this, ever. Having said that, I will be glad to help everybody who is interested in driving this to the finish line. |
…b check This check verifies that the ODB matches what we supplied, but there are some subtleties around Windows path names that can cause inexact matches to be logically the same. Since this check is really intended only for debugging and development purposes, let's remove it for now as a quick workaround. This was originally part of git-for-windows#304, as I hit it more often in that effort. However, this was hit during the Scalar release process, so I'll need to generate new installers.
Let's just close this as stale. |
It had been reported in msysgit#70 (with a quite large patch that I did not manage to review due to its sheer size) that
git daemon
might close the socket before the client read everything, and as a consequence, the transaction fails.I still need to be able to reproduce; if I manage that, I will try to come up with a substantially smaller patch myself.
The text was updated successfully, but these errors were encountered: