-
Notifications
You must be signed in to change notification settings - Fork 380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memory leak and hangs in WriteTo. #152
Conversation
} | ||
} | ||
|
||
if inFlight == 0 { | ||
if firstErr.err == nil && len(pendingWrites) > 0 { | ||
return copied, errors.New("internal inconsistency.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error strings should not end with punctuation
Maybe it makes sense to drop 1.4.3 from the CI? |
I'd prefer it if the 2 issues were split out, 1 PR for the leak and 1 PR for the hang. It'd also be great if there were a basic test for each, as I'd really like to reproduce the problem and then see the fix in action. For the leak, I assume the map you are referring to is pendingWrites. From my reading it would get freed/gc'd after the function returned, but it would keep the all incoming data in memory until that point. So not so much a leak as just inefficient memory use. Or am I missing something? |
Also, I've removed 1.4.3 from travis testing. |
Greetings. So, in reverse order. As for testing - yes it would be nice to have tests, unfortunately as is the code isn't very testable, and I don't want to refactor incorrect code into another incorrect code. |
I agree that saving all the data in memory even after being sent to the writer is bad behavior and should be addressed. In other words, putting semantics of "leaks" aside, I agree. Would you consider reworking the PR to better split up the changes for the 2 fixes. Either as 2 PRs or as 2 commits in one? I am a strong believer in 1 commit/changeset per issue. Thanks. |
When reassembling out-of-order responses, give Go a chance to free blocks we already wrote.
When writer returns an error, the code would mess up inFlight counter and the loop will never finish as the result. We switch to only use inFlight to count requests to sftp server - we delete from pendingWrites now so we can use len(pendingWrites) to count.
2 commits now. |
Thank you for splitting up the commits. I'll look at getting it merged soon. I'm first trying to see what I can do about all the races in the integration tests that cause the intermittent failures of the automated/travis testing. Your's fails every time on travis, but usually passes when I run the tests locally. I can get it to race occasionally locally and am trying to track those down. |
Any luck merging this? |
I will get to it soon. I just pushed a fix to a race and found that the upstream ssh lib package made a change that breaks one of our tests. I've located the commit that breaks things and am currently working to figure out where the bug is. Once I have the tests running reliably I'll work on getting your PR merged first. |
Filed a ticket on the problem (#155). Tracked it down to an upstream bug introduced a couple weeks ago in the x/crypto/ssh lib. There is already a change out for review that fixes it. Our travis tests will continue to fail until it gets reviewed and applied. I have it applied locally and all tests are passing here. |
@theraphim have you email address? I have some technical problem about sftp client response to discuss . My email address: [email protected]. Thanks! |
FYI, I've reviewed this and am happy with it. I'll merge it as soon as the bug breaking the testing is resolved as it looks like they intend to fix that one very soon. |
Just added a hack/workaround to the travis config so it would pass (pulls in the fix). So I'll be merging this... |
Leak is when requests are arriving out of order - they are never cleaned from the map. Hang is when Writer returns an error - because of the way inFlight is used, the function is stuck forever on receiving from channel.
This may fix #150. I don't have tests sadly, to properly test this and other corner cases I'll need to do a larger refactoring. I tested this in production instead.