Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak and hangs in WriteTo. #152

Merged
merged 2 commits into from
Feb 5, 2017
Merged

Fix memory leak and hangs in WriteTo. #152

merged 2 commits into from
Feb 5, 2017

Conversation

theraphim
Copy link
Contributor

Leak is when requests are arriving out of order - they are never cleaned from the map. Hang is when Writer returns an error - because of the way inFlight is used, the function is stuck forever on receiving from channel.

This may fix #150. I don't have tests sadly, to properly test this and other corner cases I'll need to do a larger refactoring. I tested this in production instead.

}
}

if inFlight == 0 {
if firstErr.err == nil && len(pendingWrites) > 0 {
return copied, errors.New("internal inconsistency.")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error strings should not end with punctuation

@theraphim
Copy link
Contributor Author

Maybe it makes sense to drop 1.4.3 from the CI?

@eikenb
Copy link
Member

eikenb commented Jan 16, 2017

I'd prefer it if the 2 issues were split out, 1 PR for the leak and 1 PR for the hang. It'd also be great if there were a basic test for each, as I'd really like to reproduce the problem and then see the fix in action.

For the leak, I assume the map you are referring to is pendingWrites. From my reading it would get freed/gc'd after the function returned, but it would keep the all incoming data in memory until that point. So not so much a leak as just inefficient memory use. Or am I missing something?

@eikenb
Copy link
Member

eikenb commented Jan 16, 2017

Also, I've removed 1.4.3 from travis testing.

@theraphim
Copy link
Contributor Author

Greetings.

So, in reverse order.
You may call this inefficient memory use but to me if I use it to stream 16 gigabytes and it'll eat 16 gigabytes I'll call this a leak.

As for testing - yes it would be nice to have tests, unfortunately as is the code isn't very testable, and I don't want to refactor incorrect code into another incorrect code.

@eikenb
Copy link
Member

eikenb commented Jan 16, 2017

I agree that saving all the data in memory even after being sent to the writer is bad behavior and should be addressed. In other words, putting semantics of "leaks" aside, I agree.

Would you consider reworking the PR to better split up the changes for the 2 fixes. Either as 2 PRs or as 2 commits in one? I am a strong believer in 1 commit/changeset per issue.

Thanks.

When reassembling out-of-order responses, give Go a chance to
free blocks we already wrote.
When writer returns an error, the code would mess up inFlight counter
and the loop will never finish as the result.

We switch to only use inFlight to count requests to sftp server -
we delete from pendingWrites now so we can use len(pendingWrites) to
count.
@theraphim
Copy link
Contributor Author

2 commits now.

@eikenb
Copy link
Member

eikenb commented Jan 24, 2017

Thank you for splitting up the commits. I'll look at getting it merged soon. I'm first trying to see what I can do about all the races in the integration tests that cause the intermittent failures of the automated/travis testing. Your's fails every time on travis, but usually passes when I run the tests locally. I can get it to race occasionally locally and am trying to track those down.

@eikenb eikenb self-assigned this Jan 24, 2017
@theraphim
Copy link
Contributor Author

Any luck merging this?

@eikenb
Copy link
Member

eikenb commented Feb 1, 2017

I will get to it soon. I just pushed a fix to a race and found that the upstream ssh lib package made a change that breaks one of our tests. I've located the commit that breaks things and am currently working to figure out where the bug is. Once I have the tests running reliably I'll work on getting your PR merged first.

@eikenb
Copy link
Member

eikenb commented Feb 1, 2017

Filed a ticket on the problem (#155). Tracked it down to an upstream bug introduced a couple weeks ago in the x/crypto/ssh lib. There is already a change out for review that fixes it. Our travis tests will continue to fail until it gets reviewed and applied. I have it applied locally and all tests are passing here.

@AlexLi1001
Copy link

@theraphim have you email address? I have some technical problem about sftp client response to discuss . My email address: [email protected]. Thanks!

@eikenb
Copy link
Member

eikenb commented Feb 3, 2017

FYI, I've reviewed this and am happy with it. I'll merge it as soon as the bug breaking the testing is resolved as it looks like they intend to fix that one very soon.

@eikenb
Copy link
Member

eikenb commented Feb 5, 2017

Just added a hack/workaround to the travis config so it would pass (pulls in the fix). So I'll be merging this...

@eikenb eikenb merged commit b4b1d29 into pkg:master Feb 5, 2017
@eikenb eikenb mentioned this pull request Feb 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hanging occurs when opening multi-connection download
4 participants