Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

goroutine leaks from beginSegment #51

Closed
elan-sg opened this issue Apr 10, 2018 · 14 comments
Closed

goroutine leaks from beginSegment #51

elan-sg opened this issue Apr 10, 2018 · 14 comments
Assignees
Labels

Comments

@elan-sg
Copy link

elan-sg commented Apr 10, 2018

go func() {
select {
case <-ctx.Done():
seg.handleContextDone()
}
}()

This goroutine will leak if a context without cancel is used to beginSegment, I think Segment.Close should clean this up.

@haotianw465
Copy link
Contributor

Hi,

Sorry for the late response. Thank you for reporting the issue. We are aware of this and will have a fix soon. Please stay tuned.

@Cyberax
Copy link

Cyberax commented Aug 8, 2019

Guys, it's been a year so far. The fix is simple, just add a select{} statement that monitors a channel that is closed in segment.Close().

I have even done this for you: https://github.com/Cyberax/aws-xray-sdk-go/commit/033bad50abd4f462219e5972a278fae1f26dd05b

@yogiraj07
Copy link
Contributor

yogiraj07 commented Aug 8, 2019

Hello @Cyberax ,
Apologies for delay in solving this issue. I am happy to work with you and prioritize on solving it. Can you post a PR for the above commit with unit tests.

Also I would like to know the behavior of this commit w.r.t context with cancel. Since in close we are actively closing the go routine, will there be any behavioral change, will <-ctx.Done() be never executed?

Best,
Yogi

@yogiraj07 yogiraj07 self-assigned this Aug 8, 2019
@LeoAdamek
Copy link

Any update on this? It's quite a big issue as, even when I explicitly complete the context passed to the segment, there are still goroutines leaked. In an API the rate these leaks occur can be up to the every request.

In my case I've written a minimal middleware for Gin to handle tracing:

g.Use(func(c *gin.Context) {
		ctx := context.WithValue(c, ctxURL, c.Request.URL.String())

		serviceName := "solos-api"

		if envName := viper.GetString("xray.name"); envName != "" {
			serviceName += " " + envName
		}

		parts := strings.Split(c.HandlerName(), ".")

		serviceName += " " + parts[len(parts)-1]

		ctx, s := xray.BeginSegment(ctx, serviceName)

		remote := c.Request.Header.Get("X-Forwarded-For")

		if remote == "" {
			remote = c.Request.RemoteAddr
		}

		s.HTTP = &xray.HTTPData{}
		s.HTTP.Request = &xray.RequestData{
			Method:    c.Request.Method,
			ClientIP:  remote,
			URL:       c.Request.URL.Path,
			UserAgent: c.Request.UserAgent(),
		}

		// Add the trace Id as a header so the request can be referenced later
		// for troubleshooting.
		c.Writer.Header().Set("X-Trace-Id", s.TraceID)

		c.Set("ctx", ctx)
		c.Next()
		status := c.Writer.Status()
		s.HTTP.Response = &xray.ResponseData{
			Status: status,
		}

		s.Error = false
		s.Fault = false

		if status >= 400 {
			s.Error = true
			if status >= 500 {
				s.Fault = true
			}
		}

		ctx.Done()
		s.Close(nil)
	})

However, no matter what I try, there are still goroutines leaked for almost every request.

@dtarico-suplari
Copy link
Contributor

dtarico-suplari commented Nov 26, 2019

Just had this crash a production deployment after we added xray. This bug has been open a year and a half and there's neither a fix nor a big warning in the docs. Not cool.

@JakeKalstad
Copy link

Is there any foreseeable action on this issue? We're seeing massive memory leaks resulting in crashes in production upon release. The documentation for Start a custom segment/subsegment is either incorrect or incomplete as we are properly closing the segment and still seeing issues. After migrating from a background context to a context.WithCancel() we are still seeing the memory leak.

Thanks.

@bhautikpip
Copy link
Contributor

Hi @JakeKalstad,

Apologies for the inconvenience. We recently merged PR (#156). This might fix some issues. Feel free to submit a PR or open an issue if you find any other source of memory leaks.

@JakeKalstad
Copy link

Thanks for the quick response, we've confirmed that it has resolved our issue.

@michelaquino
Copy link

michelaquino commented Jan 9, 2020

Hi @bhautikpip and @dtarico-suplari
Was this fix already released? The last version is v1.0.0-rc.14 and do not contains the code merged.

@bhautikpip
Copy link
Contributor

Hi @michelaquino,

This fix is yet to be released. This fix will not be in (v1.0.0-rc.14) this version.

@bhautikpip
Copy link
Contributor

Hi @michelaquino,

Just to keep you updated we are planning a release including this fix soon.

@bhautikpip
Copy link
Contributor

I am closing this issue since we have pushed out a fix for this issue in this release (https://github.com/aws/aws-xray-sdk-go/releases/tag/v1.0.0-rc.15). Feel free to open another issue if you think it's not addressed.

cosnicolaou added a commit to vanadium/core that referenced this issue Sep 25, 2021
…ithSampling (#219)

aws/aws-xray-sdk-go#51 describes the goroutine leak in xray.BeginSegmentWithSampling and an appropriate fix (Cyberax/aws-xray-sdk-go@033bad5). However, a different, incomplete fix was adopted: https://github.com/aws/aws-xray-sdk-go/pull/156/files.

This PR works around this bug by artificially creating a context and then canceling that new context when the xray segment is closed.
@byrnedo
Copy link

byrnedo commented May 30, 2022

I just saw this using v1.7.0 using BeginSegment. Is this definitely fixed?
Using a cancellable context solved it.

@zapling
Copy link

zapling commented Aug 29, 2022

This issue is not fixed in v1.7.0, we ran into this in services when we upgraded xray. Should we open a new ticket for this? @bhautikpip

For anyone else looking #364

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests