Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 595 watchbackend #629

Merged
merged 11 commits into from
Apr 17, 2019
Merged

Conversation

murphymj25
Copy link
Contributor

No description provided.

@murphymj25
Copy link
Contributor Author

This pull request addresses #595.

@murphymj25
Copy link
Contributor Author

murphymj25 commented Mar 25, 2019

Here is some detail around memory performance comparing the pull request to the current 1.5.11 build. In the image below, the yellow line is the current 1.5.11 binary. The green line is a build with switching next in main.watchBackend to a bytes.Buffer. The fabio processes were both started at the same time and allow to run for approximately 10 hours. As you can see the 1.5.11 build continues to increase in memory, while the instance running the pull request build is relatively flat.

Screen Shot 2019-03-25 at 8 29 48 AM

If we look at the pprofs taken from the 1.5.11 build we see the following:

	(pprof) top
	Showing nodes accounting for 2401.57MB, 97.69% of 2458.23MB total
	Dropped 469 nodes (cum <= 12.29MB)
	Showing top 10 nodes out of 15
	      flat  flat%   sum%        cum   cum%
	 2316.95MB 94.25% 94.25%  2411.38MB 98.09%  main.watchBackend
	   78.66MB  3.20% 97.45%    78.66MB  3.20%  github.com/rcrowley/go-metrics.newExpDecaySampleHeap (inline)
	    2.18MB 0.089% 97.54%    83.69MB  3.40%  github.com/fabiolb/fabio/route.(*Route).addTarget
	    1.69MB 0.069% 97.61%    87.58MB  3.56%  github.com/fabiolb/fabio/route.Table.addRoute
	    1.44MB 0.058% 97.67%    21.90MB  0.89%  net/http.(*Transport).dialConn
	    0.44MB 0.018% 97.69%    79.09MB  3.22%  github.com/rcrowley/go-metrics.NewExpDecaySample
	    0.20MB 0.0083% 97.69%    80.77MB  3.29%  github.com/rcrowley/go-metrics.NewTimer
	         0     0% 97.69%    81.06MB  3.30%  github.com/fabiolb/fabio/metrics.(*gmRegistry).GetTimer
	         0     0% 97.69%    90.17MB  3.67%  github.com/fabiolb/fabio/route.NewTable
	         0     0% 97.69%    81.06MB  3.30%  github.com/rcrowley/go-metrics.(*StandardRegistry).GetOrRegister
	
	(pprof) list main.watchBackend
	Total: 2.40GB
	ROUTINE ======================== main.watchBackend in /Users/frank/src/github.com/fabiolb/fabio/main.go
	    2.26GB     2.35GB (flat, cum) 98.09% of Total
	         .          .    439: case mancfg = <-man:
	         .          .    440: }
	         .          .    441:
	         .          .    442: // manual config overrides service config
	         .          .    443: // order matters
	    2.26GB     2.26GB    444: next := svccfg + "\n" + mancfg
	         .          .    445: if next == last {
	         .          .    446: continue
	         .          .    447: }
	         .          .    448:
	         .   169.22kB    449: aliases, err := route.ParseAliases(next)
	         .          .    450: if err != nil {
	         .          .    451: log.Printf("[WARN]: %s", err)
	         .          .    452: }
	         .          .    453: registry.Default.Register(aliases)
	         .          .    454:
	         .    90.17MB    455: t, err := route.NewTable(next)
	         .          .    456: if err != nil {
	         .          .    457: log.Printf("[WARN] %s", err)
	         .          .    458: continue
	         .          .    459: }
	         .          .    460: route.SetTable(t)
	         .     4.09MB    461: logRoutes(t, last, next, cfg.Log.RoutesFormat)
	         .          .    462: last = next
	         .          .    463:
	         .          .    464: once.Do(func() { close(first) })
	         .          .    465: }
         .          .    466:}

In the 1.5.11 build, the "next := svccfg + "\n" + mancfg" is over 2 GB in size after only 10 hours of running. This matches our behavior where our production nodes with 64GB or memory are exhausted in a few days.

In the pull request build we see the following from the pprof:

	(pprof) top
	Showing nodes accounting for 189.15MB, 84.85% of 222.93MB total
	Dropped 355 nodes (cum <= 1.11MB)
	Showing top 10 nodes out of 76
	      flat  flat%   sum%        cum   cum%
	   78.79MB 35.34% 35.34%    78.79MB 35.34%  github.com/rcrowley/go-metrics.newExpDecaySampleHeap
	   39.90MB 17.90% 53.24%    40.09MB 17.98%  github.com/sergi/go-diff/diffmatchpatch.(*DiffMatchPatch).DiffCleanupSemanticLossless
	   17.08MB  7.66% 60.90%    70.44MB 31.60%  github.com/sergi/go-diff/diffmatchpatch.(*DiffMatchPatch).DiffMain
	   10.36MB  4.65% 65.55%    10.36MB  4.65%  bufio.NewReaderSize
	   10.32MB  4.63% 70.18%    10.32MB  4.63%  bufio.NewWriterSize
	    8.25MB  3.70% 73.88%     8.25MB  3.70%  strings.Join
	    7.80MB  3.50% 77.38%     7.80MB  3.50%  github.com/sergi/go-diff/diffmatchpatch.(*DiffMatchPatch).diffLinesToRunes
	    6.43MB  2.89% 80.27%     6.43MB  2.89%  encoding/json.(*decodeState).literalStore
	    5.62MB  2.52% 82.79%     5.62MB  2.52%  reflect.New
	    4.59MB  2.06% 84.85%     4.59MB  2.06%  bytes.makeSlice
	
	(pprof) list main.watchBackend
	Total: 222.93MB
	ROUTINE ======================== main.watchBackend in /Users/z050785/go/src/github.com/fabiolb/fabio/main.go
	         0   171.73MB (flat, cum) 77.03% of Total
	         .          .    441: case mancfg = <-man:
	         .          .    442: }
	         .          .    443: // manual config overrides service config
	         .          .    444: // order matters
	         .          .    445: next.Reset()
	         .     4.52MB    446: next.WriteString(svccfg)
	         .          .    447: next.WriteString("\n")
	         .          .    448: next.WriteString(mancfg)
	         .          .    449: if next.String() == last {
	         .          .    450: continue
	         .          .    451: }
	         .   169.24kB    452: aliases, err := route.ParseAliases(next.String())
	         .          .    453: if err != nil {
	         .          .    454: log.Printf("[WARN]: %s", err)
	         .          .    455: }
	         .          .    456: registry.Default.Register(aliases)
	         .          .    457:
	         .    92.68MB    458: t, err := route.NewTable(next)
	         .          .    459: if err != nil {
	         .          .    460: log.Printf("[WARN] %s", err)
	         .          .    461: continue
	         .          .    462: }
	         .          .    463: route.SetTable(t)
	         .    74.37MB    464: logRoutes(t, last, next.String(), cfg.Log.RoutesFormat)
	         .          .    465: last = next.String()
	         .          .    466: once.Do(func() { close(first) })
	         .          .    467: }
	         .          .    468:}
         .          .    469:

The "next" buffer is only 4.5MB in size, which matches the current size of our routes from consul.

@aaronhurt
Copy link
Member

@murphymj25 LGTM! Thank you for all you effort debugging and providing a solution.

@aaronhurt
Copy link
Member

@murphymj25 would you mind re-basing this and resolving the conflicts created by merging #614?

@murphymj25
Copy link
Contributor Author

Yep, I can take care of that.

Thanks!

@aaronhurt
Copy link
Member

aaronhurt commented Apr 17, 2019

Not sure what exactly happened here. When I compare you fork against current master it looks pretty clean (only some minor differences) but this PR is a bit crazy.

master...murphymj25:issue-595-watchbackend

@murphymj25 murphymj25 force-pushed the issue-595-watchbackend branch from b0bbcd3 to c9705ca Compare April 17, 2019 15:26
@murphymj25
Copy link
Contributor Author

@leprechau Yeah, i don't think my rebase was correct at that time. Take a look now.

main.go Show resolved Hide resolved
@aaronhurt
Copy link
Member

@murphymj25 thank you! Looks better.

@aaronhurt aaronhurt merged commit 65b31ca into fabiolb:master Apr 17, 2019
@aaronhurt
Copy link
Member

@murphymj25 and thank you for that example link. That made me remember something I had forgotten. String concatenation with + in golang is quite inefficient and creates a new string object.

@murphymj25
Copy link
Contributor Author

@leprechau No Problem! Glad i could help get this merged in.

@aaronhurt
Copy link
Member

aaronhurt commented Apr 17, 2019

Could maybe also have used the new strings.Builder(). I may open another issue to go through and replace current + concatenations with strings.Builder() in other parts of the code ... especially in loops. Thanks again.

@murphymj25
Copy link
Contributor Author

I tested with the strings.builder, but didn't see a way to read directly from the buffer. When i had to convert it to a string in the route/parse_new.go to read it, I was seeing the memory leak at that point.

I think it would be a great idea to replace the remaining concatenations with strings.Builder(). That's something we were debating on tackling as well.

@murphymj25 murphymj25 deleted the issue-595-watchbackend branch April 17, 2019 17:17
aaronhurt added a commit to myENA/fabio that referenced this pull request Apr 19, 2019
@magiconair magiconair added this to the 1.5.12 milestone Oct 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants