runtime: expand capacity of map by more than 1 at a time? #56182

lizthegrey · 2022-10-12T21:18:31Z

Follow-on to #52157 and #54454

Right now the reference code in /x/exp/maps.Copy will potentially call runtime.growWork_faststr and runtime.hashGrow many times if src and dst are mostly disjoint and len(src) is large.

There should be some smarter logic to be able to not just pre-allocate a map with a certain capacity at creation time, but to expand an existing map by a certain amount (e.g. by len(src)) -- there already exists this idea with variadic append() for concatenating slices.

Short of that, maybe Copy() should make a decision based on the sizes of dst and src & capacities available in dst on whether it's more efficient to create a new map with capacity = len(src)+len(dst) and then copy dst and src both into the new map.

(you could then Shrink() the map if you were confident it were going to be read-only after that, to avoid overshooting usage by too much)

The real-world pain here is that telemetry libraries like OpenTelemetry and Honeycomb Beelines like to keep maps of attributes associated with a given trace span, and merge them together before sending into a final map; a lot of work is done to growWork_faststr and hashGrow if many attributes are added one at a time if map merging/copying via iteration.

Parallels in other languages: Rust

The text was updated successfully, but these errors were encountered:

seankhliao · 2022-10-12T21:31:34Z

cc @golang/runtime

randall77 · 2022-10-12T21:35:11Z

Where does the allocation of dst happen? Can you give it a size hint there? Or can you use Clone instead?

I'm not really sure why/how you're seeing extra work here. maps are grown by factors of 2, so asymptotically there isn't any extra work. If you would need to grow the map twice (to 4x the size), the 1x->2x grow is kinda pointless, but it is at most half the work of the 2x->4x grow, so it's not that much extra work.

lizthegrey · 2022-10-12T21:41:30Z

Where does the allocation of dst happen? Can you give it a size hint there? Or can you use Clone instead?

dst is allocated when the trace span is created, any number of attributes may be added during execution, but then we have to add in finisher attributes at the end.

That's the thing that I don't have the information to decide, because cap(dst) doesn't exist, and it might have a cap of 5 elements or it might have a cap of 100 elements. I guess I can look to the current len() as a guide though. Therefore I don't know if I'm better off running Clone() to guarantee the capacity (and re-copying everything) or whether I'm better just inserting into the map and trusting the capacity to be there.

I'm not really sure why/how you're seeing extra work here. maps are grown by factors of 2, so asymptotically there isn't any extra work. If you would need to grow the map twice (to 4x the size), the 1x->2x grow is kinda pointless, but it is at most half the work of the 2x->4x grow, so it's not that much extra work.

I wish I knew as well, but that's what the profiling data says -- that we're spending a HUGE amount of time expanding maps rather than doing useful work :(

randall77 · 2022-10-12T21:53:47Z

If there's lots of growing during that final Copy, then when you allocate dst use the size of src (the finisher attributes) as the size hint.

I'm not sure how having the map capacity would help here. It lets you know whether the map needs to be grown, but you still have to do the grow in any case (by yourself by allocating a new map, maybe, but that's going to cost just as much).

It sounds like the finisher attributes are large compared to the regular trace attibutes, so maybe just doing

m := maps.Clone(finisherAttributes)
m.Copy(traceAttributes)
... assign m back to traceAttributes ...

at the end should work?

lizthegrey · 2022-10-12T22:04:35Z

If there's lots of growing during that final Copy, then when you allocate dst use the size of src (the finisher attributes) as the size hint.
It sounds like the finisher attributes are large compared to the regular trace attibutes, so maybe just doing

Correct, while we don't know for sure what the "record immediately prior to send" attributes will be or what their values are (since it's evaluated just before sending the span), the list of keys usually don't change dynamically so we could pre-allocate a capacity of len(starter_attributes)+len(finisher_attributes), even if things might change and there might be slightly more or less attributes than expected. Thanks for the suggestion.

## Which problem is this PR solving? - Similar to honeycombio/libhoney-go#197, profiling shows we spend a lot of time in github.com/honeycombio/beeline-go/trace.(*Span).send generating the map. ## Short description of the changes - define the map capacity in advance before returning it. a subsequent follow-up might be allowing passing a map in its entirety to github.com/honeycombio/libhoney-go.(*Event).AddField so that the map entries do not need to be added one at a time when merging the maps together. Or, better yet, golang/go#56182 could prevent this problem by giving us a more sensible API for mass additions to a map.

thepudds · 2022-10-19T19:34:05Z

Hi @lizthegrey, I've been looking at map performance from a few different angles, and thought I'd share a quick benchmark that might be in the ballpark of what you initially reported.

In this benchmark, keys are 20 byte strings, dst size starts at 2, src is between 92 to 108 elems, and src gets copied into dst. Those are not going to be your exact numbers, but hopefully demonstrate a similar phenomena. The size ranges were chosen to sweep across 105, which is a point that a map grow happens.

BenchmarkMapCopy

func BenchmarkMapCopy(b *testing.B) {
	bms := []struct {
		presize      bool
		dstStartSize int
	}{
		{presize: false, dstStartSize: 2}, // dst map starts with 1 bucket
		{presize: true, dstStartSize: 2},
		{presize: false, dstStartSize: 14}, // dst map starts with 4 buckets
		{presize: true, dstStartSize: 14},
	}
	// a grow happens at a map size of 105, so here we sweep srcSize +/-20 around 105
	for srcSize := 85; srcSize < 125; srcSize++ {
		for _, bm := range bms {
			b.Run(fmt.Sprintf("src_%d_dst_start_%d_dst_final_%d_presize_%v", srcSize, bm.dstStartSize, srcSize+bm.dstStartSize, bm.presize), func(b *testing.B) {
				var src, dst map[string]string
				src = make(map[string]string)
				populate := func(m map[string]string, count int) {
					for i := 0; i < count; i++ {
						k := make([]byte, 20)
						v := make([]byte, 20)
						rand.Read(k)
						rand.Read(v)
						m[string(k)] = string(v)
					}
				}
				populate(src, srcSize)

				for i := 0; i < b.N; i++ {
					b.StopTimer()
					if bm.presize {
						dst = make(map[string]string, srcSize+bm.dstStartSize)
					} else {
						dst = make(map[string]string)
					}
					populate(dst, bm.dstStartSize)
					b.StartTimer()

					// copy src into dst
					for k, v := range src {
						dst[k] = v
					}
				}
			})
		}
	}
}

Here are some sample results, where 'old' is without presizing dst, and 'new' is presizing dst. The names src_XX_dst_YY show the len of src and the final len of dst after the copy.

name                  old time/op  new time/op  delta
Copy/src_92_dst_94    19.0µs ± 3%   8.4µs ± 2%  -55.6%  (p=0.000 n=14+15)
Copy/src_93_dst_95    19.2µs ± 6%   8.6µs ± 3%  -55.1%  (p=0.000 n=15+15)
Copy/src_94_dst_96    19.3µs ± 4%   8.7µs ± 3%  -54.7%  (p=0.000 n=15+15)
Copy/src_95_dst_97    19.7µs ± 3%   8.8µs ± 2%  -55.3%  (p=0.000 n=15+13)
Copy/src_96_dst_98    19.7µs ± 4%   8.9µs ± 2%  -54.7%  (p=0.000 n=15+13)
Copy/src_97_dst_99    19.6µs ± 4%   9.1µs ± 3%  -53.6%  (p=0.000 n=15+15)
Copy/src_98_dst_100   19.6µs ± 3%   9.2µs ± 3%  -53.0%  (p=0.000 n=15+14)
Copy/src_99_dst_101   19.6µs ± 3%   9.2µs ± 3%  -53.2%  (p=0.000 n=14+14)
Copy/src_100_dst_102  20.5µs ± 4%   9.4µs ± 3%  -54.0%  (p=0.000 n=15+15)
Copy/src_101_dst_103  20.1µs ± 4%   9.4µs ± 5%  -53.0%  (p=0.000 n=15+15)
Copy/src_102_dst_104  20.1µs ± 3%   9.6µs ± 3%  -52.4%  (p=0.000 n=15+15)
Copy/src_103_dst_105  24.3µs ± 3%   8.7µs ± 4%  -64.2%  (p=0.000 n=15+12) // dst: extra grow
Copy/src_104_dst_106  24.6µs ± 2%   8.7µs ± 3%  -64.6%  (p=0.000 n=15+15)
Copy/src_105_dst_107  28.6µs ± 5%  13.9µs ± 2%  -51.5%  (p=0.000 n=14+13) // src: mid-move iter
Copy/src_106_dst_108  29.4µs ± 3%  13.7µs ± 3%  -53.3%  (p=0.000 n=15+15)
Copy/src_107_dst_109  30.4µs ± 4%  13.1µs ± 5%  -56.9%  (p=0.000 n=14+15)
Copy/src_108_dst_110  29.9µs ± 4%  13.0µs ± 5%  -56.4%  (p=0.000 n=13+14)

Two notable jumps in performance:

Copy/src_103_dst_105 -- For the non-presized case ('old'), this dst final size of 105 is the first time in this particular range of sizes that an extra grow of dst happens during the copy, and hence is more expensive. For the presized case ('new'), it is likely benefiting here from a 2x larger capacity presized dst map with a lower fill factor compared to the prior line.
Copy/src_105_dst_107 -- For both the non-presized and presized cases, this is when src at size 105 is "stuck" mid-growth during the range operation, which is more expensive (runtime: consider growing map on iteration #51410).

I think those results generally seem explainable by the current runtime map implementation...

CAFxX · 2022-10-21T06:04:12Z

I know we don't like adding APIs for this kind of thing, but ISTM that a maps.Grow(m, hint) (just a strawman based on slices.Grow) that could be used to ask the runtime to increase the size of the map - as if it had been created with make(map[K]V, len(m)+hint) - would allow Copy callers that know that the set of keys is mostly disjunct to avoid the problem:

maps.Grow(dst, len(src))
maps.Copy(dst, src)

I don't think maps.Copy should do this internally because in the case the keys in src are mostly already included in dst it may cause an unneeded growth of the dst map.

jub0bs · 2023-11-20T07:52:46Z

@randall77

It sounds like the finisher attributes are large compared to the regular trace attributes, so maybe just doing
m := maps.Clone(finisherAttributes)
m.Copy(traceAttributes)
... assign m back to traceAttributes ...
at the end should work?

What about situations where you cannot overwrite the destination map variable?

For example, I often want to populate the headers of a HTTP response on the basis of an existing http.Header containing (possibly many) header names not already present. The approach you suggest isn't viable in this case (unless I'm missing something and there is a way to somehow replace the http.Header returned by http.ResponseWriter's Header method).

I wish there were something like @CAFxX's maps.Grow function, so that I could write something like

maps.Grow(w.Header(), len(moreHeaders)) // w is a http.ResponseWriter
maps.Copy(w.Header(), moreHeaders)

and thereby save on allocations incurred during repeated map resizing.

lizthegrey added the Proposal label Oct 12, 2022

gopherbot added this to the Proposal milestone Oct 12, 2022

lizthegrey mentioned this issue Oct 12, 2022

pre-define capacities for rollups and TLFs honeycombio/beeline-go#353

Merged

seankhliao changed the title ~~proposal: language: expand capacity of map by more than 1 at a time?~~ runtime: expand capacity of map by more than 1 at a time? Oct 12, 2022

seankhliao removed this from the Proposal milestone Oct 12, 2022

seankhliao added Performance NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed Proposal labels Oct 12, 2022

seankhliao added this to the Unplanned milestone Nov 19, 2022

lizthegrey mentioned this issue Jun 27, 2023

Refinery spends unnecessary 5% cpu resizing maps honeycombio/refinery#749

Closed

jub0bs mentioned this issue Dec 21, 2023

CORS: irrelevant headers are indiscriminately set in the preflight/actual responses gofr-dev/gofr#139

Closed

lizthegrey mentioned this issue Jan 10, 2024

sdk/trace: use slices.Grow() to avoid excessive runtime.growslice() open-telemetry/opentelemetry-go#4818

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: expand capacity of map by more than 1 at a time? #56182

runtime: expand capacity of map by more than 1 at a time? #56182

lizthegrey commented Oct 12, 2022 •

edited

Loading

seankhliao commented Oct 12, 2022

randall77 commented Oct 12, 2022

lizthegrey commented Oct 12, 2022 •

edited

Loading

randall77 commented Oct 12, 2022

lizthegrey commented Oct 12, 2022 •

edited

Loading

thepudds commented Oct 19, 2022 •

edited

Loading

CAFxX commented Oct 21, 2022 •

edited

Loading

jub0bs commented Nov 20, 2023 •

edited

Loading

runtime: expand capacity of map by more than 1 at a time? #56182

runtime: expand capacity of map by more than 1 at a time? #56182

Comments

lizthegrey commented Oct 12, 2022 • edited Loading

seankhliao commented Oct 12, 2022

randall77 commented Oct 12, 2022

lizthegrey commented Oct 12, 2022 • edited Loading

randall77 commented Oct 12, 2022

lizthegrey commented Oct 12, 2022 • edited Loading

thepudds commented Oct 19, 2022 • edited Loading

CAFxX commented Oct 21, 2022 • edited Loading

jub0bs commented Nov 20, 2023 • edited Loading

lizthegrey commented Oct 12, 2022 •

edited

Loading

lizthegrey commented Oct 12, 2022 •

edited

Loading

lizthegrey commented Oct 12, 2022 •

edited

Loading

thepudds commented Oct 19, 2022 •

edited

Loading

CAFxX commented Oct 21, 2022 •

edited

Loading

jub0bs commented Nov 20, 2023 •

edited

Loading