Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: elide call to runtime.growslice with provable capacity availability #30509

Open
dsnet opened this issue Mar 1, 2019 · 4 comments
Labels
binary-size compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@dsnet
Copy link
Member

dsnet commented Mar 1, 2019

Using go1.12

Consider the following snippet:

/*12*/	func Append(b []byte) []byte {  
/*13*/		if cap(b)-len(b) > 3 {      
/*14*/			b = append(b, 0, 1, 2)  
/*15*/		}                           
/*16*/		return b                    
/*17*/	}

This compiles to the following snippet:

 "".Append STEXT size=192 args=0x30 locals=0x48
 	0x0000 00000 (/tmp/sandbox188310960/main.go:12)	TEXT	"".Append(SB), ABIInternal, $72-48
 	0x0000 00000 (/tmp/sandbox188310960/main.go:12)	MOVQ	(TLS), CX
 	0x0009 00009 (/tmp/sandbox188310960/main.go:12)	CMPQ	SP, 16(CX)
 	0x000d 00013 (/tmp/sandbox188310960/main.go:12)	JLS	182
 	0x0013 00019 (/tmp/sandbox188310960/main.go:12)	SUBQ	$72, SP
 	0x0017 00023 (/tmp/sandbox188310960/main.go:12)	MOVQ	BP, 64(SP)
 	0x001c 00028 (/tmp/sandbox188310960/main.go:12)	LEAQ	64(SP), BP
 	0x0021 00033 (/tmp/sandbox188310960/main.go:13)	MOVQ	"".b+96(SP), AX
 	0x0026 00038 (/tmp/sandbox188310960/main.go:13)	MOVQ	"".b+88(SP), CX
 	0x002b 00043 (/tmp/sandbox188310960/main.go:13)	MOVQ	AX, DX
 	0x002e 00046 (/tmp/sandbox188310960/main.go:13)	SUBQ	CX, AX
 	0x0031 00049 (/tmp/sandbox188310960/main.go:13)	CMPQ	AX, $3
 	0x0035 00053 (/tmp/sandbox188310960/main.go:13)	JLE	172
- 	0x0037 00055 (/tmp/sandbox188310960/main.go:14)	LEAQ	3(CX), AX
- 	0x003b 00059 (/tmp/sandbox188310960/main.go:14)	CMPQ	AX, DX
- 	0x003e 00062 (/tmp/sandbox188310960/main.go:14)	JHI	105
 	0x0040 00064 (/tmp/sandbox188310960/main.go:14)	MOVQ	"".b+80(SP), BX
 	0x0045 00069 (/tmp/sandbox188310960/main.go:14)	MOVW	$256, (BX)(CX*1)
 	0x004b 00075 (/tmp/sandbox188310960/main.go:14)	MOVB	$2, 2(BX)(CX*1)
 	0x0050 00080 (/tmp/sandbox188310960/main.go:16)	MOVQ	BX, "".~r1+104(SP)
 	0x0055 00085 (/tmp/sandbox188310960/main.go:16)	MOVQ	AX, "".~r1+112(SP)
 	0x005a 00090 (/tmp/sandbox188310960/main.go:16)	MOVQ	DX, "".~r1+120(SP)
 	0x005f 00095 (/tmp/sandbox188310960/main.go:16)	MOVQ	64(SP), BP
 	0x0064 00100 (/tmp/sandbox188310960/main.go:16)	ADDQ	$72, SP
 	0x0068 00104 (/tmp/sandbox188310960/main.go:16)	RET
- 	0x0069 00105 (/tmp/sandbox188310960/main.go:14)	LEAQ	type.uint8(SB), BX
- 	0x0070 00112 (/tmp/sandbox188310960/main.go:14)	MOVQ	BX, (SP)
- 	0x0074 00116 (/tmp/sandbox188310960/main.go:14)	MOVQ	"".b+80(SP), BX
- 	0x0079 00121 (/tmp/sandbox188310960/main.go:14)	MOVQ	BX, 8(SP)
- 	0x007e 00126 (/tmp/sandbox188310960/main.go:14)	MOVQ	CX, 16(SP)
- 	0x0083 00131 (/tmp/sandbox188310960/main.go:14)	MOVQ	DX, 24(SP)
- 	0x0088 00136 (/tmp/sandbox188310960/main.go:14)	MOVQ	AX, 32(SP)
- 	0x008d 00141 (/tmp/sandbox188310960/main.go:14)	CALL	runtime.growslice(SB)
- 	0x0092 00146 (/tmp/sandbox188310960/main.go:14)	MOVQ	40(SP), BX
- 	0x0097 00151 (/tmp/sandbox188310960/main.go:14)	MOVQ	48(SP), AX
- 	0x009c 00156 (/tmp/sandbox188310960/main.go:14)	MOVQ	56(SP), DX
- 	0x00a1 00161 (/tmp/sandbox188310960/main.go:14)	ADDQ	$3, AX
- 	0x00a5 00165 (/tmp/sandbox188310960/main.go:14)	MOVQ	"".b+88(SP), CX
- 	0x00aa 00170 (/tmp/sandbox188310960/main.go:14)	JMP	69
 	0x00ac 00172 (/tmp/sandbox188310960/main.go:16)	MOVQ	CX, AX
 	0x00af 00175 (/tmp/sandbox188310960/main.go:16)	MOVQ	"".b+80(SP), BX
 	0x00b4 00180 (/tmp/sandbox188310960/main.go:16)	JMP	80
 	0x00b6 00182 (/tmp/sandbox188310960/main.go:16)	NOP
 	0x00b6 00182 (/tmp/sandbox188310960/main.go:12)	CALL	runtime.morestack_noctxt(SB)
 	0x00bb 00187 (/tmp/sandbox188310960/main.go:12)	JMP	0
 	0x0000 64 48 8b 0c 25 00 00 00 00 48 3b 61 10 0f 86 a3  dH..%....H;a....
 	0x0010 00 00 00 48 83 ec 48 48 89 6c 24 40 48 8d 6c 24  [email protected]$
 	0x0020 40 48 8b 44 24 60 48 8b 4c 24 58 48 89 c2 48 29  @H.D$`H.L$XH..H)
 	0x0030 c8 48 83 f8 03 7e 75 48 8d 41 03 48 39 d0 77 29  .H...~uH.A.H9.w)
 	0x0040 48 8b 5c 24 50 66 c7 04 0b 00 01 c6 44 0b 02 02  H.\$Pf......D...
 	0x0050 48 89 5c 24 68 48 89 44 24 70 48 89 54 24 78 48  H.\$hH.D$pH.T$xH
 	0x0060 8b 6c 24 40 48 83 c4 48 c3 48 8d 1d 00 00 00 00  [email protected]......
 	0x0070 48 89 1c 24 48 8b 5c 24 50 48 89 5c 24 08 48 89  H..$H.\$PH.\$.H.
 	0x0080 4c 24 10 48 89 54 24 18 48 89 44 24 20 e8 00 00  L$.H.T$.H.D$ ...
 	0x0090 00 00 48 8b 5c 24 28 48 8b 44 24 30 48 8b 54 24  ..H.\$(H.D$0H.T$
 	0x00a0 38 48 83 c0 03 48 8b 4c 24 58 eb 99 48 89 c8 48  8H...H.L$X..H..H
 	0x00b0 8b 5c 24 50 eb 9a e8 00 00 00 00 e9 40 ff ff ff  .\$P........@...
 	rel 5+4 t=16 TLS+0
 	rel 108+4 t=15 type.uint8+0
 	rel 142+4 t=8 runtime.growslice+0
 	rel 183+4 t=8 runtime.morestack_noctxt+0

The regions marked in red seem superfluous. At a high-level, we already know that cap(b)-len(b) > 3 so the check at 0x0037 is entire redundant. Also since there is guaranteed capacity, the code at 0x0069 and on is entirely unnecessary.

@bcmills
Copy link
Contributor

bcmills commented Mar 1, 2019

CC @randall77 @josharian @martisch for proving things.
(Please let me know if I've got that list wrong — the prove pass doesn't quite correspond to a source path in https://dev.golang.org/owners.)

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 1, 2019
@bcmills bcmills added this to the Unplanned milestone Mar 1, 2019
@martisch
Copy link
Contributor

martisch commented Mar 1, 2019

Note sure how common the check (cap(b)-len(b) > 3) with subtraction is.
Just for hand optimisation: When using an addition (cap(b) > len(b)+3), which is the way append does the capacity check, the runtime.growslice call is removed in go1.11. https://godbolt.org/z/-2qtCX
(godbolt shows a call to growslice for go tip, I did not check go1.12 so there might be a regression).

For prove and any elimination of dead code the more general optimisation seems to be to infer from x-y > z that x > z+y where x & y & z >= 0 and eliminate branches accordingly. I did not check if the >= 0 condition is needed and this also always works with negative numbers and overflow.

@josharian
Copy link
Contributor

Yeah, the owners list for the compiler is unfortunate, in that there isn’t always a nice source path correspondence. For prove, I also think of @aclements, @rasky, and @zdjones.

@mariecurried
Copy link

Note sure how common the check (cap(b)-len(b) > 3) with subtraction is.
Just for hand optimisation: When using an addition (cap(b) > len(b)+3), which is the way append does the capacity check, the runtime.growslice call is removed in go1.11. https://godbolt.org/z/-2qtCX
(godbolt shows a call to growslice for go tip, I did not check go1.12 so there might be a regression).

For prove and any elimination of dead code the more general optimisation seems to be to infer from x-y > z that x > z+y where x & y & z >= 0 and eliminate branches accordingly. I did not check if the >= 0 condition is needed and this also always works with negative numbers and overflow.

I ran a bisection and found that there is in fact a regression there caused by 38e7177.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binary-size compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

7 participants