unsafe: add StringData, String, SliceData #53003

twmb · 2022-05-19T19:27:49Z

Now that reflect.StringHeader and reflect.SliceHeader are officially deprecated, I think it's time to revisit adding function that satisfy the reason people have used these types. AFAICT, the reason for deprecation is that reflect.SliceHeader and reflect.StringHeader are commonly misused. As well, the types have always been documented as unstable and not to be relied upon.

We can see in Github code search that usage of these types is ubiquitous. The most common use cases I've seen are:

converting []byte to string
converting string to []byte
grabbing the Data pointer field for ffi or some other niche use
converting a slice of one type to a slice of another type

The first use case can also commonly be seen as *(*string)(unsafe.Pointer(&mySlice)), which is never actually officially documented anywhere as something that can be relied upon. Under the hood, the shape of a string is less than a slice, so this seems valid per unsafe rule (1), but this is all relying on undocumented behavior. The second use case is commonly seen as *(*[]byte)(unsafe.Pointer(&string)), which is by-default broken because the Cap field can be past the end of a page boundary (example here, in widely used code) -- this violates unsafe rule (1).

Regardless of the thought that people should never rely upon these types, people do, and they do so all over. People also rely on invalid conversions because Go has never made this easy. Part of the discussion on #19367 was about all the ways that people misuse these types today. These conversions are small tricks that can alleviate memory pressure and improve latencies and CPU usage in real programs. The use cases are real, and Go provides just enough unsafe and buggy ways of working around these problems such that now there is a large ecosystem of technically invalid code that just so happens to work.

Rather than consistently saying "don't use this", go veting somewhat, and then ducking all responsibility for buggy programs, Go should provide actual safe(ish) APIs that people can rely on in perpetuity. New functions can live in unsafe and have well documented rules around their use cases, and then Go can finally document what to do when people want this common escape hatch.

Concrete proposal

The following APIs in the unsafe package:

// StringToBytes returns s as a byte slice by performing a non-copying type conversion.
// Slices returned from this function cannot be modified.
func StringToBytes(s string) []byte

// BytesToString returns b as a string by performing a non-copying type conversion.
// The input bytes to this function cannot be modified while any string returned from
// this function is alive.
func BytesToString(b []byte) string

~~func DataPointer[T ~string|~[]E, E any](t T) unsafe.Pointer~~ eliminating, because realistically a person can just do &slice[0] (although a corresponding analogue does not exist for strings)

I think unsafe.Slice covers the use case for converting between slices of different types, although I'm not 100% sure what the use case of unsafe.Slice is.

The text was updated successfully, but these errors were encountered:

ianlancetaylor · 2022-05-19T22:47:24Z

Thanks, but I don't see an actual proposal here. If you want to discuss ideas, please use golang-nuts. The proposal process should be for a proposal. That is, suggest some specific functions that we should introduce. Thanks.

twmb · 2022-05-19T23:29:56Z

Sure, good point. I've edited my comment with an actual proposal of ~~three~~two new functions. I think unsafe.Slice covers my the fourth common use case I mentioned above, but I'm not 100% sure.

ianlancetaylor · 2022-05-20T02:29:49Z

One of the main use cases of unsafe.Slice is to create a slice whose backing array is a memory buffer returned from C code or from a call such as syscall.MMap. I agree that it can be used to (unsafely) convert from a slice of one type to a slice of a different type.

ianlancetaylor · 2022-05-20T02:30:14Z

CC @mdempsky

rsc · 2022-05-25T17:57:30Z

Filed #53079. Perhaps we should undeprecate them for Go 1.19, since we don't have a replacement for all valid use cases yet. Thanks.

rsc · 2022-05-25T18:05:01Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

bcmills · 2022-05-25T19:08:11Z

FWIW, the proposed functions closely parallel the OfString and AsString functions in my unsafeslice package.

That package also provides best-effort mutation detection, since Go string variables are supposed to be immutable and Go programs may generally assume that variables passed as type string are never mutated until they are garbage-collected.

twmb · 2022-05-25T19:14:41Z

I've seen that package, and the one thing I'd not do is the mutation detection (which you gate behind the unsafe build tag). I'd expect mutation detection to be automatically done when testing / when built with -race, but not to automatically be applied in releases. I'm using the unsafe package, after all :).

I was thinking to link the package in my original writeup, since it is further evidence of the use case.

bcmills · 2022-05-25T19:22:40Z

At the very least, I think it's important for the documentation for the proposed functions to explicitly call out the lifetime issues, especially for BytesToString — a string used as a map key can cause arbitrary memory corruption if it is mutated while the map is still live.

ianlancetaylor · 2022-05-28T14:36:40Z

To fully replace reflect.StringHeader and reflect.SliceHeader, let's consider what they permit us to do. They can be used to read and/or to modify the contents of a string or a slice.

For reading, we can already extract all the elements of a slice, via &s[0], len(s), and cap(s). We can get the length of a string via len(s), but we can't get the pointer to the data.

For writing, we can create a new slice setting all the elements, via unsafe.Slice followed by a slice expression. But we can't create a new string.

So to me that suggests, in package unsafe,

// StringData returns a pointer to the bytes of a string.
// The bytes must not be modified; doing so can cause
// the program to crash or behave unpredictably.
func StringData(string) *byte

// String constructs a string value from a pointer and a length.
// The bytes passed to String must not be modified;
// doing so can cause the program to crash or behave unpredictably.
func String(*byte, int) string

These functions should remain meaningful even if we somehow change the representation of slices or strings in the future.

The restrictions on changing the bytes are unfortunate, but the fact is that people are doing these kinds of transformations today. Omitting these functions from the unsafe package doesn't mean that Go programs won't do them, it just means that they will do in ways that are sometimes even less safe.

It should be feasible to add a dynamic detector for any modifications to these bytes. This could perhaps be enabled when using the race detector. This would not be perfect but would detect egregious misuses.

The functions suggested above would then be written as

func StringToBytes(s string) []byte {
    return unsafe.Slice(unsafe.StringData(s), len(s))
}

func BytesToString(b []byte) string {
    return unsafe.String(&b[0], len(b))
}

(To be clear, the reverse is also possible: we can write String and StringData in terms of StringToBytes and BytesToString.)

rsc · 2022-06-01T17:17:06Z

It does seem like unsafe.String and unsafe.StringData match unsafe.Slice a bit better and are more fundamental operations
than providing StringToBytes and BytesToString as the primitives.
I wonder if we should add unsafe.SliceData as well (code often has to work around len 0 using &s[0]).

mdempsky · 2022-06-02T19:26:04Z

I'd like to suggest we reconsider the original proposal from #19367: to add new unsafe.StringHeader and unsafe.SliceHeader types, to handle the remaining advanced use cases that are supported by reflect.{String,Slice}Header but aren't covered by unsafe.Slice.

Concretely, I propose adding:

package unsafe

type StringHeader struct {
    Data *byte
    Len int
}

type SliceHeader[Elem any] struct {
    Data *Elem
    Len, Cap int
}

and allowing conversions between string and unsafe.StringHeader, and also between []Elem and unsafe.SliceHeader[Elem].

Converting an invalid unsafe.{String,Slice}Header (e.g., Len > Cap, or Data==nil and Len>0) into a normal string or slice type should fail, at least in -d=checkptr mode. I'm leaning towards making it a run-time panic (like unsafe.Slice) because the failure conditions are easy to specify/detect, but simply leaving it undefined (like unsafe.Add) seems not unreasonable too.

N.B., my original #19367 proposal also allowed conversions between *string and *unsafe.StringHeader, etc. I think we could still allow that (e.g., particularly to help users transition away from reflect.{String,Slice}Header), but I think it's less error-prone (and marginally better for escape analysis) if we encourage users to construct an unsafe.StringHeader, value convert it to string, and then store that into memory; rather than creating a *unsafe.StringHeader and individually assigning fields in memory.

ianlancetaylor · 2022-06-03T00:34:37Z

My concern about StringHeader and SliceHeader is that it locks all possible implementations into either using those exact headers or doing strange contortions in the compiler.

What can we do with StringHeader and SliceHeader that we can't do with Slice, SliceData, String, and StringData?

mdempsky · 2022-06-03T00:46:28Z

After thinking about it more, I'm inclined to agree that {Slice,String}{,Data} builtin functions are the way to go. As you say, they support the same functionality. I think that will be easier on tools authors than extending conversion semantics too.

rsc · 2022-06-08T17:39:21Z

It sounds like we've converged on considering unsafe.StringData, unsafe.String, and unsafe.SliceData.
Does anyone object to those?

rsc · 2022-06-15T18:35:42Z

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

rsc · 2022-06-22T18:08:16Z

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

gopherbot · 2022-11-10T23:30:20Z

Change https://go.dev/cl/449537 mentions this issue: spec: document the new unsafe functions SliceData, String, and StringData

…Data For #53003. Change-Id: If5d76c7b8dfcbcab919cad9c333c0225fc155859 Reviewed-on: https://go-review.googlesource.com/c/go/+/449537 Reviewed-by: Matthew Dempsky <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> Run-TryBot: Robert Griesemer <[email protected]> Auto-Submit: Robert Griesemer <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Robert Griesemer <[email protected]>

griesemer · 2022-11-15T23:58:28Z

Both the spec and the unsafe package documentation are now done. Closing.

Ref: golang/go#53003 (comment)

simon28082 · 2023-08-14T05:05:17Z

In practice I found out about bytes to string using the following method will result in converted values that are not as expected, to over debug my application I got this

unsafe.String(&bytes[0], byteLen)

mitar · 2023-10-19T09:14:00Z

I still miss that StringToBytes and BytesToString are not directly provided. As seen here, currently implementation of those functions requires:

import "unsafe"

func String2ByteSlice(str string) []byte {
	if str == "" {
		return nil
	}
	return unsafe.Slice(unsafe.StringData(str), len(str))
}

func ByteSlice2String(bs []byte) string {
	if len(bs) == 0 {
		return ""
	}
	return unsafe.String(unsafe.SliceData(bs), len(bs))
}

Those edge cases with empty strings/slices make current API very verbose as you have to check for those edge cases. Could those two functions be added to standard library?

On a related note, should examples like TextMarshalJSON one in json package use such conversion between strings and bytes? Or is that already automatically detected by the compiler and optimized?

thepudds · 2023-10-19T14:33:24Z

Hi @mitar, FWIW, I don't know that the standard library would want to make that more convenient.

Note that Go 1.22 will have some nice enhancements in this area, including recognizing in many cases when a []byte is not modified after a string->[]byte conversion to enable a safe zero-copy conversion (issue #2205).

Here is an example (with comparison of 1.21 vs. tip via godbolt):

func f() {
	s := "hello"
	x := []byte(s) // zero-copy string->[]byte conversion
	_ = x
}

database64128 · 2023-10-20T05:34:43Z

Those edge cases with empty strings/slices make current API very verbose as you have to check for those edge cases.

@mitar I don't think these checks are needed. unsafe.StringData's documentation says:

For an empty string the return value is unspecified, and may be nil.

So it's perfectly safe to feed the returned pointer directly to unsafe.Slice. Same for unsafe.String(unsafe.SliceData(b), len(b)).

zigo101 · 2023-10-20T07:03:54Z

@database64128
With the simple one-line way, converting different zero strings might result different slice values:

package main

import "unsafe"

func main() {
	{
		var s = ""
		println(s == "") // true
		var x = unsafe.Slice(unsafe.StringData(s), len(s))
		println(x == nil) // true
	}
	{
		var s = "go"[:0]
		println(s == "") // true
		var x = unsafe.Slice(unsafe.StringData(s), len(s))
		println(x == nil) // false
	}
}

It might satisfy the needs of some cases, but not all.

The situations of converting from zero-length slices to strings are comparatively simpler.
But the verbose way makes sure that the result string will never keep the pointer of the underlying array of blank slices
(it is hard to say this is a benefit, just a personal preference),

package main

import "unsafe"

func ByteSlice2String(bs []byte) string {
	if len(bs) == 0 {
		return ""
	}
	return unsafe.String(unsafe.SliceData(bs), len(bs))
}

func main() {
	var x = make([]byte, 0, 1024)
	var s = unsafe.String(unsafe.SliceData(x), len(x))
	println(s == "") // true
	println(*(*uintptr)(unsafe.Pointer(&s))) // <a non-zero addr>
	
	var s2 = ByteSlice2String(x)
	println(s2 == "") // true
	println(*(*uintptr)(unsafe.Pointer(&s2))) // 0
}

hrissan · 2024-02-26T11:40:09Z

Why originally proposed functions StringToBytes and BytesToString are not added to unsafe package, so people stop writing various implementations with questionable correctness again and again is beyond my comprehension.

erikdubbelboer · 2024-02-26T17:57:56Z

Because StringToBytes is now just unsafe.Slice(unsafe.StringData(s), len(s)) and BytesToString is just unsafe.String(unsafe.SliceData(b), len(b)) which are both very easy and there shouldn't be any implementations with questionable correctness.

If we are running with a PID namespace, we have a shim process by default. We have code that sets the name of this process to `sinit` by: * Overwriting the value of argv[0] * Using PR_SET_NAME Both are necessary as PR_SET_NAME only affects the value from PR_GET_NAME. In this code, unsafe.StringHeader is now deprecated. Rewrite using the pattern that takes an unsafe.Slice from an unsafe.StringData to get the raw bytes for the string os.Args[0]. See: golang/go#53003 (comment)

If we are running with a PID namespace, we have a shim process by default. We have code that sets the name of this process to `sinit` by: * Overwriting the value of argv[0] * Using PR_SET_NAME Both are necessary as PR_SET_NAME only affects the value from PR_GET_NAME. In this code, unsafe.StringHeader is now deprecated. Rewrite using the pattern that takes an unsafe.Slice from an unsafe.StringData to get the raw bytes for the string os.Args[0]. See: golang/go#53003 (comment) Signed-off-by: Dave Dykstra <[email protected]>

) Closes #1147. The Godoc of `reflect.StringHeader` says [1]: "the Data field is not sufficient to guarantee the data it references will not be garbage collected, so programs must keep a separate, correctly typed pointer to the underlying data." And the Godoc of `unsafe.Pointer` says [2]: "A uintptr is an integer, not a reference. ... Even if a uintptr holds the address of some object, the garbage collector will not update that uintptr's value if the object moves, nor will that uintptr keep the object from being reclaimed." The replacement `unsafe.StringData` is a more correct way to get the pointer to the string data. The original proposal can be seen in golang/go#53003 (comment). [1]: https://pkg.go.dev/reflect#StringHeader [2]: https://pkg.go.dev/unsafe#Pointer Signed-off-by: Eng Zer Jun <[email protected]>

twmb added the Proposal label May 19, 2022

gopherbot added this to the Proposal milestone May 19, 2022

This comment was marked as off-topic.

Sign in to view

rsc mentioned this issue May 25, 2022

reflect: undeprecate StringHeader, SliceHeader #53079

Closed

rsc changed the title ~~proposal: unsafe: add functions to replace reflect.StringHeader and reflect.SliceHeader~~ proposal: unsafe: add StringData, String, SliceData Jun 8, 2022

rsc added the Proposal-FinalCommentPeriod label Jun 15, 2022

rsc changed the title ~~proposal: unsafe: add StringData, String, SliceData~~ unsafe: add StringData, String, SliceData Jun 22, 2022

rsc added the Proposal-Accepted label Jun 22, 2022

griesemer closed this as completed Nov 15, 2022

QuLogic mentioned this issue Jan 23, 2023

go1.20: add StringData, String, SliceData tinygo-org/tinygo#3407

Closed

monkey92t mentioned this issue Mar 3, 2023

Util: add StringData, String, SliceData(go 1.20) redis/go-redis#2472

Open

eugenepaniot mentioned this issue Mar 21, 2023

Ripley performance improvements loveholidays/ripley#8

Closed

This was referenced Mar 24, 2023

feat: support peek multi value for query args cloudwego/hertz#684

Merged

optimize(bytesconv): add support for go1.20 cloudwego/hertz#686

Open

KasonBraley added a commit to KasonBraley/kit that referenced this issue Jun 9, 2023

Add bytesconv pkg for converting strings and slices efficiently

7b4ad88

Ref: golang/go#53003 (comment)

rsc removed this from Proposals Nov 16, 2023

ymz-ncnk mentioned this issue Jan 14, 2024

BENC's unsafe code alecthomas/go_serialization_benchmarks#130

Closed

dtrudg mentioned this issue Mar 21, 2024

Bump go to minimum of 1.21 sylabs/singularity#2767

Merged

dims mentioned this issue Jun 20, 2024

Replaced unsafe packages with []byte(s) as of go 1.22 - (#124656) kubernetes/kubernetes#124708

Closed

rleungx mentioned this issue Aug 6, 2024

*: reduce unsafe usage tikv/pd#8482

Merged

Juneezee mentioned this issue Oct 2, 2024

refactor: replace reflect.StringHeader with unsafe.StringData corazawaf/coraza#1162

Merged

4 tasks

karelbilek mentioned this issue Oct 10, 2024

use unsafe.String tinylib/msgp#371

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unsafe: add StringData, String, SliceData #53003

unsafe: add StringData, String, SliceData #53003

twmb commented May 19, 2022 •

edited

Loading

ianlancetaylor commented May 19, 2022

twmb commented May 19, 2022 •

edited

Loading

ianlancetaylor commented May 20, 2022

ianlancetaylor commented May 20, 2022

This comment was marked as off-topic.

This comment was marked as off-topic.

rsc commented May 25, 2022

rsc commented May 25, 2022

bcmills commented May 25, 2022

twmb commented May 25, 2022

bcmills commented May 25, 2022

ianlancetaylor commented May 28, 2022

rsc commented Jun 1, 2022

mdempsky commented Jun 2, 2022

ianlancetaylor commented Jun 3, 2022

mdempsky commented Jun 3, 2022

rsc commented Jun 8, 2022

rsc commented Jun 15, 2022

rsc commented Jun 22, 2022

gopherbot commented Nov 10, 2022

griesemer commented Nov 15, 2022

simon28082 commented Aug 14, 2023

mitar commented Oct 19, 2023

thepudds commented Oct 19, 2023

database64128 commented Oct 20, 2023

zigo101 commented Oct 20, 2023 •

edited

Loading

hrissan commented Feb 26, 2024

erikdubbelboer commented Feb 26, 2024

unsafe: add StringData, String, SliceData #53003

unsafe: add StringData, String, SliceData #53003

Comments

twmb commented May 19, 2022 • edited Loading

Concrete proposal

ianlancetaylor commented May 19, 2022

twmb commented May 19, 2022 • edited Loading

ianlancetaylor commented May 20, 2022

ianlancetaylor commented May 20, 2022

This comment was marked as off-topic.

This comment was marked as off-topic.

rsc commented May 25, 2022

rsc commented May 25, 2022

bcmills commented May 25, 2022

twmb commented May 25, 2022

bcmills commented May 25, 2022

ianlancetaylor commented May 28, 2022

rsc commented Jun 1, 2022

mdempsky commented Jun 2, 2022

ianlancetaylor commented Jun 3, 2022

mdempsky commented Jun 3, 2022

rsc commented Jun 8, 2022

rsc commented Jun 15, 2022

rsc commented Jun 22, 2022

gopherbot commented Nov 10, 2022

griesemer commented Nov 15, 2022

simon28082 commented Aug 14, 2023

mitar commented Oct 19, 2023

thepudds commented Oct 19, 2023

database64128 commented Oct 20, 2023

zigo101 commented Oct 20, 2023 • edited Loading

hrissan commented Feb 26, 2024

erikdubbelboer commented Feb 26, 2024

twmb commented May 19, 2022 •

edited

Loading

twmb commented May 19, 2022 •

edited

Loading

zigo101 commented Oct 20, 2023 •

edited

Loading