Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unsafe: add StringData, String, SliceData #53003

Closed
twmb opened this issue May 19, 2022 · 47 comments
Closed

unsafe: add StringData, String, SliceData #53003

twmb opened this issue May 19, 2022 · 47 comments

Comments

@twmb
Copy link
Contributor

twmb commented May 19, 2022

Now that reflect.StringHeader and reflect.SliceHeader are officially deprecated, I think it's time to revisit adding function that satisfy the reason people have used these types. AFAICT, the reason for deprecation is that reflect.SliceHeader and reflect.StringHeader are commonly misused. As well, the types have always been documented as unstable and not to be relied upon.

We can see in Github code search that usage of these types is ubiquitous. The most common use cases I've seen are:

  • converting []byte to string
  • converting string to []byte
  • grabbing the Data pointer field for ffi or some other niche use
  • converting a slice of one type to a slice of another type

The first use case can also commonly be seen as *(*string)(unsafe.Pointer(&mySlice)), which is never actually officially documented anywhere as something that can be relied upon. Under the hood, the shape of a string is less than a slice, so this seems valid per unsafe rule (1), but this is all relying on undocumented behavior. The second use case is commonly seen as *(*[]byte)(unsafe.Pointer(&string)), which is by-default broken because the Cap field can be past the end of a page boundary (example here, in widely used code) -- this violates unsafe rule (1).

Regardless of the thought that people should never rely upon these types, people do, and they do so all over. People also rely on invalid conversions because Go has never made this easy. Part of the discussion on #19367 was about all the ways that people misuse these types today. These conversions are small tricks that can alleviate memory pressure and improve latencies and CPU usage in real programs. The use cases are real, and Go provides just enough unsafe and buggy ways of working around these problems such that now there is a large ecosystem of technically invalid code that just so happens to work.

Rather than consistently saying "don't use this", go veting somewhat, and then ducking all responsibility for buggy programs, Go should provide actual safe(ish) APIs that people can rely on in perpetuity. New functions can live in unsafe and have well documented rules around their use cases, and then Go can finally document what to do when people want this common escape hatch.

Concrete proposal

The following APIs in the unsafe package:

// StringToBytes returns s as a byte slice by performing a non-copying type conversion.
// Slices returned from this function cannot be modified.
func StringToBytes(s string) []byte

// BytesToString returns b as a string by performing a non-copying type conversion.
// The input bytes to this function cannot be modified while any string returned from
// this function is alive.
func BytesToString(b []byte) string

func DataPointer[T ~string|~[]E, E any](t T) unsafe.Pointer eliminating, because realistically a person can just do &slice[0] (although a corresponding analogue does not exist for strings)

I think unsafe.Slice covers the use case for converting between slices of different types, although I'm not 100% sure what the use case of unsafe.Slice is.

@twmb twmb added the Proposal label May 19, 2022
@gopherbot gopherbot added this to the Proposal milestone May 19, 2022
@ianlancetaylor
Copy link
Member

Thanks, but I don't see an actual proposal here. If you want to discuss ideas, please use golang-nuts. The proposal process should be for a proposal. That is, suggest some specific functions that we should introduce. Thanks.

@twmb
Copy link
Contributor Author

twmb commented May 19, 2022

Sure, good point. I've edited my comment with an actual proposal of threetwo new functions. I think unsafe.Slice covers my the fourth common use case I mentioned above, but I'm not 100% sure.

@ianlancetaylor
Copy link
Member

One of the main use cases of unsafe.Slice is to create a slice whose backing array is a memory buffer returned from C code or from a call such as syscall.MMap. I agree that it can be used to (unsafely) convert from a slice of one type to a slice of a different type.

@ianlancetaylor
Copy link
Member

CC @mdempsky

@paulstuart

This comment was marked as off-topic.

@ianlancetaylor

This comment was marked as off-topic.

@rsc
Copy link
Contributor

rsc commented May 25, 2022

Filed #53079. Perhaps we should undeprecate them for Go 1.19, since we don't have a replacement for all valid use cases yet. Thanks.

@rsc
Copy link
Contributor

rsc commented May 25, 2022

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@bcmills
Copy link
Contributor

bcmills commented May 25, 2022

FWIW, the proposed functions closely parallel the OfString and AsString functions in my unsafeslice package.

That package also provides best-effort mutation detection, since Go string variables are supposed to be immutable and Go programs may generally assume that variables passed as type string are never mutated until they are garbage-collected.

@twmb
Copy link
Contributor Author

twmb commented May 25, 2022

I've seen that package, and the one thing I'd not do is the mutation detection (which you gate behind the unsafe build tag). I'd expect mutation detection to be automatically done when testing / when built with -race, but not to automatically be applied in releases. I'm using the unsafe package, after all :).

I was thinking to link the package in my original writeup, since it is further evidence of the use case.

@bcmills
Copy link
Contributor

bcmills commented May 25, 2022

At the very least, I think it's important for the documentation for the proposed functions to explicitly call out the lifetime issues, especially for BytesToString — a string used as a map key can cause arbitrary memory corruption if it is mutated while the map is still live.

@ianlancetaylor
Copy link
Member

To fully replace reflect.StringHeader and reflect.SliceHeader, let's consider what they permit us to do. They can be used to read and/or to modify the contents of a string or a slice.

For reading, we can already extract all the elements of a slice, via &s[0], len(s), and cap(s). We can get the length of a string via len(s), but we can't get the pointer to the data.

For writing, we can create a new slice setting all the elements, via unsafe.Slice followed by a slice expression. But we can't create a new string.

So to me that suggests, in package unsafe,

// StringData returns a pointer to the bytes of a string.
// The bytes must not be modified; doing so can cause
// the program to crash or behave unpredictably.
func StringData(string) *byte

// String constructs a string value from a pointer and a length.
// The bytes passed to String must not be modified;
// doing so can cause the program to crash or behave unpredictably.
func String(*byte, int) string

These functions should remain meaningful even if we somehow change the representation of slices or strings in the future.

The restrictions on changing the bytes are unfortunate, but the fact is that people are doing these kinds of transformations today. Omitting these functions from the unsafe package doesn't mean that Go programs won't do them, it just means that they will do in ways that are sometimes even less safe.

It should be feasible to add a dynamic detector for any modifications to these bytes. This could perhaps be enabled when using the race detector. This would not be perfect but would detect egregious misuses.

The functions suggested above would then be written as

func StringToBytes(s string) []byte {
    return unsafe.Slice(unsafe.StringData(s), len(s))
}

func BytesToString(b []byte) string {
    return unsafe.String(&b[0], len(b))
}

(To be clear, the reverse is also possible: we can write String and StringData in terms of StringToBytes and BytesToString.)

@rsc
Copy link
Contributor

rsc commented Jun 1, 2022

It does seem like unsafe.String and unsafe.StringData match unsafe.Slice a bit better and are more fundamental operations
than providing StringToBytes and BytesToString as the primitives.
I wonder if we should add unsafe.SliceData as well (code often has to work around len 0 using &s[0]).

@mdempsky
Copy link
Contributor

mdempsky commented Jun 2, 2022

I'd like to suggest we reconsider the original proposal from #19367: to add new unsafe.StringHeader and unsafe.SliceHeader types, to handle the remaining advanced use cases that are supported by reflect.{String,Slice}Header but aren't covered by unsafe.Slice.

Concretely, I propose adding:

package unsafe

type StringHeader struct {
    Data *byte
    Len int
}

type SliceHeader[Elem any] struct {
    Data *Elem
    Len, Cap int
}

and allowing conversions between string and unsafe.StringHeader, and also between []Elem and unsafe.SliceHeader[Elem].

Converting an invalid unsafe.{String,Slice}Header (e.g., Len > Cap, or Data==nil and Len>0) into a normal string or slice type should fail, at least in -d=checkptr mode. I'm leaning towards making it a run-time panic (like unsafe.Slice) because the failure conditions are easy to specify/detect, but simply leaving it undefined (like unsafe.Add) seems not unreasonable too.

N.B., my original #19367 proposal also allowed conversions between *string and *unsafe.StringHeader, etc. I think we could still allow that (e.g., particularly to help users transition away from reflect.{String,Slice}Header), but I think it's less error-prone (and marginally better for escape analysis) if we encourage users to construct an unsafe.StringHeader, value convert it to string, and then store that into memory; rather than creating a *unsafe.StringHeader and individually assigning fields in memory.

@ianlancetaylor
Copy link
Member

My concern about StringHeader and SliceHeader is that it locks all possible implementations into either using those exact headers or doing strange contortions in the compiler.

What can we do with StringHeader and SliceHeader that we can't do with Slice, SliceData, String, and StringData?

@mdempsky
Copy link
Contributor

mdempsky commented Jun 3, 2022

After thinking about it more, I'm inclined to agree that {Slice,String}{,Data} builtin functions are the way to go. As you say, they support the same functionality. I think that will be easier on tools authors than extending conversion semantics too.

@rsc
Copy link
Contributor

rsc commented Jun 8, 2022

It sounds like we've converged on considering unsafe.StringData, unsafe.String, and unsafe.SliceData.
Does anyone object to those?

@rsc rsc changed the title proposal: unsafe: add functions to replace reflect.StringHeader and reflect.SliceHeader proposal: unsafe: add StringData, String, SliceData Jun 8, 2022
@rsc
Copy link
Contributor

rsc commented Jun 15, 2022

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Jun 22, 2022

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: unsafe: add StringData, String, SliceData unsafe: add StringData, String, SliceData Jun 22, 2022
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/449537 mentions this issue: spec: document the new unsafe functions SliceData, String, and StringData

gopherbot pushed a commit that referenced this issue Nov 14, 2022
…Data

For #53003.

Change-Id: If5d76c7b8dfcbcab919cad9c333c0225fc155859
Reviewed-on: https://go-review.googlesource.com/c/go/+/449537
Reviewed-by: Matthew Dempsky <[email protected]>
Reviewed-by: Ian Lance Taylor <[email protected]>
Run-TryBot: Robert Griesemer <[email protected]>
Auto-Submit: Robert Griesemer <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Reviewed-by: Robert Griesemer <[email protected]>
@griesemer
Copy link
Contributor

Both the spec and the unsafe package documentation are now done. Closing.

@simon28082
Copy link

In practice I found out about bytes to string using the following method will result in converted values that are not as expected, to over debug my application I got this

unsafe.String(&bytes[0], byteLen)

@mitar
Copy link
Contributor

mitar commented Oct 19, 2023

I still miss that StringToBytes and BytesToString are not directly provided. As seen here, currently implementation of those functions requires:

import "unsafe"

func String2ByteSlice(str string) []byte {
	if str == "" {
		return nil
	}
	return unsafe.Slice(unsafe.StringData(str), len(str))
}

func ByteSlice2String(bs []byte) string {
	if len(bs) == 0 {
		return ""
	}
	return unsafe.String(unsafe.SliceData(bs), len(bs))
}

Those edge cases with empty strings/slices make current API very verbose as you have to check for those edge cases. Could those two functions be added to standard library?

On a related note, should examples like TextMarshalJSON one in json package use such conversion between strings and bytes? Or is that already automatically detected by the compiler and optimized?

@thepudds
Copy link
Contributor

Hi @mitar, FWIW, I don't know that the standard library would want to make that more convenient.

Note that Go 1.22 will have some nice enhancements in this area, including recognizing in many cases when a []byte is not modified after a string->[]byte conversion to enable a safe zero-copy conversion (issue #2205).

Here is an example (with comparison of 1.21 vs. tip via godbolt):

func f() {
	s := "hello"
	x := []byte(s) // zero-copy string->[]byte conversion
	_ = x
}

@database64128
Copy link
Contributor

Those edge cases with empty strings/slices make current API very verbose as you have to check for those edge cases.

@mitar I don't think these checks are needed. unsafe.StringData's documentation says:

For an empty string the return value is unspecified, and may be nil.

So it's perfectly safe to feed the returned pointer directly to unsafe.Slice. Same for unsafe.String(unsafe.SliceData(b), len(b)).

@zigo101
Copy link

zigo101 commented Oct 20, 2023

@database64128
With the simple one-line way, converting different zero strings might result different slice values:

package main

import "unsafe"

func main() {
	{
		var s = ""
		println(s == "") // true
		var x = unsafe.Slice(unsafe.StringData(s), len(s))
		println(x == nil) // true
	}
	{
		var s = "go"[:0]
		println(s == "") // true
		var x = unsafe.Slice(unsafe.StringData(s), len(s))
		println(x == nil) // false
	}
}

It might satisfy the needs of some cases, but not all.

The situations of converting from zero-length slices to strings are comparatively simpler.
But the verbose way makes sure that the result string will never keep the pointer of the underlying array of blank slices
(it is hard to say this is a benefit, just a personal preference),

package main

import "unsafe"

func ByteSlice2String(bs []byte) string {
	if len(bs) == 0 {
		return ""
	}
	return unsafe.String(unsafe.SliceData(bs), len(bs))
}

func main() {
	var x = make([]byte, 0, 1024)
	var s = unsafe.String(unsafe.SliceData(x), len(x))
	println(s == "") // true
	println(*(*uintptr)(unsafe.Pointer(&s))) // <a non-zero addr>
	
	var s2 = ByteSlice2String(x)
	println(s2 == "") // true
	println(*(*uintptr)(unsafe.Pointer(&s2))) // 0
}

@hrissan
Copy link

hrissan commented Feb 26, 2024

Why originally proposed functions StringToBytes and BytesToString are not added to unsafe package, so people stop writing various implementations with questionable correctness again and again is beyond my comprehension.

@erikdubbelboer
Copy link
Contributor

Because StringToBytes is now just unsafe.Slice(unsafe.StringData(s), len(s)) and BytesToString is just unsafe.String(unsafe.SliceData(b), len(b)) which are both very easy and there shouldn't be any implementations with questionable correctness.

dtrudg added a commit to sylabs/singularity that referenced this issue Mar 21, 2024
If we are running with a PID namespace, we have a shim process by
default. We have code that sets the name of this process to `sinit` by:

 * Overwriting the value of argv[0]
 * Using PR_SET_NAME

Both are necessary as PR_SET_NAME only affects the value from
PR_GET_NAME.

In this code, unsafe.StringHeader is now deprecated. Rewrite using the
pattern that takes an unsafe.Slice from an unsafe.StringData to get the
raw bytes for the string os.Args[0].

See:

golang/go#53003 (comment)
dtrudg added a commit to dtrudg/singularity that referenced this issue Mar 21, 2024
If we are running with a PID namespace, we have a shim process by
default. We have code that sets the name of this process to `sinit` by:

 * Overwriting the value of argv[0]
 * Using PR_SET_NAME

Both are necessary as PR_SET_NAME only affects the value from
PR_GET_NAME.

In this code, unsafe.StringHeader is now deprecated. Rewrite using the
pattern that takes an unsafe.Slice from an unsafe.StringData to get the
raw bytes for the string os.Args[0].

See:

golang/go#53003 (comment)
dtrudg added a commit to dtrudg/singularity that referenced this issue Mar 21, 2024
If we are running with a PID namespace, we have a shim process by
default. We have code that sets the name of this process to `sinit` by:

 * Overwriting the value of argv[0]
 * Using PR_SET_NAME

Both are necessary as PR_SET_NAME only affects the value from
PR_GET_NAME.

In this code, unsafe.StringHeader is now deprecated. Rewrite using the
pattern that takes an unsafe.Slice from an unsafe.StringData to get the
raw bytes for the string os.Args[0].

See:

golang/go#53003 (comment)
DrDaveD pushed a commit to DrDaveD/apptainer that referenced this issue Jun 18, 2024
If we are running with a PID namespace, we have a shim process by
default. We have code that sets the name of this process to `sinit` by:

 * Overwriting the value of argv[0]
 * Using PR_SET_NAME

Both are necessary as PR_SET_NAME only affects the value from
PR_GET_NAME.

In this code, unsafe.StringHeader is now deprecated. Rewrite using the
pattern that takes an unsafe.Slice from an unsafe.StringData to get the
raw bytes for the string os.Args[0].

See:

golang/go#53003 (comment)
Signed-off-by: Dave Dykstra <[email protected]>
DrDaveD pushed a commit to DrDaveD/apptainer that referenced this issue Jun 18, 2024
If we are running with a PID namespace, we have a shim process by
default. We have code that sets the name of this process to `sinit` by:

 * Overwriting the value of argv[0]
 * Using PR_SET_NAME

Both are necessary as PR_SET_NAME only affects the value from
PR_GET_NAME.

In this code, unsafe.StringHeader is now deprecated. Rewrite using the
pattern that takes an unsafe.Slice from an unsafe.StringData to get the
raw bytes for the string os.Args[0].

See:

golang/go#53003 (comment)
Signed-off-by: Dave Dykstra <[email protected]>
fzipi pushed a commit to corazawaf/coraza that referenced this issue Oct 4, 2024
)

Closes #1147.

The Godoc of `reflect.StringHeader` says [1]:

  "the Data field is not sufficient to guarantee the data it references
will not be garbage collected, so programs must keep a separate,
correctly typed pointer to the underlying data."

And the Godoc of `unsafe.Pointer` says [2]:

  "A uintptr is an integer, not a reference. ... Even if a uintptr holds
the address of some object, the garbage collector will not update that
uintptr's value if the object moves, nor will that uintptr keep the
object from being reclaimed."

The replacement `unsafe.StringData` is a more correct way to get the
pointer to the string data. The original proposal can be seen in
golang/go#53003 (comment).

[1]: https://pkg.go.dev/reflect#StringHeader
[2]: https://pkg.go.dev/unsafe#Pointer

Signed-off-by: Eng Zer Jun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests