Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite docs for std::ptr #49767

Merged
merged 9 commits into from
May 15, 2018
Merged

Rewrite docs for std::ptr #49767

merged 9 commits into from
May 15, 2018

Conversation

ecstatic-morse
Copy link
Contributor

@ecstatic-morse ecstatic-morse commented Apr 7, 2018

This PR attempts to resolve #29371.

Rendered

This is a fairly major rewrite of the std::ptr docs, and deserves a fair bit of scrutiny. It adds links to the GNU libc docs for various instrinsics, adds internal links to types and functions referenced in the docs, adds new, more complex examples for many functions, and introduces a common template for discussing unsafety of functions in std::ptr.

All functions in std::ptr (with the exception of ptr::eq) are unsafe because they either read from or write to a raw pointer. The "Safety" section now informs that the function is unsafe because it dereferences a raw pointer and requires that any pointer to be read by the function points to "a valid vaue of type T".

Additionally, each function imposes some subset of the following conditions on its arguments.

  • The pointer points to valid memory.
  • The pointer points to initialized memory.
  • The pointer is properly aligned.

These requirements are discussed in the "Undefined Behavior" section along with the consequences of using functions that perform bitwise copies without requiring T: Copy. I don't love my new descriptions of the consequences of making such copies. Perhaps the old ones were good enough?

Some issues which still need to be addressed before this is merged:

  • The new docs assert that drop_in_place is equivalent to calling read and discarding the value. Is this correct?
  • Do write_bytes and swap_nonoverlapping require properly aligned pointers?
  • The new example for drop_in_place is a lackluster.
  • Should these docs rigorously define what valid memory is? Or should is that the job of the reference? Should we link to the reference?
  • Is it correct to require that pointers that will be read from refer to "valid values of type T"?
  • I can't imagine ever using {read,write}_volatile with non-Copy types. Should I just link to {read,write} and say that the same issues with non-Copy types apply?
  • write_volatile doesn't link back to read_volatile.
  • Update docs for the unstable swap_nonoverlapping
  • Update docs for the unstable unsafe pointer methods RFC

Looking forward to your feedback.

r? @steveklabnik

- Add links to the GNU libc docs for `memmove`, `memcpy`, and
  `memset`, as well as internally linking to other functions in `std::ptr`
- List sources of UB for all functions.
- Add example to `ptr::drop_in_place` and compares it to `ptr::read`.
- Add examples which more closely mirror real world uses for the
  functions in `std::ptr`. Also, move the reimplementation of `mem::swap`
  to the examples of `ptr::read` and use a more interesting example for
  `copy_nonoverlapping`.
- Change module level description
@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @steveklabnik (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 7, 2018
@TimNN
Copy link
Contributor

TimNN commented Apr 7, 2018

Your PR failed on Travis. Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[00:00:46] configure: rust.quiet-tests     := True
---
[00:05:21] tidy error: /checkout/src/libcore/intrinsics.rs:995: line longer than 100 chars
[00:05:21] tidy error: /checkout/src/libcore/intrinsics.rs:1152: unexplained "```ignore" doctest; try one:
[00:05:21]
[00:05:21] * make the test actually pass, by adding necessary imports and declarations, or
[00:05:21] * use "```text", if the code is not Rust code, or
[00:05:21] * use "```compile_fail,Ennnn", if the code is expected to fail at compile time, or
[00:05:21] * use "```should_panic", if the code is expected to fail at run time, or
[00:05:21] * use "```no_run", if the code should type-check but not necessary linkable/runnable, or
[00:05:21] * explain it like "```ignore (cannot-test-this-because-xxxx)", if the annotation cannot be avoided.

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN.

@ecstatic-morse
Copy link
Contributor Author

ecstatic-morse commented Apr 7, 2018

The CI build failed because I added an ignored doctest to write_byteswhich demonstrates how to cause UB by overwriting a Box<T> with all zeroes. This example is fairly trivial; I could remove it, but is there a better way to fix this? Perhaps no_run?

This also fixes improper text wrapping.
@TimNN

This comment has been minimized.

The rendered version does not make clear that this is a link to another
page, and it breaks the anchor link.
@TimNN

This comment has been minimized.

@TimNN

This comment has been minimized.

@TimNN

This comment has been minimized.

@ecstatic-morse
Copy link
Contributor Author

Sorry for all the failing builds, it took me a while to figure out that I had to do x.py test src/tools/linkchecker.

@steveklabnik
Copy link
Member

Ill try to give this a review soon, thanks!

We should also probably have @rust-lang/libs look at it too

/// The caller must ensure that `src` points to a valid sequence of type
/// `T`.
///
/// # Undefined Behavior
Copy link
Contributor

@ExpHP ExpHP Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the role of having both Safety and Undefined Behavior sections. It seems to me that the convention is to have a single section named Safety.

[ lampam @ 21:52:55 ] (master •58 -7 +7) ~/asd/clone/rust
$ rg '#* Undefined [bB]ehavio(u)?r'
src/libcore/mem.rs
532:/// # Undefined behavior

src/liballoc/raw_vec.rs
154:    /// # Undefined Behavior
171:    /// # Undefined Behavior

$ rg '# Safety'
(far more results)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(granted, I often wish the # Safety sections tended to more closely resemble your # Undefined Behavior sections, but that is just my opinion)

Copy link
Contributor Author

@ecstatic-morse ecstatic-morse Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steveklabnik mentioned that new docs should make use of Undefined Behavior (I think upper case is correct). All these functions are trivially unsafe because they dereference raw pointers, and the Safety section as I have it doesn't add any new information. Perhaps it could be moved to the module docs and not repeated everywhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I would favor a change like you just mentioned, but will also wait to see what Steve says. 😉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, I felt like these two cases were pretty distinct, but now, re-thinking about it, I'm not so sure, honestly. In some senses, UB is a subset of safety, and it feels like it's worth calling out separately, but maybe not. dunno.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused; I thought UB was the consequence of not following Safety. Is there a kind of unsafety that doesn't lead to UB?

Copy link
Contributor Author

@ecstatic-morse ecstatic-morse Apr 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some background on my thought process:

While writing this, I took the point of view that Safety referred to the language feature (i.e. the subset of rust you must opt into with the unsafe keyword). Some functions (e.g. Vec::set_len) are not obviously unsafe from looking at their signature. The Safety section is probably a good place to discuss this.

In discussing why a function is unsafe, one must communicate what invariants must be maintained to avoid UB. Most of the standard library uses the Safety heading to communicate this. I am now of the opinion that there's never really a reason to have both of these top-level headings (as I do now) since the information is so closely related.

However, for functions whose unsafety is obvious because they take a raw pointer and have names like read, write, and copy, it seemed clearer to name the section discussing its invariants Undefined Behavior. RawVec::from_raw_parts also uses this convention.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scottmcm "safety" is "memory safety", there are kinds of UB that are not memory safety related.

///
/// # Undefined Behavior
///
/// `write` can trigger undefined behavior if any of the following conditions
Copy link
Contributor

@ExpHP ExpHP Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am bothered by the language "can trigger undefined behavior," since for all intents and purposes, there's no difference from saying that it is undefined behavior.

Copy link
Contributor Author

@ecstatic-morse ecstatic-morse Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops. I thought I changed this for all functions to "Behavior is undefined if any of the following conditions are violated:", but managed to miss write{,_unaligned,_bytes}.

Copy link
Member

@steveklabnik steveklabnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than one tiny nit, the overall formatting and structure, etc, is awesome here! Once someone from libs signs off, and this gets tweaked, i'm happy to :shipit:

@@ -10,7 +10,7 @@

// FIXME: talk about offset, copy_memory, copy_nonoverlapping_memory

//! Raw, unsafe pointers, `*const T`, and `*mut T`.
//! Manually manage memory through raw, unsafe pointers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would drop the "unsafe" bit here and just use "through raw pointers"

- Remove redundant "unsafe" from module description.
- Add a missing `Safety` heading to `read_unaligned`.
- Remove weasel words in `Undefined Behavior` description for
  `write{,_unaligned,_bytes}`.
@pietroalbini pietroalbini added A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. and removed A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools labels Apr 16, 2018
@pietroalbini
Copy link
Member

Ping from triage! The reviewer asked a signoff from @rust-lang/libs, can someone check this PR?

@pietroalbini pietroalbini added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 16, 2018
@pietroalbini
Copy link
Member

Ping from triage @rust-lang/libs! Can someone check if the docs here are correct?

@alexcrichton
Copy link
Member

I'd personally prefer to fold the safety/undefined behavior sections together and also ensure it's worded something along the lines of "if violated these conditions will cause the function to be unsafe ..." instead of "this function is only unsafe if at least one of these conditions is violated ..." in the sense that these are descriptions of safety but not necessarily 100% exhaustive

@alexcrichton alexcrichton added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label May 2, 2018
Non-`Copy` types should not be in volatile memory.
@pietroalbini pietroalbini added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 14, 2018
@pietroalbini
Copy link
Member

Ping from triage @steveklabnik! The author addressed the comments from the libs team.

@steveklabnik
Copy link
Member

@bors: r+ rollup

thanks!

@bors
Copy link
Contributor

bors commented May 14, 2018

📌 Commit 827251e has been approved by steveklabnik

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 14, 2018
kennytm added a commit to kennytm/rust that referenced this pull request May 14, 2018
…bnik

Rewrite docs for `std::ptr`

This PR attempts to resolve rust-lang#29371.

This is a fairly major rewrite of the `std::ptr` docs, and deserves a fair bit of scrutiny. It adds links to the GNU libc docs for various instrinsics, adds internal links to types and functions referenced in the docs, adds new, more complex examples for many functions, and introduces a common template for discussing unsafety of functions in `std::ptr`.

All functions in `std::ptr` (with the exception of `ptr::eq`) are unsafe because they either read from or write to a raw pointer. The "Safety" section now informs that the function is unsafe because it dereferences a raw pointer and requires that any pointer to be read by the function points to "a valid vaue of type `T`".

Additionally, each function imposes some subset of the following conditions on its arguments.

* The pointer points to valid memory.
* The pointer points to initialized memory.
* The pointer is properly aligned.

These requirements are discussed in the "Undefined Behavior" section along with the  consequences of using functions that perform bitwise copies without requiring `T: Copy`. I don't love my new descriptions of the consequences of making such copies. Perhaps the old ones were good enough?

Some issues which still need to be addressed before this is merged:
- [ ] The new docs assert that `drop_in_place` is equivalent to calling `read` and discarding the value. Is this correct?
- [ ] Do `write_bytes` and `swap_nonoverlapping` require properly aligned pointers?
- [ ] The new example for `drop_in_place` is a lackluster.
- [ ] Should these docs rigorously define what `valid` memory is? Or should is that the job of the reference? Should we link to the reference?
- [ ] Is it correct to require that pointers that will be read from refer to "valid values of type `T`"?
- [x] I can't imagine ever using `{read,write}_volatile` with non-`Copy` types.  Should I just link to {read,write} and say that the same issues with non-`Copy` types apply?
- [x] `write_volatile` doesn't link back to `read_volatile`.
- [ ] Update docs for the unstable [`swap_nonoverlapping`](rust-lang#42818)
- [ ] Update docs for the unstable [unsafe pointer methods RFC](rust-lang/rfcs#1966)

Looking forward to your feedback.

r? @steveklabnik
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request May 15, 2018
…bnik

Rewrite docs for `std::ptr`

This PR attempts to resolve rust-lang#29371.

This is a fairly major rewrite of the `std::ptr` docs, and deserves a fair bit of scrutiny. It adds links to the GNU libc docs for various instrinsics, adds internal links to types and functions referenced in the docs, adds new, more complex examples for many functions, and introduces a common template for discussing unsafety of functions in `std::ptr`.

All functions in `std::ptr` (with the exception of `ptr::eq`) are unsafe because they either read from or write to a raw pointer. The "Safety" section now informs that the function is unsafe because it dereferences a raw pointer and requires that any pointer to be read by the function points to "a valid vaue of type `T`".

Additionally, each function imposes some subset of the following conditions on its arguments.

* The pointer points to valid memory.
* The pointer points to initialized memory.
* The pointer is properly aligned.

These requirements are discussed in the "Undefined Behavior" section along with the  consequences of using functions that perform bitwise copies without requiring `T: Copy`. I don't love my new descriptions of the consequences of making such copies. Perhaps the old ones were good enough?

Some issues which still need to be addressed before this is merged:
- [ ] The new docs assert that `drop_in_place` is equivalent to calling `read` and discarding the value. Is this correct?
- [ ] Do `write_bytes` and `swap_nonoverlapping` require properly aligned pointers?
- [ ] The new example for `drop_in_place` is a lackluster.
- [ ] Should these docs rigorously define what `valid` memory is? Or should is that the job of the reference? Should we link to the reference?
- [ ] Is it correct to require that pointers that will be read from refer to "valid values of type `T`"?
- [x] I can't imagine ever using `{read,write}_volatile` with non-`Copy` types.  Should I just link to {read,write} and say that the same issues with non-`Copy` types apply?
- [x] `write_volatile` doesn't link back to `read_volatile`.
- [ ] Update docs for the unstable [`swap_nonoverlapping`](rust-lang#42818)
- [ ] Update docs for the unstable [unsafe pointer methods RFC](rust-lang/rfcs#1966)

Looking forward to your feedback.

r? @steveklabnik
bors added a commit that referenced this pull request May 15, 2018
Rollup of 11 pull requests

Successful merges:

 - #49767 (Rewrite docs for `std::ptr`)
 - #50399 (save-analysis: handle aliasing imports a bit more nicely)
 - #50594 (Update the man page with additional --print options)
 - #50613 (Migrate the toolstate update bot to rust-highfive)
 - #50632 (Add minification process)
 - #50685 (ci: Add Dockerfile for dist-sparc64-linux)
 - #50691 (rustdoc: Add support for pub(restricted))
 - #50712 (Improve eager type resolution error message)
 - #50720 (Add “Examples” section header in f32/f64 doc comments.)
 - #50733 (Hyperlink DOI against preferred resolver)
 - #50745 (Uncapitalize "You")

Failed merges:
@bors bors merged commit 827251e into rust-lang:master May 15, 2018
@Gankra
Copy link
Contributor

Gankra commented May 15, 2018

My concerns weren't addressed, and I don't think this documentation should have landed as-is.

@pietroalbini
Copy link
Member

Ugh, why did GitHub mark the comments as resolved? :/

@ecstatic-morse
Copy link
Contributor Author

ecstatic-morse commented May 16, 2018

Yeah, this isn't ready to go yet.

I'm guessing @steveklabnik saw @alexcrichton's comment and not @gankro's concerns.

I'm thinking of ways to address the concerns; it will likely involve refactoring common invariants to the module-level docs. The module docs will then link to the nomicon for a more in-depth discussion on uninitialized memory and pointer aliasing. Stuff which is specific to each function will remain in a list in the function level docs.

Unfortunately, I am currently devoting all of my free time to studying for Code Jam Round 2. I will, in all likelihood, be eliminated this weekend and resume work on this. In the meantime, let me know if you need anything.

@Gankra
Copy link
Contributor

Gankra commented May 17, 2018

@steveklabnik could you revert this PR, since the author is also fine with not landing it?

@steveklabnik steveklabnik mentioned this pull request May 17, 2018
steveklabnik added a commit to steveklabnik/rust that referenced this pull request May 17, 2018
There was [some confusion](rust-lang#49767 (comment)) and I accidentally merged a PR that wasn't ready.
Mark-Simulacrum added a commit to Mark-Simulacrum/rust that referenced this pull request May 17, 2018
…isdreavus

Revert rust-lang#49767

There was [some confusion](rust-lang#49767 (comment)) and I accidentally merged a PR that wasn't ready.
bors added a commit that referenced this pull request May 18, 2018
Rollup of 10 pull requests

Successful merges:

 - #50387 (Remove leftover tab in libtest outputs)
 - #50553 (Add Option::xor method)
 - #50610 (Improve format string errors)
 - #50649 (Tweak `nearest_common_ancestor()`.)
 - #50790 (Fix grammar documentation wrt Unicode identifiers)
 - #50791 (Fix null exclusions in grammar docs)
 - #50806 (Add `bless` x.py subcommand for easy ui test replacement)
 - #50818 (Speed up `opt_normalize_projection_type`)
 - #50837 (Revert #49767)
 - #50839 (Make sure people know the book is free oline)

Failed merges:
@ecstatic-morse
Copy link
Contributor Author

@gankro, @steveklabnik I've pushed a new commit which adds a module-level definition of what a "valid" pointer is. The list of invariants for each function now links back to this definition.

I put the newly built docs in a gh-pages branch on my fork so that they can be more easily reviewed. I'll keep this up to date as I push changes.

I also removed the erroneous requirement that memory must be initialized if it is to be read from. Instead there is a module-level discussion of how one cannot make any assumptions about the value of uninitialized memory. The example asserts that, even after calling copy on an uninitialized buffer, one cannot assume that the destination buffer is actually equal to the source. I think this effectively communicates the consequences of reading undef memory.

Also, do I need to open a new PR since this accidentally got merged and reverted?

Thanks!

@hanna-kruppe
Copy link
Contributor

What one can and can't do with uninitialized memory is a bit up in the air, but it might very well be much more restrictive than what the current wording implies, so I'd prefer to err on the side of caution in these docs. I don't have specific wording to suggest but here are some things that one might get from the current text that might (likely IMO) not be true when we finally get Unsafe Code Guidelines on this matter:

  • Uninitialized memory might change non-deterministically but after you load a value from it, it's some fixed bit pattern for which all the usual rules apply, e.g. after let x = src[0]; expressions like x + x == 2 * x and x >= 0 evaluate to true
  • You can branch on its current value (e.g. if src[0] == 0 { ... } else { ... }) and it randomly picks one of the branches
  • You can safely output uninitialized memory to a file or the network, it's just completely non-deterministic what bit pattern you get

Note that most of these aren't even true of LLVM's undef, and whatever we get in Rust may very well be much more restrictive than undef (e.g., one proposal is to allow only loading and storing, all arithmetic/comparisons/etc. would be instant UB).

@steveklabnik
Copy link
Member

Also, do I need to open a new PR since this accidentally got merged and reverted?

Yes please!

@ecstatic-morse
Copy link
Contributor Author

ecstatic-morse commented May 23, 2018

@rkruppe I was under the impression that this already was rigorously defined behavior and that I was merely too dumb to find it :) Since defining these semantics is out of scope for this PR, I think it's better to just not mention uninitialized memory at all. I'm assuming whatever choice gets made will permit loads and stores, which is all that is required by the functions in ptr. I'll leave out the module-level discussion, as well as omitting it from the list of invariants for each function.

Also, is there an issue or an internals thread where this is being discussed? I couldn't find one, and all that the nomicon says is that "attempting to interpret [uninitialized] memory as any type will cause undefined behavior".

Edit: here it is

@ecstatic-morse
Copy link
Contributor Author

ecstatic-morse commented May 25, 2018

New PR opened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API Docs: ptr