Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce RefCell::try_borrow_unguarded #59211

Merged
merged 4 commits into from
Apr 11, 2019
Merged

Conversation

nox
Copy link
Contributor

@nox nox commented Mar 15, 2019

Come sit next to the fireplace with me, this is going to be a long story.

So, you may already be aware that Servo has weird design constraints that forces us developers working on it to do weird things. The thing that interests us today is that we do layout on a separate thread with its own thread pool to do some things in parallel, whereas the data it uses comes from the script thread, which implements the entire DOM and related pieces, with !Sync data types such as RefCell<T>.

The invariant we maintain is that script does not do anything ever with the DOM data as long as layout is doing its job. That's all nice and all, but one thing we don't ensure is that we don't actually know if script was currently mutably borrowing some RefCell<T> prior to starting layout, which may lead to aliasing mutable memory and obviously undefined behaviour.

This PR reinstates RefCell::borrow_state so that this method can make use of it and return None if the cell was mutably borrowed.

Cc @SimonSapin

@rust-highfive
Copy link
Collaborator

r? @KodrAus

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 15, 2019
@nox nox added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Mar 15, 2019
@Mark-Simulacrum
Copy link
Member

I might be misunderstanding, but is there a reason you can't have the function call borrow or try_borrow? And then do the unsafe cast to extend the lifetime of the reference returned as appropriate?

@nox
Copy link
Contributor Author

nox commented Mar 15, 2019

We call DomRefCell::borrow_for_layout from multiple threads in parallel. The layout threads. Using RefCell::try_borrow would potentially change the borrow flag from multiple threads at once non-atomically. I need to observe it without changing it.

@Mark-Simulacrum
Copy link
Member

Ah, I see. On the other hand, RefCell doesn't store the state atomically/via locking so I'm not sure it's safe to read the borrow state without some form of locking either?

I think the only reason the as_ptr call technically doesn't cause any sort of problem is that we're only returning a shared reference.

cc @RalfJung as well, might have relevance for stacked borrows and UCG in general

@nox
Copy link
Contributor Author

nox commented Mar 15, 2019

To send a value over a channel to a different thread, that channel already must do some synchronisation to ensure that memory accesses aren't observed in the wrong order etc, that's independent from what RefCell::borrow_state would allow me to do here.

@nox
Copy link
Contributor Author

nox commented Mar 15, 2019

Let me try to explain the use case in depth:

We can't get &mut values for the entire DOM at once

In Servo, all of DOM (Node, Element, NodeList, etc) is written in Rust but the actual struct values are stored and owned by JavaScript objects, which obviously are owned by the JavaScript engine. The lifecycle of the values is totally managed by the engine, and they will be freed only when no other JavaScript value points to them; furthermore, a lot of these values define cycles on the Rust side (children point to parents and parents point to children), let alone on the JS side where the sky is the limit. I hope that makes it clear that we can't avoid inner mutability.

Layout needs Sync data

Layout is a core piece of a Web browser, and one of the things that define Servo is that our layout component does its job parallelising a damn lot of things, which by definition means it needs Sync data. The problem is that one of its inputs is the DOM itself.

DOM cannot use expensive atomic data structures

DOM is the damn thing that JS accesses whenever it uses Element.getAttribute etc, and attributes and many other things in DOM are mutable. Given how crucial it is for DOM operations to be fast, we cannot afford to use atomic data structures when the JS itself is single-threaded and we only need Sync things when running layout.

So what do we do?

Servo currently defines a lot of Sync wrappers around non-Sync values, and provides methods on these wrappers such that the invariant maintained is "when those wrappers are used from the layout threads, we are sure script doesn't go behind our back and starts doing things again with the DOM".

Why do you need borrow_state then?

Because bugs can happen, and there is a single scenario that scares me:

  1. JavaScript calls DOM code.
  2. DOM code mutably borrows something in the DOM and calls back into JS.
  3. JS calls DOM code which triggers a layout operation.
  4. layout calls DomRefCell::borrow_for_layout on the thing that was mutably borrowed in step 2.

During step 4, script is ensured to not run anymore and thus cannot ever change any refcell's borrow state while layout runs, but layout cannot do anything defensively to avoid the UB caused by accessing a refcell that was mutably borrowed prior to running layout.

Does that make sense to you as to why I need RefCell::borrow_state now?

@Mark-Simulacrum
Copy link
Member

I think my confusion lies not in the need for borrow_state -- I see that -- but why would borrow_state be considered safe?

I'm envisioning that the layout thread calls borrow_state to check if it's currently mutably borrowed, and during that call/after it JS goes in and borrow_mut's the RefCell. Maybe that can't happen though?

@nox
Copy link
Contributor Author

nox commented Mar 15, 2019

I'm envisioning that the layout thread calls borrow_state to check if it's currently mutably borrowed, and during that call/after it JS goes in and borrow_mut's the RefCell. Maybe that can't happen though?

That's the responsibility of the script/layout interface that ensures that script doesn't run while layout is going on. It indeed can't happen.

@Mark-Simulacrum
Copy link
Member

Sounds good.

I think the concern/part of the reason we deprecated and removed this originally is that getting it right is hard; and this API is actually not useful for anyone who's not doing correct but hard to get right (i.e., using RefCell from multiple threads) things.

Maybe we should add to the docs on borrow_state that "you really shouldn't use this unless you really know what you are doing."

@nox
Copy link
Contributor Author

nox commented Mar 15, 2019

I think the concern/part of the reason we deprecated and removed this originally is that getting it right is hard; and this API is actually not useful for anyone who's not doing correct but hard to get right (i.e., using RefCell from multiple threads) things.

Yeah I can sympathise with that. I'm not even entirely sure that we 100% got it correctly in Servo, given there is too much unsafe code all over the place related to the script/layout invariant I described earlier, and nothing to try to avoid the specific scenario I told of. On a side note that's why I'm working on inert these days.

Maybe we should add to the docs on borrow_state that "you really shouldn't use this unless you really know what you are doing."

Sounds good to me, though if someone is calling it in multiple threads at once they already should be aware of being careful because they must have written an unsafe Sync implementation somewhere.

@KodrAus
Copy link
Contributor

KodrAus commented Mar 15, 2019

Can we think of any other use-case for this that doesn't involve pretending RefCell is Sync?

@Centril Centril requested a review from RalfJung March 16, 2019 02:46
src/libcore/cell.rs Outdated Show resolved Hide resolved
src/libcore/cell.rs Outdated Show resolved Hide resolved
@Centril
Copy link
Contributor

Centril commented Mar 16, 2019

As an alternative approach re. #59211 (comment), you may consider two `methods:

fn will_borrow_succeed(&self) -> bool { ... }
fn will_borrow_mut_succeed(&self) -> bool { ... }

These would correspond to the actual checks done for borrow and borrow_mut.

@RalfJung
Copy link
Member

RalfJung commented Mar 16, 2019

The method itself seems fine, just from a "is this a safe extension of RefCell" standpoint. But of course that's not the question here.

However, I'm afraid I don't understand enough about how Servo uses this to evaluate whether Servo's use of this is correct. Is it possible to produce a small amount of code demonstrating this?

JavaScript calls DOM code.
DOM code mutably borrows something in the DOM and calls back into JS.
JS calls DOM code which triggers a layout operation.
layout calls DomRefCell::borrow_for_layout on the thing that was mutably borrowed in step 2.

In the last step, which thread will this call happen in? You mentioned layout is in a different thread, so it seems to me that there's a race here between DOM code releasing the mutable borrow and layout calling borrow_state. You said above it is the responsibility of the layout code to make sure that does not happen, but then it seems like this check does not actually help to avoid UB. What am I missing?

@nox
Copy link
Contributor Author

nox commented Mar 16, 2019

In the last step, which thread will this call happen in? You mentioned layout is in a different thread, so it seems to me that there's a race here between DOM code releasing the mutable borrow and layout calling borrow_state. You said above it is the responsibility of the layout code to make sure that does not happen, but then it seems like this check does not actually help to avoid UB. What am I missing?

There is no race condition because we prevent them by doing the necessary synchronisation. That method prevents UB in the case we had a bug where a refcell was mutably borrowed before layout started. This check does avoid UB, because we wouldn't look into a refcell that was been mutably borrowed.

@RalfJung
Copy link
Member

I see, so the "doing the synchronization" part is separate from the "prevent reentrant accesses" part, and borrow_state lets you catch bugs in the latter, but you still rely on the former?

@nox
Copy link
Contributor Author

nox commented Mar 18, 2019

@RalfJung Yes.

@SimonSapin
Copy link
Contributor

This method has existed before, and was removed because:

We've added/stabilized try_borrow and try_borrow_mut, these should no longer be needed

But now there is a new use case that was not known at the time and that is not covered by the try_ methods. (More on this below.) Usually we tend to accept PRs for new unstable APIs without more process, but since this reverts a previous decision let’s get team sign off:

@rfcbot fcp merge

Having two methods that return booleans instead of one that returns an enum sounds fine in terms of functionality. (Or maybe just one? If there is no use case without UB for the other one…) Naming them might be awkward, but .borrow() would succeed might be easier to document than “state” semantics of each enum variant.


Imagine that some code has access to &mut RefCell<T>. By definition, no access to the refcell other than through this borrow is possible, as long as this borrow exists. Through .get_mut(), we can obtain &mut T and from that &T. If T: Sync, we can send that &T to multiple threads and operate on the value from there.

The Servo use case is similar: in that we want to do multi-threaded computation on &T with T: Sync, but different in that we start from only &RefCell<T>. We have external synchronization ensuring that no other access to that refcell are going to be made for the duration of this computation, because the thread that the refcell belongs to is blocked while waiting for those computation results. But it remains that accessing &T is still UB if there is cell::RefMut<T> (effectively &mut T) somewhere on the stack of the original thread. There shouldn’t be, but bugs happen. A runtime check helps catch this kind of bug before it causes UB.

Additionally there isn’t just one refcell, but a whole tree of nodes that each contain multiple refcells. Tree traversal is where we have parallelism, so we can’t do the runtime check on each refcell on the original thread. (What we send to computation threads is struct WrapperWithCarefullyLimitedApi<'a>(&'a Node).) Therefore, to avoid a need for atomic access or other synchronization, the check can only read the refcell’s borrow flag/counter, without writing like refcell.try_borrow().is_ok() would.

@rfcbot
Copy link

rfcbot commented Mar 18, 2019

Team member @SimonSapin has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Mar 18, 2019
@Mark-Simulacrum
Copy link
Member

I'd be against the method names that @Centril proposed.

The primary reason is that will_borrow_succeed is misleading in that the "true" result is meaningless.

There seems to be no reason for the mut variant (will_borrow_mut_succeed) as there's no case where you can obtain a RefMut from a RefCell but can't use borrow_mut or try_borrow_mut.

Perhaps the API should instead be unsafe fn borrow_unchecked(&self) -> &T, and would panic if currently borrowed but not increment the borrow counter? That seems to fit Servo's use case and is more obviously not the API to use, unlike borrow_state.

I'm not a fan of the "unchecked" since it does check that it is currently safe; but because it doesn't increment the ref-count it would potentially be unsafe in the future.

To be clear, I'm opposed to re-adding borrow_state as-is because it feels like introducing a safe API that has no use case in safe code, as well as having a bunch of edge-cases that are difficult to properly document. The unsafe fn borrow_unchecked(&self) -> &T API seems better suited for our purposes.

@SimonSapin
Copy link
Contributor

This borrow is not completely unchecked, the point is to check that the refcell is in an appropriate state when the borrow starts. But then we don’t track this new borrow or when it ends.

Maybe unsafe fn borrow_untracked(&self) -> Option<&T>?

@SimonSapin
Copy link
Contributor

Can you say more about what bindgen did with the borrow sate? Could it have used try_borrow instead, if that was available at the time? Or was it something more complex?

@emilio
Copy link
Contributor

emilio commented Mar 20, 2019

This is the commit that removed the RefCell: rust-lang/rust-bindgen@cfdf15f

Looking at the BorrowState usage in the deleted files (rust-lang/rust-bindgen@cfdf15f#diff-2c09afcdc3c420ab0678ba9b5e83959cL421) looks like it wanted try_borrow_mut.

@rfcbot rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Mar 21, 2019
@rfcbot
Copy link

rfcbot commented Mar 21, 2019

🔔 This is now entering its final comment period, as per the review above. 🔔

@diwic
Copy link
Contributor

diwic commented Mar 22, 2019

@nox

  • DOM code mutably borrows something in the DOM and calls back into JS.

I'm not sure what that "something" is, but it sounds like a recipe for disaster in itself, because that JS code can in turn call into DOM trying to change the very same thing, causing a RefCell panic?

@nox
Copy link
Contributor Author

nox commented Mar 22, 2019

I'm not sure what that "something" is, but it sounds like a recipe for disaster in itself, because that JS code can in turn call into DOM trying to change the very same thing, causing a RefCell panic?

Something here is indeed any DOM call, and that is indeed a bug if we call JS while a RefCell from within the DOM is mutably borrowed, because that's a case that will eventually cause a panic as you describe. This PR is about being able to panic from layout too instead of causing straight UB if the JS code calls a DOM method that calls layout.

@nox
Copy link
Contributor Author

nox commented Mar 22, 2019

Err, misread your comment, "something" here is anything that needs to be mutated sometimes in DOM, be it element's attributes, ids, urls, whatever. Indeed, if we keep that mutably borrowed while we reenter into JS code, that will ultimately lead to a panic. This is by design and not going to change any time soon.

@diwic
Copy link
Contributor

diwic commented Mar 22, 2019

@nox

Okay, so when Layout threads call try_borrow_unguarded, they would typically unwrap the result, causing a panic rather than UB, since mutably borrowing something and then calling into JS would be a bug anyway?

Fwiw, if my two cents are worth anything, it seems to me like what you want to do (putting RefCell into some parallel-sync-mode when script is not running) is so niche that maybe you need something else than RefCell to suit your use case. (Maybe something where the borrow flag can be in "Sync" mode as well as "Reading" and "Writing" modes? Not sure.) And if you like to implement this on top of RefCell like you do today with DOMRefCell then borrow_state is at least a little less niche than try_borrow_unguarded.

@nox
Copy link
Contributor Author

nox commented Mar 22, 2019

I'm 100% fine with your two cents, and I would be ok with exposing borrow_state directly, I changed it to try_borrow_unguarded because other people said borrow_state is too vague and too general-without-a-purpose, if I may say.

@rfcbot rfcbot added the finished-final-comment-period The final comment period is finished for this PR / Issue. label Mar 31, 2019
@rfcbot
Copy link

rfcbot commented Mar 31, 2019

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

@rfcbot rfcbot removed the final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. label Mar 31, 2019
@KodrAus
Copy link
Contributor

KodrAus commented Apr 11, 2019

Thanks for your patience @nox! I'll re-open the tracking issue and push this one through.

@bors r+

@bors
Copy link
Contributor

bors commented Apr 11, 2019

📌 Commit 38811a1 has been approved by KodrAus

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 11, 2019
@bors
Copy link
Contributor

bors commented Apr 11, 2019

⌛ Testing commit 38811a1 with merge 8509127...

bors added a commit that referenced this pull request Apr 11, 2019
Introduce RefCell::try_borrow_unguarded

*Come sit next to the fireplace with me, this is going to be a long story.*

So, you may already be aware that Servo has weird design constraints that forces us developers working on it to do weird things. The thing that interests us today is that we do layout on a separate thread with its own thread pool to do some things in parallel, whereas the data it uses comes from the script thread, which implements the entire DOM and related pieces, with `!Sync` data types such as `RefCell<T>`.

The invariant we maintain is that script does not do anything ever with the DOM data as long as layout is doing its job. That's all nice and all, but one thing we don't ensure is that we don't actually know if script was currently mutably borrowing some `RefCell<T>` prior to starting layout, which may lead to aliasing mutable memory and obviously undefined behaviour.

This PR reinstates `RefCell::borrow_state` so that [this method](https://github.com/servo/servo/blob/master/components/script/dom/bindings/cell.rs#L23-L30) can make use of it and return `None` if the cell was mutably borrowed.

Cc @SimonSapin
@bors
Copy link
Contributor

bors commented Apr 11, 2019

☀️ Test successful - checks-travis, status-appveyor
Approved by: KodrAus
Pushing 8509127 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 11, 2019
@bors bors merged commit 38811a1 into rust-lang:master Apr 11, 2019
@nox
Copy link
Contributor Author

nox commented Apr 13, 2019

Just wanted to say that try_borrow_unguarded is now used in Servo and, believe it or not, all tests passed without a problem, which means that the soundness issue this method prevents is not currently present in our codebase, woohoo.

@nox nox deleted the refcell-borrow-state branch April 13, 2019 08:28
SimonSapin added a commit to SimonSapin/rust that referenced this pull request May 15, 2019
Servo has been using this since servo/servo#23196 to add a runtime check to some unsafe code, as discussed in PR rust-lang#59211. Stabilizing would help do more of the same in libraries that also have users on Stable.
Centril added a commit to Centril/rust that referenced this pull request May 30, 2019
Stabilize RefCell::try_borrow_unguarded

Servo has been using this since servo/servo#23196 to add a runtime check to some unsafe code, as discussed in PR rust-lang#59211. Stabilizing would help do more of the same in libraries that also have users on Stable.
Centril added a commit to Centril/rust that referenced this pull request May 30, 2019
Stabilize RefCell::try_borrow_unguarded

Servo has been using this since servo/servo#23196 to add a runtime check to some unsafe code, as discussed in PR rust-lang#59211. Stabilizing would help do more of the same in libraries that also have users on Stable.
@oconnor663
Copy link
Contributor

oconnor663 commented Oct 4, 2022

Apologies for pinging an old issue, but I saw try_borrow_unguarded in the official docs, and I wound up here trying to understand exactly what problem it solves. After reading the whole thread, I'm still unsure why exactly the following would've been unsound. Here I'll use Python's "global interpreter lock (GIL)" as a metaphor for whatever synchronization these Servo threads are doing. (Is that a reasonable metaphor?) What's unsound about this:

Thread A                 Thread B

guard = X.borrow_mut()
release the GIL
go to sleep
                         wake up
                         acquire the GIL
                         X.try_borrow()  // Err

If I'm reading the thread above correctly, there's some soundness problem with this arrangement that try_borrow_unguarded solves. Could anyone help me pinpoint what's unsound here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.