Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Auto merge of #116173 - m-ou-se:arc, r=<try>
New atomic reference counting algorithm This implements a new 'wait-free' atomic reference counting algorithm based on the paper [Wait-Free Weak Reference Counting](https://www.microsoft.com/en-us/research/uploads/prod/2023/04/preprint-644bdf17da97d.pdf) by `@mjp41,` `@sylvanc` and `@bensimner.` The paper contains a way to implement atomic reference counting *wait free*, for five operations: 1. Release weak (= Weak::drop) 2. Release strong (= Arc::drop) 3. Acquire weak (= Weak::clone and Arc::downgrade) 4. Acquire strong (= Arc::clone) 5. Acquire strong from weak (= Weak::upgrade) `Weak::upgrade` must increase the strong count if it is nonzero. Unfortunately, processors do not have native 'increment if nonzero' instructions. Therefore, it is usually implemented with a CAS loop to increment the strong count only if it is not zero. The paper shows a way to avoid this CAS loop to make it wait-free. By reserving the least significant bit in the strong counter (by shifting the counter one bit to the left), we can use that extra bit to indicate the final 'dropped' state in which the strong counter is permanently zero. Then `Weak::upgrade` can be implemented as a `fetch_add(2)`, which leaves the 'permanently zero' bit untouched. This however does mean that Arc::drop must now do an additional operation. Not only must it decrement the strong counter, it must also use a CAS operation to set the 'permanently zero' bit if the counter is zero. The paper also shows an optimized version of the algorithm in which an additional bit in the strong counter is reserved to indicate whether there have ever been any weak pointers. When this bit is not set, some steps can be skipped. However, the algorithm from the paper is unfortunately not something we can directly use in for Rust's standard `Arc` and `Weak`, because we have more operations: 1. Weak::drop 2. Arc::drop 3. Weak::clone 4. Arc::clone 5. Weak::upgrade 7. Arc::downgrade 8. Arc::get_mut 9. Arc::try_unwrap 10. Arc::into_inner Specifically `Arc::get_mut` requires locking the weak counter to temporarily block `Arc::downgrade` from completing. We cannot implement `Arc::downgrade` as just `Weak::clone`. Our `Arc::downgrade` implementation is implemented as a CAS loop to increment the weak counter if it doesn't hold the special 'locked' (usize::MAX) value, similar to how Weak::upgrade uses a CAS loop (which is what the paper solves). I have extended the algorithm by also reserving the lowest bit in the weak counter to represent the 'weak counter locked' state. That way, `Arc::downgrade` can be implemented as a `fetch_add(2)` rather than a CAS loop. It will still have to spin in case of a concurrent call to `Arc::get_mut`, but in absence of `Arc::get_mut`, this makes Arc and Weak wait-free. The paper shows some promising benchmarking results. I have not benchmarked this change yet.
- Loading branch information