-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make storage and mmr trait async, attempt 2. #121
Make storage and mmr trait async, attempt 2. #121
Conversation
This is an unmodified copy of twenty_first::storage at twenty-first revision 890c451e4e513018d8500bedcd5bf76dd0bafdd9 (master) It is not yet incorporated into the build.
fixes error: In order to be transferred, a Block must have a non-None proof field. ProofType enum enables specifying/transferring an unimplemented Proof. This is only temporary. BlockType enum enables specifying Genesis vs Standard block. A Standard block has a ProofType The Genesis block has no ProofType
Under crate::locks we previously had: - sync - tokio Each contains an impl of AtomicRw and AtomicMutex, with basically the same API. Yet: - sync refers to synchronous locks, ie sync vs async. - tokio refers to the tokio::sync lock implementation. So the names are referring to different things. Instead we change it to: - std - tokio Now each refers to a lock implementation, ie std::sync and tokio::sync. note: we could instead have changed `tokio` to `async`, but then there might be multiple async lock impls to choose from. So it seems cleanest to use the name of each impl.
Moved this benchmark over from twenty_first. Presently unable to build the twenty_first db_* bench tests because the storage layer is now async and the divan bench crate doesn't yet support async. However, it may soon, see: nvzqz/divan#39
Before this refactor: The Mmr trait has only sync methods and exists in twenty_first. The Mmr trait is implemented by ArchivalMmr and MmrAccumulator. ArchivalMmr uses database::storage, which is now fully async. MmrAccumulator uses only sync method. ArchivalMmr implements the sync Mmr trait methods by spawning a new OS thread and tokio runtime for each method call. This works but is inefficient. After this refactor: The Mmr trait has been copied into neptune_core, and the neptune_core version has only async methods. The Mmr trait is implemented by ArchivalMmr and MmrAccumulator. ArchivalMmr is fully async, and does not spawn any OS threads. MmrAccumulator has been copied into neptune_core and is fully async. Everything in neptune_core that uses MmrAccumulator is now async. certain tests and test-support scaffolding depend on tasm traits that are not async, and had to be commented out for now. proptests had to be commented because the the proptest crate does not yet support async tests. Refactor history: wip. add MmrAsync trait wip. attempting to make mmr trait async. very messy, build broken wip. Mmr trait is async. neptune-core builds without warnings. tests do not build wip. move mmr files from twenty_first into util_types/mmr wip. cleanup commented use statements, etc wip: cargo fmt wip. tests compile cleanly. (with proptests commented-out) wip: fix failing tests. all tests now pass, including doctests style: cargo fmt style: fix clippy warnings wip. cleanup commented code, add comments
I notice that checks are failing due to clippy errors. I think this is because github is using latest rust toolchain, and I am still on 1.75. I will update, and fix any errors it reports. |
Some extra context may be helpful for reviewers, as a starting point. With regards to the motivation for this async storage work, consider that neptune-core is architected to spawn tokio tasks for all activities such as the peer loop, mining loop, and main loop. We do not directly spawn OS threads anywhere [1]. The tokio docs and indeed most any async documentation makes it clear that concurrency will suffer if blocking calls, especially disk I/O are made from async environment. I found this comment insightful from someone who experienced exactly this type of problem with their redb storage layer:
I think that quote illustrates the matter better than I could. Ok, so... The groundwork for this PR began with
The LevelDB API itself remains sync of course, however spawn_blocking() informs tokio that a blocking sync call is being made, so tokio can run that on a separate OS thread in its thread-pool, and other tasks in the current thread can continue being polled meanwhile. So that's been working already for some DB accesses, but not for DbtVec, DbtSingleton, and RustyLevelDbVec, which are heavily used in neptune-core. Those types still used the direct LevelDb API from twenty-first. So the foundational work of this PR was to modify the storage layer to use the async NeptuneLevelDb APIs. This required replacing iterators with async streams, and some other changes. Once that was done, all the rest was largely a mechanical exercise of fixing everything that broke as a result. The primary challenges were due to trait entanglement from crates twenty-first, tasm-lib, and even arbitrary. I've documented that elsewhere, so won't go into detail here. [1] if we had a separate OS thread for each peer, mining, triton-vm, etc, we could perhaps get away with blocking I/O, as such tasks would not block eachother. |
It looks like this function needs to be async because the trait functions Was it the intention to make member functions of accumulator objects async, or was this an unintended consequence of making the trait |
That is the primary difference between PR #120 and this one. We have both paths available to choose from. In 120, the Mmr trait is sync, and ArchivalMmr implements the sync methods by spawning OS threads to call async methods for storage. MmrAccumulator and a bunch of things that use it, eg MutatorSetKernel remain sync. In this one, the Mmr trait is made async, which forces MmrAccumulator and everything that uses it to also become async. One additional consideration: I imagine that triton-vm execution of some scripts will be CPU intensive and comparitively long running. tokio docs make it clear that CPU-heavy (long running) function calls harm concurrency, just as blocking I/O does, and should be wrapped with spawn_blocking() if they can't be made async directly. I have not performed any analysis in this area yet, but it may turn out that there is benefit to having MmrAccumulator, MutatorSetKernel and related areas of neptune-core be async. So anyway, that's been at the back of my mind while working on this stuff.
Yeah, so that would be nice if we can make it work without duplicating a mountain of code. In fact I suggested the possibility above:
Indeed I already attempted this briefly, and quickly found it was complex. I discussed it here. Expanding on that explanation, I believe this is a complete list of places that require the Mmr trait:
Each of those would need to be re-written to work for both sync and async scenarios. But then anything that calls them with an ArchivalMmr would need to be async anyway, which appears to be a lot of code. So in the end, much of it needs to become async anyway, and we might end up with something not too different from this PR. at least that's how it appears to me, but..... @aszepieniec you understand this area of the code much better than I do. So if you can see a clear path to getting rid of the Mmr trait, then perhaps you could start a branch based off or anyway, please let me know your thoughts. |
Thanks for clarifying. There's a lot to think about and most of the time I'm silent it's because I don't have a qualified opinion.
I will consider that, but you should know that there are also other things vying for my attention. In regards to the As for the complexity of operations: I did not run any benchmarks but raison d'être for the accumulator variants is their light footprint. Phrased differently, if making them async can be justified on the basis of their long-running CPU profile, then we are doing something very wrong and should reconsider the architecture. |
Just a note that the author of proptest_async crate notified me that he has added tokio support. We don't seem to need async proptests right now, as we are moving forward with #124 instead of this PR, however its good to know it exists for possible future tests.
|
closing in favor of #124. |
addresses #108, #75
obsoletes #120
This is a second pass at making the storage layer async in neptune-core.
note: I've added some background context here.
The first attempt in #120 is simpler and affects less code in neptune-core. Also, all of the existing tests compile and pass. But it had a serious inefficiency with ArchivalMmr, where sync and async worlds collided. I encourage reviewer(s) to read #120, including comments, and perhaps ArchivalMmr code for background first.
If some way could be found to decouple ArchivalMmr from the Mmr trait, that might allow us to use the simpler code in #120. However, I do not see any simple solution because a lot of code is generic over the Mmr trait. Perhaps a reviewer might?
This second attempt builds on #120 (strict superset) and makes the Mmr trait async (in neptune-core only) which resolves the ArchivalMmr issue, though at the cost of requiring much more code to become async, and requiring some tests and test-related infrastructure to be commented out. See notes below.
I believe this is the more correct, performant, and idiomatic implementation, and should be the one we move forward with, although it presents a few challenges.
This PR does still have one inefficiency in tasm::RemovalRecordsIntegrity::rust_shadow(), whereby it spawns an OS thread for each invocation. I am unsure how frequently it is called. Perhaps it can be modified not to call any async method, or refactored out. Reviewers, please have a look.
To facilitate reviewing, I have compiled a summary of changes:
async_storage_pr_notes.txt
If this PR is merged, we will want to find a solution for async proptests, which I have commented out for now. There is a new proptest_async crate, however it presently only works with async_std, not tokio. Last night I sent the author an offer to help add a tokio feature-flag, and have not yet heard back. Anyway, that seems a path forward.
The arbitrary crate does not provide an async trait, so I had to comment out
impl Arbitrary
in 4 cases. (These seem like they should be inside test module anyway.) I did a quick look for an async fork of arbitrary crate, but didn't find anything right away. I imagine/hope that some alternative exists in the async ecosystem.This PR depends on specific revisions of twenty-first and tasm-lib in git. Those revisions now exist in master for each repo, so we need only make a new release for each, and also for triton-vm, which depends on twenty-first.
Testing
Runtime warnings/errors encountered: