rust: switch to stable compilers #752

cyphar · 2020-09-02T07:10:29Z

The only dependency which required nightly compilers was pyo3, which
supports stable Rust (1.39.0 and later) since version 0.11.0. Supporting
stable Rust makes it easier to package Anki for distributions.

In addition, we can't link against libzip2.so.1 when building manylinux
wheels, but the only dependency which required this was "zip" and we
don't appear to make use of bzip2 in Anki.

One additional problem is that Anki was inadvertently violating Rust's aliasing
rules (it was implicitly passing out multiple &mut Backend references) and
the pyo3 update revealed this behaviour. The solution is to make all of the
methods take &self -- removing all PyMutBorrowErrors -- and then clean up
anything that failed to compile. Since Backend already had plenty of internal
locking (further pointing to aliasing violations) this wasn't too complicated
to do.

Signed-off-by: Aleksa Sarai [email protected]

dae · 2020-09-03T02:51:58Z

I have no fundamental objections to switching to stable, but currently 'make check' is failing - either another way of excluding that file needs to be found, or it needs to be automatically formatted when it's created

cyphar · 2020-09-03T03:42:49Z

The simplest solution might be to support compiling with stable but use cargo +nightly fmt for make check -- which isn't really used outside of development. Though requiring both compilers be installed would be a pain, so I'll see if there's another way to exclude the generated file from cargo fmt (or just format only that file after generation but before the check).

cyphar · 2020-09-03T06:02:12Z

Alright, build.rs now will automatically rustfmt the generated protobuf code.

dae · 2020-09-03T10:42:23Z

Thanks for doing that. Have you used this in testing for any period of time yet? I seem to recall issues using a newer pyo3 - it was previously rolled back in 17edbd1

dae · 2020-09-03T10:46:21Z

An easy way to trigger it is to use Tools>Check Database.

cyphar · 2020-09-03T11:14:28Z

Yup, I just managed to trigger a PyBorrowMutError (I could've sworn I tried it before sending a PR and it worked, but I might've tested an old build). I'll take a look at what is needed to fix the BorrowMut problems -- there is a migration document for PyO3 which mentioned this issue (I did try to check if this would be a problem before opening the PR but I think I misunderstood exactly what it meant when talking about PyCell -- I haven't used pyo3 before).

dae · 2020-09-03T23:50:50Z

I fear there's no easy fix for this. We can't chuck a mutex at it because we need to be able to run multiple calls in parallel. Changing the code to run things like syncing on a background thread is probably the long term fix, but it will be a some time before I have a chance to look into that.

prost-build doesn't generate rustfmt-safe code, so we had to add it to the ignore list for rustfmt on "make check". However, the ignore list isn't supported by stable rustfmt, so we have to work around this some other way -- in this case, just do "rustfmt" on the generated file in "build.rs" (this way, formatting errors in checked-in source code are still caught but generated code doesn't cause spurrious errors). Signed-off-by: Aleksa Sarai <[email protected]>

cyphar · 2020-09-04T06:35:18Z

tl;dr I think I found the easy fix for this.

I started working on it yesterday and I have a version which doesn't trigger PyBorrowMutError errors, and doesn't require putting everything behind Mutex<Backend>. In fact it currently only adds a Mutex for the sync_abort in the backend -- and I actually think that the sync_abort backend handling could be done in a nicer fashion. I've pushed a WIP patch which implements this, but I'll give a quick explanation for my thought process:

The main change which makes this entire procedure much simpler is to simply switch BackendService to take &self everywhere -- this entirely removes all PyBorrowMutError errors because we now no longer need mutable borrows (and it eliminates a whole class of runtime errors). In fact almost all of the &mut usage is actually not needed because self.col, self.progress_state and self.state are all behind internal Mutexes anyway!

Making this minor change reveals that there only two things that aren't Mutexed:

self.runtime which is used as a tokio::runtime::Handle cache. Putting this behind a Mutex does solve things, but it's less than ideal -- we only ever take a clone of the handle on each self.runtime_handle() call anyway (so we only ever need immutable &T access)! In order to keep the current lazy-initialisation behaviour I just opted to use once_cell which gives us only immutable references and uses less heavy synchronisation than a full-on Mutex.
self.sync_abort is a bit uglier to hack around. The solution I went with was to just wrap it in a Mutex (and then only hold the lock when we set or clear it) -- this works and should not cause /too much/ lock contention. I also decided to make this use an RAII-style guard but I can switch this back to being open-coded.

However while implementing this, I noticed that sync_abort doesn't handle multiple sync operations being queued -- meaning that if you have an existing AbortHandle registered it will be cleared if another thread tries to trigger a sync at the same time. The code now can detect this and will trigger a panic! (not ideal, just did it this way to make it simpler to show a PoC) but a nicer solution would be to make sync_abort a Vec<AbortHandle> -- and then when we do the abort we abort all existing sync operations (or we abort any old sync operation if a new one is requested).

With these changes, I can now build Anki and it no longer has any PyBorrowMutErrors -- and as far as I can tell Anki works perfectly fine (no performance problems, synchronisation works perfectly fine, and Check Database works without a hitch). I'm not sure exactly how extensively you would like me to test things, but it does appear to work with the latest patch I've pushed.

(As an aside, this is probably also a general memory safety issue because the need for locking behind a &mut makes little sense and seems to indicate that there are actually memory safety issues (aliasing &mut references is undefined behaviour and I think that's what's happening right now -- hence the breaking change in pyo3). I imagine if you actually had two threads in Python while the GIL is released calling operations which involved sync_abort you probably will have lots of data races because they will write to the same memory without any locks.)

dae · 2020-09-05T07:38:12Z

Appreciate you looking into it - that didn't turn out to be as bad as I was expecting.

Rather than having the runtime created as needed, I'd been planning to pass a handle in at backend creation time, so that a single runtime can be shared by multiple collections & configured by the callers. It's not urgent though, so happy to run with once_cell for now if you don't feel like changing it.

A vec or hashmap of abort handles would be better, but until then, I'd prefer the code doesn't panic - just logging to the console should suffice.

cyphar · 2020-09-05T08:25:11Z

In addition to the following, I'll remove BackendAbortGuard and simply use scopeguard to avoid cluttering the code with our own RAII guard implementation for something as trivial as this.

Rather than having the runtime created as needed, I'd been planning to pass a handle in at backend creation time, so that a single runtime can be shared by multiple collections & configured by the callers. It's not urgent though, so happy to run with once_cell for now if you don't feel like changing it.

I'll take a look but if it ends up being a bit odd to implement, I can do that in a follow-up PR.

A vec or hashmap of abort handles would be better, but until then, I'd prefer the code doesn't panic - just logging to the console should suffice.

Yup, I'll just make it log for now -- the slightly tricky thing is that AbortHandle and AbortRegistration don't implement Eq (or Hash) so we'd need to generate our own unique IDs each time and that'll just be a bit too hairy to do in the middle of this PR. I'd be happy to look into it in the future though.

Though it should be noted that media_sync_abort handles this by just making sync_media_inner a no-op...

One other thing I wanted to mention which I noticed while looking a bit more at the abort code is that there appears to be three different ways of aborting an operation:

progress_state.wants_abort which appears to be a mechanism to tell a program that uses the progress_fn that the user has requested that they stop doing work.
state.media_sync_abort and sync_abort which are split into handles for aborting media sychronisation and collection synchronisation (but not media?) respectively.

I won't touch these in this PR (since it's not really relevant), but I do wonder whether this could be simplified a little bit -- is there a case where a user or plugin would want to abort media synchronisation but not any other kind of synchronisation that is happening at the same time?

In addition (maybe I'm wrong here) but shouldn't the whole Future abort system avoid the need for progress_state.wants_abort? If you also abort the future which will call progress_fn and be told it should stop doing work, then it won't ever read wants_abort. Or do long-running threads get spawned which need an explicit mechanism to say "stop working"?

Thanks.

cyphar · 2020-09-05T09:25:01Z

I'll take a look but if it ends up being a bit odd to implement, I can do that in a follow-up PR.

I'll leave the backend configuration for a separate PR since we'd need to update the protobuf schema in order to add the necessary information so that Python can configure the runtime -- and if you wanted to share the same runtime among Backends you'd actually need to have a corresponding #[pyclass] of some kind to represent a Runtime. But I can take a look at that some other time.

The previous implementation had some slightly questionable memory safety properties (older versions of PyO3 didn't uphold the Rust aliasing rules and would thus create multiple &mut references to #[pyclass] objects). This explains why Backend has internal Mutex<T>s even though all of its methods took &mut self. The solution is to simply make all methods take &self, which luckily doesn't pose too make issues -- most of the code inside Backend already has sufficient locking. The only two things which needed to be explicitly handled where: 1. "self.runtime" which was fairly easy to handle. All usages of the Runtime only require an immutable reference to create a new Handle, so we could switch to OnceCell which provides lazy-initialisation semantics without needing a more heavy-handed Mutex<tokio::runtime::Handle>. 2. "self.sync_abort" was simply wrapped in a Mutex<>, though some of the odd semantics of sync_abort (not being able to handle multiple processes synchronising at the same time) become pretty obvious with this change (for now we just log a warning in that case). In addition, switch to an RAII-style guard to make sure we don't forget to clear the abort_handle. As a result, we now no longer break Rust's aliasing rules and we can build with newer versions of PyO3 which have runtime checks for these things (and build on stable Rust). Signed-off-by: Aleksa Sarai <[email protected]>

We can't link against libzip2.so.1 when building manylinux wheels, but the only dependency which required this was "zip" and we don't appear to make use of bzip2 in Anki. This fixes the following "make build" error on Linux: x maturin failed Caused by: Failed to ensure manylinux compliance Caused by: Your library is not manylinux compliant because it links the following forbidden libraries: ["libbz2.so.1"] Signed-off-by: Aleksa Sarai <[email protected]>

The only dependency which required nightly compilers was pyo3, which supports stable Rust (1.39.0 and later) since version 0.11.0. Supporting stable Rust makes it easier to package Anki for distributions. No other code changes were required. Signed-off-by: Aleksa Sarai <[email protected]>

dae · 2020-09-07T05:42:53Z

Thank you for your work on this!

The two separate methods of cancelling is ugly, but address different needs - the progress callback can be used to abort long-running operations on the current thread like "check media" and the media folder scan at the start of a media sync, and the abort handle is used to abort network operations that may hang for long periods of time without calling the progress handler. When the media checking code was written we hadn't yet bought into any of the async ecosystem, and while the code could theoretically be moved to a blocking future pool, AFAIK tokio can not actually terminate code running in the blocking pool - so an abort handle would return control, but the code would continue running until it completed in the background, with its computed results being discarded.

By hashmap, I was thinking of the key identifying which kind of operation it is, eg "media_sync" or an integer constant for each different operation.

rust: switch to stable compilers

timokau · 2020-12-21T09:08:38Z

Just in case you aren't aware, currently an anki build still needs nightly rust to build the orjson dependency. This has been blocking the update for us at NixOS. Since one of the stated goals of this PR was to make packaging easier, I thought you might appreciate a heads-up.

In any case, thank you for your efforts! This PR is already a big step.

cyphar · 2020-12-21T09:36:31Z

Oh, I misunderstood -- orjson is a Python dependency. Yeah that sucks, but aside from moving away from orjson there's not much Anki can do about it.

dae · 2020-12-21T09:51:32Z

When I tested it in the past, orjson was significantly faster than the stock json library for bulk DB operations. You could patch the compatibility wrapper for the latter back into Anki if you want to avoid the pre-built orjson binary on PyPI, but your users would be getting a slower version of Anki.

Doesn't look like orjson's in a hurry to move away from nightly unfortunately: ijl/orjson#108

timokau · 2020-12-21T13:18:35Z

You could patch the compatibility wrapper for the latter back into Anki if you want to avoid the pre-built orjson binary on PyPI

Thanks for the suggestion. Pre-built binaries can be problematic, not just from a purity standpoint but also from a pragmatic one (supporting different architectures for example). Slowness is probably better than being stuck on an old version. Can the standard library version be used as a drop-in replacement for Anki's purposes?

dae · 2020-12-21T13:35:17Z

Yes: b0aedd6

timokau · 2020-12-21T15:08:22Z

Great, thank you!

dae · 2021-01-06T23:45:22Z

I've added the wrapper back to the latest git: f9e939a

timokau · 2021-01-07T10:21:53Z

Thanks again :)

cyphar force-pushed the rust-deps branch 2 times, most recently from e96d8bc to 850ae85 Compare September 2, 2020 12:35

cyphar force-pushed the rust-deps branch from 850ae85 to 269dc49 Compare September 3, 2020 05:31

cyphar force-pushed the rust-deps branch 2 times, most recently from 4ea905c to 9fda265 Compare September 4, 2020 07:00

This comment has been minimized.

Sign in to view

cyphar force-pushed the rust-deps branch from 9fda265 to 8c6f6b9 Compare September 4, 2020 08:04

cyphar force-pushed the rust-deps branch from 8c6f6b9 to 102eb8d Compare September 5, 2020 09:19

cyphar added 3 commits September 5, 2020 19:37

cyphar force-pushed the rust-deps branch from 102eb8d to 16864e6 Compare September 5, 2020 09:44

dae merged commit 0f1e277 into ankitects:master Sep 7, 2020

cyphar deleted the rust-deps branch September 7, 2020 07:11

dae added a commit that referenced this pull request Sep 22, 2020

Merge pull request #752 from cyphar/rust-deps

54996e4

rust: switch to stable compilers

This comment has been minimized.

Sign in to view

timokau mentioned this pull request Dec 21, 2020

anki: upgrade to latest release NixOS/nixpkgs#78449

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rust: switch to stable compilers #752

rust: switch to stable compilers #752

cyphar commented Sep 2, 2020 •

edited

Loading

dae commented Sep 3, 2020

cyphar commented Sep 3, 2020 •

edited

Loading

cyphar commented Sep 3, 2020

dae commented Sep 3, 2020

dae commented Sep 3, 2020

cyphar commented Sep 3, 2020 •

edited

Loading

dae commented Sep 3, 2020

cyphar commented Sep 4, 2020 •

edited

Loading

This comment has been minimized.

dae commented Sep 5, 2020

cyphar commented Sep 5, 2020 •

edited

Loading

cyphar commented Sep 5, 2020

dae commented Sep 7, 2020 •

edited

Loading

timokau commented Dec 21, 2020

This comment has been minimized.

cyphar commented Dec 21, 2020

dae commented Dec 21, 2020

timokau commented Dec 21, 2020 •

edited

Loading

dae commented Dec 21, 2020

timokau commented Dec 21, 2020

dae commented Jan 6, 2021

timokau commented Jan 7, 2021

rust: switch to stable compilers #752

rust: switch to stable compilers #752

Conversation

cyphar commented Sep 2, 2020 • edited Loading

dae commented Sep 3, 2020

cyphar commented Sep 3, 2020 • edited Loading

cyphar commented Sep 3, 2020

dae commented Sep 3, 2020

dae commented Sep 3, 2020

cyphar commented Sep 3, 2020 • edited Loading

dae commented Sep 3, 2020

cyphar commented Sep 4, 2020 • edited Loading

This comment has been minimized.

dae commented Sep 5, 2020

cyphar commented Sep 5, 2020 • edited Loading

cyphar commented Sep 5, 2020

dae commented Sep 7, 2020 • edited Loading

timokau commented Dec 21, 2020

This comment has been minimized.

cyphar commented Dec 21, 2020

dae commented Dec 21, 2020

timokau commented Dec 21, 2020 • edited Loading

dae commented Dec 21, 2020

timokau commented Dec 21, 2020

dae commented Jan 6, 2021

timokau commented Jan 7, 2021

cyphar commented Sep 2, 2020 •

edited

Loading

cyphar commented Sep 3, 2020 •

edited

Loading

cyphar commented Sep 3, 2020 •

edited

Loading

cyphar commented Sep 4, 2020 •

edited

Loading

cyphar commented Sep 5, 2020 •

edited

Loading

dae commented Sep 7, 2020 •

edited

Loading

timokau commented Dec 21, 2020 •

edited

Loading