-
-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
panic: resource temporarily unavailable
on SunOS Elle 5.11 tribblix-m35 i86pc i386 i86pc
#168
Comments
commenting here so i get notified for this thread, and because it's a frankly fascinating setup :p though i would not be surprised if wazero ended up being the limiting factor in this situation, rather than SQLite |
we're the user doing terrifying things with Solaris! Yeah, it seemed to work for a while, then eventually something went horribly wrong and everything detonated. Also got "slow database query" warnings in the GTS logs. |
So is this a Solaris 11, or an illumos OS? The problem with Solaris, is that:
Given that I'm personally uninterested in implementing POSIX locks, the result will always be rather limited. I've tried to come up with some possible simplifications, but POSIX locks simply defy all logic. That said this doesn't appear to be a locking issue, rather an How much memory can a process map? The default SQLite configuration tries to That said, given the locking issues above, if this is Solaris, you can't have more than one connection. I don't know what configuration knobs GoToSocial makes available. |
|
I was wrong, this is trying to allocate the full 4GB. Still not sure if I can/should do something "auto-magically". By default, SQLite asks us to allocate the minimum (hardcoded to 320KB) resizable to a maximum (4GB). This is per connection. Something else could be configured by GoToSocial. This would set the maximum to 64MB (a page is 64KB, 1024 pages are 64MB): Line 14 in 368c900
This would set both minimum and maximum to 64MB: go-sqlite3/internal/testcfg/testcfg.go Lines 19 to 21 in 368c900
SQLite can probably live with 64MB in 99% of cases (the default cache size is 2MB, so 8MB would cover a lot). But degenerate cases may need a lot more (e.g. inserting large BLOBs, huge WALs, etc). The initial call reserves a lot of address space, but tries not to commit it to memory (which is how it works on Linux, but Solaris may think differently): go-sqlite3/internal/alloc/alloc_unix.go Lines 23 to 28 in 368c900
This ensures we don't need to move memory later, and it allows us to then remap portions of this for shared memory WAL. Not using As for Solaris, this is mostly irrelevant, because it doesn't support decent file locks, and so can't use shared memory WAL. On thing you can try @Here-Be-Saoirse is building with the |
@ncruces This is on Tribblix, an Illumos distro. Apologies, we should have been more clear; we use the term SOlaris for that whole thing |
Oh, that makes a lot more sense. Unlike Solaris, illumos actually has working BSD locks, and they're even compatible with POSIX locks, just like a BSD. They even claim to have OFD locks, which would be better than the BSDs, but it's a bit deceiving, since they span the entire file (so are simply equivalent to, and likely implemented on top of, So, I assume the issue is just that illumos (or the VM it's running on, etc) simply doesn't like the idea of reserving 4GB of address space for each connection. Which, tbh, isn't all that surprising. There's a reason I run my parallel tests with the 64MB limit: GitHub Actions doesn't love me having dozens of connections each eating 4GB of continuous address space (even on Linux, which is unusually liberal about this). This actually validades one of wazero's controversial design decisions: implement sandboxing with bounds checks instead of mapping even more memory and using page faults to sandbox (what browsers do). So I guess the fix here might be for me to change the default maximum address space to reserve, for example:
And then enable shared memory WAL on 32-bit platforms, too. Wdyt @tsmethurst? That's a rather simple change I can commit to the repo, if it makes it easier for y'all to test, but I'd rather release it only after SQLite 3.47 comes out, as both are “major” changes (this should be in a couple of weeks, there's a release candidate, which passes all my tests). |
@ncruces That should work. We'd love to test this on actual Solaris, specifically, SPARC64 D Solaris 10u11, but.... golang. Golang is incredibly unfriendly to ports to as-yet unsupported CPU architectures. There doesn't seem to be a frjom-zero bootstrap process, you seem to need golang to build golang, which...... is certainly a choice. Bloody modern developers |
SPARC64 Solaris 10u11, even. The 'D' there was unintended |
I'll get a gotosocial pr up in a few :) |
To configure What that won't do is solve WAL on 32-bit platforms for you. Unless you want me to fast track that? I'd rather avoid release notes mentioning breaking compatibility every week. 😬 |
Solaris proper, even if it had a working toolchain, has the file locking issue. It's really insane, IMO, that just opening and closing a file releases locks acquired through another file descriptor. As if I controlled what files other parts of the process open. |
@ncruces That's janky as hell. We're guessing that's something Illumos fixed, that <company who's name we shall not speak because of what they did to Sun Microsystems> were too up the ass of their own Linux cloud offerings to bother fixing? |
here we are! superseriousbusiness/gotosocial#3441 hopefully we'll be able to sneak in another release-candidate of gotosocial so we can test this final change before release of v0.17.0. don't worry about fast-tracking 32bit WAL support. i'm not even sure it's a platform that we're going to officially support given we only officially support our WebAssembly embedded ffmpeg, and the performance for that in interpreted mode of wazero is not very servicable :') |
That's a actually working as specified for POSIX locks. If you have time to waste: I finally tracked down why this insane behavior was standardized by the POSIX committee by talking to long-time BSD hacker and POSIX standards committee member Kirk McKusick (he of the BSD daemon artwork). As he recalls, AT&T brought the current behavior to the standards committee as a proposal for byte-range locking, as this was how their current code implementation worked. The committee asked other ISVs if this was how locking should be done. The ISVs who cared about byte range locking were the large database vendors such as Oracle, Sybase and Informix (at the time). All of these companies did their own byte range locking within their own applications, none of them depended on or needed the underlying operating system to provide locking services for them. So their unanimous answer was "we don't care". In the absence of any strong negative feedback on a proposal, the committee added it "as-is", and took as the desired behavior the specifics of the first implementation, the brain-dead one from AT&T. |
oh goooods that's jank. That's...... bad. that's extremely not good. |
Very interesting reading, thanks for the investigations! Kim knows way more about the way we use wazero in GtS so I'll defer to her :) But yeah as she said, please don't fast track anything for our sake, it's not necessary! |
One thing that could be interesting to test here is changing wazero to support the compiler on illumos. The requirements for the compiler aren't that big. It hasn't been opened up to other platforms more for lack of testing than it being hard. It's probably a matter of checking out the code, looking for places that have a I bet that OpenBSD wouldn't work (because “security” that's going to place restrictions on JIT, and us not wanting to deal with that), but other BSDs and illumos might. I can't say I'm sure I'd be able to convince the other wazero maintainers to merge something like that, but “I tested this in actual hardware, it works, and I'll keep testing it” can't hurt. |
This is released, and tested in CI. |
A GoToSocial user is trying some wacky deployment types and is encountering some interesting issues trying to run GtS on the abovementioned operating system. They managed to get some logs out of it which seem pretty interesting!
Do you have any idea what might be causing this? Is there maybe some config change they could make?
The text was updated successfully, but these errors were encountered: