Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: resource temporarily unavailable on SunOS Elle 5.11 tribblix-m35 i86pc i386 i86pc #168

Closed
tsmethurst opened this issue Oct 13, 2024 · 20 comments · Fixed by #170
Closed
Labels
bug Something isn't working

Comments

@tsmethurst
Copy link

A GoToSocial user is trying some wacky deployment types and is encountering some interesting issues trying to run GtS on the abovementioned operating system. They managed to get some logs out of it which seem pretty interesting!

panic: resource temporarily unavailable
 
goroutine 6 [running]:
github.com/ncruces/go-sqlite3/internal/alloc.Virtual(0x3?, 0x100000000)
        github.com/ncruces/[email protected]/internal/alloc/alloc_unix.go:27 +0xd2
github.com/tetratelabs/wazero/experimental.MemoryAllocatorFunc.Allocate(0x260e660?, 0x40df01?, 0xc0336c3250?)
        github.com/tetratelabs/[email protected]/experimental/memory.go:28 +0x1c
github.com/tetratelabs/wazero/internal/wasm.NewMemoryInstance(0xc001cc8110, {0x2f33a60?, 0x2ab5e78?}, {0x2f4dce8, 0xc0334c76c0})
        github.com/tetratelabs/[email protected]/internal/wasm/memory.go:78 +0xa8
github.com/tetratelabs/wazero/internal/wasm.(*ModuleInstance).buildMemory(0xc0337a0600, 0xc0016fa4e0, {0x2f33a60?, 0x2ab5e78?})
        github.com/tetratelabs/[email protected]/internal/wasm/module.go:653 +0x4c
github.com/tetratelabs/wazero/internal/wasm.(*Store).instantiate(0xc000116fc0, {0x2f3f738, 0xc0337dbd40}, 0xc0016fa4e0, {0x0, 0x0}, 0xc033b8e2c0, {0xc001f0c000, 0x3a, 0x3a})
        github.com/tetratelabs/[email protected]/internal/wasm/store.go:370 +0x365
github.com/tetratelabs/wazero/internal/wasm.(*Store).Instantiate(0xc000116fc0, {0x2f3f738, 0xc0337dbd40}, 0x30?, {0x0?, 0x30?}, 0xc00006e908?, {0xc001f0c000, 0x3a, 0x3a})
        github.com/tetratelabs/[email protected]/internal/wasm/store.go:327 +0x5a
github.com/tetratelabs/wazero.(*runtime).InstantiateModule(0xc0016f8720, {0x2f3f738, 0xc0337dbd40}, {0x2f48bd0, 0xc001f0a040}, {0x2f56a40, 0xc0051e8d20})
        github.com/tetratelabs/[email protected]/runtime.go:318 +0x1e5
github.com/ncruces/go-sqlite3.instantiateSQLite()
        github.com/ncruces/[email protected]/sqlite.go:99 +0x2be
github.com/ncruces/go-sqlite3.newConn({0x2f3f700, 0x60d8460}, {0xc000dc25a0, 0x9b}, 0x46)
        github.com/ncruces/[email protected]/conn.go:76 +0xb4
github.com/ncruces/go-sqlite3.OpenContext(...)
        github.com/ncruces/[email protected]/conn.go:49
github.com/ncruces/go-sqlite3/driver.(*connector).Connect(0xc0016bae10, {0x2f3f700, 0x60d8460})
        github.com/ncruces/[email protected]/driver/driver.go:212 +0xeb
github.com/superseriousbusiness/gotosocial/internal/db/sqlite.(*sqliteConnector).Connect(0x2e5337dbad0?, {0x2f3f700?, 0x60d8460?})
        github.com/superseriousbusiness/gotosocial/internal/db/sqlite/driver.go:64 +0x2a
database/sql.(*DB).conn(0xc0016d8c30, {0x2f3f700, 0x60d8460}, 0x1)
        database/sql/sql.go:1415 +0x71e
database/sql.(*DB).query(0xc0016d8c30, {0x2f3f700, 0x60d8460}, {0xc033b9a000, 0x19}, {0x0, 0x0, 0x0}, 0x0?)
        database/sql/sql.go:1749 +0x57
database/sql.(*DB).QueryContext.func1(0xb0?)
        database/sql/sql.go:1732 +0x4f
database/sql.(*DB).retry(0x411c01?, 0xc0336c3b90)
        database/sql/sql.go:1566 +0x42
database/sql.(*DB).QueryContext(0x0?, {0x2f3f700?, 0x60d8460?}, {0xc033b9a000?, 0x0?}, {0x0?, 0x2f3f700?, 0x60d8460?})
        database/sql/sql.go:1731 +0xc5
github.com/uptrace/bun.(*baseQuery).scan(0xc033ab83c0, {0x2f3f700?, 0x60d8460?}, {0x2f40128?, 0xc033ab83c0?}, {0xc033b9a000, 0x19}, {0x2f368a0, 0xc033753720}, 0x1)
        github.com/uptrace/[email protected]/query_base.go:562 +0x10c
github.com/uptrace/bun.(*SelectQuery).Scan(0xc033ab83c0, {0x2f3f700, 0x60d8460}, {0xc03379b150?, 0x25aca40?, 0x58?})
        github.com/uptrace/[email protected]/query_select.go:885 +0x165
github.com/superseriousbusiness/gotosocial/internal/db/bundb.(*applicationDB).GetAllTokens(0xc0053c7370, {0x2f3f700, 0x60d8460})
        github.com/superseriousbusiness/gotosocial/internal/db/bundb/application.go:142 +0x42d
github.com/superseriousbusiness/gotosocial/internal/oauth.(*tokenStore).sweep(0xc0000558a0, {0x2f3f700, 0x60d8460})
        github.com/superseriousbusiness/gotosocial/internal/oauth/tokenstore.go:71 +0x3b
github.com/superseriousbusiness/gotosocial/internal/oauth.newTokenStore.func1({0x2f3f700, 0x60d8460}, 0xc0000558a0)
        github.com/superseriousbusiness/gotosocial/internal/oauth/tokenstore.go:58 +0xfb
created by github.com/superseriousbusiness/gotosocial/internal/oauth.newTokenStore in goroutine 1
        github.com/superseriousbusiness/gotosocial/internal/oauth/tokenstore.go:49 +0xf1

Do you have any idea what might be causing this? Is there maybe some config change they could make?

@NyaaaWhatsUpDoc
Copy link
Contributor

commenting here so i get notified for this thread, and because it's a frankly fascinating setup :p

though i would not be surprised if wazero ended up being the limiting factor in this situation, rather than SQLite

@Here-Be-Saoirse
Copy link

we're the user doing terrifying things with Solaris! Yeah, it seemed to work for a while, then eventually something went horribly wrong and everything detonated. Also got "slow database query" warnings in the GTS logs.

@ncruces
Copy link
Owner

ncruces commented Oct 13, 2024

So is this a Solaris 11, or an illumos OS?

The problem with Solaris, is that:

  1. it doesn't support OFD locks (to be frank, only Linux does, and on macOS they're a private API)
  2. their BSD flock “compatiblity” implementation is broken.

Given that I'm personally uninterested in implementing POSIX locks, the result will always be rather limited. I've tried to come up with some possible simplifications, but POSIX locks simply defy all logic.

That said this doesn't appear to be a locking issue, rather an mmap issue.

How much memory can a process map? The default SQLite configuration tries to mmap 4GB per connection, which is probably excessive. It's uncommitted memory, so just reserving address space, which works out great for Linux 64-bit, (even with overcommit disabled), but maybe it breaks other Unix OSes.

That said, given the locking issues above, if this is Solaris, you can't have more than one connection. I don't know what configuration knobs GoToSocial makes available.

@ncruces
Copy link
Owner

ncruces commented Oct 13, 2024

If I understand this correctly, this is trying to allocate 256MB, and mmap is returning EAGAIN:

github.com/ncruces/go-sqlite3/internal/alloc.Virtual(0x3?, 0x100000000)

So there's already some configuration in place (otherwise we would be allocating the full 4GB).

I'm not sure there's anything I can do here, besides maybe disabling the use of mmap on Solaris (it seems to be of little benefit).

@ncruces
Copy link
Owner

ncruces commented Oct 14, 2024

I was wrong, this is trying to allocate the full 4GB. Still not sure if I can/should do something "auto-magically".

By default, SQLite asks us to allocate the minimum (hardcoded to 320KB) resizable to a maximum (4GB). This is per connection. Something else could be configured by GoToSocial.

This would set the maximum to 64MB (a page is 64KB, 1024 pages are 64MB):

RuntimeConfig = wazero.NewRuntimeConfig().WithMemoryLimitPages(1024)

This would set both minimum and maximum to 64MB:

sqlite3.RuntimeConfig = wazero.NewRuntimeConfig().
WithMemoryCapacityFromMax(true).
WithMemoryLimitPages(1024)

SQLite can probably live with 64MB in 99% of cases (the default cache size is 2MB, so 8MB would cover a lot). But degenerate cases may need a lot more (e.g. inserting large BLOBs, huge WALs, etc).

The initial call reserves a lot of address space, but tries not to commit it to memory (which is how it works on Linux, but Solaris may think differently):

// Reserve max bytes of address space, to ensure we won't need to move it.
// A protected, private, anonymous mapping should not commit memory.
b, err := unix.Mmap(-1, 0, int(max), unix.PROT_NONE, unix.MAP_PRIVATE|unix.MAP_ANON)
if err != nil {
panic(err)
}

This ensures we don't need to move memory later, and it allows us to then remap portions of this for shared memory WAL. Not using mmap means we can't use shared memory WAL, as does moving the memory. Figuring out a better strategy would allow (e.g.) Linux 32-bit to support shared memory WAL (fixing many warnings here, @tsmethurst, @NyaaaWhatsUpDoc).

As for Solaris, this is mostly irrelevant, because it doesn't support decent file locks, and so can't use shared memory WAL. On thing you can try @Here-Be-Saoirse is building with the sqlite3_nosys tag and report back.

@ncruces ncruces added the enhancement New feature or request label Oct 14, 2024
@Here-Be-Saoirse
Copy link

@ncruces This is on Tribblix, an Illumos distro. Apologies, we should have been more clear; we use the term SOlaris for that whole thing

@ncruces
Copy link
Owner

ncruces commented Oct 14, 2024

@ncruces This is on Tribblix, an Illumos distro. Apologies, we should have been more clear; we use the term SOlaris for that whole thing

Oh, that makes a lot more sense.

Unlike Solaris, illumos actually has working BSD locks, and they're even compatible with POSIX locks, just like a BSD. They even claim to have OFD locks, which would be better than the BSDs, but it's a bit deceiving, since they span the entire file (so are simply equivalent to, and likely implemented on top of, flock).

So, I assume the issue is just that illumos (or the VM it's running on, etc) simply doesn't like the idea of reserving 4GB of address space for each connection. Which, tbh, isn't all that surprising. There's a reason I run my parallel tests with the 64MB limit: GitHub Actions doesn't love me having dozens of connections each eating 4GB of continuous address space (even on Linux, which is unusually liberal about this).

This actually validades one of wazero's controversial design decisions: implement sandboxing with bounds checks instead of mapping even more memory and using page faults to sandbox (what browsers do).

So I guess the fix here might be for me to change the default maximum address space to reserve, for example:

  • 256MB on 64-bit platforms;
  • 32MB on 32-bit platforms.

And then enable shared memory WAL on 32-bit platforms, too. Wdyt @tsmethurst?

That's a rather simple change I can commit to the repo, if it makes it easier for y'all to test, but I'd rather release it only after SQLite 3.47 comes out, as both are “major” changes (this should be in a couple of weeks, there's a release candidate, which passes all my tests).

@Here-Be-Saoirse
Copy link

@ncruces That should work. We'd love to test this on actual Solaris, specifically, SPARC64 D Solaris 10u11, but.... golang. Golang is incredibly unfriendly to ports to as-yet unsupported CPU architectures. There doesn't seem to be a frjom-zero bootstrap process, you seem to need golang to build golang, which...... is certainly a choice. Bloody modern developers

@Here-Be-Saoirse
Copy link

SPARC64 Solaris 10u11, even. The 'D' there was unintended

@NyaaaWhatsUpDoc
Copy link
Contributor

I'll get a gotosocial pr up in a few :)

@ncruces
Copy link
Owner

ncruces commented Oct 14, 2024

I'll get a gotosocial pr up in a few :)

To configure WithMemoryLimitPages()? That's a good idea; I'll probably do it in go-sqlite3 too (as I wrote above), but that's more immediate for you and I'll love the feedback.

What that won't do is solve WAL on 32-bit platforms for you. Unless you want me to fast track that? I'd rather avoid release notes mentioning breaking compatibility every week. 😬

@ncruces
Copy link
Owner

ncruces commented Oct 14, 2024

SPARC64 Solaris 10u11, even. The 'D' there was unintended

Solaris proper, even if it had a working toolchain, has the file locking issue.

It's really insane, IMO, that just opening and closing a file releases locks acquired through another file descriptor. As if I controlled what files other parts of the process open.

@Here-Be-Saoirse
Copy link

@ncruces That's janky as hell. We're guessing that's something Illumos fixed, that <company who's name we shall not speak because of what they did to Sun Microsystems> were too up the ass of their own Linux cloud offerings to bother fixing?

@NyaaaWhatsUpDoc
Copy link
Contributor

I'll get a gotosocial pr up in a few :)

To configure WithMemoryLimitPages()? That's a good idea; I'll probably do it in go-sqlite3 too (as I wrote above), but that's more immediate for you and I'll love the feedback.

What that won't do is solve WAL on 32-bit platforms for you. Unless you want me to fast track that? I'd rather avoid release notes mentioning breaking compatibility every week. 😬

here we are! superseriousbusiness/gotosocial#3441

hopefully we'll be able to sneak in another release-candidate of gotosocial so we can test this final change before release of v0.17.0.

don't worry about fast-tracking 32bit WAL support. i'm not even sure it's a platform that we're going to officially support given we only officially support our WebAssembly embedded ffmpeg, and the performance for that in interpreted mode of wazero is not very servicable :')

@ncruces
Copy link
Owner

ncruces commented Oct 15, 2024

@ncruces That's janky as hell. We're guessing that's something Illumos fixed, that <company who's name we shall not speak because of what they did to Sun Microsystems> were too up the ass of their own Linux cloud offerings to bother fixing?

That's a actually working as specified for POSIX locks. If you have time to waste:
https://www.samba.org/samba/news/articles/low_point/tale_two_stds_os2.html

I finally tracked down why this insane behavior was standardized by the POSIX committee by talking to long-time BSD hacker and POSIX standards committee member Kirk McKusick (he of the BSD daemon artwork). As he recalls, AT&T brought the current behavior to the standards committee as a proposal for byte-range locking, as this was how their current code implementation worked. The committee asked other ISVs if this was how locking should be done. The ISVs who cared about byte range locking were the large database vendors such as Oracle, Sybase and Informix (at the time). All of these companies did their own byte range locking within their own applications, none of them depended on or needed the underlying operating system to provide locking services for them. So their unanimous answer was "we don't care". In the absence of any strong negative feedback on a proposal, the committee added it "as-is", and took as the desired behavior the specifics of the first implementation, the brain-dead one from AT&T.

@Here-Be-Saoirse
Copy link

oh goooods that's jank. That's...... bad. that's extremely not good.

@tsmethurst
Copy link
Author

Very interesting reading, thanks for the investigations! Kim knows way more about the way we use wazero in GtS so I'll defer to her :) But yeah as she said, please don't fast track anything for our sake, it's not necessary!

@ncruces
Copy link
Owner

ncruces commented Oct 15, 2024

One thing that could be interesting to test here is changing wazero to support the compiler on illumos.

The requirements for the compiler aren't that big. It hasn't been opened up to other platforms more for lack of testing than it being hard. It's probably a matter of checking out the code, looking for places that have a freebsd build tag, adding illumos and testing.

I bet that OpenBSD wouldn't work (because “security” that's going to place restrictions on JIT, and us not wanting to deal with that), but other BSDs and illumos might.

I can't say I'm sure I'd be able to convince the other wazero maintainers to merge something like that, but “I tested this in actual hardware, it works, and I'll keep testing it” can't hurt.

@ncruces ncruces added bug Something isn't working and removed enhancement New feature or request labels Oct 17, 2024
@ncruces
Copy link
Owner

ncruces commented Oct 30, 2024

SPARC64 Solaris 10u11, even. The 'D' there was unintended

If you ever get Go running, you can try #179 (and, maybe later, #180) on Solaris.

The only dependency for sqlite3_dotlk is an atomic Mkdir, and it may eventually support single process “shared memory” WAL.

@ncruces
Copy link
Owner

ncruces commented Nov 6, 2024

This is released, and tested in CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants