-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: thread has overflowed its stack #816
Comments
Could you check where is the stack overflowing? Depending on where it is, it could mean different things:
In the first case we should report it to LLVM and wait until it gets solved (or try to solve it ourselves, of course). The latter case is just a limitation of the current architectures and there's not much more we can do other than just invoking the contract with a bigger stack, but you could end up with gigabytes of stack depending on the program. Some lore about this: Sierra doesn't have the concept of loops; all loops are implemented as recursive functions. Every iteration of the recursive function will use part of the stack. We implemented a tail recursion to iteration conversion for tail-recursive functions so that we can run loop with more than a few hundred/thousand iterations, but this transformation only works for tail-recursive functions. Non-tail-recursive functions are still implemented normally. More information in A workaround for tests which are known to fail with stack overflow can be to run the test in a separate thread with a custom size stack. Rust provides an API to do that. |
@azteca1998 im not super familiar with debugging rust panics like this: what's the usual process to find where the stack is overflowing? |
ok - more update on this: I ran the test with More details: this specific test has a lot of recursions, running 1024 nested ethereum calls, and is quite heavy. Thus I think that's why we hit the default rust stack limits. |
Oh, I see. If I understood correctly it's not that there's a non-tail-recursive loop but rather that there's a lot of nested contract calls, right? We hadn't seen any crash because of that yet. This is (more or less) the same case as a stack overflow due to non-tail-recursive loop; that is, we can't do much more than increase the stack size. There is one difference though: we can do something about it. Once rust stabilizes coroutines we could make the runner use the coroutine, which should have its own separate stack instead of the main one. A different approach would be to start them in threads but they're not as cheap. |
Actually, there's only a single Starknet contract call to the Kakarot main contract. However, there are roughly 1024 EVM contract calls executed inside this test, which is perform a lot of operations. So in the end it's just a very heavy computation |
Then it might be that the VM's main loop is not tail recursive. |
I have a question, if the VM is more resource intensive than Native (thinking on the fact that segments can only grow), shouldn't VM execution crash as well? |
Not really. As you said, the VM uses segments which are just This is not the case in native because we're using the CPU's stack, which is just a pointer. The program doesn't know how much stack it has remaining, and even if we know I'm not sure how we could manipulate the MLIR to check that and avoid crashes. What could work is hooking into the |
Repro:
scarb 2.8.3
clone https://github.com/kkrt-labs/kakarot-ssj on main
scarb build -p contracts
- builds the required contract with some debug print logsclone https://github.com/kkrt-labs/ef-tests
ensure correct branch
git checkout feat/cairo-native
make
make setup-kakarot
copy the contracts built previously:
cp ../kakarot-ssj/target/dev/contracts_* build/v1
cargo test test_baseFeeDiffPlaces_d34g0v0_Cancun --features v1,native -- --nocapture
output:
The text was updated successfully, but these errors were encountered: