Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PRuntime from multiple nodes crashed at the same time #1087

Closed
jimiflowers opened this issue Dec 28, 2022 · 5 comments
Closed

PRuntime from multiple nodes crashed at the same time #1087

jimiflowers opened this issue Dec 28, 2022 · 5 comments

Comments

@jimiflowers
Copy link

jimiflowers commented Dec 28, 2022

Hi. I have 4 nodes (solo-mining) running all in the same network (with NAT). Last night, in all of them PRuntime process crashed at the same time (in a 5 minutes time period, between 23:55:00 and 23:59:59). The error is the same in all of them:

[2022-12-27T23:56:04.132283Z INFO  enclaveapp] pRPC status code: 200, data len: 5
[2022-12-27T23:56:04.141889Z INFO  rocket::server] Outcome: Success
[2022-12-27T23:56:04.141935Z INFO  rocket::server] Response succeeded.
[2022-12-27T23:56:04.205252Z INFO  rocket::server] POST /prpc/PhactoryAPI.DispatchBlocks:
[2022-12-27T23:56:04.205277Z INFO  rocket::server] Matched: (prpc_proxy) POST /prpc/<method>
[2022-12-27T23:56:04.217149Z INFO  phactory::prpc_service] Dispatching request: PhactoryAPI.DispatchBlocks
[2022-12-27T23:56:04.226339Z INFO  phactory::prpc_service] dispatch_block from=Some(2995149) to=Some(2995149)
[2022-12-27T23:56:04.226389Z INFO  phactory::prpc_service] Dispatching block: 2995149
[2022-12-27T23:56:04.303587Z INFO  phactory::prpc_service] State synced
[2022-12-27T23:56:04.304596Z INFO  phactory] Taking checkpoint...
memory allocation of 272629776 bytes failed
[2022-12-27T23:56:04.314467Z ERROR app] [-] ECALL Enclave Failed SGX_ERROR_ENCLAVE_CRASHED!
[2022-12-27T23:56:04.314611Z INFO  rocket::server] Outcome: Success
[2022-12-27T23:56:04.314655Z INFO  rocket::server] Response succeeded.
thread 'bench-0' panicked at 'Run benchmark 0 failed', src/main.rs:703:25
stack backtrace:
   0: rust_begin_unwind
             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/std/src/panicking.rs:498:5
   1: core::panicking::panic_fmt
             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/core/src/panicking.rs:106:14
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
thread 'bench-3' panicked at 'Run benchmark 3 failed', src/main.rs:703:25
stack backtrace:
   0: rust_begin_unwind
             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/std/src/panicking.rs:498:5
   1: core::panicking::panic_fmt
             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/core/src/panicking.rs:106:14
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
thread 'bench-4' panicked at 'Run benchmark 4 failed', src/main.rs:703:25
stack backtrace:
   0: rust_begin_unwind
             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/std/src/panicking.rs:498:5
   1: core::panicking::panic_fmt
             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/core/src/panicking.rs:106:14
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
[2022-12-27T23:56:06.484756Z INFO  rocket::server] POST /prpc/PhactoryAPI.GetInfo:
[2022-12-27T23:56:06.484783Z INFO  rocket::server] Matched: (prpc_proxy) POST /prpc/<method>
[2022-12-27T23:56:06.486811Z ERROR app] [-] ECALL Enclave Failed SGX_ERROR_ENCLAVE_CRASHED!
[2022-12-27T23:56:06.486963Z INFO  rocket::server] Outcome: Success
[2022-12-27T23:56:06.487026Z INFO  rocket::server] Response succeeded.
thread 'bench-1' panicked at 'Run benchmark 1 failed', src/main.rs:703:25
stack backtrace:
   0: rust_begin_unwind
             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/std/src/panicking.rs:498:5
   1: core::panicking::panic_fmt
             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/core/src/panicking.rs:106:14
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
[2022-12-27T23:56:08.571979Z INFO  rocket::server] POST /prpc/PhactoryAPI.GetInfo:
[2022-12-27T23:56:08.572005Z INFO  rocket::server] Matched: (prpc_proxy) POST /prpc/<method>
[2022-12-27T23:56:08.573535Z ERROR app] [-] ECALL Enclave Failed SGX_ERROR_ENCLAVE_CRASHED!
[2022-12-27T23:56:08.573561Z INFO  rocket::server] Outcome: Success
[2022-12-27T23:56:08.573601Z INFO  rocket::server] Response succeeded.
[2022-12-27T23:56:10.637287Z INFO  rocket::server] POST /prpc/PhactoryAPI.GetInfo:
[2022-12-27T23:56:10.637312Z INFO  rocket::server] Matched: (prpc_proxy) POST /prpc/<method>
[2022-12-27T23:56:10.639062Z ERROR app] [-] ECALL Enclave Failed SGX_ERROR_ENCLAVE_CRASHED!
[2022-12-27T23:56:10.639087Z INFO  rocket::server] Outcome: Success
[2022-12-27T23:56:10.639127Z INFO  rocket::server] Response succeeded.
thread 'bench-2' panicked at 'Run benchmark 2 failed', src/main.rs:703:25
stack backtrace:
   0: rust_begin_unwind
             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/std/src/panicking.rs:498:5
   1: core::panicking::panic_fmt
             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/core/src/panicking.rs:106:14
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
[2022-12-27T23:56:12.710171Z INFO  rocket::server] POST /prpc/PhactoryAPI.GetInfo:
[2022-12-27T23:56:12.710195Z INFO  rocket::server] Matched: (prpc_proxy) POST /prpc/<method>
[2022-12-27T23:56:12.712162Z ERROR app] [-] ECALL Enclave Failed SGX_ERROR_ENCLAVE_CRASHED!
[2022-12-27T23:56:12.712209Z INFO  rocket::server] Outcome: Success
[2022-12-27T23:56:12.712259Z INFO  rocket::server] Response succeeded.
[2022-12-27T23:56:14.787281Z INFO  rocket::server] POST /prpc/PhactoryAPI.GetInfo:
[2022-12-27T23:56:14.787304Z INFO  rocket::server] Matched: (prpc_proxy) POST /prpc/<method>
[2022-12-27T23:56:14.788861Z ERROR app] [-] ECALL Enclave Failed SGX_ERROR_ENCLAVE_CRASHED!
[2022-12-27T23:56:14.788883Z INFO  rocket::server] Outcome: Success
[2022-12-27T23:56:14.788923Z INFO  rocket::server] Response succeeded.
[2022-12-27T23:56:16.898260Z INFO  rocket::server] POST /prpc/PhactoryAPI.GetInfo:
[2022-12-27T23:56:16.898283Z INFO  rocket::server] Matched: (prpc_proxy) POST /prpc/<method>
[2022-12-27T23:56:16.899986Z ERROR app] [-] ECALL Enclave Failed SGX_ERROR_ENCLAVE_CRASHED!
[2022-12-27T23:56:16.900010Z INFO  rocket::server] Outcome: Success
[2022-12-27T23:56:16.900054Z INFO  rocket::server] Response succeeded.
[2022-12-27T23:56:18.972754Z INFO  rocket::server] POST /prpc/PhactoryAPI.GetInfo:
[2022-12-27T23:56:18.972782Z INFO  rocket::server] Matched: (prpc_proxy) POST /prpc/<method>
[2022-12-27T23:56:18.974928Z ERROR app] [-] ECALL Enclave Failed SGX_ERROR_ENCLAVE_CRASHED!

Restarting only PRuntime did not have any effect. After a restart of all processes they are working fine for some minutes, but it crash again with the same error.

Here you are the logs from 2 of the 4 nodes.

node1_logs.tar.gz
node2_logs.tar.gz

@jimiflowers
Copy link
Author

Hi. New error on the PRuntime process of one of my nodes:

phala-pruntime         | [2022-12-28T10:15:25.369070Z INFO  rocket::server] POST /prpc/PhactoryAPI.SyncParaHeader:
phala-pruntime         | [2022-12-28T10:15:25.369116Z INFO  rocket::server] Matched: (prpc_proxy) POST /prpc/<method>
phala-pruntime         | [2022-12-28T10:15:25.415073Z INFO  phactory::prpc_service] Dispatching request: PhactoryAPI.SyncParaHeader
phala-pruntime         | [2022-12-28T10:15:25.415365Z INFO  phactory::prpc_service] sync_para_header from=Some(2998100) to=Some(2998100)
phala-pruntime         | [2022-12-28T10:15:25.415631Z INFO  enclaveapp] pRPC status code: 200, data len: 5
phala-pruntime         | [2022-12-28T10:15:25.443533Z INFO  rocket::server] Outcome: Success
phala-pruntime         | [2022-12-28T10:15:25.443630Z INFO  rocket::server] Response succeeded.
phala-pruntime         | [2022-12-28T10:15:26.300826Z INFO  rocket::server] POST /prpc/PhactoryAPI.DispatchBlocks:
phala-pruntime         | [2022-12-28T10:15:26.301075Z INFO  rocket::server] Matched: (prpc_proxy) POST /prpc/<method>
phala-pruntime         | [2022-12-28T10:15:26.357305Z INFO  phactory::prpc_service] Dispatching request: PhactoryAPI.DispatchBlocks
phala-pruntime         | [2022-12-28T10:15:26.387950Z INFO  phactory::prpc_service] dispatch_block from=Some(2998100) to=Some(2998100)
phala-pruntime         | [2022-12-28T10:15:26.388072Z INFO  phactory::prpc_service] Dispatching block: 2998100
phala-pruntime         | [2022-12-28T10:15:40.183191Z INFO  phactory::prpc_service] State synced
phala-pruntime         | [2022-12-28T10:15:40.704311Z INFO  phactory] Taking checkpoint...
phala-pruntime         | memory allocation of 272629776 bytes failed
phala-pruntime         | ./start_pruntime.sh: line 35:    23 Illegal instruction     (core dumped) STATE_FILE_PATH="$STATE_FILE_PATH" ./app --remove-corrupted-checkpoint $EXTRA_OPTS

@jasl
Copy link
Collaborator

jasl commented Dec 28, 2022

Don't worry, our fault, recently chain has too much unepxect data should be cleansed, we're preparing a mitigation release for pruntime, then make fix for our chain

@jimiflowers
Copy link
Author

Don't worry, our fault, recently chain has too much unepxect data should be cleansed, we're preparing a mitigation release for pruntime, then make fix for our chain

Ok, thanks mate!

@jasl
Copy link
Collaborator

jasl commented Dec 29, 2022

Sorry I'm very busy yesterday...

phalanetwork/phala-pruntime:v0.2.5-1
DIGEST:sha256:4210f8b1978855b68bd21c65c350f89e1940f2cd99d173bfa99873c1c9a75488

is out, this one will workaround the memory usage problem

@jasl jasl closed this as completed Dec 29, 2022
@jimiflowers
Copy link
Author

Sorry I'm very busy yesterday...

phalanetwork/phala-pruntime:v0.2.5-1
DIGEST:sha256:4210f8b1978855b68bd21c65c350f89e1940f2cd99d173bfa99873c1c9a75488

is out, this one will workaround the memory usage problem

Hi, no worries mate. I pulled it yesterday afternoon and all nodes are running fine. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants