Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Custom substrate node does not start in a VM without --wasm-execution interpreted-i-know-what-i-do #12073

Closed
2 tasks done
gdethier opened this issue Aug 19, 2022 · 14 comments · Fixed by #12096
Closed
2 tasks done
Labels
J2-unconfirmed Issue might be valid, but it’s not yet known.

Comments

@gdethier
Copy link

Is there an existing issue?

  • I have searched the existing issues

Experiencing problems? Have you tried our Stack Exchange first?

  • This is not a support question.

Description of bug

After an upgrade of the dependencies of our node implementation from substrate polkadot-v0.9.25 to polkadot-v0.9.26, our nodes do not start up on some machines. The following error message is shown:

Cannot create a runtime error=Other("cannot create module: compilation settings are not compatible with the native host")

In this issue, it is mentioned that the problem could come from virtualization, which is indeed the case here (we are using a fully virtualized test infrastructure). Running the same executable on a bare metal machine does not produce the problem which enforces the hypothesis.

A workaround is to pass the option --wasm-execution interpreted-i-know-what-i-do, then the nodes start up as usual and everything works as expected.

The issue above suggests that the problem is coming from wasmtime which has been upgraded to 0.38.x end of June. Maybe that another upgrade would help?

Steps to reproduce

  1. Build this version of our node
  2. Run it in a qemu VM with --dev flag
  3. The node does not start and an error message is displayed
@github-actions github-actions bot added the J2-unconfirmed Issue might be valid, but it’s not yet known. label Aug 19, 2022
@pepyakin
Copy link
Contributor

Good to know that it can be reproduced this way, thanks!

The issue you linked was about our code stripping the wasmtime error details. If you used a fresh enough version of Substrate with that fix, you should be able to see more information. I would be interested to see what wasmtime returns for this error. Is it there, and is there any chance you can post it here?

@bkchr
Copy link
Member

bkchr commented Aug 19, 2022

For the record, the problem with qemu should be "solved": #11722

@pepyakin
Copy link
Contributor

Hmm, I am a bit confused: how would it fix the problem at hand?

From what I see, this problem is about wasmtime bailing early because it finds the cpuid is not compatible with wasmtime's requirements somehow.

The problem in #11722 is a problem that only shows up at runtime and leads to unexpected results.

@bkchr
Copy link
Member

bkchr commented Aug 19, 2022

The problem in #11722 is a problem that only shows up at runtime and leads to unexpected results.

Yeah good point! I should have thought more than 5 seconds about this :P So, we can just ignore what I said.

Back to this. Weird that it fails with Qemu. I'm using Qemu also on my laptop and it works.

@pepyakin
Copy link
Contributor

pepyakin commented Aug 19, 2022

Yeah, qemu-x86_64 version 6.1.0 running wasmtime configured with defaults would report

Error: Unsupported feature: SIMD support requires SSE3, SSSE3, SSE4.1, and SSE4.2 on x86_64.

However, Substrate does not use SIMD, and disabling it via wasmtime's config makes it run just fine under qemu.

I cannot run Substrate itself, though, because of a SIGILL coming from qemu itself.

so IOW: works for me.

Let's hope that logs by @gdethier shed some light on this one.

@gdethier
Copy link
Author

@pepyakin I guess that if I upgrade dependencies to polkadot-v0.9.27, I'll get the fix about wasmtime error reporting? I'll do that and post the logs by tomorrow.

@gdethier
Copy link
Author

@pepyakin After upgrade to polkadot-v0.9.27, here are the startup logs of the node. I guess the steps to reproduce I provided are actually incomplete as the problem is linked to the hypervisor's hardware/configuration rather than qemu itself.

without --wasm-execution interpreted-i-know-what-i-do

2022-08-23 07:33:01 logion Node    
2022-08-23 07:33:01 ✌️  version 4.0.0-c0495d1cdb3    
2022-08-23 07:33:01 ❤️  by Logion Team <https://github.com/logion-network>, 2017-2022    
2022-08-23 07:33:01 📋 Chain specification: logion Testnet    
2022-08-23 07:33:01 🏷  Node name: Charlie    
2022-08-23 07:33:01 👤 Role: AUTHORITY    
2022-08-23 07:33:01 💾 Database: RocksDb at ./data/chains/logion_test_testnet/db/full    
2022-08-23 07:33:01 ⛓  Native runtime: logion-116 (logion-2.tx5.au1)    
Error: Service(Client(RuntimeApiError(Application(VersionInvalid("cannot create module: compilation settings are not compatible with the native host: compilation setting \"has_sse41\" is enabled, but not available on the host")))))
2022-08-23 07:33:01 Cannot create a runtime error=Other("cannot create module: compilation settings are not compatible with the native host: compilation setting \"has_sse41\" is enabled, but not available on the host")

with --wasm-execution interpreted-i-know-what-i-do

2022-08-23 07:37:49 logion Node    
2022-08-23 07:37:49 ✌️  version 4.0.0-c0495d1cdb3    
2022-08-23 07:37:49 ❤️  by Logion Team <https://github.com/logion-network>, 2017-2022    
2022-08-23 07:37:49 📋 Chain specification: logion Testnet    
2022-08-23 07:37:49 🏷  Node name: Charlie    
2022-08-23 07:37:49 👤 Role: AUTHORITY    
2022-08-23 07:37:49 💾 Database: RocksDb at ./data/chains/logion_test_testnet/db/full    
2022-08-23 07:37:49 ⛓  Native runtime: logion-116 (logion-2.tx5.au1)    
2022-08-23 07:37:49 Using default protocol ID "sup" because none is configured in the chain specs    
2022-08-23 07:37:49 🏷  Local node identity is: 12D3KooWJvyP3VJYymTqG7eH4PM5rN4T2agk5cdNCfNymAqwqcvZ    
2022-08-23 07:37:49 🔍 Discovered new external address for our node: --EDITED--
2022-08-23 07:37:49 💻 Operating system: linux    
2022-08-23 07:37:49 💻 CPU architecture: x86_64    
2022-08-23 07:37:49 💻 Target environment: gnu    
2022-08-23 07:37:49 💻 CPU: QEMU Virtual CPU version 2.5+    
2022-08-23 07:37:49 💻 CPU cores: 4    
2022-08-23 07:37:49 💻 Memory: 7951MB    
2022-08-23 07:37:49 💻 Kernel: 5.15.0-39-generic    
2022-08-23 07:37:49 💻 Linux distribution: Ubuntu 21.04    
2022-08-23 07:37:49 💻 Virtual machine: yes

@bkchr
Copy link
Member

bkchr commented Aug 23, 2022

Looks like it could be fixed by this: bytecodealliance/wasmtime#4231?

@gdethier
Copy link
Author

Someone pointed me to this issue: AstarNetwork/Astar#727

Long story short, the reporter has the same error message (compilation setting \"has_sse41\" is enabled, but not available on the host) after an upgrade of node executed in a Proxmox VM. The problem was fixed by changing the CPU type of the VM (i.e. enable sse4.1 instruction set).

In our case however, we do not have access to this (the VM is provided by our hosting provider).

@bkchr
Copy link
Member

bkchr commented Aug 23, 2022

@gdethier when I give you a Substrate branch, can you try this?

@gdethier
Copy link
Author

@gdethier when I give you a Substrate branch, can you try this?

Sure!

@bkchr
Copy link
Member

bkchr commented Aug 23, 2022

bkchr-upgrade-wasmtime-0.40

@bkchr
Copy link
Member

bkchr commented Aug 23, 2022

Please report back @gdethier when you have tested it!

@gdethier
Copy link
Author

@bkchr The code in your branch fixes the issue!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
J2-unconfirmed Issue might be valid, but it’s not yet known.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants