Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a stack size slightly smaller than 1MB #935

Closed
wants to merge 8 commits into from
Closed

Conversation

zah
Copy link
Contributor

@zah zah commented Apr 26, 2020

This is motivated by discussions in #924

The stack size on Android and iOS is 1MB, so we want to detect
any possible stack overflow issues there while we are testing
our software on the desktop. The limit is 1 million bytes instead
of exactly 1MB, so we can be on the safe side.

This is motivated by discussions in #924

The stack size on Android and iOS is 1MB, so we want to detect
any possible stack overflow issues there while we are testing
our software on the desktop. The limit is 1 million bytes instead
of exactly 1MB, so we can be on the safe side.
config.nims Outdated
@@ -7,7 +7,7 @@ if defined(windows):
# disable timestamps in Windows PE headers - https://wiki.debian.org/ReproducibleBuilds/TimestampsInPEBinaries
switch("passL", "-Wl,--no-insert-timestamp")
# increase stack size
switch("passL", "-Wl,--stack,8388608")
switch("passL", "-Wl,--stack,1000000")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only on Windows. For POSIX systems see https://linux.die.net/man/2/setrlimit (with RLIMIT_STACK).

You can set the stack size as a define in "config.nims", so we don't duplicate that value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it's only on Windows that CI's now failing -- coincidence, or?

- make the stack size limit a Nim define and use it in "config.nims" for
  the Windows C compiler flag and in "beacon_node.nim" for the POSIX
  runtime setting.
- warn when the open file limit is bellow 1024 (macOS, by default)
- increase the macOS open file limit in Travis
- add "-Wstack-usage=..." C compiler flag. No GCC warnings visible at the
  stack limit we set, but any would be visible by defining "cwarnings"
  in Nim.
@stefantalpalaru
Copy link
Contributor

Can you test if instantiating BeaconState as a stack variable anywhere triggers this warning?

Sure, with make V=1 NIMFLAGS="-d:cwarnings" ... but it's unlikely that single functions will trigger that warning, and it doesn't look like GCC goes deeper than that in its static analysis.

@zah
Copy link
Contributor Author

zah commented Apr 27, 2020

I'm not sure I understand. Why wouldn't a single function trigger this warning if a single BeaconState variable is all it takes to exceed the stack size limit?

Ideally, we want this warning to be enabled in all builds, it shouldn't be opt-in. If it can be an error, that's even better.

@arnetheduck
Copy link
Member

some stack sizes for commonly used stuff (mainnet, 0.11.1):

 2687408  state: BeaconStateObj
    1008  block: SignedBeaconBlock
     280  validator: Validator
     528  attestation: Attestation

@stefantalpalaru
Copy link
Contributor

Why wouldn't a single function trigger this warning if a single BeaconState variable is all it takes to exceed the stack size limit?

You're right, it would.

Ideally, we want this warning to be enabled in all builds, it shouldn't be opt-in. If it can be an error, that's even better.

I can do both, but I have to enable GCC warnings for that and I can't make them any less verbose than this: https://gist.githubusercontent.com/stefantalpalaru/527d234bf64e454c2fc4a91aa3dbe464/raw/708fc6f5de4b34696c2aea74d08d5dda4960b440/gcc_warnings.txt

(not without changing the standard library to add while(1); at the end of all functions marked with __attribute__((noreturn)) that GCC complains about)

@zah
Copy link
Contributor Author

zah commented Apr 27, 2020

Maybe some filtering of the output with grep would solve the problem, but I'm worried from the fact that the Nim compiler used to hang when the C compiler output becomes too large (due to the many warnings). Seems to have been fixed here:
nim-lang/Nim#8648

@zah
Copy link
Contributor Author

zah commented Apr 27, 2020

Grepping seems hard actually, but why is it not possible to disable all GCC warnings except the ones of interest?

@stefantalpalaru
Copy link
Contributor

why is it not possible to disable all GCC warnings except the ones of interest?

We don't have that option: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html

I can either disable all warnings (including warnings as errors) which Nim does by default with -w or I can enable/disable individual types of warnings. The problem is that some warnings have no types.

- also make the stack-usage warning an error - only works with
  `-d:cwarnings`
@stefantalpalaru
Copy link
Contributor

The Windows failure is caused, indeed, by stack overflow, @tersec:

(gdb) bt
#0  ___chkstk_ms ()
    at ../../../../../src/gcc-8.1.0/libgcc/config/i386/cygwin.S:126
#1  0x0000000000472ae4 in readSszValue__UJhGUd6LyYItxQxtGKe36w (
    input=input@entry=0x2e30000 "\230!\035▒▒1\025▒@▒",
    inputLen_0=inputLen_0@entry=524288)
    at C:/Users/user/Desktop/status/nim-beacon-chain/beacon_chain/ssz/bytes_reader.nim:82
#2  0x0000000000472cbe in readSszValue__AyCdOhn6xshyfmu08fNg8w (
    input=input@entry=0x2e30000 "\230!\035▒▒1\025▒@▒", inputLen_0=524288)
    at C:/Users/user/Desktop/status/nim-beacon-chain/beacon_chain/ssz/bytes_reader.nim:96
#3  0x000000000047d01e in readValue__MPVNA1Q8O7kO9b5Hjukj9amw (
    r=r@entry=0x14edc0, val=val@entry=0x14ed48)
    at C:/Users/user/Desktop/status/nim-beacon-chain/beacon_chain/ssz.nim:277
#4  0x000000000045d75b in readValue__d87Ul2v3H9adnJQWwB1zFcw (
    reader=reader@entry=0x14edc0)
    at C:/Users/user/Desktop/status/nim-beacon-chain/vendor/nim-serialization/serialization.nim:45
#5  0x000000000048bed1 in checkSSZ__le3639bwyn7aLL9bTi3kV9cZA (
    dir=dir@entry=0x2d3c048, expectedHash=...)
    at C:/Users/user/Desktop/status/nim-beacon-chain/vendor/nim-serialization/serialization.nim:95
#6  0x000000000049035a in runSSZtests__oXtfhBNqA9bcwoDyWHI0BzA ()
    at C:/Users/user/Desktop/status/nim-beacon-chain/tests/official/test_fixture_ssz_consensus_objects.nim:95
#7  0x0000000000491a5d in NimMainModule ()
    at C:/Users/user/Desktop/status/nim-beacon-chain/tests/official/test_fixture_ssz_consensus_objects.nim:108
#8  0x000000000049166a in NimMain ()
    at C:/Users/user/Desktop/status/nim-beacon-chain/vendor/nim-metrics/metrics.nim:268
#9  0x00000000004a3f7d in main (argc=1, args=0x1e22f0, env=0x1e2580)
    at C:/Users/user/Desktop/status/nim-beacon-chain/vendor/nim-metrics/metrics.nim:275

I don't know how reliable GDB's backtrace is, when frames 8 and 9 have clearly wrong file/line info.

@stefantalpalaru
Copy link
Contributor

New Linux segfault:

./env.sh nim c -d:chronicles_log_level=TRACE -d:const_preset=mainnet -d:ETH2_SPEC="v0.11.3" -d:BLS_ETH2_SPEC="v0.11.x" --verbosity:0 --hints:off -d:usePcreHeader --passL:"-lpcre" -d:release -d:stack_size=1000000 --verbosity:0 --hints:off tests/official/test_fixture_ssz_consensus_objects.nim
gdb ./tests/official/test_fixture_ssz_consensus_objects
Reading symbols from ./tests/official/test_fixture_ssz_consensus_objects...
(gdb) r
Starting program: /mnt/sda3/storage/CODE/status/nim-beacon-chain-clean/tests/official/test_fixture_ssz_consensus_objects 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Stack size successfully limited to 1000000 bytes.
[New Thread 0x7ffff7ab6700 (LWP 1193)]

[Suite] Official - SSZ consensus objects  [Preset: mainnet]
[OK]   Testing    ProposerSlashing
[OK]   Testing    Eth1Block
[OK]   Testing    BeaconBlockHeader
[OK]   Testing    SigningRoot
[OK]   Testing    SignedBeaconBlockHeader
[OK]   Testing    VoluntaryExit
[OK]   Testing    DepositMessage
[OK]   Testing    Checkpoint
[OK]   Testing    SignedVoluntaryExit
[OK]   Testing    AttesterSlashing
[OK]   Testing    BeaconState
[OK]   Testing    IndexedAttestation
[OK]   Testing    Deposit
[OK]   Testing    SignedBeaconBlock
[OK]   Testing    SignedAggregateAndProof
[OK]   Testing    DepositData
[OK]   Testing    Eth1Data
[OK]   Testing    AggregateAndProof
[OK]   Testing    Attestation

Thread 1 "test_fixture_ss" received signal SIGSEGV, Segmentation fault.
sszDecodeEntireInput__tboh6cXwmkbCnrkwUzKiYg (
    input=0x7ffff6d96040 " \243O%?\003y \371_\n\a\334\021\237m\354_\306ݩ\374t\262\240\357\020\213\002>O\353\351\025PÐ\334g\210\214\262y7\020\265W\311[\214\262cO\273\003\366\327p\240\364(J\244\263\363\v\251\316\bT\021w\205Κ\"]\302\375\003\254\242\375q^;mc!`\312\067F\036Ԋ'\321\366\250\255k\312:\276\370V\035\034\347\037\260V\354\005.\265{\353\266H\237\bf\375\273\227\317", <incomplete sequence \327>, inputLen_0=524288)
    at /mnt/sda3/storage/CODE/status/nim-beacon-chain-clean/tests/official/fixtures_utils.nim:60
60	proc sszDecodeEntireInput*(input: openarray[byte], Decoded: type): Decoded =
(gdb) bt
#0  sszDecodeEntireInput__tboh6cXwmkbCnrkwUzKiYg (
    input=0x7ffff6d96040 " \243O%?\003y \371_\n\a\334\021\237m\354_\306ݩ\374t\262\240\357\020\213\002>O\353\351\025PÐ\334g\210\214\262y7\020\265W\311[\214\262cO\273\003\366\327p\240\364(J\244\263\363\v\251\316\bT\021w\205Κ\"]\302\375\003\254\242\375q^;mc!`\312\067F\036Ԋ'\321\366\250\255k\312:\276\370V\035\034\347\037\260V\354\005.\265{\353\266H\237\bf\375\273\227\317", <incomplete sequence \327>, inputLen_0=524288)
    at /mnt/sda3/storage/CODE/status/nim-beacon-chain-clean/tests/official/fixtures_utils.nim:60
#1  0x000055555561a87a in checkSSZ__le3639bwyn7aLL9bTi3kV9cZA (dir=<optimized out>, expectedHash=...)
    at /mnt/sda3/storage/CODE/status/nim-beacon-chain-clean/tests/official/test_fixture_ssz_consensus_objects.nim:61
#2  0x0000555555623706 in runSSZtests__oXtfhBNqA9bcwoDyWHI0BzA_2 ()
    at /mnt/sda3/storage/CODE/status/nim-beacon-chain-clean/tests/official/test_fixture_ssz_consensus_objects.nim:113
#3  0x0000555555628f65 in runSuite__oXtfhBNqA9bcwoDyWHI0BzA ()
    at /mnt/sda3/storage/CODE/status/nim-beacon-chain-clean/tests/official/test_fixture_ssz_consensus_objects.nim:131
#4  0x000055555562981e in NimMainModule () at /mnt/sda3/storage/CODE/status/nim-beacon-chain-clean/tests/testutil.nim:80
#5  0x0000555555629712 in NimMain () at /mnt/sda3/storage/CODE/status/nim-beacon-chain-clean/vendor/nim-metrics/metrics.nim:301
#6  0x000055555555770d in main (argc=<optimized out>, args=<optimized out>, env=<optimized out>)

@stefantalpalaru
Copy link
Contributor

Good hint from bt full:

#0  sszDecodeEntireInput__tboh6cXwmkbCnrkwUzKiYg (
    input=0x7ffff6d96040 " \243O%?\003y \371_\n\a\334\021\237m\354_\306ݩ\374t\262\240\357\020\213\002>O\353\351\025PÐ\334g\210\214\262y7\020\265W\311[\214\262cO\273\003\366\327p\240\364(J\244\263\363\v\251\316\bT\021w\205Κ\"]\302\375\003\254\242\375q^;mc!`\312\067F\036Ԋ'\321\366\250\255k\312:\276\370V\035\034\347\037\260V\354\005.\265{\353\266H\237\bf\375\273\227\317", <incomplete sequence \327>, inputLen_0=524288)
    at /mnt/sda3/storage/CODE/status/nim-beacon-chain-clean/tests/official/fixtures_utils.nim:60
        result = <error reading variable result (value of type `tyObject_HistoricalBatch__JrmWpYnHATrx9cZpeGzM7xQ' requires 524288 bytes, which is more than max-value-size)>
        stream = {s = 0x0}
        reader = {stream = 0x0}
        T1_ = <optimized out>

beacon_chain/spec/datatypes.nim:407

  HistoricalBatch* = object
    block_roots* : array[SLOTS_PER_HISTORICAL_ROOT, Eth2Digest]
    state_roots* : array[SLOTS_PER_HISTORICAL_ROOT, Eth2Digest]

@zah zah changed the base branch from devel to unstable February 8, 2021 12:47
@stefantalpalaru
Copy link
Contributor

Mostly implemented in other PRs, except the runtime stack limiting. We might want to pick that one too, in the future.

@arnetheduck arnetheduck force-pushed the unstable branch 2 times, most recently from 657f9d5 to a4667d1 Compare January 6, 2022 16:14
@arnetheduck
Copy link
Member

closing as obsolete - compile-time filtering is there and introducing runtime testing would require a fresh PR / approach anyway as most things in this PR have changed since

@arnetheduck arnetheduck deleted the smaller-stack branch February 16, 2022 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants