Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add runtime code layout to initcode #3584

Merged

Conversation

charles-cooper
Copy link
Member

@charles-cooper charles-cooper commented Sep 4, 2023

this commit adds the runtime code layout to the initcode payload (as a suffix), so that the runtime code can be analyzed without source code. this is particularly important for disassemblers, which need demarcations for where the data sections (added in #3496) start as distinct from the runtime code segment itself.

note the specific format for the CBOR payload was chosen to avoid changing the last 13 bytes of the signature. that is, the last 13 bytes still look like b"\xa1evyper\x83...", this is because, as the last item in a list, its encoding does not change compared to being the only dict in the payload.

this commit also changes the meaning of the two footer bytes: they now indicate the length of the entire footer (including the two bytes indicating the footer length). the sole purpose of this is to be more intuitive as the two footer bytes indicate offset-from-the-end where the CBOR-encoded metadata starts, rather than the length of the CBOR payload (without the two length bytes).

lastly, this commit renames the internal insert_vyper_signature= kwarg to insert_compiler_metadata= as the metadata includes more than just the vyper version now.

What I did

How I did it

How to verify it

Commit message

this commit adds the runtime code layout to the initcode payload (as a
suffix), so that the runtime code can be analyzed without source code.
this is particularly important for disassemblers, which need
demarcations for where the data section starts as distinct from the
runtime code segment itself.

the layout is:

CBOR-encoded list:
  runtime code length
  [<length of data section> for data section in runtime data sections]
  immutable section length
  {"vyper": (major, minor, patch)}
length of CBOR-encoded list + 2, encoded as two big-endian bytes.

note the specific format for the CBOR payload was chosen to avoid
changing the last 13 bytes of the signature compared to previous
versions of vyper. that is, the last 13 bytes still look like
b"\xa1evyper\x83...", this is because, as the last item in a list, its
encoding does not change compared to being the only dict in the payload.

this commit also changes the meaning of the two footer bytes: they now
indicate the length of the entire footer (including the two bytes
indicating the footer length). the sole purpose of this is to be more
intuitive as the two footer bytes indicate offset-from-the-end where the
CBOR-encoded metadata starts, rather than the length of the CBOR
payload (without the two length bytes).

lastly, this commit renames the internal `insert_vyper_signature=` kwarg
to `insert_compiler_metadata=` as the metadata includes more than just
the vyper version now.

Description for the changelog

Cute Animal Picture

Put a link to a cute animal picture inside the parenthesis-->

this commit adds the runtime code layout to the initcode payload (as a
suffix), so that the runtime code can be analyzed without source code.
this is particularly important for disassemblers, which need
demarcations for where the data section starts as distinct from the
runtime code segment itself.

the layout is:

CBOR-encoded list:
  runtime code length
  [<length of data section> for data section in runtime data sections]
  immutable section length
  {"vyper": (major, minor, patch)}
length of CBOR-encoded list + 2, encoded as two big-endian bytes.

note the specific format for the CBOR payload was chosen to avoid
changing the last 13 bytes of the signature. that is, the last 13 bytes
still look like b"\xa1evyper\x83...", this is because, as the last item
in a list, its encoding does not change compared to being the only
dict in the payload.

this commit also changes the meaning of the two footer bytes: they now
indicate the length of the entire footer (including the two bytes
indicating the footer length). the sole purpose of this is to be more
intuitive as the two footer bytes indicate offset-from-the-end where the
CBOR-encoded metadata starts, rather than the length of the CBOR
payload (without the two length bytes).

lastly, this commit renames the internal `insert_vyper_signature=` kwarg
to `insert_compiler_metadata=` as the metadata includes more than just
the vyper version now.
Copy link

@banteg banteg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usage example

offset = int.from_bytes(code[-2:], 'big')
signature = cbor2.loads(code[-offset:])

for vyper 0.3.10 the encoded values are:

  1. length of runtime code (code without immutables, data sections, and signature), ex. 4096
  2. list of lengths of data sections, ex. [64, 128]
  3. total length of immutables, ex. 384
  4. compiler version, ex. {'vyper': [0, 3, 10]}

so you could even do this:

names = ['runtime_size', 'data_sizes', 'immutable_size', 'compiler']
dict(zip(names, signature))
# {'runtime_size': 98, 'data_sizes': [6], 'immutable_size': 32, 'compiler': {'vyper': [0, 3, 10]}}

the number of data sections may differ when compiled with --optimize codesize.

format rationale

a justification of why this format was chosen instead of just adding items to the dict.

vyper 0.3.4 has added a cbor signature from which you can read the compiler version #2860

vyper 0.3.5 has added a suffix with the section length, following solidity #3009

the implementation was a bit flawed, since it hasn't always come at the very end and could be followed by immutables, rendering the length suffix useless.

everyone has resorted to just using a regex. we spent some time understanding cbor so we don't break this compatibility.

the way cbor encodes fixed-size lists suits us well. for example [1, 2, 3] is encoded as bytes 83 01 02 03, with 8 in 83 denoting a list and 3 denoting its size. after that all items come in their normal encodings with no terminator, so the regex for old vypers would just work.

we have also changed the cbor size suffix to offset, so you can simply read the suffix and then decode the metadata from code[-offset:].

note that you don't need to write code[-offset:-2] because cbor would know where to terminate because of how the format works.

@codecov-commenter
Copy link

codecov-commenter commented Sep 4, 2023

Codecov Report

Merging #3584 (1f79864) into master (2c21eab) will decrease coverage by 0.05%.
Report is 1 commits behind head on master.
The diff coverage is 100.00%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the GitHub App Integration for your organization. Read more.

@@            Coverage Diff             @@
##           master    #3584      +/-   ##
==========================================
- Coverage   89.05%   89.01%   -0.05%     
==========================================
  Files          85       85              
  Lines       11378    11390      +12     
  Branches     2586     2590       +4     
==========================================
+ Hits        10133    10139       +6     
- Misses        821      825       +4     
- Partials      424      426       +2     
Files Changed Coverage Δ
vyper/compiler/output.py 91.70% <ø> (ø)
vyper/compiler/phases.py 92.30% <100.00%> (ø)
vyper/ir/compile_ir.py 92.42% <100.00%> (+0.13%) ⬆️

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

apparently, byteorder is required in 3.10 but not in 3.11
@charles-cooper charles-cooper merged commit 96d2042 into vyperlang:master Sep 5, 2023
@charles-cooper charles-cooper deleted the feat/initcode-layout branch September 5, 2023 23:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants