-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propose RETURNDATACOPY and RETURNDATASIZE. #211
Conversation
👍 |
|
The peak memory consumption is no higher, though - all that memory is allocated under the current system during the second call's lifetime.
Correct. Effectively, this just keeps around the memory (or part thereof) of subcalls for longer.
Good point. |
👍 |
this doubles one of memory requirements and copying for returning data.
gas pricing (an implementations) would either have to have heuristics for both policies, or would be rather inefficient in one or the other situation. either way, gas pricing will have to change. |
Space-complexity is always much easier to handle than time-complexity: We can easily compute an upper bound on the required memory given a block gas limit, while it is not that easy to come up with a max import time for a block. Because of that, I would opt to gauge the gas costs assuming the memory of the callee is kept alive (in practices, this of course translates to a recommended memory size for a node given the current block gas limit). Anyway, discarding the callee's memory might be a good thing if we are low on memory, but in general, the gas costs do not pay for "memory * time" but only for memory, so it should not make a difference asymptotically. |
Note: to preserve existing peak-memory characteristics, it might be reasonable to define any memory-resizing operation as clearing the |
What is the semantic of both instructions when the return buffer was never assigned or has been cleared already? |
Never assigned should clearly be empty, and I guess it should be the same for the other case. @gavofyork could you explain the reasoning behind clearing a bit, please? |
EIPS/returndatacopy.md
Outdated
|
||
`RETURNDATASIZE`: `0xd` | ||
|
||
Pushes the size of the return data (or the failure return data, see EIP [206](https://github.com/ethereum/EIPs/pull/206)) of the previous call onto the stack. If there was no previous call, pushes zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does "previous call" mean "previous call in the current transaction" or "previous call in the current message call/contract creation"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous call made from the current call frame, i.e. the EVM execution that shares the same memory with the current executing opcode - not sure if there is a proper name for that somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't find any. Sometimes the Yellow Paper says "this execution" or "message-call or contract-creation". Maybe "in the current machine state" is good enough.
Does this apply only to And which cases would 'clear' the returndata-menory? The same ones that it applies to ? How about if I have my data in my |
To all of them I would assume.
My understanding was each subsequent opcode which writes to it resets it.
I assume there is a "return data buffer" for each instance, e.g. the caller of |
Any call-like opcode resets the buffer, even failed calls reset (even due to not enough funds or because the callee went out of gas). To simplify the implementation, I would say that also create resets the buffer. So at any time, there is at most one non-empty return data buffer across all stack frames. To summarize: Any opcode that attempts to create a new call stack frame resets the buffer of the current stack frame right before it executes, even if that opcode fails. |
I like that, and I think it maybe should be clarified in the EIP proposal. |
EIPS/returndatacopy.md
Outdated
|
||
This opcode has similar semantics to `CALLDATACOPY`, but instead of copying data from the call data, it copies data from the return data of the previous call. If the return data is accessed beyond its length, it is considered to be filled with zeros. If there was no previous call, copies zeros. | ||
Gas costs: `3 + 3 * ceil(amount / 32)` (same as `CALLDATACOPY`) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need something like this:
In a machine state, the return data of the previous call is maintained as follows. When a new machine state is launched, the return data of the previous call is defined to be the empty byte sequence. When the program counter reaches CALL
, CREATE
, CALLCODE
, DELEGATECALL
or STATICCALL
, the return data of the previous call is reset to the empty byte sequence. When this instruction gives return data, the resultant data becomes the the return data of the previous call.
Especially, it's currently impossible to guess CREATE
counts as a previous call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens in the following scenario:
- Call
foo()
- Call
bar()
-> returns42
RETURNDATA
is now42
- Error (e.g.
oog
orinvalid jump
)
- Call
- What does
RETURNDATA
give now? Was it cleared when going up a level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@holiman I'm not sure I fully understand. In your example, the call to the foo
contract signals a failure, correct? The RETURNDATA
is always cleared when going up a level unless the call frame returns data using return
or revert
. In that case, it is set to that data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's cleared at Call foo()
and stays empty, regardless of what happens in other call stacks. It is not cleared when going up. The RETURNDATA
in different machine states do not interfere with each other.
In this scenario, at least two machine states are involved. The machine state that calls foo()
and the machine state that calls bar()
. The machine state that calls foo()
has RETURNDATA
reset at Call foo()
.
In the Yellow Paper (9.4.1. "The Machine State"), a machine state is defined to be a tuple containing the program counter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chriseth, have you changed your answer to my old question:
Does "previous call" mean "previous call in the current transaction" or "previous call in the current message call/contract creation"?
Previous call made from the current call frame, i.e. the EVM execution that shares the same memory with the current executing opcode - not sure if there is a proper name for that somewhere.
Now your description reads as if the RETURNDATA
buffer belongs to the transaction, not to the machine state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pirapira it is kind of a different viewpoint on the same thing. As mentioned in another comment, over all call stack frames, only one return data buffer has nonzero size at any point in time. Because of that, you can also think of a single return data buffer for the whole transaction. But I think that viewpoint (one buffer for the whole transaction) might just be useful for implementations. The specification is probably easier to understand when talking about one buffer per call stack frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of that, you can also think of a single return data buffer for the whole transaction
It was in that mode of thinking that my question about the clearing above came about. Ok!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm interested because I want to know if I should change the formulation in YP ethereum/yellowpaper#264 (currently a new buffer is added to the machine state; adding a transaction-wide buffer is also doable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chriseth OK. I'll try to follow your choice in the EIP text.
This change is according to ethereum/EIPs#211 (comment)
if A calls B which allocates 1MB (and returns some portion of it), then, afterwards, A, in some independent memory-resizing operation, extends memory to be 1MB before calling RETURNDATACOPY over some portion of the existing 1MB, then the peak allocation is 2MB; prior to this PR it is only 1MB. Peak usage is only preserved if there are no memory resizing operations prior to RETURNDATACOPY. (it might be reasonable to define any memory-resizing operation as clearing the RETURNDATA return buffer) |
Summary of discussing the above issue with @chriseth: We’ve considered a few possible cases for an implementation that keeps around all of the memory allocated by the callee available for the caller.
There should be a clarification to @gavofyork’s proposal that if memory expansion happens to be caused by |
After some more thought, it looks like the peak memory consumption is actually not a problem. Some extract of the following text should probably be added to the EIP itself at some point. Let me try to formalize this a bit so we know what we are talking about. The promise of the evm is as follows: For any "reasonable" implementation Of course all of this has to be taken with a grain of salt. In particular, "reasonable" codifies the tradeoff between implementation complexity and runtime performance. As far as protocol changes are concerned, you can come up with a function In the specific example above, we consider a certain contract and notice that its memory consumption changes with the protocol change. But what we have to consider instead is the function that maps gas to max memory consumption. So we have situation X: A uses 0 memory, then calls B, B allocates 10kb and returns 1kb. A allocates 8kb after the contract returns and then accesses return data. Now consider situation Y: A first allocates 9kb memory, then calls B, B allocates 10kb and returns. A accesses return data. This should consume roughly the same gas as X but requires 19kb of memory before and after the change. So there is a contract that uses the same gas before and after the change but its max allocation is the same before and after the change and it is equal to the first example's max allocation after the change. Because of that, the first example is not a counter example. I think this can be generalized: For any contract execution Actual memory consumption due to fragmentation and contiguous memory might still be an issue, but I'm not sure if it makes any difference here. |
Although I agree with @chriseth's reasoning that the max memory per gas remains the same either way, I think that clearing like this as another advantage: it would allow EVM implementations to treat the memory as a single contiguous block of virtual memory for the entire transaction. Each time a contract calls another, the new contract's memory starts off at the end of the previous contract's memory, just like how stack allocation works in languages like C. This is currently possible, but the EIP as originally proposed would require each contract to have its own memory buffer(s) instead. With @gavofyork 's proposed variation, this would again be a possible optimisation. |
I would also like to make the recommendation that a There's no sensible reason to copy past the end of return data, and we should treat it as the error it almost certainly is, rather than silently filling with zeroes. |
In the @arkpar's example (2), after the call to |
I moved the file. I still need to read it again to approve.
aa083d0
to
c22ad19
Compare
EIPS/eip-211.md
Outdated
|
||
## Motivation | ||
|
||
In some situations, it is vital for a function to be able to return data whose length cannot be anticipated before the call. In principle, this can be solved without alterations to the EVM, for example by splitting the call into two calls where the first is used to compute only the size. All of these mechanisms, though, are very expensive in at least some situations. A very useful example of such a worst-case situation is a generic forwarding contract: A contract that takes call data, potentially makes some checks and then forwards it as is to another contract. The return data should of course be transferred in a similar way to the original caller. Since the contract is generic and does not know about the contract it calls, there is no way to determine the size of the output without adapting the called contract accordingly or trying a logarithmic number of calls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think A
after the colon should be small.
EIPS/eip-211.md
Outdated
|
||
Note that the EVM implementation needs to keep the return data until the next call or the return from the current call. Since this resource was already paid for as part of the memory of the callee, it should not be a problem. Implementations may either choose to keep the full memory of the callee alive until the next call or copy only the return data to a special memory area. | ||
|
||
Keeping the memory of the callee until the next call-like opcode does not increase the peak memory usage in the following sense: Any memory allocation in the caller's frame that happens after the return from the call can be moved before the call without a change in gas costs, but will add this allocation to the peak allocation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps ; any
instead of : Any
EIPS/eip-211.md
Outdated
Author: Christian Reitwiessner <[email protected]> | ||
Type: Standard Track | ||
Category Core | ||
Status: Draft |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Status should be Final
by now.
EIPS/eip-211.md
Outdated
|
||
If `block.number >= BYZANTIUM_FORK_BLKNUM`, add two new opcodes and amend the semantics of any opcode that creates a new call frame (like `CALL`, `CREATE`, `DELEGATECALL`, ...) called call-like opcodes in the following. It is assumed that the EVM (to be more specific: an EVM call frame) has a new internal buffer of variable size, called the return data buffer. This buffer is created empty for each new call frame. Upon executing any call-like opcode, the buffer is cleared (its size is set to zero). After executing a call-like opcode, the complete return data (or failure data, see [EIP-140](./eip-140.md)) of the call is stored in the return data buffer (of the caller), and its size changed accordingly. As an exception, `CREATE` and `CREATE2` are considered to return the empty buffer in the success case and the failure data in the failure case. If the call-like opcode is executed but does not really instantiate a call frame (for example due to insufficient funds for a value transfer or if the called contract does not exist), the return data buffer is empty. | ||
|
||
As an optimization, it is possible to share the return data buffer across call frames because only one will be non-empty at any time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: this sounds like one is always non-empty. I would change it to at most one will be non-empty at any time
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
For reference, eWASM was proposing the same with ewasm/design#12 |
This change is according to ethereum/EIPs#211 (comment)
This change is according to ethereum/EIPs#211 (comment)
This change is according to ethereum/EIPs#211 (comment)
Copy of summary:
A mechanism to allow returning arbitrary-length data inside the EVM has been requested for quite a while now. Existing proposals always had very intricate problems associated with charging gas. This proposal solves the same problem while at the same time, it has a very simple gas charging mechanism and reqires minimal changes to the call opcodes. Its workings are very similar to the way calldata is handled already: After a call, return data is kept inside a virtual buffer from which the caller can copy it (or parts thereof) into memory. At the next call, the buffer is overwritten. This mechanism is 100% backwards compatible.