-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggested modifications of EIP758 #963
Comments
I'd suggest making this an issue or a PR on @pirapira's repository. |
It looks like his fork of the EIPs repo hasn't been updated since May 2017, while the original EIP758 this is referencing was created in Nov 2017 and merged in here in Jan 2018. But if you're sure that's the best place, I'll go ahead and add it there. |
To me, it sounds like @reductionista and @tinybike need to talk. |
We've been talking--he's on board with the first one, but hasn't really had time to think about the rest yet. He was the one who suggested I open this Issue. |
I would suggest that the For example,
Both fields would be optional. |
@reductionista I'm on board with #2. Should we add from and to fields to the new subscription as well? I think it's good to have comparable functionality for http rpc and pubsub. The subID thing isn't a bad idea but I agree that we shouldn't modify sendtx and sendrawtx just for this. Let's hold off on that IMO. I like the "new completed transaction filter" proposal, and I think a hasReturnData flag sounds great. @adisney is this all reasonable on your end? Trying to avoid moving goalposts on you guys! |
#2 makes sense to me. I agree the filtering should be available in both rpc and pubsub. I think the hasReturnData flag makes sense and I have no opinion regarding the "new completed transaction filter" rename. That's a fairly straightforward update. |
@tinybike I'd lean toward saying yes, we should include from and to fields with eth_subscribe. It's not really needed there since it already filters by rpc client--but it makes sense to me to make them as parallel as possible. |
Oh, and I have one more proposal for modifying the EIP, which I only thought of after opening this issue: I think when removed=true (signaling the transaction was removed from canonical chain), there is no reason to include the actual return data. It requires re-executing extra transactions again after the transaction was removed, which is a waste of CPU, plus wasting extra bandwidth sending information that was already sent to the client previously (when the transaction was mined and removed=false). The only new information is that the transaction was removed, so it's really just the removed=true flag that matters (and the tx hash and subscription id). My opinion is that the rpc client should be responsible for remembering what that return data was, if it needs to (although most of the time I would expect, there would be no reason for it to care about the return data any more, as the transaction was not even included in the canonical chain). |
I think that makes sense, re: not re-executing the transaction on removed=true. |
I don't know that we can assume that the data has already been sent to the client in the case of removed=false. I could see situations in which a client could take advantage of the removed data without being aware of the data as it was originally executed. FWIW, I've already implemented that portion of the EIP for Parity, so I'm biased. To ease the burden on the CPU I'm caching the return data after the block is replayed the first time. The additional CPU load could be a problem not only for replaying txns when the block is removed but also if multiple clients are connected and subscribed to return_data. In general, it seems like there are members of the Ethereum community who are pushing to have node client designs take into account that not everyone will be running their own node. That means there will be more public nodes out there (e.g. Infura) and so node implementations should take into account that many clients may be connected to a single node. This is why Parity asked me to add caching to my EIP pull request. Incidentally, that caching addresses the CPU load concern in the case of a block being removed from the canonical chain. |
@adisney These are both fair points. Okay, let's keep the |
@adisney Good point about caching--yes, that would avoid the extra CPU time. I guess I'm also biased having not implemented any transaction replays, but as long as the node is behaving according to the EIP758 spec, I don't see any case where a removed=true could arrive with no previous removed=false. The only transactions it gets notified about are ones which it submitted after the subscription begins. If the client submits a transaction, and then it subscribes to return data, it should not receive any notifications about that transaction--if it did, that would be in violation of the EIP758 spec. Would you agree? As I see it, the removed=true is a way for the node to signal the rpc client saying "please forget about that data I just sent you!" If it never received any return data in the first place, then there would be no reason to send this signal. "The additional CPU load could be a problem not only for replaying txns when the block is removed but also if multiple clients are connected and subscribed to return_data." Personally, I don't see a good reason why a tx would ever need to be replayed--so this has not been a relevant issue for me so far with geth (which has never had to replay transactions for any reason, so far). I should probably look at your Parity code to see how you accomplish this, but from my investigations into it so far it looks to me like it would be a complex and costly operation. You can't guarantee that the return data will be accurate unless you replay all prior transactions in the same block, which is often hundreds of transactions. So from my perspective, the problem isn't multiple vs single clients, it's that you shouldn't be re-running hundreds of transactions just to get the return data from a single transaction, when it could have just been sent as soon as the transaction completed originally. |
I don't believe I agree. My thinking is that we should send the return data for all txns that were on blocks that were added and send return data with Regarding txn replays, all of your points regarding replaying txns are valid. Unfortunately, in the case of Parity, the location at which the subscriptions and HTTP RPC functionality lives does not have access to the txns as they execute. In fact, it doesn't have access to the txn return data at all unless the txns are replayed. Since a txn may depend on state that was changed by a txn in the same block, getting the return_data for a specific txn requires executing all of the txns that came before it in that block. Or, at the very least, restoring the state the txn saw when it began execution. For that reason, replaying the txns is unavoidable without significant modification to Parity and the CPU load is a given, which is why the caching is so important. |
@adisney I think in your first paragraph, you are disagreeing with something that was stated twice in EIP-785 as it currently stands. Both times the word "after" was emphasized in italics, which to me indicates that this was considered an important requirement. First time:
and again later:
Given that the EIP as it stands is not fully implementable, we have to make some changes. So I'm not saying there's anything wrong with disagreeing with the current EIP. Just pointing out that what you're suggesting is a major change, and would change the original vision (which is what my implementation is based around). "It seems like that would simplify things as you don't need to track txns sent to the clients." Tracking transactions submitted by the rpc client is one of the central features of the way I designed mine, so I guess I'm biased in wanting to keep this; but it does seem useful to me. I saw it as a necessary requirement of the EIP, and already invested the time at the very beginning in designing it to work that way, but if @tinybike thinks it's not necessary we can change that I guess. "Unfortunately, in the case of Parity, the location at which the subscriptions and HTTP RPC functionality lives does not have access to the txns as they execute." The same is true in geth, rpc module is somewhat isolated from the blockchain itself. However, in geth there is an easy way to pass messages back and forth using Go channels, which is what I use to pass return data back to the rpc module. I assume there is some way for two different modules to communicate in Rust, but don't know if the architecture of Parity might make that more difficult. If so, I guess that explains why we've come at this project from such different angles. |
Fair point! Fortunately, for txns in newly added blocks, my implementation follows the spec as written. However, my implementation does send return_data for blocks being removed that may have initially been sealed prior to the subscription. If it's determined that the EIP should specify that return data be sent on removed blocks and it's important that the removed txn data not be sent for blocks sealed prior to the subscription then I will have to make some changes. If the EIP is updated so that return data need not be sent when a block is removed, then I won't have to change as much. |
I've revised the EIP with all of the changes we've come to agreement about here (from/to filtering, hasReturnData, and renaming ReturnData to CompletedTransaction and NewReturnDataFilter to NewCompletedTransactionFilter). I didn't make the removed=true change, as it seems we are still in disagreement about that. I'm still in favor of it, but @tinybike has yet to respond to my latest arguments for it. |
Copying over here an important point I made in recent Discord discussions: Here's my philosophy about what these events should mean... feel free to disagree with any of this. Purpose of this subscription or filter is for the client to get notified of new completed transaction events. Purpose of the removed=true version of the event is to say "oh wait, nevermind about that last one I just notified you about... please disregard!". So if the client does receive a nevermind message without the original, it's outside of the intended behavior, and should just ignore it anyway. But I do think the "never send a removed=true without first sending a removed=false" rule should be enforced as much as possible on the server side also. It's already enforced in my implementation for eth_subscribe, and I don't think it would be difficult to ensure it's also enforced for polling, if we care about that. Original EIP does stipulate this is enforced, but we can decide to relax that if we want. This was in response to the argument that we should include return data for removed=true in case the client never received a removed=false. |
Update regarding the last remaining issue: @tinybike has convinced me now that it's better to include ReturnData with removed=true events, in order to be consistent with existing log filter/notification behavior. While I think it probably would be better if they both didn't include the Data field, I withdraw my suggestion to change it for EIP758--let's just keep it as is. I think everything is settled now as far as the specs! But I'll leave this Issue open until my latest revision of the EIP gets merged, in case review of it brings up any other further concerns. |
All changes have been merged now--closing issue. |
Having made it most of the way through in implementing EIP758, I've run into a few places where I'd recommend some slight modifications of the original proposal. 3 of them have question marks next to them because I'm unsure whether they are the right solution but it's an idea.
eth_newReturnDataFilter
The first is to maintain a consistent JSON-RPC interface, by using
eth_newReturnDataFilter
instead ofeth_newFilter
withparams=["ReturnData"]
.Presently eth_newFilter creates a log filter,
eth_newBlockFilter
creates a block filter, andeth_newPendingTransactionFilter
creates a new pending transaction filter. Ideally,eth_newFilter
would be namedeth_newLogFilter
but I assume we are stuck there due to backwards compatibility requirements with older versions where logs were the only filter type you could create. Since there are no such concerns with the new return data filter type, it seems pretty straightforward that it should be namedeth_newReturnDataFilter
(or perhapseth_newTransactionFilter
, see below)Add
from
(andto
?) param toeth_newReturnDataFilter
?The biggest issue I see with the present EIP758 is that it assumes the node can tell which client sent which transactions over http (so that only the return data from those transactions would be returned upon calling
eth_getFilterChanges
). However, this is not the case--at least, not unless other unspecified modifications to the JSON-RPC interface are added.One solution to this problem is to add a
from
param toeth_newReturnDataFilter
that specifies a list of wallet addresses the client controls and might send from. Then the node would know to associate any transactions originating from those addresses with this subscription id. It would save return data only from transactions with a from address matching one in the list, and store them for retrieval witheth_getFilterChanges
. If the client really wants to get return data from all transactions that gets added to the blockchain, it could simply leave off the from param or set it to 0. Or, we could decide not to allow that as it might have greater potential for mis-use and make it easier to launch a DoS attack aimed at wasting memory resources on the node.Instead add
subID
param toeth_SendTransaction
ð_SendRawTransaction
?This could either be an alternative to the "from" param or in addition to it. My instinct is that we should avoid modifying these two JSON-RPC commands simply to get a new command to work properly. But if it's important to have
eth_newReturnDataFilter
function likeeth_subscribe
, where the set of transactions being filtered for is determined by which client sent them rather than the from address, I think this may be the only way.Add
noEmpty
param toeth_newReturnDataFilter
I propose we add a
noEmpty
param (suggestions for other names welcome) toeth_newReturnDataFilter
. This would be very easy to add to the current implementation, and provide more general functionality. If noEmpty is set, the node would notify the client only about transactions where the return data is non-empty. Presently, it returns return data for any transaction sent from the client, even if there is no data returned from that transaction (in which case the data field is just []). It's easy to imagine some cases where a client would want to know when any of the transactions it submits complete, and other cases where it would only care about ones that have non-empty return data.NewTransactionFilter
orNewCompletedTransactionFilter
?If we do decide to add the
noEmpty
flag, then it seems logical to explore taking this one step further. Since the functionality is then more general, we could renameeth_newReturnDataFilter
toeth_NewTransactionFilter
. This would reflect the more general use case where a client wants to get notified of any transaction it submits as soon as it completes execution. The internal naming of the kind of data structure returned by the node would also make more sense if they were changed too. For example, in geth instead of aReturnData struct
, which includes a tx hash, a flag indicating whether the transaction was removed due to a chain reorg, and any return data associated with the completed transaction, it could simply be namedCompletedTransaction struct
. (I assume the same could be said for parity.) This puts subscription to pending transactions (NewPendingTransactionFilter
) and completed transactions (NewTransactionFilter
or perhapsNewCompletedTransactionFilter
) on the same footing. You can make a filter for either, and subscribe to either, and the code is already basically written to do this.Also, if we opt for this idea, then it would probably make sense to call the
noEmpty
flag something likehasReturnData
instead.The text was updated successfully, but these errors were encountered: