Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compensation Request Proposals not deleted as expected #3610

Open
julianknutsen opened this issue Nov 15, 2019 · 1 comment
Open

Compensation Request Proposals not deleted as expected #3610

julianknutsen opened this issue Nov 15, 2019 · 1 comment

Comments

@julianknutsen
Copy link
Contributor

In this issue, I will explain the source of the bug that resulted in removed Compensation Request proposals persisting through the Proposal phase and being broadcast during the Blind Vote phase. At the end, I will throw out a design idea, but we should have a design discussion on what the options are and the best path forward to fix this in a safe way.

Repro

  1. Start a regtest cluster with Alice, Bob, and a SeedNode.
  2. Alice publishes a compensation request through the UI
  3. Add a block to bitcoind so the tx is broadcast and proposal is now seen on Bob's UI
  4. Kill Bob instance of Bisq
  5. Alice removes compensation request (it is now removed on Alice UI)
  6. Restart Bob instance of Bisq (request is still available on Bob UI)
  7. Add blocks until phase switches to Blind Vote
  8. Compensation request now shows up on Alice's UI (now on both)

Root Cause

TempProposalPayload objects implement the ProtectedStoragePayload & PersistablePayload interfaces. This means that they will be stored to disk. But the current system design does not handle removes of persistable Entrys that occur when nodes are offline.

When Bob's Bisq instance is taken down, the TempProposal object is written to disk. When it is restarted it will read the on-disk instances and recreate the TempProposal that is consumed by the ProposalService.

Removes are never rebroadcast after the initial RemoveData message and the RequestData/GetData path does not handle removes, only adds. So, Bob will never learn that a RemoveData message was sent for the TempProposal that he recreated. At this point, it is only a local problem.

However, once blocks are added to move to the Blind Vote phase, maybePublishToAppendOnlyDataStore() is called on Bob which recognizes that the AppendOnlyDataStore does not have the proposal and it is broadcast to the entire network as a PersistableNetworkPayload.

So, it only takes 1 node not seeing the RemoveData that causes all nodes on the network to receive it as append-only data.

Risks & Mitigation

For now, this bug is handled through publishing the appropriate proposal tx id with the compensation request on the Github issue in the proposals repo. It is up to all voters to vote NO on any duplicates. This seems like an OK solution for the current cycle.

Designs moving forward

  1. Change TempProposals to be non-persistable
    This would require the network to stay healthy throughout the proposal phase until the Blind Vote phase where all non-persistent proposals will be added as append-only data.

Work would be required to bootstrap this and remove any saved TempProposals on nodes that currently exist.

  1. Add remove tracking to GetData/RequestData
    This is a larger design and probably worthy of its own design doc and discussion, but here is an idea off the top of my head:

Add fields to the GetData message handshake which includes hashes that have been removed:

  • A node sends the set of known hashes and the last seen sequence number for each hash to seednode. (knownHashes = set<pair<hash, latest_seqnr>>)
  • If the seed node recognizes that a known hash is actually removed (exists in seq nr map, but not hash map), it sends back the seq nr for the remove operation. (RemovedEntrys = set(pair<hash, latest_seqnr>)
  • Node post-processes the information in this set as a remove()

This requires a new GetData message type field and the interop between old and new nodes would require some thorough testing.

@chimp1984
Copy link
Contributor

Thanks for looking into that!
I was looking a while back into it and as far I remember I came to the same conclusion of the problem. As it is not trivial to solve that I postponed to try to fix that...

Regarding your suggestions:
Re 1:
I agree that the TempProposals should be not persisted (have not thought in depth about it and don't remember if there was a important reason why we persist it).
I think it would not lead to DAO consensus issues as the TempProposals is not relevant for DAO consensus and will be converted to the Proposal objects which will then be used for consensus. So old nodes should not have an issue if they do not recognize the new TempProposal version.
We have the policy that active DAO partizipants need to update if there are important changes, so we can ignore the problem that old nodes would not see any new TempProposal.

Maybe you can try that out and see if it causes any problems. Important will be to test also the mixed cases with an old and new client (I usually have then 2 IDEA instances open, one with old version, one with new version). There might be the problem that old node creating a TempProposal and later convert that to a Proposals could cause issues. Not sure if it is an critical issue beside that the node running the new version will not see the old TempProposal.

If we change the TempProposal we likely do not to support the old version (maintaining 2 classes and lists) as it is not part of consensus and resyncing the DAO should not have any effect on that.

We have a version flag for all DAO txs (in Version class) as well as the disableDaoBelowVersion field in Filter (A filter message can be sent out from a Christoph and all nodes use that as long they have not set the ignoreDevFlags option to true). I am a bit scared to use the version flags for the DAO txs as that was never much tested (or not tested at all). But using the Filter flag should be low risk. With that no one can use the DAO if not updated, so it is partial enforced update for those who want to use the DAO need. We can also use activation blocks to give more time so that the majority have already updated when we deploy it.

Maybe we should make a summary of all our features how we can deal with backward compatibility. Its quite complex and there are already several features available.

Re 2:
The requesting node would in that case trust the seed node that the remove was verified as it does not have the signature from the object owner. I am not sure if that is a problem as we do trust seeds anyway with adding objects and other things to a certain degree. Thought to extend the required trust for seeds does not feel so good. But maybe we can alter your idea that all nodes are keeping the RemoveDataMessages and if a requesting node has an older seqNr it send the RemoveDataMessage in the reply to that node so the node itself is doing the verification? The node only need to keep the latest RemoveDataMessage and we could use the TTL for purging old data. RemoveDataMessage are anyway not very frequent so it should not cause any resource/performance issues.

But also here the main challenge will be backward compatibility.
For the TempProposal issue I think it would not cause much issues as old nodes could still get the old removed TempProposal if another old node republished it but new nodes would get the RemoveDataMessage and would ignore such a republished TempProposal as the seqNr will be lower.
The GetDataRequest would have additional fields which does not break backward compatibility. We just need to deal with the valid case that such fields can be null if the peer is not updated. We cannot use the capability feature here as the GetDataRequest is the first message and the capability not known at that moment. Thought if it turns out that its absolutely needed we could work on a capability exchange protocol which is anyway needed at some point (exchanging capabilities would be then the very first step). Currently the capability feature is a bit unreliable as it depends on the context if the peers have exchanges it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants