Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use tx object hash as tx identifier #9269

Closed
Tracked by #9261
spalladino opened this issue Oct 17, 2024 · 8 comments · Fixed by #11155
Closed
Tracked by #9261

Use tx object hash as tx identifier #9269

spalladino opened this issue Oct 17, 2024 · 8 comments · Fixed by #11155
Assignees
Labels
A-security Area: Relates to security. Something is insecure. C-protocol-circuits Component: Protocol circuits (kernel & rollup) team-turing Leila's team
Milestone

Comments

@spalladino
Copy link
Collaborator

Today we use the hash of the tx request (ie the hash of the first private function call in the tx, including the entrypoint contract address) as an identifier, which gets emitted as the first nullifier of the tx by private kernel init. However, since private oracles introduce indeterminism, we could have two txs with the same tx request that yield different private-land effects (note hashes, nullifiers, logs, etc). In other words, we end up with two different txs with the same identifier. This messes up wallets who want to check the status of a tx using their hash as identifier, or the p2p layer who identifies txs by hash, or the prover node who requests tx client ivc proofs using the hash.

We need a better identifier. We propose using the hash of the tx object (excluding the proof) as the identifier of a tx. This means a user cannot pre-compute the identifier of their tx just based on the tx request, and they need to wait until simulation is completed. This doesn't seem to be an issue for any current use case.

We need to change the tx object used by the nodes so that the hash is computed based on all its fields, and is not just the first nullifier. We also need to change the tx effect object (ie result of public function execution of a tx object) so that it has a new "tx hash" field, since the original tx object gets lost. And we need to post the tx hash as a new field to L1 as well, which means changing the L1 contract and the archiver.

We also need to address replay protection. Given a tx with no nullifiers, we need to prevent it from being included multiple times:

  • One option is to have the base rollup circuit inject the tx hash as an additional nullifier in the tx, similar to how it adds an extra public data read/write to handle fees, and add a validator for duplicate hashes.
  • Another option is to force all txs to have at least one nullifier. This can be enforced by the private kernel tail. The account contract would be responsible for emitting this nullifier, computed over the nonce for instance (like we handle cancellations today).
  • Another option is to have the private kernel tail inject a nullifier if none has been injected yet. This can be computed over the tx request as the kernel init does today.

The first two options require an extra field (compared to the status quo) to be always posted to L1. The third one doesn't, and it only emits an extra field if the account contract wants to do fancy nonce management (eg cancellations, incremental nonces, etc), which is what we have today.

@just-mitch just-mitch added this to the TestNet milestone Oct 21, 2024
@just-mitch just-mitch added C-protocol-circuits Component: Protocol circuits (kernel & rollup) A-security Area: Relates to security. Something is insecure. labels Oct 21, 2024
@LeilaWang LeilaWang added the team-turing Leila's team label Oct 21, 2024
@spalladino
Copy link
Collaborator Author

Note that we're currently using the tx hash to guarantee uniqueness of the note hashes emitted, since every note hash is mixed with the tx hash and its index within the tx. With this change, the tx hash won't be known until the note hashes are emitted, since the note hashes are used for computing the tx hash itself. We may need to move the siloing of note hashes from the kernel to the private kernel tail or the base rollup.

@sirasistant
Copy link
Collaborator

sirasistant commented Dec 4, 2024

Discussed in a huddle with @iAmMichaelConnor and @spalladino
We agreed in the following points:

  • Currently we use TX request hash both as a TX id and a protocol mandated non revertible nullifier
  • Note hash uniqueness siloing requires a non revertible nullifier to exist when the phase change from non revertible to revertible happens
    • This is because if a note is created in non revertible and nullfied in revertible, its nullifier must be computed using the exact note hash that will reach the tree
  • The TX request hash is valid for note hash uniqueness siloing (since any nullifier known at the phase change would do the role just fine) but is not valid for TX identification, since the same TX request hash can lead to different TX objects that reach the p2p network, altough only one of those could be included in the chain.
  • The protocol also needs at least one non revertible nullifier to be in every TX for replay protection
  • Whatever piece of data used as TX id needs to be included in DA for tx identification when syncing, otherwise we'd need two ids (an id for before the tx is included and one for after it has been included) which is not great.

We discussed a solution to this issue: We can remove the role of TX identification from the TX request hash. Instead, TXs would be identified by the hash of the TxObject. The TX id needs then to be published on DA. It doesn't need to be included in the nullifier tree, since it already is hashing at least one nullifier that will reach the tree. We can keep the TX request hash as a protocol mandated non revertible nullifier in every TX, used for replay protection and note hash uniqueness siloing.

We agreed also on an optimization to the protocol: we don't need to always inject the tx request hash as a non revertible nullifier, if the applications emit a non revertible nullifier, that works just fine for note hash uniqueness siloing and replay protection. We could make that injection of the tx request hash conditional.

@iAmMichaelConnor
Copy link
Contributor

One small spanner: Similarly to how we've realised a TxReceipt isn't necessarily unique, a TxObject isn't necessarily unique. We could have a user (possibly maliciously anarchistic, possibly not) submit two identical TxObjects, but with different accompanying "proof" objects, where the proof objects differ in the randomness that is used to give the "zk" property. Would these duplicate TxObjects (with different proofs) break the p2p network in a similar way to how usage of hash(TxReceipt) as a txId was discovered to break the p2p network?
We'd need to hash the entire proof object in with the TxObject to obtain a truly unique TxId. A proof is a big thing. Many tens of KBs. Is it acceptable to be hashing something so large? I guess it's already being hashed as part of proof verification, but we'd need the sequencer to hash it again.
Maybe we don't care about duplicate TxObjects with differing proofs, as long as both proofs are valid?

Here I've understood a "TxObject" to not include the "proof"; correct me if I'm wrong, and then you can ignore this entire message.

@iAmMichaelConnor
Copy link
Contributor

iAmMichaelConnor commented Dec 5, 2024

Is this a fair summary: (Edit after writing: this is not a summary, but a meandering stream of thoughts)

TxObject doesn't actually appear in the codebase; it's PrivateKernelTailCircuitPublicInputs:. I've expanded it here (yes, manually... it was a useful exercise for me to catch up with what you've been doing :) ).

PrivateKernelTailCircuitPublicInputs {
    constants: TxConstantData {
        historicalHeader: Header {
            lastArchive: AppendOnlyTreeSnapshot {
                root: Fr,
                nextAvailableLeafIndex: UInt32,
            },
            contentCommitment: ContentCommitment { // terrible name
                numTxs: Fr,
                txsEffectsHash: Buffer,
                inHash: Buffer,
                outHash: Buffer,
            },
            state: StateReference { // terrible name
                l1ToL2MessageTree: AppendOnlyTreeSnapshot,
                partial: PartialStateReference { // terrible name
                    noteHashTree: AppendOnlyTreeSnapshot,
                    nullifierTree: AppendOnlyTreeSnapshot,
                    publicDataTree: AppendOnlyTreeSnapshot,
                },
            }
            globalVariables: GlobalVariables {
                chainId: Fr,
                version: Fr,
                blockNumber: Fr,
                slotNumber: Fr,
                timestamp: Fr,
                coinbase: EthAddress,
                feeRecipient: AztecAddress,
                gasFees: GasFees { // Should be called GasPrices or FeesPerGas
                    feePerDaGas: Fr,
                    feePerL2Gas: Fr,
                },
            },
            totalFees: Fr,
            totalManaUsed: Fr,
        }
        txContext: TxContext: {
            chainId: Fr, // Duplicate? Already in Header.GlobalVariables
            version: Fr, // Duplicate? Already in Header.GlobalVariables
            gasSettings: GasSettings {
                gasLimits: Gas {
                    daGas: UInt32,
                    l2Gas: UInt32,
                },
                teardownGasLimits: Gas {
                    daGas: UInt32,
                    l2Gas: UInt32,
                },
                maxFeesPerGas: GasFees { // Should be called GasPrices or FeesPerGas
                    feePerDaGas: Fr,
                    feePerL2Gas: Fr,
                },
            },
        },
        vkTreeRoot: Fr,
        protocolContractTreeRoot: Fr,
    },
    rollupValidationRequests: RollupValidationRequests
    gasUsed: Gas {
        daGas: UInt32,
        l2Gas: UInt32,
    },
    feePayer: AztecAddress,
    forPublic?: PartialPrivateTailPublicInputsForPublic {
        nonRevertibleAccumulatedData: PrivateToPublicAccumulatedData {
            noteHashes: Tuple<Fr, typeof MAX_NOTE_HASHES_PER_TX>,
            nullifiers: Tuple<Fr, typeof MAX_NULLIFIERS_PER_TX>,
            l2ToL1Msgs: Tuple<ScopedL2ToL1Message, typeof MAX_L2_TO_L1_MSGS_PER_TX>,
            privateLogs: Tuple<PrivateLog, typeof MAX_PRIVATE_LOGS_PER_TX>,
            contractClassLogsHashes: Tuple<ScopedLogHash, typeof MAX_CONTRACT_CLASS_LOGS_PER_TX>,
            publicCallRequests: Tuple<PublicCallRequest, typeof MAX_ENQUEUED_CALLS_PER_TX>,
        },
        revertibleAccumulatedData: PrivateToPublicAccumulatedData {
            noteHashes: Tuple<Fr, typeof MAX_NOTE_HASHES_PER_TX>,
            nullifiers: Tuple<Fr, typeof MAX_NULLIFIERS_PER_TX>,
            l2ToL1Msgs: Tuple<ScopedL2ToL1Message, typeof MAX_L2_TO_L1_MSGS_PER_TX>,
            privateLogs: Tuple<PrivateLog, typeof MAX_PRIVATE_LOGS_PER_TX>,
            contractClassLogsHashes: Tuple<ScopedLogHash, typeof MAX_CONTRACT_CLASS_LOGS_PER_TX>,
            publicCallRequests: Tuple<PublicCallRequest, typeof MAX_ENQUEUED_CALLS_PER_TX>,
        },
        publicTeardownCallRequest: PublicCallRequest {
            msgSender: AztecAddress,
            contractAddress: AztecAddress,
            functionSelector: FunctionSelector,
            isStaticCall: boolean,
            argsHash: Fr,
        },
    },
    forRollup?: PartialPrivateTailPublicInputsForRollup {
        end: CombinedAccumulatedData {
            noteHashes: Tuple<Fr, typeof MAX_NOTE_HASHES_PER_TX>,
            nullifiers: Tuple<Fr, typeof MAX_NULLIFIERS_PER_TX>,
            l2ToL1Msgs: Tuple<ScopedL2ToL1Message, typeof MAX_L2_TO_L1_MSGS_PER_TX>,
            privateLogs: Tuple<PrivateLog, typeof MAX_PRIVATE_LOGS_PER_TX>,
            unencryptedLogsHashes: Tuple<ScopedLogHash, typeof MAX_UNENCRYPTED_LOGS_PER_TX>,
            contractClassLogsHashes: Tuple<ScopedLogHash, typeof MAX_CONTRACT_CLASS_LOGS_PER_TX>,
            unencryptedLogPreimagesLength: Fr,
            contractClassLogPreimagesLength: Fr,
            publicDataWrites: Tuple<PublicDataWrite, typeof MAX_PUBLIC_DATA_UPDATE_REQUESTS_PER_TX>,
        },
    },
}
  • TxId = hash(TxObject)
  • The sequencer must be constrained to compute TxId correctly.
    • (This worries me, because it's a lot of data to provably hash, and a bit wasteful; especially since most of it is already being selectively-hashed in places in the rollup circuits already).
  • The sequencer must submit TxId to DA.
  • A private kernel circuit will check whether the non-revertible phase included a nullifier; if not, it will include a new one.
    • A hash of the TxRequest is an obvious choice for this new nullifier, BUT: the TxRequest is only available to the initial private kernel circuit. So, if the non-revertible phase spans multiple function calls, then the TxRequest will not be available to the inner private kernel that handles the transition from non-revertible to revertible. Perhaps we'd need to pass a hash of the TxRequest from kernel to kernel, just in case? That's a bit ugly, isn't it!

WAIT:

Notes that are created in the private, non-revertible phase need a nonce! I think this nonce needs to be known by the end of the non-revertible phase, so that revertible nullifiers can be computed from the "final" note_hash. (Non-rev nullifiers which nullify a non-rev note can be squashed). So: if the non-rev phase emitted a nullifier, we can use that in nonce computations for non-rev notes; else we can use the hash(TxRequest). This makes simulation fiddly, doesn't it! Within aztec.nr, if we want to nullify a non-rev note in the rev phase, we need to know that the note came from the non-rev phase, and we would need to know its nonce, meaning we would need to know the nullifier that was used to compute its nonce. The function then needs to convey to the kernel that the "final" note hash is correct.
Is there ever a possibility (an arrangement of execution frames) where a rev function could be processed by a kernel before a non-rev function?
And what about a function during which we transition from the non-rev to the rev phase?

What about tx nonces?

An account contract can still emit a nullifier which acts as a nonce, to enable things like cancellation.

@spalladino
Copy link
Collaborator Author

Maybe we don't care about duplicate TxObjects with differing proofs, as long as both proofs are valid?

Good catch! I checked how Ethereum does it, and indeed, they include the signature in the computation of the txHash to avoid this issue. I think it's safe to omit the proof from the hash computation, since the p2p layer validates that the proof is correct before forwarding the tx object. It does make me nervous that we can have two different things relying on the same identifier, but I don't see how it can be used to break things.

(This worries me, because it's a lot of data to provably hash, and a bit wasteful; especially since most of it is already being selectively-hashed in places in the rollup circuits already).

Can we leverage that "selective hashing" somehow? Eg if the rollup circuits are already hashing the logs into a "logsHash", maybe we can define the txHash to be computed from the logsHash, instead of hashing everything (logs included) as a plain buffer.

Is there ever a possibility (an arrangement of execution frames) where a rev function could be processed by a kernel before a non-rev function?

IIRC (I may be wrong here) there is no such thing as a revertible or non revertible function. What we have is a "revertible marker" that can be emitted halfway through a function. So it's perfectly possible that the following happens:

  • Entrypoint calls function foo
  • Entrypoint emits revertible marker
  • Entrypoint does stuff

The kernel will then process first entrypoint and then foo, where everything in foo is non-revertible, but some stuff in entrypoint is revertible.

@sirasistant
Copy link
Collaborator

I'm not too worried about hashing the Tail public inputs in public for the tx hash, since it should be something like 50k gates, which is not many for a sequencer

Regarding the conditional adding of the first nullifier:

  • During simulation of the TX the PXE computes nonces for non revertible notes the moment that we change the phase from non revertible to revertible. At that point we can know if we have a non revertible nullifier or not
  • During kernel execution it's more tricky, but I think it might be doable too. Init kernel can have a private input that is "should I add a nonrevertible nullifier?" that the pxe can choose to turn on or off. We just verify in tail that the first nullifier is nonempty and nonrevertible. The PXE is responsible to turn on that flag if no nonrevertible nullifiers were created

@sirasistant
Copy link
Collaborator

I'm going to do it in two phases, first the decoupling of the first nullifier from the tx hash, making the tx hash the hash of the tail public inputs. When that is done I'll implement the conditional adding of the first nullifier

sirasistant added a commit that referenced this issue Jan 8, 2025
As preparation for
#9269 remove
legacy stuff from the tail public inputs
sirasistant added a commit that referenced this issue Jan 10, 2025
Implements #9269
Separates the role of the first nullifier and the transaction hash. The
transaction hash is now the hash of the tail public inputs. The first
nullifier is still used for note uniqueness and replayability protection
@sirasistant
Copy link
Collaborator

Tail outputs hash as tx hash is merged. Currently working on optional protocol nullifier

AztecBot pushed a commit to AztecProtocol/barretenberg that referenced this issue Jan 11, 2025
Implements AztecProtocol/aztec-packages#9269
Separates the role of the first nullifier and the transaction hash. The
transaction hash is now the hash of the tail public inputs. The first
nullifier is still used for note uniqueness and replayability protection
@github-project-automation github-project-automation bot moved this from Todo to Done in A3 Jan 14, 2025
AztecBot pushed a commit to AztecProtocol/aztec-nr that referenced this issue Jan 15, 2025
Resolves AztecProtocol/aztec-packages#9269
Avoids inserting the tx request hash as nullifier (protocol nullifier)
when the tx already generates a nonrevertible nullifier. The simulator's
note cache detects this while simulating the private functions and
chooses a nonce generating nullifier appropiately.

Circuits flow:
- Private kernel init receives a claim on what will be the first
nullifier (or zero indicating that it needs to generate and append the
protocol nullifier)
- That claim gets passed through the databus
- Reset uses the claimed value to make unique note hashes
- Tail verifies that the first nullifier is the claimed one. Tail to
public additionally verifies that it's non revertible
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-security Area: Relates to security. Something is insecure. C-protocol-circuits Component: Protocol circuits (kernel & rollup) team-turing Leila's team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants