Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FRAME core] Clawback PoV Weights For Dispatchables #209

Closed
ggwpez opened this issue Mar 1, 2023 · 36 comments
Closed

[FRAME core] Clawback PoV Weights For Dispatchables #209

ggwpez opened this issue Mar 1, 2023 · 36 comments
Assignees
Labels
D1-medium Can be fixed by a coder with good Rust knowledge but little knowledge of the codebase. I5-enhancement An additional feature request. T1-FRAME This PR/Issue is related to core FRAME, the framework.

Comments

@ggwpez
Copy link
Member

ggwpez commented Mar 1, 2023

The PoV benchmarking aims to estimate the worst-case scenario, which is unlikely to occur on average.
Too large over-estimations hurt block utilization. There is currently no way to retroactively measure the actual used proof size.
Having such a retroactive measurement in place would enable a multi iteration approach; alternating between dispatch and assessment phase to reach maximum block utilization.

It would therefore be nice to have a way to measure the weight of a batch of dispatchables after their execution. For example in:

  • Scheduler Task dispatch
  • XCM dispatch
  • Transaction dispatch
  • MessageQueue processing

Basically in every situation where we need to fit a batch of Dispatchables into a weight limit, and assume that the pre- and post-dispatch PoV weight differ greatly.

@ggwpez ggwpez changed the title Retrofitted PoV Weights For Dispatchables Retrofitting PoV Weights For Dispatchables Mar 1, 2023
@ggwpez ggwpez changed the title Retrofitting PoV Weights For Dispatchables Clawback PoV Weights For Dispatchables Apr 12, 2023
@ggwpez ggwpez changed the title Clawback PoV Weights For Dispatchables [FRAME core] Clawback PoV Weights For Dispatchables Apr 27, 2023
@kianenigma
Copy link
Contributor

@paritytech/sdk-node would be good to get some input here how the proof recording is currently working in the client side, and what else can be done to assist this.

For example, how expensive is it to know the accurate proof as we are building the block? Can the client measure and report back the actual post-dispatch (proof_size) weight?

@xlc
Copy link
Contributor

xlc commented May 17, 2023

we can’t realistically enable proof size benchmarks as current without big regression on the block size. this is a major blocking issue.

@bkchr
Copy link
Member

bkchr commented May 17, 2023

For example, how expensive is it to know the accurate proof as we are building the block? Can the client measure and report back the actual post-dispatch (proof_size) weight?

Relative "cheap": https://github.com/paritytech/substrate/blob/5b253f8757de20cd95f274eb35531a5b0a683987/primitives/trie/src/recorder.rs#L155-L157

However, we currently don't take the compressed size into account.

@ggwpez
Copy link
Member Author

ggwpez commented May 17, 2023

We could start with the compacted proof, that should already bring advantages in cases that keys are accessed repetitively, and i think is quite cheap to compute.
Otherwise we need some stream compression? I remember reading an issue/discussion about it.

@bkchr
Copy link
Member

bkchr commented May 17, 2023

I think the better solution and much easier is to come up with some kind of factor between the storage proof and the compressed storage proof. @cheme was always arguing that the compression isn't bringing that much as we have binary data that can not be compressed that much anymore. So, maybe we should collect all the factors of storage proof vs compressed in the benchmarking and then taking the worst factor to calculate the compressed storage proof size based on the storage proof size returned by the host function.

@xlc
Copy link
Contributor

xlc commented May 17, 2023

Some of the data can be compressed well such as Balance and some don't (AccountId) so the compress ratio could vary by the usage.

But I think the main contributor of big size is difference between MaxEncodedLen and actual average item size

@bkchr
Copy link
Member

bkchr commented May 17, 2023

But I think the main contributor of big size is difference between MaxEncodedLen and actual average item size

This is not really relevant to what I said?

Some of the data can be compressed well such as Balance and some don't (AccountId) so the compress ratio could vary by the usage.

Yeah for sure, but when you access some balance, there are probably more trie nodes that contribute to the storage proof size than the actual balance. Even if we only allow to read 5MiB of "uncompressed data" it should be good enough, assuming that the compression ratio is somewhere between 1-2 in average.

@cheme
Copy link
Contributor

cheme commented May 18, 2023

For example, how expensive is it to know the accurate proof as we are building the block? Can the client measure and report back the actual post-dispatch (proof_size) weight?

Relative "cheap":

https://github.com/paritytech/substrate/blob/5b253f8757de20cd95f274eb35531a5b0a683987/primitives/trie/src/recorder.rs#L155-L157

However, we currently don't take the compressed size into account.

This is also not estimating the compacted proof (I did try to have primitive for compacted proof, it means recording parent-child relation too in order to reduce estimated size, it was not too simple or cheap so I drop the branch).

For compression, it does not compress much all trie nodes (mostly crypto hash), but the values in the proof can be compressed (not all value of course, but it depends on the chain here). But there might be something to do by compressing the value only for estimation (at least a proof format could be design with two files: one with trie material and one with indexed values, then we only compress the second one).

@bkchr
Copy link
Member

bkchr commented May 22, 2023

I would propose that we check the current storage proof size after applying an extrinsic. Then we would return the unused (worst_case - actual) proof size as we return the ref time. This should be implemented in the following way:

We introduce a custom host function into Cumulus:

/// Returns the current uncompressed storage proof size.
///
///  The storage proof size is composed by the size of all trie nodes/values read from the state.
fn storage_proof_size() -> u32; 

In Substrate this will mean that we will introduce a new Extension that will hold a reference to the StorageProof to call estimate_encoded_size(). When this extension isn't registered, we the host function should return 0.

While the host function isn't used by Polkadot itself, we should still create some RFC for adding it.

There are four different "stages" in which this host function will be called.

Block building

When we are building a block, we already collecting the storage proof. So, we will just need to register the extension and it should work.

Block import

Currently we don't collect a storage proof while importing a block, as we don't need it. However, we would need to start doing this to have storage_proof_size return the correct values while importing a block. This would be the simplest solution to the problem and should also be very easy to implement for light clients (They would just need to return the size of the storage proof they have fetched so far, when they actually need to execute a block).

Another more complex solution would be to alter FRAME to have some new hook on_final_finalize (name is a joke, naming is hard :P). The idea of this hook would be that you are only allowed to do a really small subset of actions that require extra caution, but you could return a vec of custom Calls that can be attached to the block. This would run after on_finalize and before we calculate the storage root etc. These custom Calls would then be added as unsigned Extrinsic to the block (would require to change the BlockBuilder runtime api to also have these extrinsic being returned). Before executing the block in execute_block we would then remove these extrinsics from the block and setup certain things based on them. In our use case here we would store the return values of storage_proof_size as a vector in a call. When we would then call execute_block it would take the vec of values and setup of storage_proof_size to not call into the host and stay in the runtime and return always the next value in the vector. (The way of adding a custom calls/extrinsics by the runtime to the block is also something I would always had imagined for: #82)

Block validation

Depending how we implement the solution for block import, block validation would do it similar. If we enable storage proof recording while block import, it would require to add some new field to ParachainsBlockData. This field would be a Vec<u32> and each entry corresponds to the return value of estimate_encoded_size while building the block. We would then use this vec to "replay" the same return values as we are running the validation. This will be done by overwriting the storage_proof_size host function as we do it for the storage host functions etc.

If we choose the other solution as outlined above, block validation will not require any kind of special handling and will just work™

Offchain calls

For any kind of offchain call that maybe is hitting storage_proof_size it should return 0.

We now only need to discuss on which block import solution to implement.

I would propose @skunert as the one implementing the changes to make the host function work. The actual FRAME integration should then be relative easy.

@ggwpez
Copy link
Member Author

ggwpez commented May 22, 2023

This is also not estimating the compacted proof (I did try to have primitive for compacted proof, it means recording parent-child relation too in order to reduce estimated size, it was not too simple or cheap so I drop the branch).

I thought the compacted proof is just de-duplicating node entries from the proof? Or is that already happening?
AFIAK this decreased the PoV size quite a lot (will run some tests to check again).

/// Returns the current uncompressed storage proof size.
///
/// The storage proof size is composed by the size of all trie nodes/values read from the state.
fn storage_proof_size() -> u32;

Is there an easy way to keep host functions extensible?
Maybe we could add an enum as argument to distinguish between RawEncoded, Compact, CompactCompressed. Then we could later-on add compaction and compression if we want to, but for now the enum only has RawEncoded.
I am just thinking on how to get the most out of adding that host function, since once we add it, there is no changing.
Or we return a Map, that contains all of these values. Basically a weakly typed struct.

We now only need to discuss on which block import solution to implement.

The issue states multiple situations where this PoV clawback would be good. I assume this host-function is transactional aware and could just called within something like Utility::batch to clawback the unused PoV of each call when looping over them?
The MQ pallet would also profit from this, so that it can reclaim unused PoV and process more messages.

@bkchr
Copy link
Member

bkchr commented May 22, 2023

I thought the compacted proof is just de-duplicating node entries from the proof? Or is that already happening?

We don't have duplicated nodes by default. This is by design. Compaction removes an intermediate node when we have all the childs available to recreate it.

Is there an easy way to keep host functions extensible?

I'm not really convinced that we need this. Especially if we would add a new enum variant, we should also bump the host function version.

Stuff like compaction could probably be implemented using some kind of iterative algorithm. However, we would then still need to compress the compact result which can not be iterative, as we could have removed a node in between using compaction.

I assume this host-function is transactional aware

The storage recorder is transactional, but this isn't working as you expect it. We can only discard a layer if the extrinsic isn't part of the block. The moment the transaction is part of the block, we will need to record all the accessed nodes and can not throw away anything. Otherwise the validation will not work.

could just called within something like Utility::batch to clawback the unused PoV of each call when looping over them?

It should work, but I don't see the reason? You can just do it after batch is finished? As before you run batch you need to take the static weight any way to decide if the extrinsic/calls fits into the block. In this moment you already remove all the static weight and can refund later. No need to return in between.

The MQ pallet would also profit from this, so that it can reclaim unused PoV and process more messages.

That makes sense. I think in general you didn't understand what I meant with these "stages". These are seen from the node side, the runtime doesn't care and can call the host function as it likes.

@cheme
Copy link
Contributor

cheme commented May 22, 2023

Stuff like compaction could probably be implemented using some kind of iterative algorithm. However, we would then still need to compress the compact result which can not be iterative, as we could have removed a node in between using compaction.

I did try to implement it inline (when writing a new node we pass its parent reference then we can remove one hash size if the parent in the proof still have this hash unaccessed).
But it was a bit impactfull on the trie crate, and I expect it to be a bit too costly.

Actually, right now I just realize that every time you include a new trie node in the node proof set, it has a parent, so if it is its first insert, you can say that we remove one hash from the size of the compact proof, so I think it is fairly ok to do (just realize that , maybe there is a caveat): the only cost is the insert in the proof recorder need to be able to check if the node is already present (so I think we are plain good here with the existing node hashmap of the current recorder).

Algo being simply on insert if node did not exists, remove ~32 from proof size (except the very first node (root)).

Edit: actually can also be calculated simply from the number of nodes currently recorded: (nb_node recorded - 1) * 32 to remove from current proof size.

@ggwpez
Copy link
Member Author

ggwpez commented May 22, 2023

It should work, but I don't see the reason? You can just do it after batch is finished?

Ah yea, that was a stupid example 🙈

The MQ pallet would also profit from this, so that it can reclaim unused PoV and process more messages.

That makes sense. I think in general you didn't understand what I meant with these "stages". These are seen from the node side, the runtime doesn't care and can call the host function as it likes.

Okay. I was only concerned that it would not update the PoV estimation immediately after a storage access in the runtime, but it seems to be the case.

@bkchr
Copy link
Member

bkchr commented May 22, 2023

The PovRecorder records as you access the storage and the POV size estimation is also always updated.

@gavofyork
Copy link
Member

gavofyork commented May 23, 2023

So it's not just extrinics, but also on_initialize/on_idle code across pallets, the scheduler and the message-queue, as well as any ecosystem pallets which also do dynamic stuff in on_initialize/on_idle. So we should be careful not to over-specialise for the blockbuilder/extrinsic use-case.

Jacob mentioned correctly that we don't need to correct-down the weight tracker on every operation (extrinsic/task/message), but only when we believe we have reached some limit. At this point and only here shoudl we correct-down to the actual amount of weight used and recheck against the limit properly.

In the case of scheduler to which a chain might allocate 80% of block space, the first five tasks might use, say, 75%, with the sixth task estimated at 10%. It would only be at this point that the weight is corrected-down to, say, 45% and the sixth task executed.

Block-builder will need to be properly aware, but that's not really the concern of Frame. From the Frame team's point of view, the host function is really all that is needed.

@gavofyork
Copy link
Member

@bkchr I presume "Block import" is what happens in the collator node, whereas "Block validation" is what happens in the validator node?

Seems sensible to go the Vec<u8> way - but why would it not be done at the point of block builder and distributed alongside the block?

@bkchr
Copy link
Member

bkchr commented May 29, 2023

@bkchr I presume "Block import" is what happens in the collator node, whereas "Block validation" is what happens in the validator node?

Yeah, sorry! I should have been more clear. By block import I mean when any node imports this block to its database. By block validation I meant the relay chain running validate_block for a given PoV.

Seems sensible to go the Vec<u8> way

What you mean by the Vec<u8> way?

but why would it not be done at the point of block builder and distributed alongside the block?

We don't support distributing any data alongside the block, at least not currently. In the database we store the parts of the block separately, aka the Header and the Extrinsics. Thus, my idea was that we add the information to "reconstruct" storage_proof_size into a transaction. However, this extrinsic needs to be generated inside the runtime to ensure that the extrinsics_root is correct and it also needs to be generated at a point at which no one is accessing this functionality anymore.

@bkchr
Copy link
Member

bkchr commented May 29, 2023

I briefly talked with @gavofyork.

We came to the following conclusion/idea:

  • We record all the accesses to the storage_proof_size host function on the host side.
  • After we applied all the extrinsics there will be a new "inherent phase". So, we call into the runtime again, passing it data to construct inherent extrinsics. These inherents are then pushed at the end of the block. These inherents should not do any kind of computation, only contain our data.
  • The data is then shared alongside the block. When importing the block or validating the PoV we will then have the data available and can use it to reconstruct the calls to the host function.
  • It will not be allowed to call storage_proof_size as part of on_finalize as this would not be recorded.
  • We will have some chicken/egg problem as the data we collect to construct this inherent actually also contributes to the PoV size. But it shouldn't be such a big problem and we can just reserve some space for this data in our calculations.

@gavofyork
Copy link
Member

gavofyork commented Jun 1, 2023

@bkchr we also briefly discussed the possibility of (re)evaluating the storage proof size during execution on later validation/execution by marking the parts of the proof which get used as they get entered into a cache. This would obviate the need for this extra inherent phase, the new requirements of finalize_block and the "data extrinsic".

Is this viable?

@bkchr
Copy link
Member

bkchr commented Jun 1, 2023

The idea was that in validate_block we could use the TrieCache to find out what the proof size is. As the cache represents all the de-duplicated data that we have read, requesting its size should work as return value of storage_proof_size.

The only question would be on what to do with execute_block on full nodes. The best and most easiest solution would probably to enable storage proof recording to be able to answer the storage_proof_size requests. I think we can go with this.

@kianenigma
Copy link
Contributor

Once we have this, we can do some kind of a "auto-refund" scheme for all dispatchable's proof_size dimension. This won't be make sense for ref_time, but if we know the actual proof size post dispatch, we can automatically refund it.

This will help increase the block utilization, and slightly reduce fees for end users.

@pgherveou
Copy link
Contributor

The pallet-contracts team is looking into this issue, as decreasing PoV size is high on our list too.
I like the solution described so far, but I have a couple of points I would like to add to the conversation:

  • How about exploring a pure Runtime solution so we can ship this with a forkless update?. We could track all storage reads from the runtime in a global Hashset and counter? It would add some extra memory pressure, and be an overestimation compared to getting the TrieCache size, but do you think it's a viable solution?

  • in Eliminating pre-dispatch weight, we talk about shifting from pre-dispatch weight towards dynamically metering the PoV. This could take us a step closer to freeing up devs from benchmarking the PoV weight component (see other on-going research to move to dynamic ref_time metering here mentioned in the post).

One idea discussed in this thread is to use systemic over-estimation, using strategies like imposing max PoV weight of 1/10th of a block's capacity, ensuring we can fill up to 90% of available block space.

Would love to hear your thoughts on this.

@bkchr
Copy link
Member

bkchr commented Aug 28, 2023

  • We could track all storage reads from the runtime in a global Hashset and counter?

This would not include the trie nodes and thus, we would underestimate the size of the PoV.

  • in Eliminating pre-dispatch weight, we talk about shifting from pre-dispatch weight towards dynamically metering the PoV. This could take us a step closer to freeing up devs

This would require to let the node abort the execution if any storage access would use over the maximum pov weight.

@pgherveou
Copy link
Contributor

pgherveou commented Aug 29, 2023

This would not include the trie nodes and thus, we would underestimate the size of the PoV.

isn't counting key + value size an over-estimation, since trie path are repeated in each key?

This would require to let the node abort the execution if any storage access would use over the maximum pov weight.

Can't we just exclude the transaction when this happens? ref_time is already protecting us from unlimited resource usage

@bkchr
Copy link
Member

bkchr commented Aug 29, 2023

isn't counting key + value size an over-estimation, since trie path are repeated in each key?

No. Trie nodes are consisting up to 16 * 32byte hashes and you don't know how many are there in between the root and your value from the runtime.

Can't we just exclude the transaction when this happens? ref_time is already protecting us from unlimited resource usage

It is about the block producer, who will find out about this while building a block and not before like it is right now. You need to abort the transaction, but the block producer still would need to get paid for this as he has done work.

@skunert skunert added this to SDK Node Aug 30, 2023
@skunert skunert moved this from backlog to in progress in SDK Node Aug 30, 2023
@github-project-automation github-project-automation bot moved this to backlog in SDK Node Aug 30, 2023
@pgherveou
Copy link
Contributor

Make sense, thanks for your answers.

I think the important thing here is being able to correct down the weight after each extrinsic.
I guess the argument being discussed in the forum on systemic over-estimation, is that once we do that, we could then avoid complex benchmark and Instead, simply set a generous pre-dispatch weight (for instance, 1/10th of block weight), ensuring that blocks are at least 90% full.

@pgherveou
Copy link
Contributor

We (pallet-contracts) are really interested in this issue and would love to contribute. @bkchr @skunert let me know how I can collaborate with you on it.

@skunert
Copy link
Contributor

skunert commented Aug 30, 2023

I have started looking into this issue some time ago already and my current draft mostly follows what has been sketched earlier in this issue. I plan to start opening PRs for this soon.

Outline of the current draft:

  • In cumulus there will be an externalities extension that has a reference to the proof recorder and is able to return the currently estimated proof size.
  • A new host function that returns the currently estimated proof size to the runtime.
  • Extension can be passed down to the block-builder on authoring and to the client for block import.
  • For block import, recording will be enabled so that a proof recorder is available for the extension.
  • Currently looking into the best option for block validation. Initially suggested in this issue was to use the TrieCache to get a size, but I would prefer to use the same recorder mechanism that we use in the other places.
  • When this extension and host function are available, the runtime can reclaim some storage proof weight by calling this host function and calculating the diff between the benchmarked weight and the actual one.

@crystalin
Copy link

  • When this extension and host function are available, the runtime can reclaim some storage proof weight by calling this host function and calculating the diff between the benchmarked weight and the actual one.

That is indeed going to be very useful. Having done recently many benchmark, I think some worst case scenario (specially the system.account or locks....) are too far from reality (due to the worst case estimation of the tree branches).

@CanonicalJP
Copy link

I have started looking into this issue some time ago already and my current draft mostly follows what has been sketched earlier in this issue. I plan to start opening PRs for this soon.

Following-up on the pending PR, as it's expected to fix srlabs_findings/issues/276, @skunert please let me know when you have the PR and its expected ETA for merge. Thanks in advance!

@skunert
Copy link
Contributor

skunert commented Sep 26, 2023

I have started looking into this issue some time ago already and my current draft mostly follows what has been sketched earlier in this issue. I plan to start opening PRs for this soon.

Following-up on the pending PR, as it's expected to fix srlabs_findings/issues/276, @skunert please let me know when you have the PR and its expected ETA for merge. Thanks in advance!

The PR is available here #1462, I am currently waiting on some more reviews (I know some are in progress). However, I can sadly not access the issue you linked, so not sure what it is about.

@skunert
Copy link
Contributor

skunert commented Nov 10, 2023

I would like to discuss the reclaiming mechanism on the runtime side. I think there are two main scenarios worth discussing.

Extrinsics

When we apply extrinsics we can just perform a check of the proof size before and after the application. The diff between the values tells us how much proof size was actually consumed. The difference between the consumed proof size and the benchmarked one can be reclaimed.

I have a simple SignedExtension that will do the above and perform the reclaiming. You can find it in this draft PR: #2270

Hooks

For reclaiming in hooks the situation is a bit more complicated.

  • on_initialize: This hook returns the weight that is used by itself plus whatever should be consumed by on_finalize. So after this is run, the storage proof size is basically "out of sync" and we can not use the pre and post-dispatch diff technique we use for the extrinsics.
  • on_idle: Here we could compare pre and post-run storage proof weight and reclaim accordingly. However, users will probably want to maximize their usage of the remaining weight and will want to manually reclaim inside their implementation so they can get the maximum possible work done.
  • on_finalize: There should be nothing to do here since its weight has been accounted for and nothing should happen after this anyway.

One way to allow easy manual reclaiming would be to provide a cumulus-supplied WeightMeter wrapper that has the standard WeightMeter functionality but is enhanced by knowledge about proof reclaiming. A user would instantiate such a ProofAwareWeightMeter and it would internally fetch the proof size. User could at any time they wish call a proof_refund method to reclaim differences in weight.

Alternatives

In an ideal world, we could just set the current weight to the proof size directly, without the diff calculation steps. However, this would require a change in how we handle on_initialize, since it is returning weights for on_finalize too. We would need to refactor this to be able to differentiate between the two.

These are my thoughts about it, but I am not too experienced with frame, so happy to hear further opinions.
cc @ggwpez @pgherveou

@pgherveou
Copy link
Contributor

on_idle: Here we could compare pre and post-run storage proof weight and reclaim accordingly. However, users will probably want to maximize their usage of the remaining weight and will want to manually reclaim inside their implementation so they can get the maximum possible work done.

Agree with that.
As an example, our custom multi-block migration framework on pallet-contracts, run the migration on_idle.
It tries to make as much progress as possible given the remaining weight.

In a world with Clawback PoV, I would probably adjust the logic, and add an extra check, where I would call the client api to get the adjusted PoV, before trying one more time to make progress and eventually break from the loop.

Alternatively, if the client call is pretty cheap, we could also just call it every time we mutate the remaining_weight, to get the PoV component adjusted to it's real value.

fn on_idle(_block: BlockNumberFor<T>, mut remaining_weight: Weight) -> Weight {
use migration::MigrateResult::*;
loop {
let (result, weight) = Migration::<T>::migrate(remaining_weight);
remaining_weight.saturating_reduce(weight);
match result {
// There is not enough weight to perform a migration, or make any progress, we
// just return the remaining weight.
NoMigrationPerformed | InProgress { steps_done: 0 } => return remaining_weight,
// Migration is still in progress, we can start the next step.
InProgress { .. } => continue,
// Either no migration is in progress, or we are done with all migrations, we
// can do some more other work with the remaining weight.
Completed | NoMigrationInProgress => break,
}
}

@ggwpez
Copy link
Member Author

ggwpez commented Nov 13, 2023

I think for the hooks we can for now let the implementer do the metering. The big user-facing part are the Transactions.

on_idle [...] However, users will probably want to maximize their usage of the remaining weight and will want to manually reclaim inside their implementation so they can get the maximum possible work done.

This is fine and i dont see the problem. In this case, "manually reclaiming" would just mean to call the new HostFn after each loop iter. For example:

fn on_idle(limit: Weight) -> Weight {
    for item in work {
        item();
        if new_host_call() > limit.proof {
            break;
        }
    }

    return // Whatever we estimate.
}

The current System::BlockWeight should not be modified by that hook but by the caller: Executive who can then consume the correct weight.

With on_initialize and on_finalize i would do it similar. Just trust the developer for now that they benchmark these well enough. I think the important part is just to have more block utilization via Transaction Proof metering.
Both hooks are also going to be deprecated soon in favour of unpaired hooks, who dont have this weird entanglement for their weight returns.

For the signed extension: I think it must be in a specific order, so that the Weight SE deduces the correct amount?
We can also hard-code this into executive directly. Not sure what is better.

@skunert
Copy link
Contributor

skunert commented Nov 16, 2023

Okay, so you guys agree that manual reclaiming for the hooks is the way to go.

This is fine and i dont see the problem. In this case, "manually reclaiming" would just mean to call the new HostFn after each loop iter.

Yeah something like that, but since we are getting the remaining block weight passed into on_idle, we would need multiple host calls. One in the beginning to check the baseline proof size and then after some work done do check how much weight was actually consumed. To make this easier I was proposing the weightmeter, would look something like this:

fn on_idle(limit: Weight) -> Weight {
  // The naming tbd
  // ProofAwareWeightMeter internally stores the storage limit as of now by calling hf
  let meter = ProofAwareWeightMeter::new(limit);
  for item in work {
        // Some checking logic that the item actually fits into the remaining weight via benchmark
        ...
        item();
        // This would downcorrect the consumed storage weight on the meter based on diff of now vs initialization time.
        meter.refund_proof_size()
   }

   return meter.consumed()
}

For the signed extension: I think it must be in a specific order, so that the Weight SE deduces the correct amount?

What scenarios do you have in mind? You mean other extension already modify the proof weight and it could lead to wrong results? Was not on my radar until now.

We can also hard-code this into executive directly. Not sure what is better.

From what I saw until now, I think its attractive to have this in a SE. It stays nicely modular and uses the same mechanism as the CheckWeight SE. But needs to be separate because this whole thing will be parachain only.

skunert added a commit that referenced this issue Nov 30, 2023
This PR provides the infrastructure for the pov-reclaim mechanism
discussed in #209. The goal is to provide the current proof size to the
runtime so it can be used to reclaim storage weight.

## New Host Function
- A new host function is provided
[here](https://github.com/skunert/polkadot-sdk/blob/5b317fda3be205f4136f10d4490387ccd4f9765d/cumulus/primitives/pov-reclaim/src/lib.rs#L23).
It returns the size of the current proof size to the runtime. If
recording is not enabled, it returns 0.

## Implementation Overview
- Implement option to enable proof recording during import in the
client. This is currently enabled for `polkadot-parachain`,
`parachain-template` and the cumulus test node.
- Make the proof recorder ready for no-std. It was previously only
enabled for std environments, but we need to record the proof size in
`validate_block` too.
- Provide a recorder implementation that only the records the size of
incoming nodes and does not store the nodes itself.
- Fix benchmarks that were broken by async backing changes
- Provide new externalities extension that is registered by default if
proof recording is enabled.
- I think we should discuss the naming, pov-reclaim was more intuitive
to me, but we could also go with clawback like in the issue.

## Impact of proof recording during import
With proof recording: 6.3058 Kelem/s
Without proof recording: 6.3427 Kelem/s

The measured impact on the importing performance is quite low on my
machine using the block import benchmark. With proof recording I am
seeing a performance hit of 0.585%.

---------

Co-authored-by: command-bot <>
Co-authored-by: Davide Galassi <[email protected]>
Co-authored-by: Bastian Köcher <[email protected]>
@skunert
Copy link
Contributor

skunert commented Mar 8, 2024

#3002

@skunert skunert closed this as completed Mar 8, 2024
@github-project-automation github-project-automation bot moved this from Backlog to Done in Benchmarking and Weights Mar 8, 2024
@github-project-automation github-project-automation bot moved this from Backlog to Done in Runtime / FRAME Mar 8, 2024
@github-project-automation github-project-automation bot moved this from Draft to Closed in Parity Roadmap Mar 8, 2024
@github-project-automation github-project-automation bot moved this from in progress to done in SDK Node Mar 8, 2024
bgallois pushed a commit to duniter/duniter-polkadot-sdk that referenced this issue Mar 25, 2024
This PR provides the infrastructure for the pov-reclaim mechanism
discussed in paritytech#209. The goal is to provide the current proof size to the
runtime so it can be used to reclaim storage weight.

## New Host Function
- A new host function is provided
[here](https://github.com/skunert/polkadot-sdk/blob/5b317fda3be205f4136f10d4490387ccd4f9765d/cumulus/primitives/pov-reclaim/src/lib.rs#L23).
It returns the size of the current proof size to the runtime. If
recording is not enabled, it returns 0.

## Implementation Overview
- Implement option to enable proof recording during import in the
client. This is currently enabled for `polkadot-parachain`,
`parachain-template` and the cumulus test node.
- Make the proof recorder ready for no-std. It was previously only
enabled for std environments, but we need to record the proof size in
`validate_block` too.
- Provide a recorder implementation that only the records the size of
incoming nodes and does not store the nodes itself.
- Fix benchmarks that were broken by async backing changes
- Provide new externalities extension that is registered by default if
proof recording is enabled.
- I think we should discuss the naming, pov-reclaim was more intuitive
to me, but we could also go with clawback like in the issue.

## Impact of proof recording during import
With proof recording: 6.3058 Kelem/s
Without proof recording: 6.3427 Kelem/s

The measured impact on the importing performance is quite low on my
machine using the block import benchmark. With proof recording I am
seeing a performance hit of 0.585%.

---------

Co-authored-by: command-bot <>
Co-authored-by: Davide Galassi <[email protected]>
Co-authored-by: Bastian Köcher <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
D1-medium Can be fixed by a coder with good Rust knowledge but little knowledge of the codebase. I5-enhancement An additional feature request. T1-FRAME This PR/Issue is related to core FRAME, the framework.
Projects
Status: Done
Status: Done
Status: done
Development

No branches or pull requests