Compile time blowup when calling push_back repeatedly #4395

spalladino · 2024-02-16T13:42:16Z

Aim

Given a large enough array (about 8k elements), copying it into another array by iteratively calling push_back increases compiling time to several minutes (over 5 minutes).

This happens when declaring a very large array as an argument for an Aztec contract, since they get passed into the Hasher which copies elements into its own array using push_back.

https://github.com/AztecProtocol/aztec-packages/blob/f14474571210a46e7159cb9d2f0bc9374a837d3d/noir-projects/aztec-nr/aztec/src/hasher.nr#L22-L26

Expected Behavior

Compile time should not blow up.

Bug

The following vanilla Noir snippet reproduces the issue:

struct Hasher {
    fields: [Field],
}

impl Hasher {
    pub fn new() -> Self {
        Self { fields: [] }
    }

    pub fn add(&mut self, field: Field) {
        self.fields = self.fields.push_back(field);
    }

    pub fn add_multiple<N>(&mut self, fields: [Field; N]) {
        for i in 0..N {
            self.fields = self.fields.push_back(fields[i]);
        }
    }
}

fn main(expected: pub Field, first: Field, input: [Field;8192]) {
    let mut hasher = Hasher::new();
    hasher.add(first);
    hasher.add_multiple(input);
    assert(hasher.fields[0] == expected);
}

To Reproduce

No response

Project Impact

None

Impact Context

This makes it very annoying to add large arrays as arguments to Aztec contracts.

Workaround

None

Workaround Description

We can work around it by injecting the array through an oracle call. Still, if there is an equivalent operation to push_back that would yield better results, it'd be most welcome!

Additional Context

No response

Installation Method

Binary (noirup default)

Nargo Version

0.23.0+5be9f9d7e2f39ca228df10e5a530474af0331704

NoirJS Version

No response

Would you like to submit a PR for this Issue?

None

Support Needs

No response

The text was updated successfully, but these errors were encountered:

Updates the `ContractDeployer` to use the new deployment flow. Instead of creating txs flagged as `isDeployment`, the deployer now creates a batch call that registers the contract class (if needed), publicly deploys the contract via the canonical `InstanceDeployer`, and then initializes it by calling the constructor. This batch call is then sent via an account contract entrypoint and executed in a single tx. Check out `e2e_deploy` _publicly deploys and initializes a contract_ to see it in action. However, since this requires sending txs through an account entrypoint, it breaks for deploying accounts. So we still keep the old deployment flow around for those scenarios, under the `LegacyDeployMethod` name. We'll need to tweak account contracts constructors so they can be initialized and optionally deployed in the same single direct call, and they also pay fees for it. Alternatively, we can also add a canonical `MultiCall` contract, which could take care of this, though fee payment is unclear. This PR also changes the `ClassRegisterer` API so that bytecode is provided via a capsule instead of an arg, since having args of size over ~2k elements was causing a loop in the Hasher that massively increased build times (see noir-lang/noir#4395). **Breaking changes:** Since deployments are now done from an account, deploying a contract via `Contract.deploy` now requires a `Wallet` instance instead of a `PXE`, and deploy a contract via CLI now requires a `--private-key`. --------- Co-authored-by: PhilWindle <[email protected]>

jfecher · 2024-03-19T17:01:35Z

We optimized the implementation of as_slice() recently for converting arrays to slices. Is that applicable here?

You could, for example rewrite the example to make use of as_slice and push to the resulting slice for possibly better performance:

fn main(expected: pub Field, first: Field, input: [Field;8192]) {
    let mut hasher_slice = input.as_slice();
    hasher_slice = hasher_slice.push_front(first);
    assert(hasher_slice[0] == expected);
}

takes 0.644 seconds for me on nargo compiled in debug mode.

push_front can be hard to use if this code is auto-generated from a macro since it works backwards somewhat. If that is the case then for now, I think you can replace:

pub fn add_multiple<N>(&mut self, fields: [Field; N]) {
    for i in 0..N {
        self.fields = self.fields.push_back(fields[i]);
    }
}

with the existing append in the stdlib:

pub fn add_multiple<N>(&mut self, fields: [Field; N]) {
    self.fields = self.fields.append(fields.as_slice());
}

This won't give you a speedup yet, but when we eventually optimize append you'll then see that speedup automatically too.

vezenovm · 2024-03-19T17:10:57Z

takes 0.644 second for me on nargo compiled in debug mode

Execute was ~.9s for me as well using this update

zac-williamson · 2024-03-20T11:22:46Z

@jfecher is anyone working on this bug? I appreciate that a workaround exists for @spalladino , but other developers will also encounter this issue.

jfecher · 2024-03-20T14:11:32Z

@zac-williamson I spoke with @vezenovm about adding a builtin for append yesterday. It doesn't seem too difficult but nobody is working on adding it at this moment.

spalladino · 2024-03-22T22:15:19Z

@jfecher I've just tried the approach you suggested (apologies for the delay!) in this branch in aztec-packages, but I'm running into a should have slice capacity set for value v776 being stored at v764. Any hints? Note that I have modified the compiler to allow skipping code autogeneration for the hasher, but that should've affected only the aztec macros.

jfecher · 2024-03-25T13:34:57Z

@spalladino what version of the compiler are you using? There was a bug that was fixed a couple days ago related to .as_slice() calls which triggered that error message. You may want to try updating to the latest version of master

spalladino · 2024-03-25T20:51:14Z

@jfecher assuming that fix was #4610, then yeah, it wasn't synced to the aztec-packages codebase yet. It landed just earlier today. I'll retry and report back!

spalladino · 2024-03-25T21:27:32Z

@jfecher no luck, same error as before (though the index has moved a little). I've pushed the latest to the same branch as before: AztecProtocol/aztec-packages#5410.

The application panicked (crashed).
Message:  ICE: should have slice capacity set for value v700 being stored at v688
Location: compiler/noirc_evaluator/src/ssa/opt/flatten_cfg/capacity_tracker.rs:134

jfecher · 2024-03-26T14:29:04Z

Another issue. On master, there's been a large regression. The compiler now panics for the original code example with:

The application panicked (crashed).
Message:  assertion `left == right` failed
  left: 2
 right: 1
Location: compiler/noirc_evaluator/src/ssa/opt/inlining.rs:420

I think the issue stems from the slice created by the new function not having an associated capacity returned as well:

Initial SSA:
acir fn main f0 {
  b0(v0: Field, v1: Field, v2: [Field; 8192]):
    inc_rc v2
    v4, v5 = call f1()
    ...
}
acir fn new f1 {
  b0():
    inc_rc []
    return []
}

In the SSA above, the v4, v5 = call f1() expects two return values but f1 only returns a single [] instead of Field 0, [].

Edit: This is due a bug in the array/slice split. new returns an array even though the field should only accept a slice. Changing new to return &[] fixes this.

Edit 2: I think the array/slice split just exposed this bug rather than introduced it. It looks like the coercion code hasn't been changed in quite a while. Before the split, the array in this issue would just be inferred to be a slice instead of being coerced into one.

vezenovm · 2024-03-26T14:38:06Z

Edit: This is due a bug in the array/slice split. new returns an array even though the field should only accept a slice. Changing new to return &[] fixes this.

Probably can be resolved with an improved type check. I am looking at resolving the initial bug but I can take a look at this one as well.

jfecher · 2024-03-26T14:50:45Z

@vezenovm I fixed it and am putting up a PR for it now after I make a new issue. It looks like the array to slice coercion has always been broken which makes me question how it ever worked.

vezenovm · 2024-03-26T14:51:30Z

It looks like the array to slice coercion has always been broken which makes me question how it ever worked.

I don't think we ever allowed empty arrays before. So if you did [] it was always a slice so you would have Field 0, [].

jfecher · 2024-03-26T14:52:26Z

We allowed empty arrays but arrays and slices were polymorphic before so in that case before it'd just be inferred to be a slice instead of coerced into one.

jfecher · 2024-03-26T14:56:53Z

@spalladino may be worth testing again after #4639 is merged

vezenovm · 2024-03-26T14:59:33Z

@spalladino may be worth testing again after #4639 is merged

He didn't use the Hasher struct in his PR. I'm looking into why his PR might be failing.

jfecher · 2024-03-26T15:17:01Z

@vezenovm I was assuming there still may be a as_slice coercion inserted somewhere by the aztec macro for hashing

vezenovm · 2024-03-26T15:24:56Z

@vezenovm I was assuming there still may be a as_slice coercion inserted somewhere by the aztec macro for hashing

That PR doesn't use the macro which hashes the args. I have a fix.

# Description ## Problem\* Resolves #4395 (comment) ## Summary\* We were not accurately accounting for situations where the slice capacity tracker was expecting a capacity from slice intrinsic results. I have added a case for handling the slice capacity of `AsSlice`. I also tested this against the `aztec-packages` work and we can successfully compile now. ## Additional Context ## Documentation\* Check one: - [ ] No documentation needed. - [ ] Documentation included in this PR. - [ ] **[For Experimental Features]** Documentation to be submitted in a separate PR. # PR Checklist\* - [ ] I have tested the changes locally. - [ ] I have formatted the changes with [Prettier](https://prettier.io/) and/or `cargo fmt` on default settings.

spalladino · 2024-03-26T19:33:05Z

Thanks for looking into this, and apologies as I got a bit lost in the discussion. Does this mean that the version of the code that uses slices and push_front (ie the one shared here) should compile fast, as long as #4643 is merged into aztec-packages?

vezenovm · 2024-03-26T19:58:48Z

Does this mean that the version of the code that uses slices and push_front (ie the one shared here) should compile fast, as long as #4643 is merged into aztec-packages?

Yes it should compile fast now, but until we have a builtin append you will have to use push_front.

spalladino · 2024-03-27T14:13:19Z

@jfecher @vezenovm can confirm compile time has gone down from 7 minutes to just 30 seconds using push_front here.

However, given a recent increase in max bytecode size, we now have to increase the ARGS_HASH_CHUNK_LENGTH and ARGS_HASH_CHUNK_COUNT used in the hashing loop here from 64 to 128 each:

https://github.com/AztecProtocol/aztec-packages/pull/5410/files#diff-047179fc6de7179088a0fad53d28c70b04f85c09a5dd1271bf341cf54006d9e2R36-R52

This means going from hashing 4k field elements to hashing 16k, and that causes compile times to spike again, going from 30s to 5 minutes. Is this expected, or is there something further we can do to improve those times?

spalladino · 2024-03-27T14:22:24Z

Commenting out the pedersen_hash inside the loop does not speed things up, so it's the copying that's taking up time.

https://github.com/AztecProtocol/aztec-packages/pull/5410/files#diff-047179fc6de7179088a0fad53d28c70b04f85c09a5dd1271bf341cf54006d9e2R48

vezenovm · 2024-03-27T14:25:59Z

Commenting out the pedersen_hash inside the loop does not speed things up, so it's the copying that's taking up time.

@jfecher I'm suspicious this is a case of #4621 as the copying is done inside of an if statement that checks the dynamic args.len().

jfecher · 2024-03-27T14:49:58Z

@vezenovm possibly. When investigating #4621 further for the hashmap usecase though I found that it didn't actually apply there - the last_uses check already applied to every array set and made them all mutable even with dynamic indices. I think I'll push a PR from a branch I made which turns the last_use analysis into an actual SSA pass to make it more clear for debugging which instructions it applies to.

I'm also not sure if adding reference counting to acir is plausible in general or if we'd need a static analysis like the last_use we have now.

If there was a minimal repro for the aztec-packages PR we could test to see how many array sets are made mutable.

vezenovm · 2024-03-27T16:17:57Z

This looks to just be an issue when flattening with such a large loop bound. I made the following vanilla Noir program:

use dep::std::hash::pedersen_hash;
global ARGS_HASH_CHUNK_COUNT = 64;
global ARGS_HASH_CHUNK_LENGTH = 64;
fn main(expected: pub Field, first: Field, input: [Field; 20]) {
    let mut hasher_slice = input.as_slice();
    hasher_slice = hasher_slice.push_front(first);
    assert(hasher_slice[0] == expected);

    let hash = hash_args_slice(hasher_slice);
    println(hash);
}

fn hash_args_slice(args: [Field]) -> Field {
    if args.len() == 0 {
        0
    } else {
        let mut chunks_hashes = [0; ARGS_HASH_CHUNK_COUNT];
        for i in 0..ARGS_HASH_CHUNK_COUNT {
            let mut chunk_hash = 0;
            let start_chunk_index = i * ARGS_HASH_CHUNK_LENGTH;
            if start_chunk_index < args.len() {
                let mut chunk_args = [0; ARGS_HASH_CHUNK_LENGTH];
                for j in 0..ARGS_HASH_CHUNK_LENGTH {
                    let item_index = i * ARGS_HASH_CHUNK_LENGTH + j;
                    if item_index < args.len() {
                        chunk_args[j] = args[item_index];
                    }
                }
                chunk_hash = pedersen_hash(chunk_args);
            }
            chunks_hashes[i] = chunk_hash;
        }
        pedersen_hash(chunks_hashes)
    }
}

Running time nargo compile --force on master gives me ~10s. This is no matter the size of input. I tried sizes of 20, 2000, and 15000. This makes sense as we are bound by the loop not the input, but I just wanted to try it to be sure.

Switching the loop bound constants to 128 the compile time jumped to ~2 minutes.

I also made an example without any if statements in the loop.

fn hash_args_slice_no_if(args: [Field]) -> Field {
    if args.len() == 0 {
        0
    } else {
        let mut chunks_hashes = [0; ARGS_HASH_CHUNK_COUNT];
        for i in 0..ARGS_HASH_CHUNK_COUNT {
            let mut chunk_hash = 0;
            let mut chunk_args = [0; ARGS_HASH_CHUNK_LENGTH];
            for j in 0..ARGS_HASH_CHUNK_LENGTH {
                let item_index = i * ARGS_HASH_CHUNK_LENGTH + j;
                chunk_args[j] = args[item_index];
            }
            chunk_hash = pedersen_hash(chunk_args);
            chunks_hashes[i] = chunk_hash;
        }
        pedersen_hash(chunks_hashes)
    }
}

I then made input equal to [Field; ARGS_HASH_CHUNK_COUNT * ARGS_HASH_CHUNK_LENGTH - NUM_PUSH_FRONTS]. This took ~1s to compile with the loop bounds at 64 and ~4s to compile the bounds at 128.

vezenovm · 2024-03-27T16:19:07Z

It would be good to add some benchmarking on certain tests to see how much time we are spending on each compilation pass.

jfecher · 2024-03-27T16:29:54Z

It worries me if the reason for this slowdown is just from a large nested loop since that isn't really fixable. We can always try to optimize these passes further but even if we get the entire compiler running in half the time that'd just mean we'd compile just as slowly for a loop bound 2x higher.

vezenovm · 2024-03-27T16:32:59Z

It worries me if the reason for this slowdown is just from a large nested loop since that isn't really fixable. We can always try to optimize these passes further but even if we get the entire compiler running in half the time that'd just mean we'd compile just as slowly for a loop bound 2x higher.

I did some basic timing using std::time::Instant and got this breakdown for loop bounds of 64 in the example above (with the if statements in the loop):

After Defunctionalization:: 0
After Removing Paired rc_inc & rc_decs:: 0
After Inlining:: 0
After Mem2Reg:: 0
After Simplifying:: 3
After Flattening:: 1658
After Removing Bit Shifts:: 16
After Mem2Reg:: 3957
After Constant Folding:: 906
After Constraint Folding:: 1240
After Dead Instruction Elimination:: 273
ACIR process: 2180
nargo compile --force  8.65s user 8.85s system 159% cpu 10.960 total

And then loop bounds of 128:

After Defunctionalization:: 0
After Removing Paired rc_inc & rc_decs:: 0
After Inlining:: 0
After Mem2Reg:: 0
After Simplifying:: 17
After Flattening:: 15480
After Removing Bit Shifts:: 140
After Mem2Reg:: 45459
After Constant Folding:: 12410
After Constraint Folding:: 11968
After Dead Instruction Elimination:: 2604
ACIR process: 26105
nargo compile --force  92.16s user 29.98s system 105% cpu 1:56.29 total

jfecher · 2024-03-27T16:36:20Z

I'm assuming you changed both loop bounds to 128. Can you change only 1 so that we're comparing 64 * 64 to 64 * 128? That way we can see if they grow linearly or not.

vezenovm · 2024-03-27T16:40:31Z

Changing just ARGS_HASH_CHUNK_LENGTH to 128 (the inner loop bound):

After Defunctionalization:: 0
After Removing Paired rc_inc & rc_decs:: 0
After Inlining:: 0
After Mem2Reg:: 0
After Simplifying:: 7
After Flattening:: 7161
After Removing Bit Shifts:: 93
After Mem2Reg:: 21578
After Constant Folding:: 5410
After Constraint Folding:: 5063
After Dead Instruction Elimination:: 1172
ACIR process: 9440
nargo compile --force  41.29s user 15.25s system 110% cpu 51.085 total

Changing just ARGS_HASH_CHUNK_COUNT to 128 (the outer loop bound):

After Defunctionalization:: 0
After Removing Paired rc_inc & rc_decs:: 0
After Inlining:: 0
After Mem2Reg:: 0
After Simplifying:: 15
After Flattening:: 5662
After Removing Bit Shifts:: 60
After Mem2Reg:: 15511
After Constant Folding:: 2237
After Constraint Folding:: 2696
After Dead Instruction Elimination:: 596
ACIR process: 4155
nargo compile --force  27.18s user 6.29s system 103% cpu 32.465 total

jfecher · 2024-03-27T16:47:28Z

Looks like both are growing faster than linearly. mem2reg and flattening take up the most time, with mem2reg growing a bit faster. Generally the passes take longer as the SSA grows towards the end after loop unrolling and flattening.

Is loop unrolling included in another pass or just excluded from these timings?

vezenovm · 2024-03-27T16:50:54Z

Is loop unrolling included in another pass or just excluded from these timings?

Ah good catch I was just printing inside of run_pass not try_run_pass. Let me just make a general PR for this.

vezenovm · 2024-03-27T16:53:13Z

I did it quickly and loop unrolling is only ~200 ms in both cases above.

spalladino · 2024-03-27T19:06:26Z

I also made an example without any if statements in the loop. (...) I then made input equal to [Field; ARGS_HASH_CHUNK_COUNT * ARGS_HASH_CHUNK_LENGTH - NUM_PUSH_FRONTS]. This took ~1s to compile with the loop bounds at 64 and ~4s to compile the bounds at 128.

Hold on, so the main culprit is the if statements? Given I know the size of the args array at compile time, do you think I can somehow work around this? If I changed the input from a slice to a [Field; N], would those conditionals be erased at compiled time based on the value of N? However, if I did that, I'd be back at square one since I need to copy each argument into the "args" array, which was the original problem that we fixed with slices...

vezenovm · 2024-03-27T21:00:55Z

If I changed the input from a slice to a [Field; N], would those conditionals be erased at compiled time based on the value of N?

Yeah they should be. As you do know the size of the args array at compile time, another option is we could add that builtin append function we mentioned earlier, and then you simply would attach any extra dummy data to args to fill the MAX_ARGS_LENGTH. Then the copying would at least be simplified. However, this is not super user friendly and like you said we are essentially back to square one.

spalladino · 2024-03-27T22:25:43Z

However, this is not super user friendly and like you said we are essentially back to square one.

Just to be clear, the issue is not the user-friendliness of the syntax, but the increase in the compilation time.

# Description ## Problem\* Resolves #4395 (comment) ## Summary\* We were not accurately accounting for situations where the slice capacity tracker was expecting a capacity from slice intrinsic results. I have added a case for handling the slice capacity of `AsSlice`. I also tested this against the `aztec-packages` work and we can successfully compile now. ## Additional Context ## Documentation\* Check one: - [ ] No documentation needed. - [ ] Documentation included in this PR. - [ ] **[For Experimental Features]** Documentation to be submitted in a separate PR. # PR Checklist\* - [ ] I have tested the changes locally. - [ ] I have formatted the changes with [Prettier](https://prettier.io/) and/or `cargo fmt` on default settings.

spalladino added the bug Something isn't working label Feb 16, 2024

github-project-automation bot added this to Noir Feb 16, 2024

github-project-automation bot moved this to 📋 Backlog in Noir Feb 16, 2024

kevaundray assigned jfecher and guipublic Feb 16, 2024

kevaundray added the optimization label Feb 16, 2024

Savio-Sou added the aztec.nr Helpful for development of Aztec.nr the smart contract framework label Feb 16, 2024

spalladino mentioned this issue Feb 17, 2024

feat!: Use new deployment flow in ContractDeployer AztecProtocol/aztec-packages#4497

Merged

jfecher mentioned this issue Mar 26, 2024

Array to slice coercion regression #4639

Closed

vezenovm mentioned this issue Mar 26, 2024

fix(ssa): Fix slice intrinsic handling in the capacity tracker #4643

Merged

5 tasks

jfecher closed this as completed in #4643 Mar 26, 2024

github-project-automation bot moved this from 📋 Backlog to ✅ Done in Noir Mar 26, 2024

vezenovm mentioned this issue Mar 27, 2024

Benchmarking each step of the compilation process #4658

Closed

Compile time blowup when calling push_back repeatedly #4395

Compile time blowup when calling push_back repeatedly #4395

Comments

spalladino commented Feb 16, 2024

Aim

Expected Behavior

Bug

To Reproduce

Project Impact

Impact Context

Workaround

Workaround Description

Additional Context

Installation Method

Nargo Version

NoirJS Version

Would you like to submit a PR for this Issue?

Support Needs

jfecher commented Mar 19, 2024 • edited Loading

vezenovm commented Mar 19, 2024

zac-williamson commented Mar 20, 2024

jfecher commented Mar 20, 2024

spalladino commented Mar 22, 2024

jfecher commented Mar 25, 2024

spalladino commented Mar 25, 2024

spalladino commented Mar 25, 2024

jfecher commented Mar 26, 2024 • edited Loading

vezenovm commented Mar 26, 2024

jfecher commented Mar 26, 2024

vezenovm commented Mar 26, 2024

jfecher commented Mar 26, 2024

jfecher commented Mar 26, 2024

vezenovm commented Mar 26, 2024

jfecher commented Mar 26, 2024

vezenovm commented Mar 26, 2024

spalladino commented Mar 26, 2024

vezenovm commented Mar 26, 2024 • edited Loading

spalladino commented Mar 27, 2024

spalladino commented Mar 27, 2024

vezenovm commented Mar 27, 2024

jfecher commented Mar 27, 2024 • edited Loading

vezenovm commented Mar 27, 2024

vezenovm commented Mar 27, 2024

jfecher commented Mar 27, 2024 • edited Loading

vezenovm commented Mar 27, 2024

jfecher commented Mar 27, 2024

vezenovm commented Mar 27, 2024 • edited Loading

jfecher commented Mar 27, 2024 • edited Loading

vezenovm commented Mar 27, 2024 • edited Loading

vezenovm commented Mar 27, 2024

spalladino commented Mar 27, 2024

vezenovm commented Mar 27, 2024

spalladino commented Mar 27, 2024

jfecher commented Mar 19, 2024 •

edited

Loading

jfecher commented Mar 26, 2024 •

edited

Loading

vezenovm commented Mar 26, 2024 •

edited

Loading

jfecher commented Mar 27, 2024 •

edited

Loading

jfecher commented Mar 27, 2024 •

edited

Loading

vezenovm commented Mar 27, 2024 •

edited

Loading

jfecher commented Mar 27, 2024 •

edited

Loading

vezenovm commented Mar 27, 2024 •

edited

Loading