Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improved fast-path detection in HLS #13070

Merged
merged 6 commits into from
Sep 11, 2024
Merged

improved fast-path detection in HLS #13070

merged 6 commits into from
Sep 11, 2024

Conversation

alexanderivrii
Copy link
Contributor

Summary

This tiny PR improves the performance of the HighLevelSynthesis transpiler pass in the case that it does not need to do anything, for instance when every gate in the circuit is either already supported or can be handled by basis translator. And this is actually quite a common use-case in practice. In fact, this "fast-handling-when-there-is-nothing-to-do" was already supported at some point in the past, but got broken due to some recent changes.

The first change is that when deciding whether synthesizing a gate can be skipped we should not abort when we see a CX or a CZ gate, even though these are controlled gates.

The second change is that we should not spend time calculating sets of clean/dirty ancillas unless there is something to do.

On the utility-scale circuits, the runtime is improved from 0.2 s to 0.02 s.

Unfortunately, this causes HLS to examine the circuit 3 times: the first time it checks if it may need to do anything at all, the second time it synthesizing objects in a DAG while keeping track of clean/dirty ancillas, and the third time it builds a new DAG performing the substitutions.

Any additional suggestions for improvements are welcome.

P.S. Not sure if this deserves a release note.

@alexanderivrii alexanderivrii added the Changelog: Bugfix Include in the "Fixed" section of the changelog label Sep 2, 2024
@alexanderivrii alexanderivrii added this to the 1.3 beta milestone Sep 2, 2024
@alexanderivrii alexanderivrii requested review from ShellyGarion and a team as code owners September 2, 2024 11:25
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core

@coveralls
Copy link

coveralls commented Sep 2, 2024

Pull Request Test Coverage Report for Build 10813390132

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 9 of 9 (100.0%) changed or added relevant lines in 1 file are covered.
  • 24 unchanged lines in 3 files lost coverage.
  • Overall coverage decreased (-0.002%) to 88.864%

Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/lex.rs 3 92.23%
crates/accelerate/src/sabre/layer.rs 5 96.69%
crates/accelerate/src/sabre/route.rs 16 96.14%
Totals Coverage Status
Change from base Build 10802845750: -0.002%
Covered Lines: 73118
Relevant Lines: 82281

💛 - Coveralls

Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Sasha.

I think we're ok to add an extra examination of the DAG, especially if we expect that a decent number of calls to HighLevelSynthesis will use the fast path, and so only do one iteration. If we don't take the fast path, we already examine the DAG loads of times throughout the transpiler, and I'd expect the extra iteration to be dwarfed by the costs of the synthesis / substitution.

Comment on lines 284 to 296
all_skipped = True
for node in dag.topological_op_nodes():
qubits = tuple(dag.find_bit(q).index for q in node.qargs)
if not (
dag.has_calibration_for(node)
or len(node.qargs) < self._min_qubits
or node.is_directive()
or self._definitely_skip_node(node, qubits)
):
all_skipped = False
break
if all_skipped:
return dag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to do this, but in case you're interested, Python has a very uncommon control-flow construction that handles this precise case without an auxiliary variable:

for node in dag.topological_op_nodes():
    qubits = ...
    if not self._should_skip(dag, node, qubits):
        break
else:
    return dag

(The else is attached to the for loop.) Totally up to you if you want to use it - there's plenty of reason not to, since most people don't know what the for/else pattern in Python means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat, I like this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 0962be4.

Comment on lines 287 to 292
if not (
dag.has_calibration_for(node)
or len(node.qargs) < self._min_qubits
or node.is_directive()
or self._definitely_skip_node(node, qubits)
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This set of checks is the exact same set that we do during the loop below - should they be folded into _definitely_skip_node so it's shorter to read and everything stays in sync?

Looking quickly, it seems like the internal logic of HLS has changed slightly since I wrote that fast path within the loop, so the logic around having a separate _definitely_skip_node might be a bit different to when I first wrote it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, done in 0962be4.

@@ -280,6 +280,21 @@ def run(self, dag: DAGCircuit) -> DAGCircuit:
return self._run(dag, tracker)

def _run(self, dag: DAGCircuit, tracker: QubitTracker) -> DAGCircuit:
# Check if HighLevelSynthesis can be skipped.
all_skipped = True
for node in dag.topological_op_nodes():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We technically don't need to waste time getting a topological order for this function, we can just look at every op node in arbitrary order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, done in 0962be4.

Comment on lines 660 to 662
# At the moment we don't consider fast-path handling for controlled gates over 3 or
# more controlled qubits. However, we should not abort early for CX/CZ/etc.
and not (node.is_controlled_gate() and node.num_qubits >= 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe this comment has too many negatives in it? I think we do abort early for CX/CZ, right?

Or maybe the comment is referring to "abort _definitely_skip" and I'm reading it as "abort synthesis for this gate", in which case we might want to tweak the words, since it wasn't clear to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reworded the comment. I have indeed meant that the function _definitely_skip should not immediately return False when seeing a CX-gate or a CZ-gate, even though these are controlled gates.

all_skipped = True
for node in dag.topological_op_nodes():
qubits = tuple(dag.find_bit(q).index for q in node.qargs)
if not (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to cache this result so we can just look it up in the second iteration over the DAG?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean hashing whether a particular node can be skipped? I briefly tried this, but in the main use-case, where only a single pass is required, this extra hashing makes the runtime a bit slower.

@Cryoris
Copy link
Contributor

Cryoris commented Sep 2, 2024

As you said you're already looking at porting parts of the HLS into Rust, we could maybe rewrite this as:

  1. Skip (in Rust): Check whether all nodes are skipped.
  2. Analysis (still in Rust): Compute auxiliary qubits for each node we want to synthesize -- but don't synthesize the node yet, just return {node: (clean aux, dirty aux)}.
  3. Rebuilding (Python): For each node to synthesize, synthesize it in Python and append it onto the DAG.

The advantage here is that we can stay in Rust space for 1+2 and only have to move back into Python for step 3.

Copy link
Contributor

@Cryoris Cryoris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the improvement!

As discussed offline, let's include the suggested restructure from above in the Rust improvement to follow 🙂

@Cryoris Cryoris enabled auto-merge September 11, 2024 14:12
@Cryoris Cryoris added this pull request to the merge queue Sep 11, 2024
Merged via the queue into Qiskit:main with commit 81063a7 Sep 11, 2024
15 checks passed
@alexanderivrii alexanderivrii deleted the hls branch September 11, 2024 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: Bugfix Include in the "Fixed" section of the changelog
Projects
Status: done
Development

Successfully merging this pull request may close these issues.

5 participants