Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dedicated functions for memory marginalization #8051

Merged
merged 17 commits into from
Jun 22, 2022

Conversation

mtreinish
Copy link
Member

@mtreinish mtreinish commented May 11, 2022

Summary

This commit adds dedicated functions for memory marginalization.
Previously, the marginal_counts() function had support for marginalizing
memory in a Results object, but this can be inefficient especially if
your memory list is outside a Results object. The new functions added in
this commit are implemented in Rust and multithreaded. Additionally the
marginal_counts() function is updated to use the same inner Rust
functions.

Details and comments

TODO:

@mtreinish mtreinish added the on hold Can not fix yet label May 11, 2022
@mtreinish mtreinish requested a review from a team as a code owner May 11, 2022 15:40
@qiskit-bot
Copy link
Collaborator

Thank you for opening a new pull request.

Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient.

While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone.

One or more of the the following people are requested to review this:

@mtreinish mtreinish added the Changelog: New Feature Include in the "Added" section of the changelog label May 11, 2022
@mtreinish mtreinish changed the title Add dedicated functions for memory marginalization [WIP] Add dedicated functions for memory marginalization May 11, 2022
@jakelishman jakelishman added the Rust This PR or issue is related to Rust code in the repository label May 11, 2022
@coveralls
Copy link

coveralls commented May 11, 2022

Pull Request Test Coverage Report for Build 2543893476

  • 20 of 24 (83.33%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.001%) to 84.288%

Changes Missing Coverage Covered Lines Changed/Added Lines %
qiskit/result/utils.py 19 23 82.61%
Totals Coverage Status
Change from base Build 2543833521: -0.001%
Covered Lines: 54987
Relevant Lines: 65237

💛 - Coveralls

to 4 threads.

Args:
memory: The input memory list, this is a list of hexadecimal strings to be marginalized
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not cover IQ data or is that outside the scope of this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't factoring that in when I wrote this, but we should try to support that in this function. I can add it to this PR if you have an example for what the input and output look like with IQ data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, here is what the input looks like for a single three-qubit circuit under meas level 1 and with 5 single-shots

memory=[
    # qubit 0                    qubit 1                     qubit 2
    [[-12974255.0, -28106672.0], [ 15848939.0, -53271096.0], [-18731048.0, -56490604.0]],  #shot 1
    [[-18346508.0, -26587824.0], [-12065728.0, -44948360.0], [14035275.0, -65373000.0]],  # shot 2
    [[ 12802274.0, -20436864.0], [-15967512.0, -37575556.0], [15201290.0, -65182832.0]],   # ...
    [[ -9187660.0, -22197716.0], [-17028016.0, -49578552.0], [13526576.0, -61017756.0]], 
    [[  7006214.0, -32555228.0], [ 16144743.0, -33563124.0], [-23524160.0, -66919196.0]]
]

you can see sth like this by running job_1ts = backend.run(circ, meas_level=1, memory=True, meas_return="single", shots=5). If I want to marginalize over some of the qubits then I need to remove their slots. For instance keeping qubits 0 and 2 would result in

memory=[
    [[-12974255.0, -28106672.0], [-18731048.0, -56490604.0]],  #shot 1
    [[-18346508.0, -26587824.0], [14035275.0, -65373000.0]],  # shot 2
    [[ 12802274.0, -20436864.0], [15201290.0, -65182832.0]],   # ...
    [[ -9187660.0, -22197716.0], [13526576.0, -61017756.0]], 
    [[  7006214.0, -32555228.0], [-23524160.0, -66919196.0]]
]

If we are dealing with average IQ data then the input memory looks like so (again three qubits, one circuit, five shots but now they are averaged).

memory=[[-1059254.375, -26266612.0], [-9012669.0, -41877468.0], [6027076.0, -54875060.0]]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added support for this in: e709ce8 I still need to add testing to cover all the paths. But let me know if that interface works for you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That table is what I based e709ce8 but yeah the lack of explicit typing was annoying and why I needed an avg_data kwarg to differentiate between single level 1 and avg level 0 because I couldn't figure out a way to reliably detect the difference without an explicit input type. I was trying to make this function work independently of the Results object which is the only place I think that metadata would be stored.

qiskit/result/utils.py Outdated Show resolved Hide resolved
qiskit/result/utils.py Outdated Show resolved Hide resolved
qiskit/result/utils.py Outdated Show resolved Hide resolved
Comment on lines 13 to 18
#[inline]
pub fn hex_char_to_bin(c: char) -> &'static str {
match c {
'0' => "0000",
'1' => "0001",
'2' => "0010",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we cover cases where we are, e.g., working with the third level of the Transmon? I.e. we have states 0, 1 and 2?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is outside the scope of this PR (though I like the idea). Because current result object doesn't define basis in metadata, i.e. it always assumes binary, it is difficult to select proper output basis (binary or ternary) only with input hex numbers. Having basis argument in marginal function seems to me an overkill. But we can implement such ternary memory in experiment data processor, where we always need marginalization of IQ numbers to run custom ternary discriminator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can save this for a follow on, I think having the marginization functions work with this would be useful but there is probably enough to this PR just working with binary to start.

mtreinish and others added 4 commits May 16, 2022 22:07
This commit adds dedicated functions for memory marginalization.
Previously, the marginal_counts() function had support for marginalizing
memory in a Results object, but this  can be inefficient especially if
your memory list is outside a Results object. The new functions added in
this commit are implemented in Rust and multithreaded. Additionally the
marginal_counts() function is updated to use the same inner Rust
functions.
Co-authored-by: Daniel J. Egger <[email protected]>
@mtreinish
Copy link
Member Author

I spent some time to find the default value for the parallel threshold. I ran different number of shots in parallel and serial to find where parallel is faster. Based on these graphs I'm going to increase the parallel threshold to 1000.

marginal_mem_hex_return
marginal_mem_int_return
marginal_mem
(note these are all log log plots)

Which were generated by plotting:

random.seed(412123)

size = []
serial_times = []
parallel_times = []
for i in np.logspace(1, 8, dtype=int):
    memory = [
        hex(random.randint(0, 999999999999999999999999999999999999999999999999999900000))
        for _ in range(i)
    ]
    size.append(i)
    print(i)
    start = time.perf_counter()
    res = marginal_memory(memory, indices=[0,3,6,7], parallel_threshold=i-2)
    stop = time.perf_counter()
    parallel_times.append(stop - start)
    start = time.perf_counter()
    res = marginal_memory(memory, indices=[0,3,6,7], parallel_threshold=i+2)
    stop = time.perf_counter()
    serial_times.append(stop - start)

def marginal_memory(
memory: List[str],
indices: Optional[List[int]] = None,
int_return: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a curiosity. Is the list of integer more efficient in memory footprint than binary ndarray? Given we use memory information to run restless analysis in qiskit experiment, it should take memory efficient representation to run a parallel experiment in 100Q+ device.

Copy link
Member Author

@mtreinish mtreinish May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean like storing the shot memory as a 2d array where each row has n elements for each bit or something else? The list of ints here will be more memory efficient than that in the rust side because I'm using a Vec<BigUint> (whch is just a Vec of digits internally) and it will not be fixed with for each shot. The python side I expect would be similar since the Python integer class is very similar to BigUint (a byte array of digits). (although list isn't necessarily as contiguous as a Vec<T>/ndarray). I think it would be best to test this though to be sure and settle on a common way to represent large results values in a non-string type.

As an aside I only used a list here because numpy doesn't have support for arbitrary large integers (outside of using a object dtype, which ends up just being a pointer to the python heap, for python ints) and I was worried about the

Copy link
Contributor

@nkanazawa1989 nkanazawa1989 May 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Sounds like current implementation is reasonable (I just worried about storing 2**100 for "10000...0", in binary array it's just 100 binary element).

@mtreinish mtreinish changed the title [WIP] Add dedicated functions for memory marginalization Add dedicated functions for memory marginalization May 31, 2022
@mtreinish mtreinish removed the on hold Can not fix yet label May 31, 2022
@mtreinish mtreinish added this to the 0.21 milestone Jun 6, 2022
@kevinhartman kevinhartman self-assigned this Jun 21, 2022
Copy link
Contributor

@kevinhartman kevinhartman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Will approve once comments are addressed.

qiskit/result/utils.py Outdated Show resolved Hide resolved
qiskit/result/utils.py Outdated Show resolved Hide resolved

#[inline]
pub fn hex_char_to_bin(c: char) -> &'static str {
match c {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but it might be a really fun opportunity to write a constant expression LUT generator function :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leveraged lazy_static to generate a static lookup table in: 5c81510 I need to benchmark it and look into expanding it to support larger chunks. But this might be a good enough start as it should eliminate most of the runtime branching.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried playing with adding chunking in groups of 4 and it was slower than doing this. I think this is coming down to needing to do intermediate allocations in how I could get it to work. At this point I think just doing a single element lookup table is probably sufficient. If we end up hitting bottlenecks down the road I feel like we can revisit this easily enough as it's not a big deal to improve the internal implementation down the road.

For benchmarking I compared prior to 5c81510 with 5c81510 and my local chunked implementation using:

import time
import random
from qiskit.result import marginal_memory

random.seed(42)

memory = [hex(random.randint(0, 4096)) for _ in range(500000)]
start = time.perf_counter()
res = marginal_memory(memory, indices=[0, 3, 5, 9])
stop = time.perf_counter()
print(stop - start)

The geometric mean of each implementation over 10 trials each was:

match: 0.08678476060453334
LUT: 0.08359493472968436
Chunked LUT: 0.10288708564573844

Ok(out_mem
.iter()
.map(|x| BigUint::parse_bytes(x.as_bytes(), 2).unwrap())
.collect::<Vec<BigUint>>()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay turbo fish! ::<<>>

src/results/marginalization.rs Show resolved Hide resolved
@mtreinish mtreinish requested a review from kevinhartman June 22, 2022 16:18
@mergify mergify bot merged commit 8ee4ac8 into Qiskit:main Jun 22, 2022
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Jun 22, 2022
In the recently merged Qiskit#8051 we create a lookup table in Rust to speed
up the hex->bin conversion used internally as part of the
marginal_memory() function. This was previously done using the
lazy_static crate which is used to lazily evaluate dynamic code to
create a static at runtime on the first access. The typical use case for
this is to create a static Vec or HashMap. However for the
marginal_counts() usage we didn't need to do this because we were
creating a fixed size array so the static can be evaulated at compile
time assuming the array is constructed with a const function. This
commit removes the lazy_static usage and switches to a true static to
further improve the performance of the lookup table by avoiding the
construction overhead.
mergify bot added a commit that referenced this pull request Jun 23, 2022
#8223)

* Refactor marginal_memory() hex to bin lookup table to be a true static

In the recently merged #8051 we create a lookup table in Rust to speed
up the hex->bin conversion used internally as part of the
marginal_memory() function. This was previously done using the
lazy_static crate which is used to lazily evaluate dynamic code to
create a static at runtime on the first access. The typical use case for
this is to create a static Vec or HashMap. However for the
marginal_counts() usage we didn't need to do this because we were
creating a fixed size array so the static can be evaulated at compile
time assuming the array is constructed with a const function. This
commit removes the lazy_static usage and switches to a true static to
further improve the performance of the lookup table by avoiding the
construction overhead.

* Reduce number of empty entries in LUT

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
@mtreinish mtreinish deleted the marginal-memory branch November 16, 2022 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: New Feature Include in the "Added" section of the changelog Rust This PR or issue is related to Rust code in the repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants