-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dedicated functions for memory marginalization #8051
Changes from 4 commits
d796ba3
ea86c35
f234efe
c4924f3
e8930c1
b088de1
a0ca25c
e709ce8
b718200
7c2be93
141ca7b
20b4c60
6bc9e6c
85cb99c
a9262c2
5c81510
7229634
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,6 +18,8 @@ | |
from collections import Counter | ||
from copy import deepcopy | ||
|
||
import numpy as np | ||
|
||
from qiskit.exceptions import QiskitError | ||
from qiskit.result.result import Result | ||
from qiskit.result.counts import Counts | ||
|
@@ -87,12 +89,9 @@ def marginal_counts( | |
sorted_indices = sorted( | ||
indices, reverse=True | ||
) # same convention as for the counts | ||
bit_strings = [_hex_to_bin(s) for s in experiment_result.data.memory] | ||
marginal_bit_strings = [ | ||
"".join([s[-idx - 1] for idx in sorted_indices if idx < len(s)]) or "0" | ||
for s in bit_strings | ||
] | ||
experiment_result.data.memory = [_bin_to_hex(s) for s in marginal_bit_strings] | ||
experiment_result.data.memory = results_rs.marginal_memory( | ||
experiment_result.data.memory, sorted_indices, return_hex=True | ||
) | ||
return result | ||
else: | ||
marg_counts = _marginalize(result, indices) | ||
|
@@ -127,14 +126,60 @@ def _adjust_creg_sizes(creg_sizes, indices): | |
return new_creg_sizes | ||
|
||
|
||
def marginal_memory( | ||
memory: List[str], | ||
indices: Optional[List[int]] = None, | ||
int_return: bool = False, | ||
hex_return: bool = False, | ||
parallel_threshold: int = 50, | ||
) -> Union[List[str], np.ndarray]: | ||
"""Marginalize memory | ||
|
||
This function is multithreaded and will launch a thread pool with threads equal to the number | ||
of CPUs by default. You can tune the number of threads with the ``RAYON_NUM_THREADS`` | ||
environment variable. For example, setting ``RAYON_NUM_THREADS=4`` would limit the thread pool | ||
to 4 threads. | ||
|
||
Args: | ||
memory: The input memory list, this is a list of hexadecimal strings to be marginalized | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this not cover IQ data or is that outside the scope of this PR? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wasn't factoring that in when I wrote this, but we should try to support that in this function. I can add it to this PR if you have an example for what the input and output look like with IQ data. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, here is what the input looks like for a single three-qubit circuit under meas level 1 and with 5 single-shots
you can see sth like this by running
If we are dealing with average IQ data then the input memory looks like so (again three qubits, one circuit, five shots but now they are averaged).
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added support for this in: e709ce8 I still need to add testing to cover all the paths. But let me know if that interface works for you. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I really don't like how it is implemented without explicit typing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That table is what I based e709ce8 but yeah the lack of explicit typing was annoying and why I needed an |
||
indices: The bit positions of interest to marginalize over. If | ||
``None`` (default), do not marginalize at all. | ||
int_return: If set to ``True`` the output will be a list of integers. | ||
By default the return type is a bit string. This and ``hex_return`` | ||
are mutually exclusive and can not be specified at the same time. | ||
hex_return: If set to ``True`` the output will be a list of hexadecimal | ||
strings. By default the return type is a bit string. This and | ||
``int_return`` are mutually exclusive and can not be specified | ||
at the same time. | ||
parallel_threshold: The number of elements in ``memory`` to start running in multiple | ||
threads. If ``len(memory)`` is >= this value this function will run in multiple | ||
mtreinish marked this conversation as resolved.
Show resolved
Hide resolved
|
||
threads. By default this is set to 50. | ||
|
||
Returns: | ||
marginal_memory: The list of marginalized memory | ||
|
||
Raises: | ||
ValueError: if both ``int_return`` and ``hex_return`` are set to ``True`` | ||
""" | ||
if int_return and hex_return: | ||
raise ValueError("Either int_return or hex_return can be specified but not both") | ||
return results_rs.marginal_memory( | ||
memory, | ||
indices, | ||
return_int=int_return, | ||
return_hex=hex_return, | ||
parallel_threshold=parallel_threshold, | ||
) | ||
|
||
|
||
def marginal_distribution( | ||
counts: dict, indices: Optional[List[int]] = None, format_marginal: bool = False | ||
) -> Dict[str, int]: | ||
"""Marginalize counts from an experiment over some indices of interest. | ||
|
||
Unlike :func:`~.marginal_counts` this function respects the order of | ||
the input ``indices``. If the input ``indices`` list is specified, the order | ||
the bit indices will be the output order of the bitstrings | ||
the input ``indices``. If the input ``indices`` list is specified then the order | ||
the bit indices are specified will be the output order of the bitstrings | ||
in the marginalized output. | ||
|
||
Args: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
// This code is part of Qiskit. | ||
// | ||
// (C) Copyright IBM 2022 | ||
// | ||
// This code is licensed under the Apache License, Version 2.0. You may | ||
// obtain a copy of this license in the LICENSE.txt file in the root directory | ||
// of this source tree or at http://www.apache.org/licenses/LICENSE-2.0. | ||
// | ||
// Any modifications or derivative works of this code must retain this | ||
// copyright notice, and modified files need to carry a notice indicating | ||
// that they have been altered from the originals. | ||
|
||
#[inline] | ||
pub fn hex_char_to_bin(c: char) -> &'static str { | ||
match c { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is fine, but it might be a really fun opportunity to write a constant expression LUT generator function :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I leveraged lazy_static to generate a static lookup table in: 5c81510 I need to benchmark it and look into expanding it to support larger chunks. But this might be a good enough start as it should eliminate most of the runtime branching. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried playing with adding chunking in groups of 4 and it was slower than doing this. I think this is coming down to needing to do intermediate allocations in how I could get it to work. At this point I think just doing a single element lookup table is probably sufficient. If we end up hitting bottlenecks down the road I feel like we can revisit this easily enough as it's not a big deal to improve the internal implementation down the road. For benchmarking I compared prior to 5c81510 with 5c81510 and my local chunked implementation using: import time
import random
from qiskit.result import marginal_memory
random.seed(42)
memory = [hex(random.randint(0, 4096)) for _ in range(500000)]
start = time.perf_counter()
res = marginal_memory(memory, indices=[0, 3, 5, 9])
stop = time.perf_counter()
print(stop - start) The geometric mean of each implementation over 10 trials each was:
|
||
'0' => "0000", | ||
'1' => "0001", | ||
'2' => "0010", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we cover cases where we are, e.g., working with the third level of the Transmon? I.e. we have states 0, 1 and 2? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is outside the scope of this PR (though I like the idea). Because current result object doesn't define basis in metadata, i.e. it always assumes binary, it is difficult to select proper output basis (binary or ternary) only with input hex numbers. Having There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can save this for a follow on, I think having the marginization functions work with this would be useful but there is probably enough to this PR just working with binary to start. |
||
'3' => "0011", | ||
'4' => "0100", | ||
'5' => "0101", | ||
'6' => "0110", | ||
'7' => "0111", | ||
'8' => "1000", | ||
'9' => "1001", | ||
'A' => "1010", | ||
'B' => "1011", | ||
'C' => "1100", | ||
'D' => "1101", | ||
'E' => "1110", | ||
'F' => "1111", | ||
'a' => "1010", | ||
'b' => "1011", | ||
'c' => "1100", | ||
'd' => "1101", | ||
'e' => "1110", | ||
'f' => "1111", | ||
_ => "", | ||
} | ||
} | ||
|
||
#[inline] | ||
pub fn hex_to_bin(hex: &str) -> String { | ||
hex[2..].chars().map(hex_char_to_bin).collect() | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,8 +10,12 @@ | |
// copyright notice, and modified files need to carry a notice indicating | ||
// that they have been altered from the originals. | ||
|
||
use super::converters::hex_to_bin; | ||
use crate::getenv_use_multiple_threads; | ||
use hashbrown::HashMap; | ||
use num_bigint::BigUint; | ||
use pyo3::prelude::*; | ||
use rayon::prelude::*; | ||
|
||
fn marginalize<T: std::ops::AddAssign + Copy>( | ||
counts: HashMap<String, T>, | ||
|
@@ -68,3 +72,83 @@ pub fn marginal_distribution( | |
) -> HashMap<String, f64> { | ||
marginalize(counts, indices) | ||
} | ||
|
||
#[inline] | ||
fn map_memory( | ||
hexstring: &str, | ||
indices: &Option<Vec<usize>>, | ||
clbit_size: usize, | ||
return_hex: bool, | ||
) -> String { | ||
let out = match indices { | ||
Some(indices) => { | ||
let bitstring = hex_to_bin(hexstring); | ||
let bit_array = bitstring.as_bytes(); | ||
indices | ||
.iter() | ||
.map(|bit| { | ||
let index = clbit_size - *bit - 1; | ||
match bit_array.get(index) { | ||
Some(bit) => *bit as char, | ||
None => '0', | ||
} | ||
}) | ||
.rev() | ||
.collect() | ||
} | ||
None => hex_to_bin(hexstring), | ||
}; | ||
if return_hex { | ||
format!("0x{:x}", BigUint::parse_bytes(out.as_bytes(), 2).unwrap()) | ||
} else { | ||
out | ||
} | ||
} | ||
|
||
#[pyfunction(return_int = "false", return_hex = "false", parallel_threshold = "50")] | ||
pub fn marginal_memory( | ||
py: Python, | ||
memory: Vec<String>, | ||
indices: Option<Vec<usize>>, | ||
return_int: bool, | ||
return_hex: bool, | ||
parallel_threshold: usize, | ||
) -> PyResult<PyObject> { | ||
let run_in_parallel = getenv_use_multiple_threads(); | ||
let first_elem = memory.get(0); | ||
if first_elem.is_none() { | ||
let res: Vec<String> = Vec::new(); | ||
return Ok(res.to_object(py)); | ||
} | ||
|
||
let clbit_size = hex_to_bin(first_elem.unwrap()).len(); | ||
|
||
let out_mem: Vec<String> = if memory.len() < parallel_threshold || !run_in_parallel { | ||
memory | ||
.iter() | ||
.map(|x| map_memory(x, &indices, clbit_size, return_hex)) | ||
.collect() | ||
} else { | ||
memory | ||
.par_iter() | ||
.map(|x| map_memory(x, &indices, clbit_size, return_hex)) | ||
.collect() | ||
}; | ||
if return_int { | ||
if out_mem.len() < parallel_threshold || !run_in_parallel { | ||
Ok(out_mem | ||
.iter() | ||
.map(|x| BigUint::parse_bytes(x.as_bytes(), 2).unwrap()) | ||
.collect::<Vec<BigUint>>() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yay turbo fish! |
||
.to_object(py)) | ||
} else { | ||
Ok(out_mem | ||
.par_iter() | ||
.map(|x| BigUint::parse_bytes(x.as_bytes(), 2).unwrap()) | ||
.collect::<Vec<BigUint>>() | ||
.to_object(py)) | ||
} | ||
} else { | ||
Ok(out_mem.to_object(py)) | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
# This code is part of Qiskit. | ||
# | ||
# (C) Copyright IBM 2017, 2019. | ||
# | ||
# This code is licensed under the Apache License, Version 2.0. You may | ||
# obtain a copy of this license in the LICENSE.txt file in the root directory | ||
# of this source tree or at http://www.apache.org/licenses/LICENSE-2.0. | ||
# | ||
# Any modifications or derivative works of this code must retain this | ||
# copyright notice, and modified files need to carry a notice indicating | ||
# that they have been altered from the originals. | ||
|
||
"""Test marginal_memory() function.""" | ||
|
||
from qiskit.test import QiskitTestCase | ||
from qiskit.result import marginal_memory | ||
|
||
|
||
class TestMarginalMemory(QiskitTestCase): | ||
"""Result operations methods.""" | ||
|
||
def test_marginalize_memory(self): | ||
"""Test that memory marginalizes correctly.""" | ||
memory = [hex(ii) for ii in range(8)] | ||
res = marginal_memory(memory, indices=[0]) | ||
self.assertEqual(res, [bin(ii % 2)[2:] for ii in range(8)]) | ||
|
||
def test_marginalize_memory_int(self): | ||
"""Test that memory marginalizes correctly int output.""" | ||
memory = [hex(ii) for ii in range(8)] | ||
res = marginal_memory(memory, indices=[0], int_return=True) | ||
self.assertEqual(res, [ii % 2 for ii in range(8)]) | ||
|
||
def test_marginalize_memory_hex(self): | ||
"""Test that memory marginalizes correctly hex output.""" | ||
memory = [hex(ii) for ii in range(8)] | ||
res = marginal_memory(memory, indices=[0], hex_return=True) | ||
self.assertEqual(res, [hex(ii % 2) for ii in range(8)]) | ||
|
||
def test_marginal_counts_result_memory_indices_None(self): | ||
"""Test that a memory marginalizes correctly with indices=None.""" | ||
memory = [hex(ii) for ii in range(8)] | ||
res = marginal_memory(memory, hex_return=True) | ||
self.assertEqual(res, memory) | ||
|
||
def test_marginalize_memory_in_parallel(self): | ||
"""Test that memory marginalizes correctly multithreaded.""" | ||
memory = [hex(ii) for ii in range(15)] | ||
res = marginal_memory(memory, indices=[0], parallel_threshold=1) | ||
self.assertEqual(res, [bin(ii % 2)[2:] for ii in range(15)]) | ||
|
||
def test_error_on_multiple_return_types(self): | ||
"""Test that ValueError raised if multiple return types are requested.""" | ||
with self.assertRaises(ValueError): | ||
marginal_memory([], int_return=True, hex_return=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a curiosity. Is the list of integer more efficient in memory footprint than binary ndarray? Given we use memory information to run restless analysis in qiskit experiment, it should take memory efficient representation to run a parallel experiment in 100Q+ device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean like storing the shot memory as a 2d array where each row has n elements for each bit or something else? The list of ints here will be more memory efficient than that in the rust side because I'm using a
Vec<BigUint>
(whch is just a Vec of digits internally) and it will not be fixed with for each shot. The python side I expect would be similar since the Python integer class is very similar to BigUint (a byte array of digits). (although list isn't necessarily as contiguous as aVec<T>
/ndarray
). I think it would be best to test this though to be sure and settle on a common way to represent large results values in a non-string type.As an aside I only used a list here because numpy doesn't have support for arbitrary large integers (outside of using a
object
dtype, which ends up just being a pointer to the python heap, for python ints) and I was worried about theThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Sounds like current implementation is reasonable (I just worried about storing 2**100 for "10000...0", in binary array it's just 100 binary element).