Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dedicated functions for memory marginalization #8051

Merged
merged 17 commits into from
Jun 22, 2022
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,11 @@ rand_distr = "0.4.3"
indexmap = "1.8.1"
ahash = "0.7.6"
num-complex = "0.4"
num-bigint = "0.4"

[dependencies.pyo3]
version = "0.16.5"
features = ["extension-module", "hashbrown", "num-complex"]
features = ["extension-module", "hashbrown", "num-complex", "num-bigint"]

[dependencies.ndarray]
version = "^0.15.0"
Expand Down
2 changes: 2 additions & 0 deletions qiskit/result/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
Counts
marginal_counts
marginal_distribution
marginal_memory

Distributions
=============
Expand All @@ -50,6 +51,7 @@
from .exceptions import ResultError
from .utils import marginal_counts
from .utils import marginal_distribution
from .utils import marginal_memory
from .counts import Counts

from .distributions.probability import ProbDistribution
Expand Down
61 changes: 53 additions & 8 deletions qiskit/result/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@
from collections import Counter
from copy import deepcopy

import numpy as np

from qiskit.exceptions import QiskitError
from qiskit.result.result import Result
from qiskit.result.counts import Counts
Expand Down Expand Up @@ -87,12 +89,9 @@ def marginal_counts(
sorted_indices = sorted(
indices, reverse=True
) # same convention as for the counts
bit_strings = [_hex_to_bin(s) for s in experiment_result.data.memory]
marginal_bit_strings = [
"".join([s[-idx - 1] for idx in sorted_indices if idx < len(s)]) or "0"
for s in bit_strings
]
experiment_result.data.memory = [_bin_to_hex(s) for s in marginal_bit_strings]
experiment_result.data.memory = results_rs.marginal_memory(
experiment_result.data.memory, sorted_indices, return_hex=True
)
return result
else:
marg_counts = _marginalize(result, indices)
Expand Down Expand Up @@ -127,14 +126,60 @@ def _adjust_creg_sizes(creg_sizes, indices):
return new_creg_sizes


def marginal_memory(
memory: List[str],
indices: Optional[List[int]] = None,
int_return: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a curiosity. Is the list of integer more efficient in memory footprint than binary ndarray? Given we use memory information to run restless analysis in qiskit experiment, it should take memory efficient representation to run a parallel experiment in 100Q+ device.

Copy link
Member Author

@mtreinish mtreinish May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean like storing the shot memory as a 2d array where each row has n elements for each bit or something else? The list of ints here will be more memory efficient than that in the rust side because I'm using a Vec<BigUint> (whch is just a Vec of digits internally) and it will not be fixed with for each shot. The python side I expect would be similar since the Python integer class is very similar to BigUint (a byte array of digits). (although list isn't necessarily as contiguous as a Vec<T>/ndarray). I think it would be best to test this though to be sure and settle on a common way to represent large results values in a non-string type.

As an aside I only used a list here because numpy doesn't have support for arbitrary large integers (outside of using a object dtype, which ends up just being a pointer to the python heap, for python ints) and I was worried about the

Copy link
Contributor

@nkanazawa1989 nkanazawa1989 May 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Sounds like current implementation is reasonable (I just worried about storing 2**100 for "10000...0", in binary array it's just 100 binary element).

hex_return: bool = False,
parallel_threshold: int = 50,
) -> Union[List[str], np.ndarray]:
"""Marginalize memory

This function is multithreaded and will launch a thread pool with threads equal to the number
of CPUs by default. You can tune the number of threads with the ``RAYON_NUM_THREADS``
environment variable. For example, setting ``RAYON_NUM_THREADS=4`` would limit the thread pool
to 4 threads.

Args:
memory: The input memory list, this is a list of hexadecimal strings to be marginalized
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not cover IQ data or is that outside the scope of this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't factoring that in when I wrote this, but we should try to support that in this function. I can add it to this PR if you have an example for what the input and output look like with IQ data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, here is what the input looks like for a single three-qubit circuit under meas level 1 and with 5 single-shots

memory=[
    # qubit 0                    qubit 1                     qubit 2
    [[-12974255.0, -28106672.0], [ 15848939.0, -53271096.0], [-18731048.0, -56490604.0]],  #shot 1
    [[-18346508.0, -26587824.0], [-12065728.0, -44948360.0], [14035275.0, -65373000.0]],  # shot 2
    [[ 12802274.0, -20436864.0], [-15967512.0, -37575556.0], [15201290.0, -65182832.0]],   # ...
    [[ -9187660.0, -22197716.0], [-17028016.0, -49578552.0], [13526576.0, -61017756.0]], 
    [[  7006214.0, -32555228.0], [ 16144743.0, -33563124.0], [-23524160.0, -66919196.0]]
]

you can see sth like this by running job_1ts = backend.run(circ, meas_level=1, memory=True, meas_return="single", shots=5). If I want to marginalize over some of the qubits then I need to remove their slots. For instance keeping qubits 0 and 2 would result in

memory=[
    [[-12974255.0, -28106672.0], [-18731048.0, -56490604.0]],  #shot 1
    [[-18346508.0, -26587824.0], [14035275.0, -65373000.0]],  # shot 2
    [[ 12802274.0, -20436864.0], [15201290.0, -65182832.0]],   # ...
    [[ -9187660.0, -22197716.0], [13526576.0, -61017756.0]], 
    [[  7006214.0, -32555228.0], [-23524160.0, -66919196.0]]
]

If we are dealing with average IQ data then the input memory looks like so (again three qubits, one circuit, five shots but now they are averaged).

memory=[[-1059254.375, -26266612.0], [-9012669.0, -41877468.0], [6027076.0, -54875060.0]]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added support for this in: e709ce8 I still need to add testing to cover all the paths. But let me know if that interface works for you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That table is what I based e709ce8 but yeah the lack of explicit typing was annoying and why I needed an avg_data kwarg to differentiate between single level 1 and avg level 0 because I couldn't figure out a way to reliably detect the difference without an explicit input type. I was trying to make this function work independently of the Results object which is the only place I think that metadata would be stored.

indices: The bit positions of interest to marginalize over. If
``None`` (default), do not marginalize at all.
int_return: If set to ``True`` the output will be a list of integers.
By default the return type is a bit string. This and ``hex_return``
are mutually exclusive and can not be specified at the same time.
hex_return: If set to ``True`` the output will be a list of hexadecimal
strings. By default the return type is a bit string. This and
``int_return`` are mutually exclusive and can not be specified
at the same time.
parallel_threshold: The number of elements in ``memory`` to start running in multiple
threads. If ``len(memory)`` is >= this value this function will run in multiple
mtreinish marked this conversation as resolved.
Show resolved Hide resolved
threads. By default this is set to 50.

Returns:
marginal_memory: The list of marginalized memory

Raises:
ValueError: if both ``int_return`` and ``hex_return`` are set to ``True``
"""
if int_return and hex_return:
raise ValueError("Either int_return or hex_return can be specified but not both")
return results_rs.marginal_memory(
memory,
indices,
return_int=int_return,
return_hex=hex_return,
parallel_threshold=parallel_threshold,
)


def marginal_distribution(
counts: dict, indices: Optional[List[int]] = None, format_marginal: bool = False
) -> Dict[str, int]:
"""Marginalize counts from an experiment over some indices of interest.

Unlike :func:`~.marginal_counts` this function respects the order of
the input ``indices``. If the input ``indices`` list is specified, the order
the bit indices will be the output order of the bitstrings
the input ``indices``. If the input ``indices`` list is specified then the order
the bit indices are specified will be the output order of the bitstrings
in the marginalized output.

Args:
Expand Down
45 changes: 45 additions & 0 deletions src/results/converters.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
// This code is part of Qiskit.
//
// (C) Copyright IBM 2022
//
// This code is licensed under the Apache License, Version 2.0. You may
// obtain a copy of this license in the LICENSE.txt file in the root directory
// of this source tree or at http://www.apache.org/licenses/LICENSE-2.0.
//
// Any modifications or derivative works of this code must retain this
// copyright notice, and modified files need to carry a notice indicating
// that they have been altered from the originals.

#[inline]
pub fn hex_char_to_bin(c: char) -> &'static str {
match c {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but it might be a really fun opportunity to write a constant expression LUT generator function :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leveraged lazy_static to generate a static lookup table in: 5c81510 I need to benchmark it and look into expanding it to support larger chunks. But this might be a good enough start as it should eliminate most of the runtime branching.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried playing with adding chunking in groups of 4 and it was slower than doing this. I think this is coming down to needing to do intermediate allocations in how I could get it to work. At this point I think just doing a single element lookup table is probably sufficient. If we end up hitting bottlenecks down the road I feel like we can revisit this easily enough as it's not a big deal to improve the internal implementation down the road.

For benchmarking I compared prior to 5c81510 with 5c81510 and my local chunked implementation using:

import time
import random
from qiskit.result import marginal_memory

random.seed(42)

memory = [hex(random.randint(0, 4096)) for _ in range(500000)]
start = time.perf_counter()
res = marginal_memory(memory, indices=[0, 3, 5, 9])
stop = time.perf_counter()
print(stop - start)

The geometric mean of each implementation over 10 trials each was:

match: 0.08678476060453334
LUT: 0.08359493472968436
Chunked LUT: 0.10288708564573844

'0' => "0000",
'1' => "0001",
'2' => "0010",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we cover cases where we are, e.g., working with the third level of the Transmon? I.e. we have states 0, 1 and 2?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is outside the scope of this PR (though I like the idea). Because current result object doesn't define basis in metadata, i.e. it always assumes binary, it is difficult to select proper output basis (binary or ternary) only with input hex numbers. Having basis argument in marginal function seems to me an overkill. But we can implement such ternary memory in experiment data processor, where we always need marginalization of IQ numbers to run custom ternary discriminator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can save this for a follow on, I think having the marginization functions work with this would be useful but there is probably enough to this PR just working with binary to start.

'3' => "0011",
'4' => "0100",
'5' => "0101",
'6' => "0110",
'7' => "0111",
'8' => "1000",
'9' => "1001",
'A' => "1010",
'B' => "1011",
'C' => "1100",
'D' => "1101",
'E' => "1110",
'F' => "1111",
'a' => "1010",
'b' => "1011",
'c' => "1100",
'd' => "1101",
'e' => "1110",
'f' => "1111",
_ => "",
}
}

#[inline]
pub fn hex_to_bin(hex: &str) -> String {
hex[2..].chars().map(hex_char_to_bin).collect()
}
84 changes: 84 additions & 0 deletions src/results/marginalization.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,12 @@
// copyright notice, and modified files need to carry a notice indicating
// that they have been altered from the originals.

use super::converters::hex_to_bin;
use crate::getenv_use_multiple_threads;
use hashbrown::HashMap;
use num_bigint::BigUint;
use pyo3::prelude::*;
use rayon::prelude::*;

fn marginalize<T: std::ops::AddAssign + Copy>(
counts: HashMap<String, T>,
Expand Down Expand Up @@ -68,3 +72,83 @@ pub fn marginal_distribution(
) -> HashMap<String, f64> {
marginalize(counts, indices)
}

#[inline]
fn map_memory(
hexstring: &str,
indices: &Option<Vec<usize>>,
clbit_size: usize,
return_hex: bool,
) -> String {
let out = match indices {
Some(indices) => {
let bitstring = hex_to_bin(hexstring);
let bit_array = bitstring.as_bytes();
indices
.iter()
.map(|bit| {
let index = clbit_size - *bit - 1;
match bit_array.get(index) {
Some(bit) => *bit as char,
None => '0',
}
})
.rev()
.collect()
}
None => hex_to_bin(hexstring),
};
if return_hex {
format!("0x{:x}", BigUint::parse_bytes(out.as_bytes(), 2).unwrap())
} else {
out
}
}

#[pyfunction(return_int = "false", return_hex = "false", parallel_threshold = "50")]
pub fn marginal_memory(
py: Python,
memory: Vec<String>,
indices: Option<Vec<usize>>,
return_int: bool,
return_hex: bool,
parallel_threshold: usize,
) -> PyResult<PyObject> {
let run_in_parallel = getenv_use_multiple_threads();
let first_elem = memory.get(0);
if first_elem.is_none() {
let res: Vec<String> = Vec::new();
return Ok(res.to_object(py));
}

let clbit_size = hex_to_bin(first_elem.unwrap()).len();

let out_mem: Vec<String> = if memory.len() < parallel_threshold || !run_in_parallel {
memory
.iter()
.map(|x| map_memory(x, &indices, clbit_size, return_hex))
.collect()
} else {
memory
.par_iter()
.map(|x| map_memory(x, &indices, clbit_size, return_hex))
.collect()
};
if return_int {
if out_mem.len() < parallel_threshold || !run_in_parallel {
Ok(out_mem
.iter()
.map(|x| BigUint::parse_bytes(x.as_bytes(), 2).unwrap())
.collect::<Vec<BigUint>>()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay turbo fish! ::<<>>

.to_object(py))
} else {
Ok(out_mem
.par_iter()
.map(|x| BigUint::parse_bytes(x.as_bytes(), 2).unwrap())
.collect::<Vec<BigUint>>()
.to_object(py))
}
} else {
Ok(out_mem.to_object(py))
}
}
2 changes: 2 additions & 0 deletions src/results/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
// copyright notice, and modified files need to carry a notice indicating
// that they have been altered from the originals.

pub mod converters;
pub mod marginalization;

use pyo3::prelude::*;
Expand All @@ -19,5 +20,6 @@ use pyo3::wrap_pyfunction;
pub fn results(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_wrapped(wrap_pyfunction!(marginalization::marginal_counts))?;
m.add_wrapped(wrap_pyfunction!(marginalization::marginal_distribution))?;
m.add_wrapped(wrap_pyfunction!(marginalization::marginal_memory))?;
Ok(())
}
55 changes: 55 additions & 0 deletions test/python/result/test_memory_marginalization.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# This code is part of Qiskit.
#
# (C) Copyright IBM 2017, 2019.
#
# This code is licensed under the Apache License, Version 2.0. You may
# obtain a copy of this license in the LICENSE.txt file in the root directory
# of this source tree or at http://www.apache.org/licenses/LICENSE-2.0.
#
# Any modifications or derivative works of this code must retain this
# copyright notice, and modified files need to carry a notice indicating
# that they have been altered from the originals.

"""Test marginal_memory() function."""

from qiskit.test import QiskitTestCase
from qiskit.result import marginal_memory


class TestMarginalMemory(QiskitTestCase):
"""Result operations methods."""

def test_marginalize_memory(self):
"""Test that memory marginalizes correctly."""
memory = [hex(ii) for ii in range(8)]
res = marginal_memory(memory, indices=[0])
self.assertEqual(res, [bin(ii % 2)[2:] for ii in range(8)])

def test_marginalize_memory_int(self):
"""Test that memory marginalizes correctly int output."""
memory = [hex(ii) for ii in range(8)]
res = marginal_memory(memory, indices=[0], int_return=True)
self.assertEqual(res, [ii % 2 for ii in range(8)])

def test_marginalize_memory_hex(self):
"""Test that memory marginalizes correctly hex output."""
memory = [hex(ii) for ii in range(8)]
res = marginal_memory(memory, indices=[0], hex_return=True)
self.assertEqual(res, [hex(ii % 2) for ii in range(8)])

def test_marginal_counts_result_memory_indices_None(self):
"""Test that a memory marginalizes correctly with indices=None."""
memory = [hex(ii) for ii in range(8)]
res = marginal_memory(memory, hex_return=True)
self.assertEqual(res, memory)

def test_marginalize_memory_in_parallel(self):
"""Test that memory marginalizes correctly multithreaded."""
memory = [hex(ii) for ii in range(15)]
res = marginal_memory(memory, indices=[0], parallel_threshold=1)
self.assertEqual(res, [bin(ii % 2)[2:] for ii in range(15)])

def test_error_on_multiple_return_types(self):
"""Test that ValueError raised if multiple return types are requested."""
with self.assertRaises(ValueError):
marginal_memory([], int_return=True, hex_return=True)