Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type of strings and bytes inconsistent iterating manually or in .tolist() #43

Closed
lgray opened this issue Jan 5, 2020 · 1 comment
Closed
Labels
bug The problem described is something that must be fixed

Comments

@lgray
Copy link
Contributor

lgray commented Jan 5, 2020

check = ak.Array([{"x": "b"}, {"x": "b"}, {"x": "a"}, {"x": "b"}])
print(check.layout)

print(ak.tolist(check))
print(ak.tolist(check.layout))

results in:

<RecordArray>
    <type>{"x": string}</type>
    <field index="0" key="x">
        <ListOffsetArray64>
            <type>string</type>
            <offsets><Index64 i="[0 1 2 3 4]" offset="0" at="0x7fb342145600"/></offsets>
            <content><NumpyArray format="B" shape="4" data="0x 62626162" at="0x7fb342034800">
                <type>utf8</type>
            </NumpyArray></content>
        </ListOffsetArray64>
    </field>
</RecordArray>
[{'x': [98]}, {'x': [98]}, {'x': [97]}, {'x': [98]}]
[{'x': [98]}, {'x': [98]}, {'x': [97]}, {'x': [98]}]
check = ak.Array([{"x": b"b"}, {"x": b"b"}, {"x": b"a"}, {"x": b"b"}])
print(check.layout)

print(ak.tolist(check))
print(ak.tolist(check.layout))

results in:

<RecordArray>
    <type>{"x": bytes}</type>
    <field index="0" key="x">
        <ListOffsetArray64>
            <type>bytes</type>
            <offsets><Index64 i="[0 1 2 3 4]" offset="0" at="0x7fb3431d6e00"/></offsets>
            <content><NumpyArray format="B" shape="4" data="0x 62626162" at="0x7fb34314e200">
                <type>byte</type>
            </NumpyArray></content>
        </ListOffsetArray64>
    </field>
</RecordArray>
[{'x': [98]}, {'x': [98]}, {'x': [97]}, {'x': [98]}]
[{'x': [98]}, {'x': [98]}, {'x': [97]}, {'x': [98]}]

However:

for out in check:
    print(out["x"])

results in (similar for strings):

b'b'
b'b'
b'a'
b'b'

but:

for out in check.layout:
    print(out["x"])

results in

<NumpyArray format="B" shape="1" data="0x 62" at="0x7fb3439c8200">
    <type>byte</type>
</NumpyArray>
<NumpyArray format="B" shape="1" data="0x 62" at="0x7fb3439c8200">
    <type>byte</type>
</NumpyArray>
<NumpyArray format="B" shape="1" data="0x 61" at="0x7fb3439c8200">
    <type>byte</type>
</NumpyArray>
<NumpyArray format="B" shape="1" data="0x 62" at="0x7fb3439c8200">
    <type>byte</type>
</NumpyArray>

It looks like the underlying behavior is correct but the handling is inconsistent?

@lgray lgray changed the title Type of strings and bytes not preserved iterating manually or in .tolist() Type of strings and bytes inconsistent iterating manually or in .tolist() Jan 5, 2020
@lgray
Copy link
Contributor Author

lgray commented Jan 5, 2020

The shortest reproducer is:

check = ak.Array([{"x": b"b"}, {"x": b"b"}, {"x": b"a"}, {"x": b"b"}])
print(check)
print(ak.tolist(check))
print(ak.tolist(ak.fromiter(ak.tolist(check))))

which gives:

[{x: b'b'}, {x: b'b'}, {x: b'a'}, {x: b'b'}]
[{'x': [98]}, {'x': [98]}, {'x': [97]}, {'x': [98]}]
[{'x': [98]}, {'x': [98]}, {'x': [97]}, {'x': [98]}]

jpivarski added a commit that referenced this issue Jan 6, 2020
jpivarski added a commit that referenced this issue Jan 6, 2020
@jpivarski jpivarski added the bug The problem described is something that must be fixed label Feb 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The problem described is something that must be fixed
Projects
None yet
Development

No branches or pull requests

2 participants