-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Base.unzip() #13942
Comments
We definitely need an |
Or even simpler unzip(a) = zip(a...) |
Trying the zip(...) thing on any non-toy problem shows that it's not so good pretty fast, unfortunately. |
I just ran into this issue when I absentmindedly used the Python-esque |
Like almost all other performance tips in julia, doing so is perfectly fine and sometimes wanted for various reasons. (OK, splatting 6000 elements is a bit crazy, but splatting unknown size can be useful....). The only thing that matters is to not put it in performance critical path. It looks like this is not covered in the performance tip? |
Yeah, a warning is a bad idea.
I think this should also be marked under differences from other languages. People coming from Python are probably used to the |
You can dispatch only on arrays of tuples:
|
Bump, we still need and unzip function. |
BUMP if anyone is bored... |
function unzip(zipped::Base.Iterators.Zip2{Array{T1,1}, Array{T2,1}}) where {T1,T2}
n = length(zipped)
vec1 = Vector{T1}(n)
vec2 = Vector{T2}(n)
i = 1
for pair in enumerate(zipped)
vec1[i], vec2[i] = pair[2]
i = i+1
end
return vec1, vec2
end |
maybe <= 1.5x is function unzip(zipped::Base.Iterators.Zip2{Array{T1,1}, Array{T2,1}}) where {T1,T2}
return map(first,zipped), map(last,zipped)
end |
unzip(zipped::Base.Iterators.Zip2) = (zipped.a, zipped.b) But all those differ in how they treat This trivial case apart, the desired semantics are not absolutely clear to me: Given an iterator Think about unzip(f(n) for n in Base.Iterators.countfrom(0)) for different |
There seems to be multiple use cases for an unzip, with different requirements. A common case is involves Arrays of Tuples, in which the size is known and array shape must be preserved. Some suggestions here do not obey this constraint. I have come up with a solution for this specific case using |
Can we get this for generalized iterators and not just arrays? Basically just a collect_many modeled after collect? |
I could give it a stab? The trickiest bit (I think) will be getting an equivalent of default_eltype for zero length eltype unknown iterators |
Go for it, @bramtayl! |
Tada, this seems to work. Great excuse not to finish my grading. struct Unzip{Iterator}
iterator::Iterator
end
iterate(iterator::Unzip) = iterate(iterator.iterator)
iterate(iterator::Unzip, state) = iterate(iterator.iterator, state)
IteratorEltype(iterator::Unzip) = IteratorEltype(iterator.iterator)
eltype(iterator::Unzip) = eltype(iterator.iterator)
IteratorSize(iterator::Unzip) = IteratorSize(iterator.iterator)
axes(iterator::Unzip) = axes(iterator.iterator)
size(iterator::Unzip) = size(iterator.iterator)
length(iterator::Unzip) = length(iterator.iterator)
struct ModelArray{ElementType, NumberOfDimensions, Model, Rest <: Tuple} <:
AbstractArray{ElementType, NumberOfDimensions}
model::Model
rest::Rest
end
model_array(model, rest...) =
ModelArray{
Tuple{eltype(model), eltype.(rest)...},
ndims(model),
typeof(model),
typeof(rest)
}(model, rest)
axes(array::ModelArray) = axes(array.model)
size(array::ModelArray) = size(array.model)
IndexStyle(array::ModelArray) = IndexLinear()
@propagate_inbounds getindex(array::ModelArray, index::Int) = (
getindex(array.model, index),
getindex.(array.rest, index)...
)
@propagate_inbounds setindex!(array::ModelArray, value::Tuple, index::Int) = (
setindex!(array.model, value[1], index),
setindex!.(array.rest, tail(value), index)...
)
push!(array::ModelArray, value::Tuple) = (
push!(array.model, value[1]),
push!.(array.rest, tail(value))
)
# TODO: use fieldtypes instead of value_field_types in a future with more constant propagation
value_field_types(a_type) =
ntuple(index -> Val(fieldtype(a_type, index)), fieldcount(a_type))
function similar(
array::Unzip,
::Type{ElementType},
dims::Dims
) where {ElementType, NumberOfDimensions}
inner_array(::Val{InnerElementType}) where InnerElementType =
Array{InnerElementType}(undef, dims)
model_array(inner_array.(value_field_types(ElementType))...)
end
# Unzip needs to replicate AbstractArray similar machinery
similar(a::Unzip, ::Type{T}) where {T} = similar(a, T, to_shape(axes(a)))
similar(a::Unzip{T}, dims::Tuple) where {T} = similar(a, T, to_shape(dims))
similar(a::Unzip{T}, dims::DimOrInd...) where {T} = similar(a, T, to_shape(dims))
similar(a::Unzip, ::Type{T}, dims::DimOrInd...) where {T} = similar(a, T, to_shape(dims))
similar(a::Unzip, ::Type{T}, dims::Tuple{Union{Integer, OneTo}, Vararg{Union{Integer, OneTo}}}) where {T} = similar(a, T, to_shape(dims))
similar(array::ModelArray, ::Type{ElementType}, dims::Dims) where ElementType =
similar(Unzip(nothing), ElementType, dims)
_similar_for(c::Unzip{Nothing}, T, itr, ::SizeUnknown) =
similar(c, T, 0)
_similar_for(c::Unzip{Nothing}, T, itr, ::HasLength) =
similar(c, T, Int(length(itr)::Integer))
_similar_for(c::Unzip{Nothing}, T, itr, ::HasShape) =
similar(c, T, axes(itr))
collect(iterator::Unzip) = _collect(
Unzip(nothing),
iterator,
IteratorEltype(iterator),
IteratorSize(iterator)
)
# Examples
Generator(x -> (x, x + 1.0), [1]) |> Unzip |> collect
Generator(x -> (x, x + 1.0), [1, missing]) |> Unzip |> collect
zip([1], [1.0]) |> Unzip |> collect
[(1, 1.0)] |> Unzip |> collect
Filter(x -> x[1] == x[2], [(1, 1.0)]) |> Unzip |> collect
`` |
The above strategy turns out to be extremely sub-optimal for EltypeUnknown iterators, but it still works. I'm not sure I have the programming wherewithall to design a better system. |
I'm thinking it would probably just need specialized ::ModelArray methods for grow_to! and collect_to! to be efficient, I'll see what I can cook up |
I took at stab at simplifying the code in my gist. I made a little headway but not much. If it's any comfort, many of the functions (map_unrolled, partial_map, zip_missing) would fit nicely into a unexported mini-module for unrolling tuple operations. I've been kicking around a plan for that for a long time. |
It would be really nice to get this solved for 1.6. |
@StefanKarpinski I remember that you had a full grasp of this -- is anything of deep import holding it back? |
Only performance, Iirc. |
@JeffreySarnoff, I may be misremembering some things, but I think there were some unresolved questions aside from performance:
Maybe some other things too, but it's been a while since I read through the entire thread. Note that these issues only really arise in |
I wrote up https://github.com/bramtayl/Unzip.jl and it should be registered 3 days from now. |
afaic a clear candidate after looking at
unzip(x::Base.Iterators.Zip) = x.is |
Feel free to open up a PR over there, but I'm not so sure about that method. The behavior in my version |
How is this obvious? It's one possible way to define |
(it works better with Certainly, the unzip of a To forgo that very high throughput advancing some computation because it provides unequal length vectors iff unequal length vectors were given to |
@zzyxyzz I meant "obvious" in the sense that this is what |
good point |
Maybe |
@JeffreySarnoff, I'm not saying that making |
Maybe this issue should be renamed, following Julia tradition, "Taking unzip seriously". 😛 |
Registered! |
For |
My personal take on this: In any decent implementation,
|
No post in 2021 ? So let's give it a try now! HNY 2022 Unzip ! Multidispatch-friendly zen of unzip in julia
OK FWIW, my good-enough-poor-man, but usable in prod, isn't it, unzip considering than
using Test
import Base.Iterators as _I
unzip(s...) = unzip(collect(s))
unzip(vs::Vector{<:Vector}) =
let M=length(vs), N=mapfoldl(length, min, vs); # todo remove me when SVector is in Base
([vs[i][j] for i in 1:M] for j in 1:N)
end
unzip(a::Vector{<:Pair}) = [k for (k,_) in a], [v for (_,v) in a] TESTimport Base.Iterators as _I
using Test
# zipdata(M,N) = let v=collect(1:M), vt=ntuple(N) do _; copy(v) end; vt end
data(M,N) = ntuple(M) do i; fill(i,N) end
data(N) = let ks=_I.take(_I.cycle('a':'z'), N), vs=(1:N...,); (k=>v for (k,v) in zip(ks,vs)) end VALIDITY TEST# unzip of vector
@test data(5,3) == ([1,1,1],[2,2,2],[3,3,3],[4,4,4],[5,5,5])
@test unzip(data(5,3)...) |> collect == ([1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]) |> collect
# unzip of pair vector
@test data(5) |> collect == ('a'=>1, 'b'=>2, 'c'=>3, 'd'=>4, 'e'=>5) |> collect
@test unzip(data(5) |> collect) |> collect == (['a','b','c','d','e'], [1,2,3,4,5]) |> collect SPEED TEST
|
Hmm... @o314, that's not working for me: julia> unzip(zip(1:10, 1:10))
Base.Generator{UnitRange{Int64}, var"#7#9"{Vector{Vector{Base.Iterators.Zip{Tuple{UnitRange{Int64}, UnitRange{Int64}}}}}, Int64}}(var"#7#9"{Vector{Vector{Base.Iterators.Zip{Tuple{UnitRange{Int64}, UnitRange{Int64}}}}}, Int64}(Vector{Base.Iterators.Zip{Tuple{UnitRange{Int64}, UnitRange{Int64}}}}[[zip(1:10, 1:10)]], 1), 1:1)
julia> for i in unzip(zip(1:10, 1:10))
println(i)
end
Base.Iterators.Zip{Tuple{UnitRange{Int64}, UnitRange{Int64}}}[zip(1:10, 1:10)]
julia> x = collect(zip(1:5, 2:2:10))
5-element Vector{Tuple{Int64, Int64}}:
(1, 2)
(2, 4)
(3, 6)
(4, 8)
(5, 10)
julia> collect(unzip(x))
5-element Vector{Vector{Tuple{Int64, Int64}}}:
[(1, 2)]
[(2, 4)]
[(3, 6)]
[(4, 8)]
[(5, 10)] |
Per 1.7.2, the
Hell I miss an |
Someday |
note that since #50435 was merged, there is precedent to throw on follow-up operations when zipped iterators have unequal lengths. so many of the previous concerns upthread about the desire to & controversy of defining w.r.t. returning the original iterators or copies, I definitely prefer not to copy. all |
Hi there,
apologies if this has already been addressed somewhere, but is there a reason that there is no
unzip()
function in Base?Ideally this would be a function that would take a
Vector{Tuple{ ... }}
and return aTuple{Vector, ..., Vector}
for output. E.g.A naive implementation might be something like
The text was updated successfully, but these errors were encountered: