-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revamp of Dataset and YAXArray saving #132
Conversation
Benchmark resultJudge resultBenchmark Report for /home/runner/work/YAXArrays.jl/YAXArrays.jlJob Properties
ResultsA ratio greater than
Benchmark Group ListHere's a list of all the benchmark groups executed by this job:
Julia versioninfoTarget
Baseline
Target resultBenchmark Report for /home/runner/work/YAXArrays.jl/YAXArrays.jlJob Properties
ResultsBelow is a table of this job's results, obtained by running the benchmarks.
Benchmark Group ListHere's a list of all the benchmark groups executed by this job:
Julia versioninfo
Baseline resultBenchmark Report for /home/runner/work/YAXArrays.jl/YAXArrays.jlJob Properties
ResultsBelow is a table of this job's results, obtained by running the benchmarks.
Benchmark Group ListHere's a list of all the benchmark groups executed by this job:
Julia versioninfo
Runtime information
|
@felixcremer it would be great if you could experiment a bit, in particular if this is useful for rechunking, or if you encounter problems. |
Benchmark resultJudge resultBenchmark Report for /home/runner/work/YAXArrays.jl/YAXArrays.jlJob Properties
ResultsA ratio greater than
Benchmark Group ListHere's a list of all the benchmark groups executed by this job:
Julia versioninfoTarget
Baseline
Target resultBenchmark Report for /home/runner/work/YAXArrays.jl/YAXArrays.jlJob Properties
ResultsBelow is a table of this job's results, obtained by running the benchmarks.
Benchmark Group ListHere's a list of all the benchmark groups executed by this job:
Julia versioninfo
Baseline resultBenchmark Report for /home/runner/work/YAXArrays.jl/YAXArrays.jlJob Properties
ResultsBelow is a table of this job's results, obtained by running the benchmarks.
Benchmark Group ListHere's a list of all the benchmark groups executed by this job:
Julia versioninfo
Runtime information
|
Benchmark resultJudge resultBenchmark Report for /home/runner/work/YAXArrays.jl/YAXArrays.jlJob Properties
ResultsA ratio greater than
Benchmark Group ListHere's a list of all the benchmark groups executed by this job:
Julia versioninfoTarget
Baseline
Target resultBenchmark Report for /home/runner/work/YAXArrays.jl/YAXArrays.jlJob Properties
ResultsBelow is a table of this job's results, obtained by running the benchmarks.
Benchmark Group ListHere's a list of all the benchmark groups executed by this job:
Julia versioninfo
Baseline resultBenchmark Report for /home/runner/work/YAXArrays.jl/YAXArrays.jlJob Properties
ResultsBelow is a table of this job's results, obtained by running the benchmarks.
Benchmark Group ListHere's a list of all the benchmark groups executed by this job:
Julia versioninfo
Runtime information
|
src/Cubes/Cubes.jl
Outdated
cleaner::Vector{CleanMe} | ||
function YAXArray(axes, data, properties, cleaner) | ||
function YAXArray(axes, data, properties, chunks, cleaner) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this change breaking? Because I can't do YAXArray(axes, data, properties ,cleaner)
anymore.
Trying to save a cube I get the following error: julia> forestcor
YAXArray with the following dimensions
Lon Axis with 401 Elements from -670000.0 to -662000.0
Lat Axis with 300 Elements from -390015.0 to -395995.0
Polarisation Axis with 2 elements: VH VV
IMF Axis with 8 elements: IMF 1 IMF 2 IMF 3 IMF 4 IMF 5 IMF 6 Residual Original
Variable Axis with 4 elements: Ta_200 SM_10 Ta_10 SM_20
Total size: 29.37 MB
julia> savecube(forestcor, "/home/fcremer/Documents/Hypersense/bexis_hainich/forestcorsome.zarr")
(bufnow, outcs) = ((8.203725928756583e102, 6.765734799625517e102, -1.129464429664537e103, -0.0), (401, 300, 2, 8))
(rat, buf, sout) = (2.045816939839547e100, 8.203725928756583e102, 401)
ERROR: InexactError: trunc(Int64, 2.045816939839547e100)
Stacktrace:
[1] trunc
@ ./float.jl:805 [inlined]
[2] round
@ ./float.jl:369 [inlined]
[3] outalign(buf::Float64, sout::Int64)
@ YAXArrays.Cubes ~/Documents/papers_wip/EMDAmazonas/dev/YAXArrays/src/Cubes/Rechunker.jl:26
[4] _broadcast_getindex_evalf
@ ./broadcast.jl:670 [inlined]
[5] _broadcast_getindex
@ ./broadcast.jl:643 [inlined]
[6] #29
@ ./broadcast.jl:1075 [inlined]
[7] macro expansion
@ ./ntuple.jl:74 [inlined]
[8] ntuple
@ ./ntuple.jl:69 [inlined]
[9] copy
@ ./broadcast.jl:1075 [inlined]
[10] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple}, Nothing, typeof(YAXArrays.Cubes.outalign), Tuple{NTuple{4, Float64}, NTuple{4, Int64}}})
@ Base.Broadcast ./broadcast.jl:860
[11] get_copy_buffer_size(incube::SubArray{Union{Missing, Float32}, 4, Array{Union{Missing, Float32}, 5}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, outcube::ZArray{Union{Missing, Float32}, 4, Zarr.BloscCompressor, DirectoryStore}; writefac::Float64, maxbuf::Float64, align_output::Bool)
@ YAXArrays.Cubes ~/Documents/papers_wip/EMDAmazonas/dev/YAXArrays/src/Cubes/Rechunker.jl:56
[12] copy_diskarray(incube::SubArray{Union{Missing, Float32}, 4, Array{Union{Missing, Float32}, 5}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, outcube::ZArray{Union{Missing, Float32}, 4, Zarr.BloscCompressor, DirectoryStore}; writefac::Float64, maxbuf::Float64, align_output::Bool)
@ YAXArrays.Cubes ~/Documents/papers_wip/EMDAmazonas/dev/YAXArrays/src/Cubes/Rechunker.jl:72
[13] copydataset!(diskds::Dataset, ds::Dataset; writefac::Float64, maxbuf::Float64)
@ YAXArrays.Datasets ~/Documents/papers_wip/EMDAmazonas/dev/YAXArrays/src/DatasetAPI/Datasets.jl:392
[14] savedataset(ds::Dataset; path::String, persist::Nothing, overwrite::Bool, append::Bool, skeleton_only::Bool, backend::Symbol, driver::Symbol, max_cache::Float64)
@ YAXArrays.Datasets ~/Documents/papers_wip/EMDAmazonas/dev/YAXArrays/src/DatasetAPI/Datasets.jl:516
[15] savecube(c::YAXArray{Union{Missing, Float32}, 5, Array{Union{Missing, Float32}, 5}, Vector{CubeAxis}}, path::String; name::String, datasetaxis::String, max_cache::Float64, backend::Symbol, driver::Symbol, chunks::Nothing, overwrite::Bool, append::Bool, skeleton_only::Bool)
@ YAXArrays.Datasets ~/Documents/papers_wip/EMDAmazonas/dev/YAXArrays/src/DatasetAPI/Datasets.jl:538
[16] savecube(c::YAXArray{Union{Missing, Float32}, 5, Array{Union{Missing, Float32}, 5}, Vector{CubeAxis}}, path::String)
@ YAXArrays.Datasets ~/Documents/papers_wip/EMDAmazonas/dev/YAXArrays/src/DatasetAPI/Datasets.jl:534
[17] top-level scope
@ REPL[152]:1 |
This is a cube that could led to the error above: julia> a = YAXArray([RangeAxis(:Lon, 1:400), RangeAxis(:Lat, 1:300), CategoricalAxis("Polarisation", 1:2), CategoricalAxis("IMF", 1:8), CategoricalAxis("Variable", string.(1:4))], rand(400,300,2, 8,4))
We had different errors depending on the name of the "Variable" dimensions and also depending on the length of the strings in the Axis values. |
I think, I found the error for the not aligned chunks and the two cubes that I want to concatenate have the same chunksizes, but have different offsets: julia> DiskArrays.eachchunk(setchunks(permutedims(haidescsub, (3,1,2,4)), chunks))[1,1,:,1]
15-element Vector{NTuple{4, UnitRange{Int64}}}:
(1:358, 1:401, 1:20, 1:1)
(1:358, 1:401, 21:40, 1:1)
(1:358, 1:401, 41:60, 1:1)
(1:358, 1:401, 61:80, 1:1)
(1:358, 1:401, 81:100, 1:1)
(1:358, 1:401, 101:120, 1:1)
(1:358, 1:401, 121:140, 1:1)
(1:358, 1:401, 141:160, 1:1)
(1:358, 1:401, 161:180, 1:1)
(1:358, 1:401, 181:200, 1:1)
(1:358, 1:401, 201:220, 1:1)
(1:358, 1:401, 221:240, 1:1)
(1:358, 1:401, 241:260, 1:1)
(1:358, 1:401, 261:280, 1:1)
(1:358, 1:401, 281:299, 1:1)
julia> DiskArrays.eachchunk(su)[1,1,:,1]
sub1 subset subset! subsetcube subtypes success sum sum! summary supertype supertypes surface surface!
julia> DiskArrays.eachchunk(sub1)[1,1,:,1]
16-element Vector{NTuple{4, UnitRange{Int64}}}:
(1:358, 1:401, 1:10, 1:1)
(1:358, 1:401, 11:30, 1:1)
(1:358, 1:401, 31:50, 1:1)
(1:358, 1:401, 51:70, 1:1)
(1:358, 1:401, 71:90, 1:1)
(1:358, 1:401, 91:110, 1:1)
(1:358, 1:401, 111:130, 1:1)
(1:358, 1:401, 131:150, 1:1)
(1:358, 1:401, 151:170, 1:1)
(1:358, 1:401, 171:190, 1:1)
(1:358, 1:401, 191:210, 1:1)
(1:358, 1:401, 211:230, 1:1)
(1:358, 1:401, 231:250, 1:1)
(1:358, 1:401, 251:270, 1:1)
(1:358, 1:401, 271:290, 1:1)
(1:358, 1:401, 291:299, 1:1)
`` |
Unfortunately I have no idea how to move forward from here. julia> concatenatecubes([sub1, setchunks(permutedims(haidescsub, (3,1,2,4)), chunks)], catax)
(i, eachchunk(cl[i])) = (2, [(1:358, 1:401, 1:20, 1:1);;; (1:358, 1:401, 21:40, 1:1);;; (1:358, 1:401, 41:60, 1:1);;; (1:358, 1:401, 61:80, 1:1);;; (1:358, 1:401, 81:100, 1:1);;; (1:358, 1:401, 101:120, 1:1);;; (1:358, 1:401, 121:140, 1:1);;; (1:358, 1:401, 141:160, 1:1);;; (1:358, 1:401, 161:180, 1:1);;; (1:358, 1:401, 181:200, 1:1);;; (1:358, 1:401, 201:220, 1:1);;; (1:358, 1:401, 221:240, 1:1);;; (1:358, 1:401, 241:260, 1:1);;; (1:358, 1:401, 261:280, 1:1);;; (1:358, 1:401, 281:299, 1:1);;;; (1:358, 1:401, 1:20, 2:2);;; (1:358, 1:401, 21:40, 2:2);;; (1:358, 1:401, 41:60, 2:2);;; (1:358, 1:401, 61:80, 2:2);;; (1:358, 1:401, 81:100, 2:2);;; (1:358, 1:401, 101:120, 2:2);;; (1:358, 1:401, 121:140, 2:2);;; (1:358, 1:401, 141:160, 2:2);;; (1:358, 1:401, 161:180, 2:2);;; (1:358, 1:401, 181:200, 2:2);;; (1:358, 1:401, 201:220, 2:2);;; (1:358, 1:401, 221:240, 2:2);;; (1:358, 1:401, 241:260, 2:2);;; (1:358, 1:401, 261:280, 2:2);;; (1:358, 1:401, 281:299, 2:2)])
(i, chunks) = (2, [(1:358, 1:401, 1:10, 1:1);;; (1:358, 1:401, 11:30, 1:1);;; (1:358, 1:401, 31:50, 1:1);;; (1:358, 1:401, 51:70, 1:1);;; (1:358, 1:401, 71:90, 1:1);;; (1:358, 1:401, 91:110, 1:1);;; (1:358, 1:401, 111:130, 1:1);;; (1:358, 1:401, 131:150, 1:1);;; (1:358, 1:401, 151:170, 1:1);;; (1:358, 1:401, 171:190, 1:1);;; (1:358, 1:401, 191:210, 1:1);;; (1:358, 1:401, 211:230, 1:1);;; (1:358, 1:401, 231:250, 1:1);;; (1:358, 1:401, 251:270, 1:1);;; (1:358, 1:401, 271:290, 1:1);;; (1:358, 1:401, 291:299, 1:1);;;; (1:358, 1:401, 1:10, 2:2);;; (1:358, 1:401, 11:30, 2:2);;; (1:358, 1:401, 31:50, 2:2);;; (1:358, 1:401, 51:70, 2:2);;; (1:358, 1:401, 71:90, 2:2);;; (1:358, 1:401, 91:110, 2:2);;; (1:358, 1:401, 111:130, 2:2);;; (1:358, 1:401, 131:150, 2:2);;; (1:358, 1:401, 151:170, 2:2);;; (1:358, 1:401, 171:190, 2:2);;; (1:358, 1:401, 191:210, 2:2);;; (1:358, 1:401, 211:230, 2:2);;; (1:358, 1:401, 231:250, 2:2);;; (1:358, 1:401, 251:270, 2:2);;; (1:358, 1:401, 271:290, 2:2);;; (1:358, 1:401, 291:299, 2:2)])
size(chunks) = (1, 1, 16, 2)
size(eachchunk(cl[i])) = (1, 1, 15, 2)
ERROR: Trying to concatenate cubes with different chunk sizes. Consider manually setting a common chunk size using `setchunks`.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:33
[2] concatenatecubes(cl::Vector{YAXArray{Union{Missing, Float32}, 4, A, Vector{CubeAxis}} where A<:AbstractArray{Union{Missing, Float32}, 4}}, cataxis::CategoricalAxis{String, :IMF, Vector{String}})
@ YAXArrays.Cubes ~/Documents/papers_wip/EMDAmazonas/dev/YAXArrays/src/Cubes/TransformedCubes.jl:48
[3] top-level scope
@ REPL[103]:1
`` |
It doesn't find the backend from the filename. Maybe this only happens for .zarr files. |
Saving a dataset and then reopening it leads to the following error: julia> ds2 = open_dataset("data/concatedds.zarr")
ERROR: MethodError: no method matching typemax(::Type{Union{Missing, Float32}})
Closest candidates are:
typemax(::Union{DateTime, Type{DateTime}}) at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/Dates/src/types.jl:453
typemax(::Union{Date, Type{Date}}) at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/Dates/src/types.jl:455
typemax(::Union{Dates.Time, Type{Dates.Time}}) at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/Dates/src/types.jl:457
...
Stacktrace:
[1] DiskArrayTools.CFDiskArray(a::ZArray{Union{Missing, Float32}, 3, Zarr.BloscCompressor, DirectoryStore}, attr::Dict{String, Any})
@ DiskArrayTools ~/.julia/packages/DiskArrayTools/rEAgw/src/DiskArrayTools.jl:164
[2] open_dataset(g::String; driver::Symbol)
@ YAXArrays.Datasets ~/Documents/SeasFire/dev/YAXArrays/src/DatasetAPI/Datasets.jl:272
[3] open_dataset(g::String)
@ YAXArrays.Datasets ~/Documents/SeasFire/dev/YAXArrays/src/DatasetAPI/Datasets.jl:242
[4] top-level scope
@ REPL[2]:1 |
@lazarusA are you on this PR branch or do you get the error on master? |
@meggart yes, this is on this branch. Right @felixcremer? |
Yes this was on this branch. I also get this from the Gdalcube we optimzed yesterday. You can open the cube, when you use nonmissingtype in the DiskarrayTools function that throws the error. But I am not sure, whether this is the way |
Can you please send me a full path to a file you are trying to open? |
The error happened with a local fluxcom dataset and after saving it with this branch. |
I meant an MWE, it is really hard to reproduce without |
We want, that the buffersize is at most the length along the given dimension. Therefore we are penalizing buffers which are larger than the dimension length. This also surfaces the writefac parameter to the savecube function.
This is superseeded by #150 |
New features include:
savedataset
functionsavecube
now callssavedataset
after transforming withto_dataset
append
option for savedataset to add variables to an existing storeIn addition, I implemented something I had planned a long time ago: explicitly adding the chunks of a YAXArray as a field and an option for users to modify the chunking using
setchunks
. This way, to store a dataset with user-defined chunking, one just callssetchunks
prior to saving the dataset. Another area of application is whenmap
,concatenatecubes
,mapCube
orCubeTable
and friends fail to find a good common chunking when operating on multiple cubes. Then the user has always the possiblity to reset the chunks that YAXArray sees and thereby give hints on how to best access the data.