-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run test_count_call_alleles on Cubed (alternative approach) #1254
Conversation
One issue that this work has highlighted is whether we allow chunking in core dimensions, ploidy in this case. With Dask we do allow ploidy to be chunked, but this only works because we rely on Dask's map_blocks to transparently concatenate chunks along chunked core dimensions. The Dask documentation at https://docs.dask.org/en/stable/generated/dask.array.map_blocks.html actually warns against relying on this, and says
So I think we should explicitly disallow this case, and fail if the input dataset is chunked in the ploidy dimension. There is no reason to chunk in this dimension, and our VCF converters in sgkit and bio2zarr never do. I bring it up in this PR since Cubed will never concatenate chunks in this way, which exposed the issue. |
Agreed - there's no good reason to allow this. 2D chunks are more than enough complication here! |
Added a check for chunking in the ploidy dimension. |
Sorry for the radio silence - I was on leave. This approach looks good to me. There are a couple of cases where we use |
Thanks @timothymillar! That sounds good to me. My plan is to get the aggregation functions needed for QC working under Cubed next - |
See #908
This takes an alternative approach to the one in #1249, by using
map_blocks
on both Dask and Cubed, rather than Xarray'sapply_ufunc
. This avoids the possible issue of the Dask graph changing (originally from #871) that was seen in #1249 by keeping the Dask code path the same.I think this is the more pragmatic path forward as it allows us to experiment with Cubed by making minimal changes to Dask. (It also doesn't preclude using
apply_ufunc
in the future.)Would love to get your thoughts @jeromekelleher and @timothymillar.