Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp of Dataset and YAXArray saving #132

Closed
wants to merge 12 commits into from
Closed

Revamp of Dataset and YAXArray saving #132

wants to merge 12 commits into from

Conversation

meggart
Copy link
Member

@meggart meggart commented May 6, 2022

New features include:

  • savedataset function
  • savecube now calls savedataset after transforming with to_dataset
  • append option for savedataset to add variables to an existing store
  • optimized writing, so this can be used for rechunking data
  • lots of examples in the docs

In addition, I implemented something I had planned a long time ago: explicitly adding the chunks of a YAXArray as a field and an option for users to modify the chunking using setchunks. This way, to store a dataset with user-defined chunking, one just calls setchunks prior to saving the dataset. Another area of application is when map, concatenatecubes, mapCube or CubeTable and friends fail to find a good common chunking when operating on multiple cubes. Then the user has always the possiblity to reset the chunks that YAXArray sees and thereby give hints on how to best access the data.

@github-actions
Copy link
Contributor

github-actions bot commented May 6, 2022

Benchmark result

Judge result

Benchmark Report for /home/runner/work/YAXArrays.jl/YAXArrays.jl

Job Properties

  • Time of benchmarks:
    • Target: 6 May 2022 - 09:39
    • Baseline: 6 May 2022 - 09:40
  • Package commits:
    • Target: 578c1c
    • Baseline: 2ecc94
  • Julia commits:
    • Target: bf5349
    • Baseline: bf5349
  • Julia command flags:
    • Target: None
    • Baseline: None
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["mapslices", "small"] 0.01 (5%) ✅ 0.19 (1%) ✅

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["mapslices"]

Julia versioninfo

Target

Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.4 LTS
  uname: Linux 5.13.0-1022-azure #26~20.04.1-Ubuntu SMP Thu Apr 7 19:42:45 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz: 
              speed         user         nice          sys         idle          irq
       #1  2394 MHz       1270 s          2 s        203 s        876 s          0 s
       #2  2394 MHz       1030 s          1 s        182 s       1160 s          0 s
       
  Memory: 6.783607482910156 GB (3274.8125 MB free)
  Uptime: 241.19 sec
  Load Avg:  1.5  1.03  0.44
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, haswell)

Baseline

Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.4 LTS
  uname: Linux 5.13.0-1022-azure #26~20.04.1-Ubuntu SMP Thu Apr 7 19:42:45 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz: 
              speed         user         nice          sys         idle          irq
       #1  2394 MHz       1639 s          2 s        213 s        954 s          0 s
       #2  2394 MHz       1110 s          1 s        186 s       1532 s          0 s
       
  Memory: 6.783607482910156 GB (3467.26171875 MB free)
  Uptime: 286.97 sec
  Load Avg:  1.24  1.03  0.47
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, haswell)

Target result

Benchmark Report for /home/runner/work/YAXArrays.jl/YAXArrays.jl

Job Properties

  • Time of benchmark: 6 May 2022 - 9:39
  • Package commit: 578c1c
  • Julia commit: bf5349
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["mapslices", "small"] 217.121 ms (5%) 6.947 ms 325.11 MiB (1%) 41585

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["mapslices"]

Julia versioninfo

Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.4 LTS
  uname: Linux 5.13.0-1022-azure #26~20.04.1-Ubuntu SMP Thu Apr 7 19:42:45 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz: 
              speed         user         nice          sys         idle          irq
       #1  2394 MHz       1270 s          2 s        203 s        876 s          0 s
       #2  2394 MHz       1030 s          1 s        182 s       1160 s          0 s
       
  Memory: 6.783607482910156 GB (3274.8125 MB free)
  Uptime: 241.19 sec
  Load Avg:  1.5  1.03  0.44
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, haswell)

Baseline result

Benchmark Report for /home/runner/work/YAXArrays.jl/YAXArrays.jl

Job Properties

  • Time of benchmark: 6 May 2022 - 9:40
  • Package commit: 2ecc94
  • Julia commit: bf5349
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["mapslices", "small"] 15.482 s (5%) 904.747 ms 1.70 GiB (1%) 26921309

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["mapslices"]

Julia versioninfo

Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.4 LTS
  uname: Linux 5.13.0-1022-azure #26~20.04.1-Ubuntu SMP Thu Apr 7 19:42:45 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz: 
              speed         user         nice          sys         idle          irq
       #1  2394 MHz       1639 s          2 s        213 s        954 s          0 s
       #2  2394 MHz       1110 s          1 s        186 s       1532 s          0 s
       
  Memory: 6.783607482910156 GB (3467.26171875 MB free)
  Uptime: 286.97 sec
  Load Avg:  1.24  1.03  0.47
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, haswell)

Runtime information

Runtime Info
BLAS #threads 2
BLAS.vendor() openblas64
Sys.CPU_THREADS 2

lscpu output:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          2
On-line CPU(s) list:             0,1
Thread(s) per core:              1
Core(s) per socket:              2
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           63
Model name:                      Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
Stepping:                        2
CPU MHz:                         2394.453
BogoMIPS:                        4788.90
Hypervisor vendor:               Microsoft
Virtualization type:             full
L1d cache:                       64 KiB
L1i cache:                       64 KiB
L2 cache:                        512 KiB
L3 cache:                        30 MiB
NUMA node0 CPU(s):               0,1
Vulnerability Itlb multihit:     KVM: Mitigation: VMX unsupported
Vulnerability L1tf:              Mitigation; PTE Inversion
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Retpolines, STIBP disabled, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt md_clear
Cpu Property Value
Brand Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
Vendor :Intel
Architecture :Haswell
Model Family: 0x06, Model: 0x3f, Stepping: 0x02, Type: 0x00
Cores 2 physical cores, 2 logical cores (on executing CPU)
No Hyperthreading hardware capability detected
Clock Frequencies Not supported by CPU
Data Cache Level 1:3 : (32, 256, 30720) kbytes
64 byte cache line size
Address Size 48 bits virtual, 46 bits physical
SIMD 256 bit = 32 byte max. SIMD vector size
Time Stamp Counter TSC is accessible via rdtsc
TSC increased at every clock cycle (non-invariant TSC)
Perf. Monitoring Performance Monitoring Counters (PMC) are not supported
Hypervisor Yes, Microsoft

@meggart
Copy link
Member Author

meggart commented May 6, 2022

@felixcremer it would be great if you could experiment a bit, in particular if this is useful for rechunking, or if you encounter problems.

@github-actions
Copy link
Contributor

github-actions bot commented May 6, 2022

Benchmark result

Judge result

Benchmark Report for /home/runner/work/YAXArrays.jl/YAXArrays.jl

Job Properties

  • Time of benchmarks:
    • Target: 6 May 2022 - 09:57
    • Baseline: 6 May 2022 - 09:58
  • Package commits:
    • Target: c63bff
    • Baseline: 2ecc94
  • Julia commits:
    • Target: bf5349
    • Baseline: bf5349
  • Julia command flags:
    • Target: None
    • Baseline: None
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["mapslices", "small"] 0.01 (5%) ✅ 0.19 (1%) ✅

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["mapslices"]

Julia versioninfo

Target

Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.4 LTS
  uname: Linux 5.13.0-1022-azure #26~20.04.1-Ubuntu SMP Thu Apr 7 19:42:45 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz: 
              speed         user         nice          sys         idle          irq
       #1  2793 MHz        866 s          2 s        130 s       2010 s          0 s
       #2  2793 MHz       1033 s          1 s        186 s       1802 s          0 s
       
  Memory: 6.783607482910156 GB (3178.9921875 MB free)
  Uptime: 305.49 sec
  Load Avg:  1.42  0.75  0.31
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, icelake-server)

Baseline

Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.4 LTS
  uname: Linux 5.13.0-1022-azure #26~20.04.1-Ubuntu SMP Thu Apr 7 19:42:45 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz: 
              speed         user         nice          sys         idle          irq
       #1  2793 MHz        903 s          2 s        137 s       2330 s          0 s
       #2  2793 MHz       1370 s          1 s        193 s       1822 s          0 s
       
  Memory: 6.783607482910156 GB (3194.96875 MB free)
  Uptime: 342.05 sec
  Load Avg:  1.22  0.78  0.34
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, icelake-server)

Target result

Benchmark Report for /home/runner/work/YAXArrays.jl/YAXArrays.jl

Job Properties

  • Time of benchmark: 6 May 2022 - 9:57
  • Package commit: c63bff
  • Julia commit: bf5349
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["mapslices", "small"] 155.179 ms (5%) 2.547 ms 325.11 MiB (1%) 41585

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["mapslices"]

Julia versioninfo

Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.4 LTS
  uname: Linux 5.13.0-1022-azure #26~20.04.1-Ubuntu SMP Thu Apr 7 19:42:45 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz: 
              speed         user         nice          sys         idle          irq
       #1  2793 MHz        866 s          2 s        130 s       2010 s          0 s
       #2  2793 MHz       1033 s          1 s        186 s       1802 s          0 s
       
  Memory: 6.783607482910156 GB (3178.9921875 MB free)
  Uptime: 305.49 sec
  Load Avg:  1.42  0.75  0.31
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, icelake-server)

Baseline result

Benchmark Report for /home/runner/work/YAXArrays.jl/YAXArrays.jl

Job Properties

  • Time of benchmark: 6 May 2022 - 9:58
  • Package commit: 2ecc94
  • Julia commit: bf5349
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["mapslices", "small"] 12.056 s (5%) 676.031 ms 1.70 GiB (1%) 26911547

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["mapslices"]

Julia versioninfo

Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.4 LTS
  uname: Linux 5.13.0-1022-azure #26~20.04.1-Ubuntu SMP Thu Apr 7 19:42:45 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz: 
              speed         user         nice          sys         idle          irq
       #1  2793 MHz        903 s          2 s        137 s       2330 s          0 s
       #2  2793 MHz       1370 s          1 s        193 s       1822 s          0 s
       
  Memory: 6.783607482910156 GB (3194.96875 MB free)
  Uptime: 342.05 sec
  Load Avg:  1.22  0.78  0.34
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, icelake-server)

Runtime information

Runtime Info
BLAS #threads 2
BLAS.vendor() openblas64
Sys.CPU_THREADS 2

lscpu output:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          2
On-line CPU(s) list:             0,1
Thread(s) per core:              1
Core(s) per socket:              2
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           106
Model name:                      Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Stepping:                        6
CPU MHz:                         2793.437
BogoMIPS:                        5586.87
Hypervisor vendor:               Microsoft
Virtualization type:             full
L1d cache:                       96 KiB
L1i cache:                       64 KiB
L2 cache:                        2.5 MiB
L3 cache:                        48 MiB
NUMA node0 CPU(s):               0,1
Vulnerability Itlb multihit:     KVM: Mitigation: VMX unsupported
Vulnerability L1tf:              Mitigation; PTE Inversion
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Retpolines, STIBP disabled, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT Host state unknown
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt xsavec xsaves md_clear
Cpu Property Value
Brand Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Vendor :Intel
Architecture :UnknownIntel
Model Family: 0x06, Model: 0x6a, Stepping: 0x06, Type: 0x00
Cores 2 physical cores, 2 logical cores (on executing CPU)
No Hyperthreading hardware capability detected
Clock Frequencies Not supported by CPU
Data Cache Level 1:3 : (48, 1280, 49152) kbytes
64 byte cache line size
Address Size 48 bits virtual, 46 bits physical
SIMD 512 bit = 64 byte max. SIMD vector size
Time Stamp Counter TSC is accessible via rdtsc
TSC increased at every clock cycle (non-invariant TSC)
Perf. Monitoring Performance Monitoring Counters (PMC) are not supported
Hypervisor Yes, Microsoft

@github-actions
Copy link
Contributor

github-actions bot commented May 6, 2022

Benchmark result

Judge result

Benchmark Report for /home/runner/work/YAXArrays.jl/YAXArrays.jl

Job Properties

  • Time of benchmarks:
    • Target: 6 May 2022 - 16:15
    • Baseline: 6 May 2022 - 16:15
  • Package commits:
    • Target: 6acd4f
    • Baseline: 818088
  • Julia commits:
    • Target: bf5349
    • Baseline: bf5349
  • Julia command flags:
    • Target: None
    • Baseline: None
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["mapslices", "small"] 0.02 (5%) ✅ 0.19 (1%) ✅

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["mapslices"]

Julia versioninfo

Target

Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.4 LTS
  uname: Linux 5.13.0-1022-azure #26~20.04.1-Ubuntu SMP Thu Apr 7 19:42:45 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz       1128 s          1 s        174 s        993 s          0 s
       #2  2593 MHz        728 s          2 s        141 s       1448 s          0 s
       
  Memory: 6.783607482910156 GB (3266.65234375 MB free)
  Uptime: 235.04 sec
  Load Avg:  1.44  0.8  0.33
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake-avx512)

Baseline

Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.4 LTS
  uname: Linux 5.13.0-1022-azure #26~20.04.1-Ubuntu SMP Thu Apr 7 19:42:45 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz       1421 s          1 s        181 s       1056 s          0 s
       #2  2593 MHz        793 s          2 s        145 s       1741 s          0 s
       
  Memory: 6.783607482910156 GB (3442.6875 MB free)
  Uptime: 271.28 sec
  Load Avg:  1.3  0.84  0.36
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake-avx512)

Target result

Benchmark Report for /home/runner/work/YAXArrays.jl/YAXArrays.jl

Job Properties

  • Time of benchmark: 6 May 2022 - 16:15
  • Package commit: 6acd4f
  • Julia commit: bf5349
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["mapslices", "small"] 201.169 ms (5%) 3.759 ms 325.11 MiB (1%) 41585

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["mapslices"]

Julia versioninfo

Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.4 LTS
  uname: Linux 5.13.0-1022-azure #26~20.04.1-Ubuntu SMP Thu Apr 7 19:42:45 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz       1128 s          1 s        174 s        993 s          0 s
       #2  2593 MHz        728 s          2 s        141 s       1448 s          0 s
       
  Memory: 6.783607482910156 GB (3266.65234375 MB free)
  Uptime: 235.04 sec
  Load Avg:  1.44  0.8  0.33
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake-avx512)

Baseline result

Benchmark Report for /home/runner/work/YAXArrays.jl/YAXArrays.jl

Job Properties

  • Time of benchmark: 6 May 2022 - 16:15
  • Package commit: 818088
  • Julia commit: bf5349
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["mapslices", "small"] 12.007 s (5%) 849.965 ms 1.69 GiB (1%) 26903036

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["mapslices"]

Julia versioninfo

Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.4 LTS
  uname: Linux 5.13.0-1022-azure #26~20.04.1-Ubuntu SMP Thu Apr 7 19:42:45 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz       1421 s          1 s        181 s       1056 s          0 s
       #2  2593 MHz        793 s          2 s        145 s       1741 s          0 s
       
  Memory: 6.783607482910156 GB (3442.6875 MB free)
  Uptime: 271.28 sec
  Load Avg:  1.3  0.84  0.36
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake-avx512)

Runtime information

Runtime Info
BLAS #threads 2
BLAS.vendor() openblas64
Sys.CPU_THREADS 2

lscpu output:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          2
On-line CPU(s) list:             0,1
Thread(s) per core:              1
Core(s) per socket:              2
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           85
Model name:                      Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Stepping:                        7
CPU MHz:                         2593.908
BogoMIPS:                        5187.81
Hypervisor vendor:               Microsoft
Virtualization type:             full
L1d cache:                       64 KiB
L1i cache:                       64 KiB
L2 cache:                        2 MiB
L3 cache:                        35.8 MiB
NUMA node0 CPU(s):               0,1
Vulnerability Itlb multihit:     KVM: Mitigation: VMX unsupported
Vulnerability L1tf:              Mitigation; PTE Inversion
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Retpolines, STIBP disabled, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT Host state unknown
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt xsavec xsaves md_clear
Cpu Property Value
Brand Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Vendor :Intel
Architecture :Skylake
Model Family: 0x06, Model: 0x55, Stepping: 0x07, Type: 0x00
Cores 2 physical cores, 2 logical cores (on executing CPU)
No Hyperthreading hardware capability detected
Clock Frequencies Not supported by CPU
Data Cache Level 1:3 : (32, 1024, 36608) kbytes
64 byte cache line size
Address Size 48 bits virtual, 46 bits physical
SIMD 512 bit = 64 byte max. SIMD vector size
Time Stamp Counter TSC is accessible via rdtsc
TSC increased at every clock cycle (non-invariant TSC)
Perf. Monitoring Performance Monitoring Counters (PMC) are not supported
Hypervisor Yes, Microsoft

cleaner::Vector{CleanMe}
function YAXArray(axes, data, properties, cleaner)
function YAXArray(axes, data, properties, chunks, cleaner)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this change breaking? Because I can't do YAXArray(axes, data, properties ,cleaner) anymore.

@felixcremer
Copy link
Member

felixcremer commented May 10, 2022

Trying to save a cube I get the following error:

julia> forestcor
YAXArray with the following dimensions
Lon                 Axis with 401 Elements from -670000.0 to -662000.0
Lat                 Axis with 300 Elements from -390015.0 to -395995.0
Polarisation        Axis with 2 elements: VH VV 
IMF                 Axis with 8 elements: IMF 1 IMF 2 IMF 3 IMF 4 IMF 5 IMF 6 Residual Original 
Variable            Axis with 4 elements: Ta_200 SM_10 Ta_10 SM_20 
Total size: 29.37 MB

julia> savecube(forestcor, "/home/fcremer/Documents/Hypersense/bexis_hainich/forestcorsome.zarr")
(bufnow, outcs) = ((8.203725928756583e102, 6.765734799625517e102, -1.129464429664537e103, -0.0), (401, 300, 2, 8))
(rat, buf, sout) = (2.045816939839547e100, 8.203725928756583e102, 401)
ERROR: InexactError: trunc(Int64, 2.045816939839547e100)
Stacktrace:
  [1] trunc
    @ ./float.jl:805 [inlined]
  [2] round
    @ ./float.jl:369 [inlined]
  [3] outalign(buf::Float64, sout::Int64)
    @ YAXArrays.Cubes ~/Documents/papers_wip/EMDAmazonas/dev/YAXArrays/src/Cubes/Rechunker.jl:26
  [4] _broadcast_getindex_evalf
    @ ./broadcast.jl:670 [inlined]
  [5] _broadcast_getindex
    @ ./broadcast.jl:643 [inlined]
  [6] #29
    @ ./broadcast.jl:1075 [inlined]
  [7] macro expansion
    @ ./ntuple.jl:74 [inlined]
  [8] ntuple
    @ ./ntuple.jl:69 [inlined]
  [9] copy
    @ ./broadcast.jl:1075 [inlined]
 [10] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple}, Nothing, typeof(YAXArrays.Cubes.outalign), Tuple{NTuple{4, Float64}, NTuple{4, Int64}}})
    @ Base.Broadcast ./broadcast.jl:860
 [11] get_copy_buffer_size(incube::SubArray{Union{Missing, Float32}, 4, Array{Union{Missing, Float32}, 5}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, outcube::ZArray{Union{Missing, Float32}, 4, Zarr.BloscCompressor, DirectoryStore}; writefac::Float64, maxbuf::Float64, align_output::Bool)
    @ YAXArrays.Cubes ~/Documents/papers_wip/EMDAmazonas/dev/YAXArrays/src/Cubes/Rechunker.jl:56
 [12] copy_diskarray(incube::SubArray{Union{Missing, Float32}, 4, Array{Union{Missing, Float32}, 5}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, outcube::ZArray{Union{Missing, Float32}, 4, Zarr.BloscCompressor, DirectoryStore}; writefac::Float64, maxbuf::Float64, align_output::Bool)
    @ YAXArrays.Cubes ~/Documents/papers_wip/EMDAmazonas/dev/YAXArrays/src/Cubes/Rechunker.jl:72
 [13] copydataset!(diskds::Dataset, ds::Dataset; writefac::Float64, maxbuf::Float64)
    @ YAXArrays.Datasets ~/Documents/papers_wip/EMDAmazonas/dev/YAXArrays/src/DatasetAPI/Datasets.jl:392
 [14] savedataset(ds::Dataset; path::String, persist::Nothing, overwrite::Bool, append::Bool, skeleton_only::Bool, backend::Symbol, driver::Symbol, max_cache::Float64)
    @ YAXArrays.Datasets ~/Documents/papers_wip/EMDAmazonas/dev/YAXArrays/src/DatasetAPI/Datasets.jl:516
 [15] savecube(c::YAXArray{Union{Missing, Float32}, 5, Array{Union{Missing, Float32}, 5}, Vector{CubeAxis}}, path::String; name::String, datasetaxis::String, max_cache::Float64, backend::Symbol, driver::Symbol, chunks::Nothing, overwrite::Bool, append::Bool, skeleton_only::Bool)
    @ YAXArrays.Datasets ~/Documents/papers_wip/EMDAmazonas/dev/YAXArrays/src/DatasetAPI/Datasets.jl:538
 [16] savecube(c::YAXArray{Union{Missing, Float32}, 5, Array{Union{Missing, Float32}, 5}, Vector{CubeAxis}}, path::String)
    @ YAXArrays.Datasets ~/Documents/papers_wip/EMDAmazonas/dev/YAXArrays/src/DatasetAPI/Datasets.jl:534
 [17] top-level scope
    @ REPL[152]:1

@felixcremer
Copy link
Member

This is a cube that could led to the error above:

julia> a = YAXArray([RangeAxis(:Lon, 1:400), RangeAxis(:Lat, 1:300), CategoricalAxis("Polarisation", 1:2), CategoricalAxis("IMF", 1:8), CategoricalAxis("Variable", string.(1:4))], rand(400,300,2, 8,4))

We had different errors depending on the name of the "Variable" dimensions and also depending on the length of the strings in the Axis values.

@felixcremer
Copy link
Member

I think, I found the error for the not aligned chunks and the two cubes that I want to concatenate have the same chunksizes, but have different offsets:

julia> DiskArrays.eachchunk(setchunks(permutedims(haidescsub, (3,1,2,4)), chunks))[1,1,:,1]
15-element Vector{NTuple{4, UnitRange{Int64}}}:
 (1:358, 1:401, 1:20, 1:1)
 (1:358, 1:401, 21:40, 1:1)
 (1:358, 1:401, 41:60, 1:1)
 (1:358, 1:401, 61:80, 1:1)
 (1:358, 1:401, 81:100, 1:1)
 (1:358, 1:401, 101:120, 1:1)
 (1:358, 1:401, 121:140, 1:1)
 (1:358, 1:401, 141:160, 1:1)
 (1:358, 1:401, 161:180, 1:1)
 (1:358, 1:401, 181:200, 1:1)
 (1:358, 1:401, 201:220, 1:1)
 (1:358, 1:401, 221:240, 1:1)
 (1:358, 1:401, 241:260, 1:1)
 (1:358, 1:401, 261:280, 1:1)
 (1:358, 1:401, 281:299, 1:1)

julia> DiskArrays.eachchunk(su)[1,1,:,1]
sub1       subset      subset!     subsetcube  subtypes    success     sum         sum!        summary     supertype   supertypes  surface     surface!
julia> DiskArrays.eachchunk(sub1)[1,1,:,1]
16-element Vector{NTuple{4, UnitRange{Int64}}}:
 (1:358, 1:401, 1:10, 1:1)
 (1:358, 1:401, 11:30, 1:1)
 (1:358, 1:401, 31:50, 1:1)
 (1:358, 1:401, 51:70, 1:1)
 (1:358, 1:401, 71:90, 1:1)
 (1:358, 1:401, 91:110, 1:1)
 (1:358, 1:401, 111:130, 1:1)
 (1:358, 1:401, 131:150, 1:1)
 (1:358, 1:401, 151:170, 1:1)
 (1:358, 1:401, 171:190, 1:1)
 (1:358, 1:401, 191:210, 1:1)
 (1:358, 1:401, 211:230, 1:1)
 (1:358, 1:401, 231:250, 1:1)
 (1:358, 1:401, 251:270, 1:1)
 (1:358, 1:401, 271:290, 1:1)
 (1:358, 1:401, 291:299, 1:1)
``

@felixcremer
Copy link
Member

Unfortunately I have no idea how to move forward from here.
For posterity, this is the error that I get when I try to concatenate two cubes and I already use setchunks to the same chunks as the first cube:

julia> concatenatecubes([sub1, setchunks(permutedims(haidescsub, (3,1,2,4)), chunks)], catax)
(i, eachchunk(cl[i])) = (2, [(1:358, 1:401, 1:20, 1:1);;; (1:358, 1:401, 21:40, 1:1);;; (1:358, 1:401, 41:60, 1:1);;; (1:358, 1:401, 61:80, 1:1);;; (1:358, 1:401, 81:100, 1:1);;; (1:358, 1:401, 101:120, 1:1);;; (1:358, 1:401, 121:140, 1:1);;; (1:358, 1:401, 141:160, 1:1);;; (1:358, 1:401, 161:180, 1:1);;; (1:358, 1:401, 181:200, 1:1);;; (1:358, 1:401, 201:220, 1:1);;; (1:358, 1:401, 221:240, 1:1);;; (1:358, 1:401, 241:260, 1:1);;; (1:358, 1:401, 261:280, 1:1);;; (1:358, 1:401, 281:299, 1:1);;;; (1:358, 1:401, 1:20, 2:2);;; (1:358, 1:401, 21:40, 2:2);;; (1:358, 1:401, 41:60, 2:2);;; (1:358, 1:401, 61:80, 2:2);;; (1:358, 1:401, 81:100, 2:2);;; (1:358, 1:401, 101:120, 2:2);;; (1:358, 1:401, 121:140, 2:2);;; (1:358, 1:401, 141:160, 2:2);;; (1:358, 1:401, 161:180, 2:2);;; (1:358, 1:401, 181:200, 2:2);;; (1:358, 1:401, 201:220, 2:2);;; (1:358, 1:401, 221:240, 2:2);;; (1:358, 1:401, 241:260, 2:2);;; (1:358, 1:401, 261:280, 2:2);;; (1:358, 1:401, 281:299, 2:2)])
(i, chunks) = (2, [(1:358, 1:401, 1:10, 1:1);;; (1:358, 1:401, 11:30, 1:1);;; (1:358, 1:401, 31:50, 1:1);;; (1:358, 1:401, 51:70, 1:1);;; (1:358, 1:401, 71:90, 1:1);;; (1:358, 1:401, 91:110, 1:1);;; (1:358, 1:401, 111:130, 1:1);;; (1:358, 1:401, 131:150, 1:1);;; (1:358, 1:401, 151:170, 1:1);;; (1:358, 1:401, 171:190, 1:1);;; (1:358, 1:401, 191:210, 1:1);;; (1:358, 1:401, 211:230, 1:1);;; (1:358, 1:401, 231:250, 1:1);;; (1:358, 1:401, 251:270, 1:1);;; (1:358, 1:401, 271:290, 1:1);;; (1:358, 1:401, 291:299, 1:1);;;; (1:358, 1:401, 1:10, 2:2);;; (1:358, 1:401, 11:30, 2:2);;; (1:358, 1:401, 31:50, 2:2);;; (1:358, 1:401, 51:70, 2:2);;; (1:358, 1:401, 71:90, 2:2);;; (1:358, 1:401, 91:110, 2:2);;; (1:358, 1:401, 111:130, 2:2);;; (1:358, 1:401, 131:150, 2:2);;; (1:358, 1:401, 151:170, 2:2);;; (1:358, 1:401, 171:190, 2:2);;; (1:358, 1:401, 191:210, 2:2);;; (1:358, 1:401, 211:230, 2:2);;; (1:358, 1:401, 231:250, 2:2);;; (1:358, 1:401, 251:270, 2:2);;; (1:358, 1:401, 271:290, 2:2);;; (1:358, 1:401, 291:299, 2:2)])
size(chunks) = (1, 1, 16, 2)
size(eachchunk(cl[i])) = (1, 1, 15, 2)
ERROR: Trying to concatenate cubes with different chunk sizes. Consider manually setting a common chunk size using `setchunks`.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] concatenatecubes(cl::Vector{YAXArray{Union{Missing, Float32}, 4, A, Vector{CubeAxis}} where A<:AbstractArray{Union{Missing, Float32}, 4}}, cataxis::CategoricalAxis{String, :IMF, Vector{String}})
   @ YAXArrays.Cubes ~/Documents/papers_wip/EMDAmazonas/dev/YAXArrays/src/Cubes/TransformedCubes.jl:48
 [3] top-level scope
   @ REPL[103]:1
``

@felixcremer
Copy link
Member

It doesn't find the backend from the filename. Maybe this only happens for .zarr files.

@lazarusA
Copy link
Collaborator

Saving a dataset and then reopening it leads to the following error:

julia> ds2 = open_dataset("data/concatedds.zarr")
ERROR: MethodError: no method matching typemax(::Type{Union{Missing, Float32}})
Closest candidates are:
  typemax(::Union{DateTime, Type{DateTime}}) at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/Dates/src/types.jl:453
  typemax(::Union{Date, Type{Date}}) at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/Dates/src/types.jl:455
  typemax(::Union{Dates.Time, Type{Dates.Time}}) at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/Dates/src/types.jl:457
  ...
Stacktrace:
 [1] DiskArrayTools.CFDiskArray(a::ZArray{Union{Missing, Float32}, 3, Zarr.BloscCompressor, DirectoryStore}, attr::Dict{String, Any})
   @ DiskArrayTools ~/.julia/packages/DiskArrayTools/rEAgw/src/DiskArrayTools.jl:164
 [2] open_dataset(g::String; driver::Symbol)
   @ YAXArrays.Datasets ~/Documents/SeasFire/dev/YAXArrays/src/DatasetAPI/Datasets.jl:272
 [3] open_dataset(g::String)
   @ YAXArrays.Datasets ~/Documents/SeasFire/dev/YAXArrays/src/DatasetAPI/Datasets.jl:242
 [4] top-level scope
   @ REPL[2]:1

@meggart
Copy link
Member Author

meggart commented May 20, 2022

@lazarusA are you on this PR branch or do you get the error on master?

@lazarusA
Copy link
Collaborator

@meggart yes, this is on this branch. Right @felixcremer?
We were trying to save a new cube with attributes attached to new variables. Like units, long_name, attributes that you normally have in .nc files without success.

@felixcremer
Copy link
Member

Yes this was on this branch. I also get this from the Gdalcube we optimzed yesterday. You can open the cube, when you use nonmissingtype in the DiskarrayTools function that throws the error. But I am not sure, whether this is the way
to fix this.

@meggart
Copy link
Member Author

meggart commented May 20, 2022

Can you please send me a full path to a file you are trying to open?

@felixcremer
Copy link
Member

The error happened with a local fluxcom dataset and after saving it with this branch.

@meggart
Copy link
Member Author

meggart commented May 20, 2022

I meant an MWE, it is really hard to reproduce without

meggart and others added 11 commits June 3, 2022 08:41
We want, that the buffersize is at most the length along the given dimension.
Therefore we are penalizing buffers which are larger than the dimension length.
This also surfaces the writefac parameter to the savecube function.
@meggart meggart mentioned this pull request Jul 7, 2022
@felixcremer
Copy link
Member

This is superseeded by #150

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants