Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support writing nested lists to parquet #1746

Merged
merged 10 commits into from
May 27, 2022

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented May 25, 2022

Which issue does this PR close?

Closes #1744.

Rationale for this change

The existing logic to handle nested types was incredibly convoluted, performed a lot of unnecessary work when writing non-nested types, and it was unclear how it could be made to handle more than one layer of nested lists.

What changes are included in this PR?

This PR therefore reworks the levels computation logic to hopefully be easier to understand, and support arbitrary nesting of lists.

Are there any user-facing changes?

No

@github-actions github-actions bot added the parquet Changes to the parquet crate label May 25, 2022
@tustvold
Copy link
Contributor Author

This actually has an issue with nulls in struct arrays with non-null slices in child list arrays... Working on fixing...

@tustvold tustvold force-pushed the rework-writer-levels branch 2 times, most recently from 94b30f0 to b50ba82 Compare May 26, 2022 15:41
@tustvold tustvold force-pushed the rework-writer-levels branch from b50ba82 to 9ede644 Compare May 26, 2022 15:42
@tustvold tustvold marked this pull request as ready for review May 26, 2022 15:46
@tustvold
Copy link
Contributor Author

I think this is now ready for review, going to work on getting a few more esoteric tests, but I think the meat of it is ready

@tustvold tustvold changed the title Support writing arbitrarily nested arrow arrays Support writing nested lists to parquet May 26, 2022

#[test]
fn test_calculate_array_levels_twitter_example() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm currently working through porting these tests

@codecov-commenter
Copy link

codecov-commenter commented May 26, 2022

Codecov Report

Merging #1746 (8b3c37f) into master (722fcfc) will increase coverage by 0.18%.
The diff coverage is 97.49%.

❗ Current head 8b3c37f differs from pull request most recent head abe5c36. Consider uploading reports for the commit abe5c36 to get more accurate results

@@            Coverage Diff             @@
##           master    #1746      +/-   ##
==========================================
+ Coverage   83.27%   83.46%   +0.18%     
==========================================
  Files         195      196       +1     
  Lines       55896    55895       -1     
==========================================
+ Hits        46549    46653     +104     
+ Misses       9347     9242     -105     
Impacted Files Coverage Δ
parquet/src/arrow/levels.rs 96.69% <95.90%> (+11.89%) ⬆️
parquet/src/arrow/arrow_writer.rs 97.77% <97.56%> (+0.11%) ⬆️
arrow/src/compute/kernels/aggregate.rs 75.26% <100.00%> (+2.03%) ⬆️
arrow/src/compute/kernels/concat_elements.rs 100.00% <100.00%> (ø)
arrow/src/ipc/reader.rs 90.74% <100.00%> (+2.01%) ⬆️
arrow/src/datatypes/datatype.rs 65.42% <0.00%> (-0.38%) ⬇️
parquet_derive/src/parquet_field.rs 65.75% <0.00%> (-0.23%) ⬇️
arrow/src/array/transform/mod.rs 86.85% <0.00%> (+0.11%) ⬆️
parquet/src/encodings/encoding.rs 93.65% <0.00%> (+0.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 722fcfc...abe5c36. Read the comment docs.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That certainly seems to have made the code simpler, as measured by lines ;) Thank you for the efforts @tustvold

I also read all the code carefully. While I can't say I got every line I would feel much better about debugging and maintaining the code after this PR than before.

There appear to be some non trivial changes to the tests and the cases that are covered -- I wonder if this was intentional or if you are still working to port remaining tests?

Also I don't really have a sense for any end to end testing we have (like do we have tests that round trip StructArrays to and back from parquet? That might be worth writing if we don't have them yet (as a follow on PR??)

cc @nevi-me and @helgikrs in case you have time or interest in reviewing this code

I wonder if @ahmedriza has time to try his reproducer out with the code in this PR?

let indices = levels.filter_array_indices();
// Slice array according to computed offset and length
let column = column.slice(levels.offset, levels.length);
// TODO: Avoid filtering if no need
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a follow on ticket to track? It doesn't seem like this PR is any worse in the delpartment

Copy link
Contributor Author

@tustvold tustvold May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a hold over from an intermediate version of this PR, you can only avoid doing the "take" operation if there are no-nulls which is relatively rare, will remove

/// i.e. a leaf array with no children
fn is_leaf(data_type: &DataType) -> bool {
matches!(
data_type,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this would be more future proof if it were !matches!(DataType::Dictionary....) -- aka explicitly list out the types that are handled in the rest of this module rather than trying to enumerate the converse.

I am just thinking about what would happen if someone added a new non leaf type to arrow's DataType

Copy link
Contributor Author

@tustvold tustvold May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow? The intent is that we can match just dictionaries where the value type is a leaf, as this can be handled transparently. We don't support complex dictionary value types (and for what it is worth neither does Arrow C++). We used to simply assume that dictionary types were primitive, this is completing a TODO I encountered in the code - https://github.com/apache/arrow-rs/pull/1746/files/b104aba76a6868e4296eb0f5a7b4fd0b8960eb62#diff-4b715628e2e0ae6f66e590227d9587cd5f2155055a59ad6b7b0dc7b1914ad8edL316

Edit: Oh I see what you're saying, I think if a new arrow type were added we would likely need additional work in the parquet writer to support it, and so I think not automatically supporting it is probably safer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: Oh I see what you're saying, I think if a new arrow type were added we would likely need additional work in the parquet writer to support it, and so I think not automatically supporting it is probably safer.

Yes, thank you -- this is what I was trying to say 💯

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't support complex dictionary value types (and for what it is worth neither does Arrow C++)

I didn't realize that. TIL!

/// A helper to construct [`LevelInfo`] from a potentially nested [`Field`]
enum LevelInfoBuilder {
Primitive(LevelInfo),
List(Box<LevelInfoBuilder>, LevelContext),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might help to add some docstrings here about what the Box<LevelInfoBuilder> and Vec<LevelBuilder> represent

Comment on lines 312 to 314
for _ in 0..len {
def_levels.push(ctx.def_level - 1);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this would more "idomatic" if it were written like:

def_levels
  .extend(
     std::iter::repeat(ctx.def_level - 1)
    .take(len)
  )

? Though I admit the current code seems more readable 👍

let mut last_non_null_idx = None;
let mut last_null_idx = None;

// TODO: BitChunkIterator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a TODO for this PR or for a follow on?

};
let array_offsets = vec![0, 2, 2, 4, 8, 11];
let array_mask = vec![true, false, true, true, true];
let leaf_array = Int32Array::from_iter([0, 0, 2, 2, 3, 3, 3, 3, 4, 4, 4]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we seem to have lost a substantial number of verifications in this test. Is that intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff is being unhelpful, once I've ported the remaining tests it should sort itself out

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will check it again when you are done porting the test

parquet/src/arrow/levels.rs Outdated Show resolved Hide resolved
parquet/src/arrow/levels.rs Show resolved Hide resolved
parquet/src/arrow/levels.rs Show resolved Hide resolved
offset: 0,
length: 5,
fn test_struct_mask_list() {
// Test a struct array masking a list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what "masking a list" means in this case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reworded to hopefully make it clearer

@nevi-me nevi-me self-requested a review May 26, 2022 19:43
@ahmedriza
Copy link

ahmedriza commented May 26, 2022

This is fantastic. Checked my test cases against this PR and they work perfectly now. Also checked a few other combinations and they are all handled correctly now.

  • Empty arrays in bids column:
+---+----------------------+----+
|id |prices                |bids|
+---+----------------------+----+
|t1 |[[GBP, 3.14, [[LON]]]]|[]  |
|t2 |[[USD, 4.14, [[NYC]]]]|[]  |
+---+----------------------+----+
  • Nulls in bids column:
+---+----------------------+----+
|id |prices                |bids|
+---+----------------------+----+
|t1 |[[GBP, 3.14, [[LON]]]]|null|
|t2 |[[USD, 4.14, [[NYC]]]]|null|
+---+----------------------+----+
  • Mixture of non-null and null in bids column
+---+----------------------+----------------------+
|id |prices                |bids                  |
+---+----------------------+----------------------+
|t1 |[[GBP, 3.14, [[LON]]]]|[[GBP, 3.04, [[LON]]]]|
|t2 |[[USD, 4.14, [[NYC]]]]|null                  |
+---+----------------------+----------------------+

Amazing work. Thanks for the quick fix.

// 2 3 [4] are 0
// 4 5 6 7 [8] are 1 (defined at level 1 only)
// 8 9 10 [11] are 2 (defined at both levels)
definition: vec![0, 0, 1, 0, 0, 3, 3, 3, 3, 3, 3, 3],
Copy link
Contributor Author

@tustvold tustvold May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is actually wrong, the levels are actually incoherent. This can be clearly seen from the first two levels.

The definition levels are [0, 0] implying it is null at the root for both, and yet it has repetition levels of [0, 1] which suggests a non-null slice of two elements, these are clearly not compatible.

I suspect the old code is not correctly handling a struct array masking a non-empty list slice of a child, and is producing a level for each entry in the child slice, instead of a single level for the struct's null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was trying to express here was the list's values even though the struct is null at that point. Hence you see d: [0, 0], r: [0, 1] to say that the value has 2 slots, which are all null due to the struct being null at that point.

Perhaps my oversight was not considering that this might not matter at all, and it's safer/better to do d: [0], r: [0].
What limited me a lot was my indexing logic, because given an child array of 11 values, producing fewer than 11 repetition levels would lose the offset information from the Arrow side.

Looking at your solution, you have 10 values vs my 12 (11 values and an extra value to encode the offset [2, 2] in the list). When slicing into the child list doesn't cause issues, your solution is fine/better.

I suppose the root problem is Arrow's logical null rules. You could have a 1GB record with all sorts of values, but if you wrap that against a null struct (e.g. struct<all the values>), then you get different behaviour:

  • IPC roundtrip would preserve the values
  • Writing to and reading from Parquet would drop all those null values.

I was trying to preserve data, but maybe while looking at the minor details, I ended up missing the bigger picture.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, handling structs masking lists tripped me up the first time also, I had to completely rewrite my rewrite to support it 😆

I think the important property is what you read out is "logically-equal" to what you wrote in, even if it doesn't have the exact same data in the backing buffers. StructArrays are one example where this would occur, but a sliced array would be another example where the underlying arrow buffers are not preserved. This is in fact critical to how the ArrowWriter batches row groups by slicing arrays.

IPC roundtrip would preserve the values

TBH I view this as a potentially undesirable implementation quirk than a desirable property, see #208.

I suppose the root problem is Arrow's logical null rules

I think you could be more general and say arrow has a relaxed approach to the backing representation of an array. This allows it to avoid copies, but comes at the cost of things like the never-ending bug-source that is ArrayData::offset() 😆

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's true. We also had a lot more logical-null bugs (esp slicing) at the time that I wrote the level logic. So it was harder to avoid the undesirable quirks at the time.

If one will have to materialise data, they might as well remove data that's been made null due to masking, so the new approach is more desirable.

// 3: [[108, 109], [110, 111], [112, 113], [114, 115]]
// 4: [[116, 117], [118, 119], [120, 121]]
definition: vec![
0, 0, 0, 0, 1, 0, 0, 0, 0, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, these definition levels are wrong. In the structure above the only non-max definition level will come from the empty slice.

@tustvold
Copy link
Contributor Author

Ok I think I've ported all the tests, there are some omissions because we no longer compute the levels for non-leaf arrays (whatever that actually means), and test_calculate_array_levels_2 was just wrong, but otherwise everything matches 🎉

@tustvold tustvold merged commit 8e1666a into apache:master May 27, 2022
@alamb
Copy link
Contributor

alamb commented May 27, 2022

🎉 arrow 15 is going to be a great release

@nevi-me
Copy link
Contributor

nevi-me commented May 27, 2022

I was still going through it, looks great, I hadn't seen anything that stood out, so I'm happy with the merge

@tustvold
Copy link
Contributor Author

Oh sorry, please let me know if you see anything and I'll be more than happy to address in a follow on PR 😅

@nevi-me
Copy link
Contributor

nevi-me commented May 27, 2022

Oh sorry, please let me know if you see anything and I'll be more than happy to address in a follow on PR 😅

I only added a comment explaining why the def/rep for <struct<list>> appeared the way they did in the old test. Thanks for fixing the logic :D

@nevi-me
Copy link
Contributor

nevi-me commented May 27, 2022

Worth advertising @tustvold @alamb is at least 30% improvement in the writer due to this PR. At least that's on an M1 Pro (now I'm curious about x86)

write_batch primitive/4096 values primitive                                                                             
                        time:   [1.1237 ms 1.1270 ms 1.1303 ms]
                        thrpt:  [156.08 MiB/s 156.54 MiB/s 156.99 MiB/s]
                 change:
                        time:   [-29.494% -28.991% -28.501%] (p = 0.00 < 0.05)
                        thrpt:  [+39.863% +40.828% +41.831%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe
write_batch primitive/4096 values primitive non-null                                                                            
                        time:   [979.35 us 982.76 us 986.33 us]
                        thrpt:  [178.86 MiB/s 179.51 MiB/s 180.13 MiB/s]
                 change:
                        time:   [-30.346% -29.722% -29.002%] (p = 0.00 < 0.05)
                        thrpt:  [+40.849% +42.291% +43.567%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe
write_batch primitive/4096 values bool                                                                            
                        time:   [62.749 us 63.655 us 64.860 us]
                        thrpt:  [17.527 MiB/s 17.859 MiB/s 18.116 MiB/s]
                 change:
                        time:   [-33.845% -32.730% -31.281%] (p = 0.00 < 0.05)
                        thrpt:  [+45.520% +48.655% +51.159%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
write_batch primitive/4096 values bool non-null                                                                             
                        time:   [40.799 us 40.848 us 40.901 us]
                        thrpt:  [27.793 MiB/s 27.830 MiB/s 27.863 MiB/s]
                 change:
                        time:   [-45.958% -45.620% -45.294%] (p = 0.00 < 0.05)
                        thrpt:  [+82.796% +83.891% +85.043%]
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe
write_batch primitive/4096 values string                                                                            
                        time:   [627.92 us 634.23 us 642.61 us]
                        thrpt:  [123.85 MiB/s 125.49 MiB/s 126.75 MiB/s]
                 change:
                        time:   [-23.875% -23.318% -22.771%] (p = 0.00 < 0.05)
                        thrpt:  [+29.486% +30.408% +31.362%]
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  8 (8.00%) high mild
  3 (3.00%) high severe
write_batch primitive/4096 values string non-null                                                                            
                        time:   [617.66 us 618.60 us 619.56 us]
                        thrpt:  [128.46 MiB/s 128.66 MiB/s 128.86 MiB/s]
                 change:
                        time:   [-26.558% -25.760% -25.150%] (p = 0.00 < 0.05)
                        thrpt:  [+33.601% +34.699% +36.162%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

Benchmarking write_batch nested/4096 values primitive list: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.4s, enable flat sampling, or reduce sample count to 60.
write_batch nested/4096 values primitive list                                                                             
                        time:   [1.2680 ms 1.2695 ms 1.2709 ms]
                        thrpt:  [128.87 MiB/s 129.01 MiB/s 129.16 MiB/s]
                 change:
                        time:   [-33.645% -33.420% -33.197%] (p = 0.00 < 0.05)
                        thrpt:  [+49.694% +50.195% +50.705%]
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  1 (1.00%) low severe
  6 (6.00%) low mild
  2 (2.00%) high mild
  5 (5.00%) high severe
Benchmarking write_batch nested/4096 values primitive list non-null: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.2s, enable flat sampling, or reduce sample count to 50.
write_batch nested/4096 values primitive list non-null                                                                             
                        time:   [1.6240 ms 1.6347 ms 1.6475 ms]
                        thrpt:  [117.49 MiB/s 118.41 MiB/s 119.19 MiB/s]
                 change:
                        time:   [-29.010% -28.463% -27.890%] (p = 0.00 < 0.05)
                        thrpt:  [+38.676% +39.788% +40.866%]
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  8 (8.00%) high mild
  5 (5.00%) high severe

@nevi-me
Copy link
Contributor

nevi-me commented May 27, 2022

amd64, 3900x on a pcie4 SSD, less dramatic but still great

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.0s, enable flat sampling, or reduce sample count to 50.
write_batch primitive/4096 values primitive
                        time:   [1.6179 ms 1.6298 ms 1.6462 ms]
                        thrpt:  [107.17 MiB/s 108.24 MiB/s 109.04 MiB/s]
                 change:
                        time:   [-31.768% -19.351% -7.9981%] (p = 0.00 < 0.05)
                        thrpt:  [+8.6934% +23.994% +46.559%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe
Benchmarking write_batch primitive/4096 values primitive non-null: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.1s, enable flat sampling, or reduce sample count to 50.
write_batch primitive/4096 values primitive non-null
                        time:   [1.3544 ms 1.3650 ms 1.3777 ms]
                        thrpt:  [128.05 MiB/s 129.25 MiB/s 130.25 MiB/s]
                 change:
                        time:   [-20.261% -14.076% -10.278%] (p = 0.00 < 0.05)
                        thrpt:  [+11.455% +16.382% +25.409%]
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe
write_batch primitive/4096 values bool
                        time:   [98.617 us 99.677 us 101.01 us]
                        thrpt:  [11.254 MiB/s 11.405 MiB/s 11.527 MiB/s]
                 change:
                        time:   [-42.532% -30.107% -17.429%] (p = 0.00 < 0.05)
                        thrpt:  [+21.108% +43.076% +74.009%]
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) high mild
  8 (8.00%) high severe
write_batch primitive/4096 values bool non-null
                        time:   [51.438 us 51.798 us 52.269 us]
                        thrpt:  [21.749 MiB/s 21.946 MiB/s 22.100 MiB/s]
                 change:
                        time:   [-36.532% -34.935% -33.116%] (p = 0.00 < 0.05)
                        thrpt:  [+49.513% +53.693% +57.560%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe
write_batch primitive/4096 values string
                        time:   [771.00 us 790.87 us 832.66 us]
                        thrpt:  [95.586 MiB/s 100.64 MiB/s 103.23 MiB/s]
                 change:
                        time:   [-8.6599% -2.6996% +8.1085%] (p = 0.73 > 0.05)
                        thrpt:  [-7.5003% +2.7745% +9.4810%]
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe
write_batch primitive/4096 values string non-null
                        time:   [780.70 us 785.66 us 793.50 us]
                        thrpt:  [100.30 MiB/s 101.30 MiB/s 101.95 MiB/s]
                 change:
                        time:   [-11.689% -9.2061% -6.3254%] (p = 0.00 < 0.05)
                        thrpt:  [+6.7525% +10.140% +13.236%]
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  8 (8.00%) high mild
  3 (3.00%) high severe

Benchmarking write_batch nested/4096 values primitive list: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.0s, enable flat sampling, or reduce sample count to 50.
write_batch nested/4096 values primitive list
                        time:   [1.5257 ms 1.5327 ms 1.5410 ms]
                        thrpt:  [106.28 MiB/s 106.86 MiB/s 107.35 MiB/s]
                 change:
                        time:   [-18.698% -10.026% +0.4958%] (p = 0.04 < 0.05)
                        thrpt:  [-0.4934% +11.143% +22.998%]
                        Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe
write_batch nested/4096 values primitive list non-null
                        time:   [1.9924 ms 2.0106 ms 2.0334 ms]
                        thrpt:  [95.194 MiB/s 96.274 MiB/s 97.150 MiB/s]
                 change:
                        time:   [-11.857% -10.947% -9.8362%] (p = 0.00 < 0.05)
                        thrpt:  [+10.909% +12.292% +13.452%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parquet write failure (from record batches) when data is nested two levels deep
5 participants