Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Put BufWriter into TrackedWrite #3361

Merged
merged 4 commits into from
Dec 19, 2022
Merged

Put BufWriter into TrackedWrite #3361

merged 4 commits into from
Dec 19, 2022

Conversation

viirya
Copy link
Member

@viirya viirya commented Dec 17, 2022

Which issue does this PR close?

Closes #3366.

Rationale for this change

To improve writing performance.

write_batch primitive/4096 values primitive      
                        time:   [817.22 µs 822.51 µs 828.48 µs]                                                    
                        thrpt:  [213.07 MiB/s 214.61 MiB/s 216.00 MiB/s]                                           
                 change:                                                                                           
                        time:   [-65.811% -65.543% -65.248%] (p = 0.00 < 0.05)
                        thrpt:  [+187.76% +190.22% +192.49%]                                                       
                        Performance has improved.                                                                  
Found 2 outliers among 100 measurements (2.00%)                                                                    
  2 (2.00%) high severe                          
write_batch primitive/4096 values primitive with bloom filter                 
                        time:   [6.8151 ms 6.8616 ms 6.9182 ms]
                        thrpt:  [25.516 MiB/s 25.726 MiB/s 25.901 MiB/s]
                 change:                                                                                           
                        time:   [-13.979% -12.918% -11.881%] (p = 0.00 < 0.05)                                                                                                                                                         
                        thrpt:  [+13.483% +14.834% +16.251%]
                        Performance has improved.                                                                  
Found 13 outliers among 100 measurements (13.00%)                                                                  
  5 (5.00%) high mild                                                                                              
  8 (8.00%) high severe                                                                                            
write_batch primitive/4096 values primitive non-null                                                               
                        time:   [699.61 µs 704.04 µs 708.41 µs]
                        thrpt:  [244.35 MiB/s 245.87 MiB/s 247.43 MiB/s]
                 change:                         
                        time:   [-69.195% -68.820% -68.475%] (p = 0.00 < 0.05)
                        thrpt:  [+217.21% +220.71% +224.63%]                                                       
                        Performance has improved.                                                                                                                                                                                      
Found 1 outliers among 100 measurements (1.00%)                                                                    
  1 (1.00%) high mild                                                                                              
write_batch primitive/4096 values primitive non-null with bloom filter                                             
                        time:   [6.7241 ms 6.7664 ms 6.8179 ms]               
                        thrpt:  [25.390 MiB/s 25.583 MiB/s 25.744 MiB/s]      
                 change:                                                                                           
                        time:   [-12.194% -11.531% -10.773%] (p = 0.00 < 0.05)
                        thrpt:  [+12.074% +13.034% +13.887%]
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)                                                                    
  1 (1.00%) high mild                                                                                                                                                                                                                  
  8 (8.00%) high severe                                                                                            
write_batch primitive/4096 values bool                                                                             
                        time:   [109.06 µs 111.13 µs 113.51 µs]         
                        thrpt:  [10.149 MiB/s 10.367 MiB/s 10.564 MiB/s]
                 change:                                                                                           
                        time:   [-70.096% -69.467% -68.691%] (p = 0.00 < 0.05)
                        thrpt:  [+219.39% +227.51% +234.40%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)                                                                    
  1 (1.00%) high mild                               
  3 (3.00%) high severe                                                                                            
write_batch primitive/4096 values bool non-null  
                        time:   [84.165 µs 84.416 µs 84.695 µs]
                        thrpt:  [7.8371 MiB/s 7.8629 MiB/s 7.8864 MiB/s]
                 change:   
                        time:   [-76.492% -75.147% -74.298%] (p = 0.00 < 0.05)
                        thrpt:  [+289.07% +302.36% +325.38%]
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  9 (9.00%) high severe
write_batch primitive/4096 values string
                        time:   [396.50 µs 397.06 µs 397.79 µs]
                        thrpt:  [200.19 MiB/s 200.56 MiB/s 200.85 MiB/s]
                 change:
                        time:   [-56.044% -55.808% -55.529%] (p = 0.00 < 0.05)
                        thrpt:  [+124.87% +126.28% +127.50%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe
write_batch primitive/4096 values string with bloom filter
                        time:   [2.3735 ms 2.3988 ms 2.4264 ms]
                        thrpt:  [32.820 MiB/s 33.199 MiB/s 33.551 MiB/s]
                 change:
                        time:   [-3.0282% -1.8635% -0.6440%] (p = 0.00 < 0.05)
                        thrpt:  [+0.6482% +1.8989% +3.1228%]
                        Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
  6 (6.00%) high mild
  8 (8.00%) high severe
write_batch primitive/4096 values string dictionary
                        time:   [231.39 µs 231.85 µs 232.38 µs]
                        thrpt:  [207.16 MiB/s 207.64 MiB/s 208.06 MiB/s]
                 change:
                        time:   [-57.620% -55.943% -54.979%] (p = 0.00 < 0.05)
                        thrpt:  [+122.12% +126.98% +135.96%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
Benchmarking write_batch primitive/4096 values string dictionary with bloom filter: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.0s, enable flat sampling, or reduce sample count to 60.
write_batch primitive/4096 values string dictionary with bloom filter
                        time:   [1.1996 ms 1.2117 ms 1.2253 ms]
                        thrpt:  [39.289 MiB/s 39.731 MiB/s 40.131 MiB/s]
                 change:
                        time:   [-7.5271% -6.1918% -4.7965%] (p = 0.00 < 0.05)
                        thrpt:  [+5.0381% +6.6005% +8.1398%]
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  9 (9.00%) high severe
write_batch primitive/4096 values string non-null
                        time:   [474.16 µs 479.55 µs 484.84 µs]
                        thrpt:  [162.24 MiB/s 164.03 MiB/s 165.89 MiB/s]
                 change:
                        time:   [-50.271% -49.696% -49.133%] (p = 0.00 < 0.05)
                        thrpt:  [+96.591% +98.792% +101.09%]
                        Performance has improved.
write_batch primitive/4096 values string non-null with bloom filter
                        time:   [2.6152 ms 2.6373 ms 2.6627 ms]
                        thrpt:  [29.541 MiB/s 29.826 MiB/s 30.078 MiB/s]
                 change:
                        time:   [-0.2535% +1.2207% +2.7549%] (p = 0.11 > 0.05)
                        thrpt:  [-2.6811% -1.2059% +0.2541%]
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe

Benchmarking write_batch nested/4096 values primitive list: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.7s, enable flat sampling, or reduce sample count to 60.
write_batch nested/4096 values primitive list
                        time:   [1.1079 ms 1.1170 ms 1.1322 ms]
                        thrpt:  [144.75 MiB/s 146.71 MiB/s 147.93 MiB/s]
                 change:
                        time:   [-41.536% -41.148% -40.713%] (p = 0.00 < 0.05)
                        thrpt:  [+68.672% +69.916% +71.045%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
Benchmarking write_batch nested/4096 values primitive list non-null: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.6s, enable flat sampling, or reduce sample count to 60.
write_batch nested/4096 values primitive list non-null
                        time:   [1.3208 ms 1.3322 ms 1.3435 ms]
                        thrpt:  [141.83 MiB/s 143.04 MiB/s 144.27 MiB/s]
                 change:
                        time:   [-36.827% -36.343% -35.860%] (p = 0.00 < 0.05)
                        thrpt:  [+55.909% +57.092% +58.296%]
                        Performance has improved.

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the parquet Changes to the parquet crate label Dec 17, 2022
Comment on lines +318 to +319
let path = env::temp_dir().join("arrow_writer.temp");
let file = File::create(path).unwrap();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to write batch to file.

@@ -45,16 +45,17 @@ use crate::schema::types::{

/// A wrapper around a [`Write`] that keeps track of the number
/// of bytes that have been written
pub struct TrackedWrite<W> {
inner: W,
pub struct TrackedWrite<W: Write> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could also update the doc comment to mention the addition of a BufWriter

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When writing to memory, does the additional BufWriter have an appreciable performance cost? That would be my only concern, otherwise this seems like a substantial win 👍

I also double-checked that large writes, such as when writing an entire page, will skip the buffer - https://doc.rust-lang.org/src/std/io/buffered/bufwriter.rs.html#364

@viirya
Copy link
Member Author

viirya commented Dec 19, 2022

When writing to memory, does the additional BufWriter have an appreciable performance cost? That would be my only concern, otherwise this seems like a substantial win 👍

Yea, on the benchmark of writing to in-memory writer, the additional BufWriter is slower. However I think file writer is preferred as this is for Parquet writer so I assume the practice usage is file-based instead of in-memory which is basically for test purpose.

@tustvold
Copy link
Contributor

IOx writes to memory prior to flushing to object storage... How bad is the regression?

@viirya
Copy link
Member Author

viirya commented Dec 19, 2022

For no-bloom filter ones, the regression is slightly less than 6.9%, but for bool types, it is 50~143% improvement.

For bloom filter ones, 54% (string non-null with bloom filter) regression is worst one.

write_batch primitive/4096 values primitive                                                                        
                        time:   [645.50 µs 646.45 µs 647.66 µs]                                                    
                        thrpt:  [272.55 MiB/s 273.06 MiB/s 273.46 MiB/s]                                           
                 change:
                        time:   [+3.2843% +3.6252% +3.9598%] (p = 0.00 < 0.05)                                     
                        thrpt:  [-3.8090% -3.4984% -3.1799%]                                                       
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)                                                                    
  1 (1.00%) high mild                                                                                              
write_batch primitive/4096 values primitive with bloom filter
                        time:   [4.7123 ms 4.7255 ms 4.7396 ms]               
                        thrpt:  [37.244 MiB/s 37.355 MiB/s 37.459 MiB/s]
                 change:                                                                                           
                        time:   [-3.1886% +0.7072% +4.7452%] (p = 0.73 > 0.05)                  
                        thrpt:  [-4.5302% -0.7023% +3.2936%]                                                                                                                                                                           
                        No change in performance detected.                    
Found 4 outliers among 100 measurements (4.00%)                                                                    
  3 (3.00%) high mild                                                                                              
  1 (1.00%) high severe                                                                                            
write_batch primitive/4096 values primitive non-null                                                                                                                                                                                   
                        time:   [518.44 µs 520.65 µs 522.66 µs]
                        thrpt:  [331.20 MiB/s 332.48 MiB/s 333.89 MiB/s]                                           
                 change:                                                                                           
                        time:   [-3.3975% -2.9062% -2.4188%] (p = 0.00 < 0.05)
                        thrpt:  [+2.4787% +2.9932% +3.5170%]                                                       
                        Performance has improved.                                                                  
Found 38 outliers among 100 measurements (38.00%)                                                                  
  15 (15.00%) low severe                       
  4 (4.00%) low mild                                                                                               
  1 (1.00%) high mild                                                                                              
  18 (18.00%) high severe                             
write_batch primitive/4096 values primitive non-null with bloom filter
                        time:   [4.5171 ms 4.5242 ms 4.5318 ms]
                        thrpt:  [38.198 MiB/s 38.262 MiB/s 38.321 MiB/s]
                 change:
                        time:   [+14.987% +15.610% +16.218%] (p = 0.00 < 0.05)
                        thrpt:  [-13.955% -13.503% -13.033%]
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe
write_batch primitive/4096 values bool
                        time:   [55.116 µs 55.555 µs 56.063 µs]
                        thrpt:  [20.549 MiB/s 20.737 MiB/s 20.902 MiB/s]
                 change:
                        time:   [-33.749% -33.206% -32.669%] (p = 0.00 < 0.05)
                        thrpt:  [+48.519% +49.713% +50.941%]
                        Performance has improved.
write_batch primitive/4096 values bool non-null
                        time:   [30.195 µs 30.200 µs 30.205 µs]
                        thrpt:  [21.975 MiB/s 21.979 MiB/s 21.983 MiB/s]
                 change:
                        time:   [-58.990% -58.969% -58.951%] (p = 0.00 < 0.05)
                        thrpt:  [+143.61% +143.72% +143.84%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe
write_batch primitive/4096 values string
                        time:   [288.85 µs 291.28 µs 294.04 µs]
                        thrpt:  [270.83 MiB/s 273.40 MiB/s 275.70 MiB/s]
                 change:
                        time:   [+6.2048% +6.8701% +7.5217%] (p = 0.00 < 0.05)
                        thrpt:  [-6.9955% -6.4285% -5.8423%]
                        Performance has regressed.
write_batch primitive/4096 values string with bloom filter
                        time:   [1.1640 ms 1.1804 ms 1.1949 ms]
                        thrpt:  [66.645 MiB/s 67.463 MiB/s 68.413 MiB/s]
                 change:
                        time:   [+30.161% +32.152% +34.602%] (p = 0.00 < 0.05)
                        thrpt:  [-25.707% -24.330% -23.172%]
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  8 (8.00%) high mild
write_batch primitive/4096 values string dictionary
                        time:   [153.55 µs 154.83 µs 156.01 µs]
                        thrpt:  [308.57 MiB/s 310.93 MiB/s 313.52 MiB/s]
                 change:
                        time:   [+0.9110% +1.8998% +2.9084%] (p = 0.00 < 0.05)
                        thrpt:  [-2.8262% -1.8644% -0.9028%]
                        Change within noise threshold.
write_batch primitive/4096 values string dictionary with bloom filter
                        time:   [561.37 µs 561.54 µs 561.71 µs]
                        thrpt:  [85.705 MiB/s 85.731 MiB/s 85.757 MiB/s]
                 change:
                        time:   [+13.056% +13.385% +13.678%] (p = 0.00 < 0.05)
                        thrpt:  [-12.032% -11.805% -11.548%]
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe
write_batch primitive/4096 values string non-null
                        time:   [329.27 µs 329.37 µs 329.49 µs]
                        thrpt:  [238.73 MiB/s 238.82 MiB/s 238.89 MiB/s]
                 change:
                        time:   [-0.6052% -0.3595% -0.0651%] (p = 0.00 < 0.05)
                        thrpt:  [+0.0651% +0.3608% +0.6088%]
                        Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  10 (10.00%) high severe
Benchmarking write_batch primitive/4096 values string non-null with bloom filter: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.6s, enable flat sampling, or reduce sample count to 60.
write_batch primitive/4096 values string non-null with bloom filter
                        time:   [1.3000 ms 1.3275 ms 1.3563 ms]
                        thrpt:  [57.995 MiB/s 59.253 MiB/s 60.506 MiB/s]
                 change:
                        time:   [+52.898% +54.765% +57.194%] (p = 0.00 < 0.05)
                        thrpt:  [-36.384% -35.386% -34.597%]
                        Performance has regressed.
Found 22 outliers among 100 measurements (22.00%)
  1 (1.00%) high mild
  21 (21.00%) high severe

write_batch nested/4096 values primitive list
                        time:   [958.15 µs 964.37 µs 970.71 µs]
                        thrpt:  [168.82 MiB/s 169.93 MiB/s 171.04 MiB/s]
                 change:
                        time:   [+3.4025% +4.0153% +4.6257%] (p = 0.00 < 0.05)
                        thrpt:  [-4.4212% -3.8603% -3.2905%]
                        Performance has regressed.
Benchmarking write_batch nested/4096 values primitive list non-null: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.6s, enable flat sampling, or reduce sample count to 60.
write_batch nested/4096 values primitive list non-null
                        time:   [1.0984 ms 1.0990 ms 1.0999 ms]
                        thrpt:  [173.25 MiB/s 173.38 MiB/s 173.48 MiB/s]
                 change:
                        time:   [+1.2422% +1.3060% +1.3765%] (p = 0.00 < 0.05)
                        thrpt:  [-1.3578% -1.2892% -1.2269%]
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high severe

@viirya
Copy link
Member Author

viirya commented Dec 19, 2022

Updated: By implementing some functions of Write, most regressions disappear.

Still regression ~25%: string bloom filter, string dictionary with bloom filter, string non-null with bloom filter.

write_batch primitive/4096 values primitive
                        time:   [625.94 µs 626.93 µs 628.10 µs]
                        thrpt:  [281.04 MiB/s 281.57 MiB/s 282.01 MiB/s]
                 change:                                                                                           
                        time:   [-0.0703% +0.2373% +0.5201%] (p = 0.12 > 0.05)                                     
                        thrpt:  [-0.5174% -0.2367% +0.0704%]                  
                        No change in performance detected.  
Found 7 outliers among 100 measurements (7.00%)                                                                    
  6 (6.00%) high mild                                                                                              
  1 (1.00%) high severe                                                                                            
write_batch primitive/4096 values primitive with bloom filter                               
                        time:   [3.9652 ms 3.9857 ms 4.0075 ms]                                                                                                                                                                        
                        thrpt:  [44.048 MiB/s 44.289 MiB/s 44.518 MiB/s]
                 change:                                                                                           
                        time:   [-3.7486% -3.1900% -2.6823%] (p = 0.00 < 0.05)
                        thrpt:  [+2.7562% +3.2951% +3.8946%]
                        Performance has improved.                                                                  
Found 15 outliers among 100 measurements (15.00%)                                                                  
  15 (15.00%) high mild                               
write_batch primitive/4096 values primitive non-null
                        time:   [537.44 µs 540.51 µs 543.10 µs]
                        thrpt:  [318.73 MiB/s 320.26 MiB/s 322.09 MiB/s]
                 change:
                        time:   [-2.6202% -1.7441% -0.7530%] (p = 0.00 < 0.05)
                        thrpt:  [+0.7587% +1.7751% +2.6907%]
                        Change within noise threshold.
write_batch primitive/4096 values primitive non-null with bloom filter
                        time:   [3.8815 ms 3.8926 ms 3.9050 ms]
                        thrpt:  [44.328 MiB/s 44.470 MiB/s 44.597 MiB/s]
                 change:
                        time:   [-2.3063% -1.9639% -1.6055%] (p = 0.00 < 0.05)
                        thrpt:  [+1.6317% +2.0032% +2.3608%]
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe
write_batch primitive/4096 values bool
                        time:   [51.376 µs 52.002 µs 52.541 µs]
                        thrpt:  [21.926 MiB/s 22.154 MiB/s 22.424 MiB/s]
                 change:
                        time:   [-40.319% -39.705% -39.077%] (p = 0.00 < 0.05)
                        thrpt:  [+64.143% +65.851% +67.558%]
                        Performance has improved.
Found 23 outliers among 100 measurements (23.00%)                                                                  
  23 (23.00%) high mild                                                                                            
write_batch primitive/4096 values bool non-null                                                                                                                                                                                        
                        time:   [30.089 µs 30.094 µs 30.099 µs]
                        thrpt:  [22.053 MiB/s 22.056 MiB/s 22.060 MiB/s]
                 change:
                        time:   [-59.123% -59.067% -58.987%] (p = 0.00 < 0.05)
                        thrpt:  [+143.83% +144.30% +144.64%]
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe
write_batch primitive/4096 values string
                        time:   [275.84 µs 278.10 µs 280.72 µs]
                        thrpt:  [283.68 MiB/s 286.35 MiB/s 288.70 MiB/s]
                 change:
                        time:   [-0.0601% +1.0398% +2.2510%] (p = 0.08 > 0.05)
                        thrpt:  [-2.2014% -1.0291% +0.0602%]
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  12 (12.00%) high severe
write_batch primitive/4096 values string with bloom filter
                        time:   [966.97 µs 988.71 µs 1.0155 ms]
                        thrpt:  [78.417 MiB/s 80.545 MiB/s 82.356 MiB/s]
                 change:
                        time:   [+20.632% +24.001% +27.467%] (p = 0.00 < 0.05)
                        thrpt:  [-21.548% -19.356% -17.103%]
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  6 (6.00%) low mild
  2 (2.00%) high mild
  8 (8.00%) high severe
write_batch primitive/4096 values string dictionary
                        time:   [150.19 µs 151.71 µs 153.53 µs]
                        thrpt:  [313.56 MiB/s 317.32 MiB/s 320.54 MiB/s]
                 change:
                        time:   [+2.4272% +3.6178% +4.7690%] (p = 0.00 < 0.05)
                        thrpt:  [-4.5519% -3.4915% -2.3697%]
                        Performance has regressed.
write_batch primitive/4096 values string dictionary with bloom filter
                        time:   [549.69 µs 559.42 µs 567.68 µs]
                        thrpt:  [84.805 MiB/s 86.056 MiB/s 87.579 MiB/s]
                 change:
                        time:   [+23.015% +26.986% +31.032%] (p = 0.00 < 0.05)
                        thrpt:  [-23.683% -21.251% -18.709%]
                        Performance has regressed.
write_batch primitive/4096 values string non-null
                        time:   [339.09 µs 342.80 µs 346.21 µs]
                        thrpt:  [227.20 MiB/s 229.46 MiB/s 231.97 MiB/s]
                 change:
                        time:   [+1.3603% +2.0031% +2.7986%] (p = 0.00 < 0.05)
                        thrpt:  [-2.7224% -1.9638% -1.3421%]
                        Performance has regressed.
Found 23 outliers among 100 measurements (23.00%)
  23 (23.00%) high mild
Benchmarking write_batch primitive/4096 values string non-null with bloom filter: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.4s, enable flat sampling, or reduce sample count to 60.
write_batch primitive/4096 values string non-null with bloom filter
                        time:   [1.1739 ms 1.1742 ms 1.1746 ms]
                        thrpt:  [66.968 MiB/s 66.988 MiB/s 67.004 MiB/s]
                 change:
                        time:   [+21.914% +25.172% +28.517%] (p = 0.00 < 0.05)
                        thrpt:  [-22.189% -20.110% -17.975%]
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

write_batch nested/4096 values primitive list
                        time:   [935.44 µs 942.05 µs 948.25 µs]
                        thrpt:  [172.82 MiB/s 173.96 MiB/s 175.19 MiB/s]
                 change:
                        time:   [-0.6966% -0.0169% +0.6661%] (p = 0.96 > 0.05)
                        thrpt:  [-0.6617% +0.0169% +0.7015%]
                        No change in performance detected.
Benchmarking write_batch nested/4096 values primitive list non-null: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.8s, enable flat sampling, or reduce sample count to 60.
write_batch nested/4096 values primitive list non-null
                        time:   [1.1124 ms 1.1230 ms 1.1339 ms]
                        thrpt:  [168.05 MiB/s 169.68 MiB/s 171.30 MiB/s]
                 change:
                        time:   [+0.2012% +1.2888% +2.3253%] (p = 0.02 < 0.05)
                        thrpt:  [-2.2725% -1.2724% -0.2008%]
                        Change within noise threshold.

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can live with a slight regression for writing to memory, I am still somewhat surprised by it as the page data should be large enough to skip the buffer, but perhaps the benchmarks are just unfortunately sized

@tustvold tustvold merged commit e2abb4b into apache:master Dec 19, 2022
@ursabot
Copy link

ursabot commented Dec 19, 2022

Benchmark runs are scheduled for baseline = 2cf4abb and contender = e2abb4b. e2abb4b is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@alamb alamb mentioned this pull request Dec 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speed up TrackedWrite
3 participants