Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cat performance #3645

Closed
ViralBShah opened this issue Jul 7, 2013 · 16 comments
Closed

cat performance #3645

ViralBShah opened this issue Jul 7, 2013 · 16 comments
Labels
performance Must go faster

Comments

@ViralBShah
Copy link
Member

The cat performance compares the performance of various concatenation functions with a simple setindex based implementation. The performance benchmark suggests that something is not quite right, since the setindex versions are generally much faster:

small_hvcat          57.877
small_hvcat_setind   55.922
large_hvcat          11.106
large_hvcat_setind    4.726
small_hcat           21.654
small_hcat_setind    35.115
large_hcat            4.293
large_hcat_setind     4.306
small_vcat           64.364
small_vcat_setind    56.346
large_vcat            4.761
large_vcat_setind     4.710
small_catnd         581.283
small_catnd_setind  143.262
large_catnd           7.422
large_catnd_setind    4.897
@ViralBShah ViralBShah added this to the 0.4 milestone Apr 27, 2014
@ViralBShah
Copy link
Member Author

Things are better now:

julia,hvcat_small,5.841653,28.443581,7.240933,1.619755
julia,hvcat_large,5.855422,12.409402,8.069710,1.251207
julia,hvcat_setind_small,4.436304,8.289098,5.479964,0.786027
julia,hvcat_setind_large,5.882567,11.567854,8.076142,1.444297
julia,hcat_small,3.399357,6.442293,4.299652,0.643592
julia,hcat_large,4.687004,10.086189,7.417788,1.377123
julia,hcat_setind_small,4.385627,7.691682,5.325854,0.777748
julia,hcat_setind_large,5.387160,11.101184,7.654127,1.433939
julia,vcat_small,18.599786,27.051015,22.315713,1.944881
julia,vcat_large,5.777338,11.221460,8.061492,1.234265
julia,vcat_setind_small,4.661357,8.321572,5.696195,0.790171
julia,vcat_setind_large,5.738162,12.823408,8.011272,1.470064
julia,catnd_small,182.341470,196.423850,188.391843,4.648532
julia,catnd_large,11.016256,18.077194,14.226727,1.661805
julia,catnd_setind_small,30.878567,40.430500,35.517463,2.441230
julia,catnd_setind_large,5.563092,10.719209,8.023162,1.131493

@ViralBShah
Copy link
Member Author

Cc: @timholy @Jutho

@timholy
Copy link
Member

timholy commented Feb 1, 2015

If I'm reading it correctly there's still a big gap, e.g, with vcat_small. Have you profiled it?

@Jutho
Copy link
Contributor

Jutho commented Feb 1, 2015

catnd also seems pretty bad. Where can I find this benchmark code?

@jiahao
Copy link
Member

jiahao commented Feb 1, 2015

julia/test/perf/cat/perf.jl

@Jutho
Copy link
Contributor

Jutho commented Feb 1, 2015

Well, I can easily bring down the timings of catnd_small by a factor 3 and make those of catnd_large approximately equal to the setind version by replacing the internal cat_t function with a mutating cat! function and writing the latter as a stagedfunction.

@simonster
Copy link
Member

@Jutho I'm not sure we necessarily want to use a staged function here, at least not always. Someone might write cat(3, X...) where length(X) is very large (and possibly also varies across iterations of a loop, etc.). In that case taking a hit at runtime is probably better than generating specialized code.

@Jutho
Copy link
Contributor

Jutho commented Feb 1, 2015

The use of the stagedfunction is to write specialized code depending on the dimensionality of the output array, not specialize on the number of arguments. However, since I have no clue how a stagedfunction handles a varargs argument, I have no clue about the effect of this...

@simonster
Copy link
Member

I believe stagedfunctions are always fully specialized on varargs, or at least it seemed that way when I tested this.

@Jutho
Copy link
Contributor

Jutho commented Feb 1, 2015

Well, certainly not beyond 8 arguments, but that might of course change with #8974 . But is that truly different than the amount of specialization on varargs in normal functions? I really have no clue.

@simonster
Copy link
Member

I don't think they get type inference for >8 arguments, but a different function seems to be compiled for any number of varargs, e.g.:

julia> stagedfunction f(x...)
       :($(length(x)))
       end
f (generic function with 1 method)

julia> code_llvm(f, NTuple{100,Int})

define %jl_value_t* @julia_anonymous_134264(%jl_value_t*, %jl_value_t**, i32) {                                                               
top:                                                                                                                                          
  %3 = alloca [3 x %jl_value_t*], align 8                                                                                                     
  %.sub = getelementptr inbounds [3 x %jl_value_t*]* %3, i64 0, i64 0                                                                         
  %4 = getelementptr [3 x %jl_value_t*]* %3, i64 0, i64 2, !dbg !603                                                                          
  store %jl_value_t* inttoptr (i64 2 to %jl_value_t*), %jl_value_t** %.sub, align 8                                                           
  %5 = getelementptr [3 x %jl_value_t*]* %3, i64 0, i64 1, !dbg !603                                                                          
  %6 = load %jl_value_t*** @jl_pgcstack, align 8, !dbg !603                                                                                   
  %.c = bitcast %jl_value_t** %6 to %jl_value_t*, !dbg !603                                                                                   
  store %jl_value_t* %.c, %jl_value_t** %5, align 8, !dbg !603                                                                                
  store %jl_value_t** %.sub, %jl_value_t*** @jl_pgcstack, align 8, !dbg !603                                                                  
  store %jl_value_t* null, %jl_value_t** %4, align 8, !dbg !603                                                                               
  %7 = load %jl_value_t** %5, align 8, !dbg !603                                                                                              
  %8 = getelementptr inbounds %jl_value_t* %7, i64 0, i32 0, !dbg !603                                                                        
  store %jl_value_t** %8, %jl_value_t*** @jl_pgcstack, align 8, !dbg !603                                                                     
  ret %jl_value_t* inttoptr (i64 140626142645440 to %jl_value_t*), !dbg !603                                                                  
}

julia> unsafe_pointer_to_objref(convert(Ptr{Void}, 140626142645440))
100

(not sure that GC root is necessary, but the code is just returning 100)

Ordinarily I think varargs functions are not specialized at all. See #5402, although f there is now inlined so you need to put a dummy loop to see the suboptimal calling convention in code_llvm.

@vtjnash
Copy link
Member

vtjnash commented Aug 24, 2015

that PR eventually became #10338 before becoming subsumed by #7128. i'm going to assume that needs to be resolved first and remove the target milestone from this in the meantime.

@vtjnash vtjnash removed this from the 0.4.x milestone Aug 24, 2015
@Jutho
Copy link
Contributor

Jutho commented Aug 24, 2015

Indeed.

@ViralBShah
Copy link
Member Author

ViralBShah commented Jul 17, 2017

There seems to be a general slowdown in 0.6 across the board than earlier. The times are minimum, maximum, mean, and median.

julia,hvcat_small,7.336272,83.110470,9.151902,8.535927
julia,hvcat_large,6.195210,75.650329,9.320567,4.827748
julia,hvcat_setind_small,6.520121,11.430050,7.255832,0.744990
julia,hvcat_setind_large,6.038093,78.406094,9.554000,5.253685
julia,hcat_small,10.091909,22.847853,12.177417,2.105768
julia,hcat_large,5.449313,82.616243,9.762316,5.675814
julia,hcat_setind_small,5.350112,14.789836,7.732790,1.713924
julia,hcat_setind_large,5.640484,102.515004,9.175704,6.632492
julia,vcat_small,18.460564,24.171685,21.647538,1.211606
julia,vcat_large,6.181153,72.471150,9.499165,4.639837
julia,vcat_setind_small,5.896819,9.408016,7.532382,0.676082
julia,vcat_setind_large,6.327230,73.536199,9.397235,4.706905
julia,catnd_small,306.599528,318.999051,314.978989,5.774449
julia,catnd_large,8.290890,93.080390,12.043643,6.528367
julia,catnd_setind_small,34.817327,41.600884,37.610669,1.792527
julia,catnd_setind_large,5.847496,73.403590,9.057344,4.637332

@ViralBShah
Copy link
Member Author

Comparing with my own reports above from 2015, catnd_small is twice as slow as before.

@ViralBShah
Copy link
Member Author

Things appear singificantly improved here. I am not sure, but I am probably using a newer computer. I think we will need better targeted benchmarking if there's anything to do here.

julia,hvcat_small,4.396699,16.297507,6.608667,2.066157
julia,hvcat_large,3.235825,45.255710,4.303054,2.729414
julia,hvcat_setind_small,3.508096,7.443855,3.734079,0.333990
julia,hvcat_setind_large,2.954424,45.062366,4.296051,2.754234
julia,hcat_small,1.585352,4.087416,1.865025,0.304821
julia,hcat_large,2.568260,44.529903,3.658572,2.554980
julia,hcat_setind_small,3.553062,5.410028,3.776157,0.243217
julia,hcat_setind_large,2.678956,48.617825,4.238269,2.884385
julia,vcat_small,3.770044,6.230312,4.076772,0.331431
julia,vcat_large,3.296056,45.546644,4.428436,2.800709
julia,vcat_setind_small,3.533616,5.267436,3.708259,0.175622
julia,vcat_setind_large,2.655591,44.732889,4.285145,2.764478
julia,catnd_small,169.010907,178.360217,170.545414,2.971016
julia,catnd_large,5.419363,48.499889,6.802585,3.402186
julia,catnd_setind_small,13.439711,16.501161,14.084918,0.389261
julia,catnd_setind_large,3.996800,46.777139,5.237512,2.980763

IanButterworth pushed a commit that referenced this issue Oct 11, 2023
)

Co-authored-by: Dilum Aluthge <[email protected]>
Fix Pkg.precompile ext races (#3645)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

6 participants