-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3x slowdown after 1st run of readcsv #3441
Comments
Are you garbage collecting between trials? The fact that the first run is fast and the later runs are slower seems like a memory allocation issue. |
It does indeed seem to be a memory allocation issue. function timing()
gc_disable()
for i = 1:5
@time readcsv("sales2.csv")
end
gc_enable()
end Results julia> timing()
elapsed time: 3.064959336 seconds
elapsed time: 2.929288328 seconds
elapsed time: 3.025489937 seconds
elapsed time: 3.119300652 seconds
elapsed time: 3.113658087 seconds |
Can we close this? |
What would be the solution here? I guess I'm confused how a user is supposed to consistently have performance without being aware of the gc_enable/disable functions. |
Why would we close this? The issue doesn't seem addressed at all. Using mmap might help this. |
It's well known that disabling a GC will get you better performance right until you run out of memory and crash. |
This would be typical of any operation that taxes gc by allocating and releasing lots of memory/objects. Memory map may help here as @StefanKarpinski pointed out. If the cells are numeric, it may also help to clean the data and explicitly set the data type in |
I think I can do a better job tuning the gc. There are some knobs to twiddle. |
@karbarcca, it may not be applicable to your case here, but I've also found the profiler can be extremely useful for figuring out gc issues (of course, maybe you're already doing that...). For example, just today I was writing some code that, over the course of a couple hours, I managed to make 300x faster. Some of the bottlenecks were gc's triggered from unexpected places. For example, the majority of the time in my code was spent multiplying lots of 3x3 matrices times vectors, so it proved to be completely worth writing my own specialized function:
In my initial version of this function, it lacked the last line (returning |
These kinds of things always give me a mix of horror and delight – horror that you have to write out that 3x3 matmul like that, but delight that you can do that and get it to be as fast as you need. We'll get there. |
We do need more small-matrix optimizations. One problem especially is that the logic and code flow in matmul.jl has gotten really complex, and there are several layers of function calls. |
I fall mostly on the delight side. It took me, what, 2 minutes to write and wire into the various places in my code? Understanding the gc on the return value was more subtle and cost me more time, but nothing outrageous. The workflow "quickly write the simple version first" (which is really pleasant thanks to Julia's design and the nice library that everyone has been contributing to) -> "run it" -> "ugh, too slow" -> "profile it" -> "fix a few problems in places where it actually matters" -> "ah, that's much nicer!" is quite satisfying. If I had to write everything in C from the ground up it would be far more tedious. And doing a little hand-optimization gives us geeks something to make us feel like we're earning our keep. The only time I get frustrated is when I can't get the performance I know I deserve :-). But that's getting rarer all the time. I guess the major negative is that knowing what to do to solve performance problems does take some experience. Think of it as training---it makes us all stronger. I know I've gotten better thanks to the example code in julia and the help from the community. |
Well, that's a pleasant view of it. I guess the key points are that
|
Back to the original issue, does anyone besides me think it's suspicious that it's a 6s increment after the first run? In saying this, I'm presuming that:
If those two assumptions are true, why doesn't the second run take 0.1s longer, rather than 6s longer, than the first? I'm expecting that the first call to |
It has a lot to do with the GC's heap growth schedule, which can introduce a sliver of super-linear run time behavior. If the heap grows too slowly, it has to do many unnecessary collections as this very large data structure is filled in. |
- make it an error to resize an array with shared data (fixes #3430) - now able to use realloc to grow arrays (part of #3440, helps #3441) - the new scheme is simpler. one Array owns the data, instead of tracking the buffers separately as mallocptr_t - Array data can be allocated inline, with malloc, or from a pool
#261 does look awesome. Aside from the (much more important) performance improvements, presumably it would also make many finalizers run at more predictable times, which perhaps could be a good thing. Now it doesn't sound relevant, but in cases where "slivers" of poor performance appear for very specific parameter combinations, is it worth asking whether introducing some element of randomness could be your friend? |
@karbarcca can you try this again and see if there's any improvement? |
Unfortunately I can't get julia> @time readcsv("C:/Users/karbarcca/Google Drive/Dropbox/Dropbox/Sears/Teradata/Sears-Julia/sales2.csv")
symbol could not be found fcntl (-1): The specified procedure could not be found.
symbol could not be found lseek (-1): The specified procedure could not be found.
symbol could not be found lseek (-1): The specified procedure could not be found.
symbol could not be found pwrite (-1): The specified procedure could not be found.
symbol could not be found lseek (-1): The specified procedure could not be found.
symbol could not be found mmap (-1): The specified procedure could not be found.
ERROR: ccall: could not find function fcntl
in mmap_stream_settings at mmap.jl:74 (repeats 26666 times) I'll try to dig into it some more to get it to run. |
I think this issue should be closed, and the mmap should be a new issue. |
Ok, got it to work with the following and with multiple runs, I get consistent times. Is the
|
It seems like |
I've been trying out the new
readcsv
changes this morning and noticed something funny:Subsequent reads to the 1st are 3x; anyone know what's going on here? @tanmaykm?
This is great progress though. In my personal benchmarks,
R
'sread.csv
bare is 1.9x us, and with certain optimizations toread.csv
, we're still 1.4x. Great stuff!The text was updated successfully, but these errors were encountered: