3x slowdown after 1st run of readcsv #3441

quinnj · 2013-06-18T16:04:41Z

I've been trying out the new readcsv changes this morning and noticed something funny:

julia> timing()
elapsed time: 3.146338111 seconds
elapsed time: 9.017399806 seconds
elapsed time: 9.482192787 seconds
elapsed time: 9.330088505 seconds
elapsed time: 9.368527561 seconds
elapsed time: 9.520508683 seconds
elapsed time: 9.536091891 seconds
elapsed time: 9.311094323 seconds
elapsed time: 9.302004193 seconds
elapsed time: 9.412307084 seconds

Subsequent reads to the 1st are 3x; anyone know what's going on here? @tanmaykm?

This is great progress though. In my personal benchmarks, R's read.csv bare is 1.9x us, and with certain optimizations to read.csv, we're still 1.4x. Great stuff!

The text was updated successfully, but these errors were encountered:

dmbates · 2013-06-18T16:25:48Z

Are you garbage collecting between trials? The fact that the first run is fast and the later runs are slower seems like a memory allocation issue.

quinnj · 2013-06-18T16:35:32Z

It does indeed seem to be a memory allocation issue.

function timing()
    gc_disable()
    for i = 1:5
        @time readcsv("sales2.csv")
    end
    gc_enable()
 end

Results

julia> timing()
elapsed time: 3.064959336 seconds
elapsed time: 2.929288328 seconds
elapsed time: 3.025489937 seconds
elapsed time: 3.119300652 seconds
elapsed time: 3.113658087 seconds

ViralBShah · 2013-06-18T16:50:25Z

Can we close this?

quinnj · 2013-06-18T16:57:11Z

What would be the solution here? I guess I'm confused how a user is supposed to consistently have performance without being aware of the gc_enable/disable functions.

StefanKarpinski · 2013-06-18T17:22:42Z

Why would we close this? The issue doesn't seem addressed at all. Using mmap might help this.

JeffBezanson · 2013-06-18T17:32:01Z

It's well known that disabling a GC will get you better performance right until you run out of memory and crash.

tanmaykm · 2013-06-18T18:11:22Z

This would be typical of any operation that taxes gc by allocating and releasing lots of memory/objects. Memory map may help here as @StefanKarpinski pointed out. If the cells are numeric, it may also help to clean the data and explicitly set the data type in readcsv.

JeffBezanson · 2013-06-18T18:21:23Z

I think I can do a better job tuning the gc. There are some knobs to twiddle.

timholy · 2013-06-18T21:37:59Z

@karbarcca, it may not be applicable to your case here, but I've also found the profiler can be extremely useful for figuring out gc issues (of course, maybe you're already doing that...). For example, just today I was writing some code that, over the course of a couple hours, I managed to make 300x faster. Some of the bottlenecks were gc's triggered from unexpected places. For example, the majority of the time in my code was spent multiplying lots of 3x3 matrices times vectors, so it proved to be completely worth writing my own specialized function:

function gemv3!(res, A, x)
    res[1] = A[1]*x[1] + A[4]*x[2] + A[7]*x[3]
    res[2] = A[2]*x[1] + A[5]*x[2] + A[8]*x[3]
    res[3] = A[3]*x[1] + A[6]*x[2] + A[9]*x[3]
    res
end

In my initial version of this function, it lacked the last line (returning res), and surprisingly gc was getting called. That showed up as excessive time spent on the third row multiplication; since the profiler drills down into the C code it was easy to see that gc was responsible. Inserting the return of res eliminated gc from being executed on this line; by itself this one optimization netted a sizable performance boost. Eventually I managed to expunge all gc calls from the code (as well as make other important improvements).

StefanKarpinski · 2013-06-18T22:10:26Z

These kinds of things always give me a mix of horror and delight – horror that you have to write out that 3x3 matmul like that, but delight that you can do that and get it to be as fast as you need. We'll get there.

JeffBezanson · 2013-06-18T22:21:51Z

We do need more small-matrix optimizations. One problem especially is that the logic and code flow in matmul.jl has gotten really complex, and there are several layers of function calls.

timholy · 2013-06-19T00:13:26Z

I fall mostly on the delight side. It took me, what, 2 minutes to write and wire into the various places in my code? Understanding the gc on the return value was more subtle and cost me more time, but nothing outrageous. The workflow "quickly write the simple version first" (which is really pleasant thanks to Julia's design and the nice library that everyone has been contributing to) -> "run it" -> "ugh, too slow" -> "profile it" -> "fix a few problems in places where it actually matters" -> "ah, that's much nicer!" is quite satisfying. If I had to write everything in C from the ground up it would be far more tedious. And doing a little hand-optimization gives us geeks something to make us feel like we're earning our keep.

The only time I get frustrated is when I can't get the performance I know I deserve :-). But that's getting rarer all the time.

I guess the major negative is that knowing what to do to solve performance problems does take some experience. Think of it as training---it makes us all stronger. I know I've gotten better thanks to the example code in julia and the help from the community.

StefanKarpinski · 2013-06-19T01:24:58Z

Well, that's a pleasant view of it. I guess the key points are that

you can write the easy version that works, and
you can usually get it fast with a bit more work.

timholy · 2013-06-19T23:55:50Z

Back to the original issue, does anyone besides me think it's suspicious that it's a 6s increment after the first run? In saying this, I'm presuming that:

a single pass of gc() takes vastly less than 6s to run after the first iteration (although it might explain everything if it did take 6s). For the purposes of being able to make a specific argument, let's say one call to gc() takes < 0.1s.
any call to gc() clears all the garbage accumulated up to the current time.

If those two assumptions are true, why doesn't the second run take 0.1s longer, rather than 6s longer, than the first? I'm expecting that the first call to gc() that occurs after the 2nd call to readcsv should effectively take us back to the original state. Are we seeing some kind of "destructive interference" in the precise timing with which the gc gets automatically called??

JeffBezanson · 2013-06-20T04:14:18Z

It has a lot to do with the GC's heap growth schedule, which can introduce a sliver of super-linear run time behavior. If the heap grows too slowly, it has to do many unnecessary collections as this very large data structure is filled in.
#261 will help a lot with this.

- make it an error to resize an array with shared data (fixes #3430) - now able to use realloc to grow arrays (part of #3440, helps #3441) - the new scheme is simpler. one Array owns the data, instead of tracking the buffers separately as mallocptr_t - Array data can be allocated inline, with malloc, or from a pool

timholy · 2013-06-20T10:56:44Z

#261 does look awesome. Aside from the (much more important) performance improvements, presumably it would also make many finalizers run at more predictable times, which perhaps could be a good thing.

Now it doesn't sound relevant, but in cases where "slivers" of poor performance appear for very specific parameter combinations, is it worth asking whether introducing some element of randomness could be your friend?

JeffBezanson · 2013-07-02T02:43:10Z

@karbarcca can you try this again and see if there's any improvement?

quinnj · 2013-07-02T03:22:18Z

Unfortunately I can't get readcsv to run now; I think it has to do with some mmap stuff that was added? I'm on Windows 8 64-bit native build as of this morning (cd35600701).

julia> @time readcsv("C:/Users/karbarcca/Google Drive/Dropbox/Dropbox/Sears/Teradata/Sears-Julia/sales2.csv")
symbol could not be found fcntl (-1): The specified procedure could not be found.
symbol could not be found lseek (-1): The specified procedure could not be found.
symbol could not be found lseek (-1): The specified procedure could not be found.
symbol could not be found pwrite (-1): The specified procedure could not be found.
symbol could not be found lseek (-1): The specified procedure could not be found.
symbol could not be found mmap (-1): The specified procedure could not be found.
ERROR: ccall: could not find function fcntl
 in mmap_stream_settings at mmap.jl:74 (repeats 26666 times)

I'll try to dig into it some more to get it to run.

ViralBShah · 2013-07-02T03:26:58Z

I think this issue should be closed, and the mmap should be a new issue.

quinnj · 2013-07-02T03:27:03Z

Ok, got it to work with the following and with multiple runs, I get consistent times. Is the use_mmap default supposed to be true by default? It seems like it might be better to default false until it's working on all platforms.

@time readdlm("C:/Users/karbarcca/Google Drive/Dropbox/Dropbox/Sears/Teradata/Sears-Julia/sales2.csv",',',Float64;use_mmap=false)

tanmaykm · 2013-07-02T03:27:08Z

It seems like mmap support is not complete on windows? Can you probably try passing use_mmap=false to readcsv.

StefanKarpinski mentioned this issue Jun 18, 2013

readdlm variant to accept memory mapped byte arrays. #3442

Merged

JeffBezanson added a commit that referenced this issue Jun 18, 2013

small improvement to gc parameters for #3441

465e80d

timholy mentioned this issue Jun 21, 2013

Unexplained allocation/gc in linalg/blas.jl dot #3485

Closed

quinnj closed this as completed Jul 2, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3x slowdown after 1st run of readcsv #3441

3x slowdown after 1st run of readcsv #3441

quinnj commented Jun 18, 2013

dmbates commented Jun 18, 2013

quinnj commented Jun 18, 2013

ViralBShah commented Jun 18, 2013

quinnj commented Jun 18, 2013

StefanKarpinski commented Jun 18, 2013

JeffBezanson commented Jun 18, 2013

tanmaykm commented Jun 18, 2013

JeffBezanson commented Jun 18, 2013

timholy commented Jun 18, 2013

StefanKarpinski commented Jun 18, 2013

JeffBezanson commented Jun 18, 2013

timholy commented Jun 19, 2013

StefanKarpinski commented Jun 19, 2013

timholy commented Jun 19, 2013

JeffBezanson commented Jun 20, 2013

timholy commented Jun 20, 2013

JeffBezanson commented Jul 2, 2013

quinnj commented Jul 2, 2013

ViralBShah commented Jul 2, 2013

quinnj commented Jul 2, 2013

tanmaykm commented Jul 2, 2013

3x slowdown after 1st run of readcsv #3441

3x slowdown after 1st run of readcsv #3441

Comments

quinnj commented Jun 18, 2013

dmbates commented Jun 18, 2013

quinnj commented Jun 18, 2013

ViralBShah commented Jun 18, 2013

quinnj commented Jun 18, 2013

StefanKarpinski commented Jun 18, 2013

JeffBezanson commented Jun 18, 2013

tanmaykm commented Jun 18, 2013

JeffBezanson commented Jun 18, 2013

timholy commented Jun 18, 2013

StefanKarpinski commented Jun 18, 2013

JeffBezanson commented Jun 18, 2013

timholy commented Jun 19, 2013

StefanKarpinski commented Jun 19, 2013

timholy commented Jun 19, 2013

JeffBezanson commented Jun 20, 2013

timholy commented Jun 20, 2013

JeffBezanson commented Jul 2, 2013

quinnj commented Jul 2, 2013

ViralBShah commented Jul 2, 2013

quinnj commented Jul 2, 2013

tanmaykm commented Jul 2, 2013