-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added rfft! and irfft! functionality through PaddedRFFTArray type. #54
base: master
Are you sure you want to change the base?
Conversation
Due to the rapid changes on Julia now, I'll wait for the release of v0.7 to finish up this PR. |
Bump! Could you finish this up with v0.7-dev? It would be nice if this will release before v0.7 official release. |
I think all to major changes on Julia have been made. I'll work on it again. |
I managed to avoid using unsafe views and this PR is ready for technical review now. If it is acceptable, it will only be missing docstrings. julia> versioninfo()
Julia Version 0.7.0-DEV.4702
Commit 524710f60d (2018-03-26 19:44 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin17.4.0)
CPU: Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)
Environment:
JULIA_NUM_THREADS = 2
julia> using FFTW
julia> a = PaddedRFFTArray(rand(8,8))
5×8 PaddedRFFTArray{Float64,2,false}:
0.712979+0.28724im 0.509384+0.0177532im … 0.958592+0.884517im 0.685513+0.401894im
0.361746+0.369876im 0.968775+0.193823im 0.0521561+0.815999im 0.924453+0.528454im
0.835827+0.403952im 0.775697+0.187061im 0.106656+0.0442893im 0.321887+0.859492im
0.784735+0.141277im 0.436122+0.198054im 0.580609+0.22239im 0.266962+0.252452im
0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im
julia> rfft!(a)
5×8 PaddedRFFTArray{Float64,2,false}:
32.2515+0.0im -2.36603+0.797907im … 2.8308-1.49447im -2.36603-0.797907im
0.521591+1.17158im -0.190847-0.681372im -0.334622+1.19408im -2.46524-3.4627im
1.14537-2.0369im 0.383983-0.265645im -1.52369-1.83912im 0.0772888+2.13619im
1.39388-1.77041im 1.46133+3.4213im -1.50413+0.570226im 0.00780449-0.57811im
2.6889+0.0im 3.28556-0.875104im 1.79742+1.84643im 3.28556+0.875104im
julia> b = Array(a)
5×8 Array{Complex{Float64},2}:
32.2515+0.0im -2.36603+0.797907im … 2.8308-1.49447im -2.36603-0.797907im
0.521591+1.17158im -0.190847-0.681372im -0.334622+1.19408im -2.46524-3.4627im
1.14537-2.0369im 0.383983-0.265645im -1.52369-1.83912im 0.0772888+2.13619im
1.39388-1.77041im 1.46133+3.4213im -1.50413+0.570226im 0.00780449-0.57811im
2.6889+0.0im 3.28556-0.875104im 1.79742+1.84643im 3.28556+0.875104im
julia> using BenchmarkTools
julia> @benchmark @inbounds getindex($(b),1)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.542 ns (0.00% GC)
median time: 1.549 ns (0.00% GC)
mean time: 1.552 ns (0.00% GC)
maximum time: 5.031 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
julia> @benchmark @inbounds getindex($(a.c),1)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 15.493 ns (0.00% GC)
median time: 15.699 ns (0.00% GC)
mean time: 15.729 ns (0.00% GC)
maximum time: 50.897 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 998
julia> @benchmark @inbounds getindex($(a),1)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.846 ns (0.00% GC)
median time: 1.853 ns (0.00% GC)
mean time: 1.856 ns (0.00% GC)
maximum time: 9.709 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
julia> @benchmark sum($b)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 34.567 ns (0.00% GC)
median time: 34.638 ns (0.00% GC)
mean time: 34.957 ns (0.00% GC)
maximum time: 91.601 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 993
julia> @benchmark sum($(a.c))
BenchmarkTools.Trial:
memory estimate: 208 bytes
allocs estimate: 6
--------------
minimum time: 865.900 ns (0.00% GC)
median time: 875.400 ns (0.00% GC)
mean time: 1.011 μs (9.66% GC)
maximum time: 705.103 μs (99.77% GC)
--------------
samples: 10000
evals/sample: 60
julia> @benchmark sum($(a))
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 35.123 ns (0.00% GC)
median time: 35.186 ns (0.00% GC)
mean time: 35.756 ns (0.00% GC)
maximum time: 70.551 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 992 |
Did I mess this pull request up with incorrect rebase/merges stuff? Or this is the way to go? Should I start a new clean PR with those final changes? |
I consider this finished now. Julia v0.7 probably won't have any other disruptive change. |
d = data(A) | ||
i = $ip | ||
I = $t | ||
@boundscheck checkbounds(d,i,I...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it also check bounds with i-1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I guess since ip=2*I2[1]
, if it is in-bounds then i-1
must also be in-bounds.
src/rfft!.jl
Outdated
(1 in region) || throw(ArgumentError("The first dimension must always be transformed")) | ||
if flags&PRESERVE_INPUT != 0 | ||
a = similar(X) | ||
return rFFTWPlan{Complex{T},BACKWARD,true,N}(complex_view(a), real(a), region, flags,timelimit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's going on here? IIRC, PRESERVE_INPUT
only affects out-of-place plans.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my... That is reminiscent from early stage development, when I didn't notice that inplace r2c and c2r were already exposed on Julia. I tried hacking out-of-place plans to work inplace, thus this PRESERVE_INPUT flag.
I'll change this.
test/test_rfft!.jl
Outdated
@testset "Read binary file to PaddedRFFTArray" begin | ||
for s in ((8,4,4),(9,4,4),(8,),(9,)) | ||
aa = rand(Float64,s) | ||
f = Base.Filesystem.tempname() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should be able to test i/o with an IOBuffer
rather than a file.
src/rfft!.jl
Outdated
else | ||
x = similar(X) | ||
p = rFFTWPlan{T,FORWARD,true,N}(real(x), complex_view(x), region, flags, timelimit) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you are trying to avoid overwriting X
during planning, but I don't think it's a good idea:
-
plan_fft
is documented to overwrite the input during non-ESTIMATE planning, so we should be consistent. -
Creating the plan for
x = similar(X)
could result in the plan not being applicable toX
in the unlikely event that theX
array is not 16-byte aligned.
Sorry for this late change. But I just observed that performance is slightly better if we use julia> a = rand(128,128,128);
julia> b = view(a,1:126,:,:);
julia> c = view(a,1:126,1:128,1:128);
julia> using BenchmarkTools
julia> @benchmark sum($b)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 4.533 ms (0.00% GC)
median time: 4.620 ms (0.00% GC)
mean time: 4.689 ms (0.00% GC)
maximum time: 6.707 ms (0.00% GC)
--------------
samples: 1064
evals/sample: 1
julia> @benchmark sum($c)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 5.150 ms (0.00% GC)
median time: 5.208 ms (0.00% GC)
mean time: 5.267 ms (0.00% GC)
maximum time: 7.113 ms (0.00% GC)
--------------
samples: 947
evals/sample: 1
julia> @benchmark reduce($(*),$b)
BenchmarkTools.Trial:
memory estimate: 560 bytes
allocs estimate: 14
--------------
minimum time: 4.531 ms (0.00% GC)
median time: 4.608 ms (0.00% GC)
mean time: 4.678 ms (0.00% GC)
maximum time: 6.404 ms (0.00% GC)
--------------
samples: 1067
evals/sample: 1
julia> @benchmark reduce($(*),$c)
BenchmarkTools.Trial:
memory estimate: 560 bytes
allocs estimate: 14
--------------
minimum time: 5.161 ms (0.00% GC)
median time: 5.227 ms (0.00% GC)
mean time: 5.291 ms (0.00% GC)
maximum time: 7.780 ms (0.00% GC)
--------------
samples: 943
evals/sample: 1
julia> versioninfo()
Julia Version 0.7.0-beta2.0
Commit b145832402* (2018-07-13 19:54 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin17.6.0)
CPU: Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.0 (ORCJIT, broadwell)
Environment:
JULIA_NUM_THREADS = 2 |
src/rfft!.jl
Outdated
rsize = (nx,rrsize[2:end]...) | ||
r = view(rr,(1:l for l in rsize)...) | ||
return new{T, N, N === 1 ? true : false}(rr,r,c) | ||
r = view(rr, 1:nx, ntuple(i->Colon(),Nm1)...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can pass Val{Nm1}()
here to get a compiler-unrolled ntuple
on 0.7. On 0.6 you need Val{Nm1}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because Val(n)
is a pure function now I can just put Val(Nm1)
instead of Val{Nm1}()
, right?
LGTM, now just needs docs. |
Would be good to get documentation so that we can merge this. |
Just to give some update. I' currently finishing my thesis, so I don't think I'll be able to work on this until a month from... |
@favba news on this PR? Seems neat and it would be very nice to have it merged in FFTW.jl |
I would also very appreciate that feature! |
I took a stab at reviving this, but it has some issues. My motivation was to allow simply replacing There's little difference in performance because most of the time is spent in the C routines, but there is less memory churn. My version can be found here: https://github.com/BioTurboNick/FFTW.jl/tree/in-place-rfft I don't know the API that well, so I've hit a stumbling block in a couple places. Maybe someone else can pick up from here. |
The only missing piece for this PR is documentation, correct? @BioTurboNick, I didn't understand what is the missing functionality you are adding in your PR. Could you try clarifying that to me? Maybe it can be done here. |
A few issues:
|
Also, not sure if all the special indexing methods in this PR are necessary anymore, as those are noted as needed due to inefficiencies back in Julia v0.7. |
Both issues
and
are related to the fact that there is not a one-to-one mapping between the size of the complex array and the corresponding real array after the inverse transform. The Now, in order to be able to use And I agree that
|
Unfortunately, it seems those issues are still present on Julia 1.6.3 julia> versioninfo()
Julia Version 1.6.3
Commit ae8452a9e0 (2021-09-23 17:34 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin19.5.0)
CPU: Intel(R) Core(TM) i5-8279U CPU @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
JULIA_CUDA_SILENT = true
JULIA_MPI_PATH = /usr/local/Cellar/open-mpi/4.0.3
JULIA_NUM_THREADS = 4
julia> a = rand(256,256);
julia> b = reinterpret(Complex{Float64},a);
julia> c = Array(b);
julia> using BenchmarkTools
julia> @benchmark sum($c)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 9.504 μs … 55.548 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 9.745 μs ┊ GC (median): 0.00%
Time (mean ± σ): 10.203 μs ± 2.793 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▇▅▁ ▁
████▇▅▅▆▄▁▁▃▅▁▁▁▄▁▃▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▆▅▄▁▃▁▃█▇▇▅▄▃▃▃▆ █
9.5 μs Histogram: log(frequency) by time 28.5 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark sum($b)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 20.482 μs … 130.839 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 21.765 μs ┊ GC (median): 0.00%
Time (mean ± σ): 23.524 μs ± 5.070 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▇██▅▃ ▁ ▃▃▄▄▄▄▄▄▂ ▁▁ ▁ ▂
▆██████████████████▇▅▆▅▅▃▁▅▁▃▃▁▃▄▁▃▁▄▄▁▁▄▆██▇▅▅▄███▇▇▇▆▇▇▆▆▆ █
20.5 μs Histogram: log(frequency) by time 44.6 μs <
Memory estimate: 0 bytes, allocs estimate: 0. |
Does my PR fail in some way? Because it implements what I suggested with respect to being able to use |
Ok, I see it now... Since you kept the I think that can be easily incorporated into this PR, keeping different signatures for And I think I know a way to make a = rand(256,256)
b = rfft!(irfft!(rfft(a),size(a,1))) # same as rfft(a) @BioTurboNick , do you mind If I incorporate those features you provided in your PR to this one? or would you rather keep working on that one? |
Absolutely, go for it! |
Maintaining good index performance while adding those features is trickier than I thought it would be... |
Ah, dang. Fair enough. Wonder what would be necessary to resolve the underlying slowness... I have a tendency to poke around on problems like that. |
If it's a julia issue can you open an issue there? Maybe it can be fixed |
I took a look and it appears to be a fairly low-level issue. At the bottom is one or more I was about to suggest It sounds like the main issue is you want to ensure consumers can't normally get a reference to the wrapped data. Thus, it could be have a Something like:
|
Try Also, in case it's helpful: https://github.com/HolyLab/RFFT.jl. Steal anything you like, keeping in mind that was originally written back in the Julia 0.2 days or so and only minimally updated since then. (Prior to docstrings, Documenter.jl, my current understanding of fast & inferrable tuple handling, keyword arguments, etc.) |
I think the issue is at a lower level than that... see my benchmark here. It doesn't seem to involve any reshaping
That is my plan... to fiddle with references and/or pointers again. I'd put the "unsafe" indexing routines inside this WrappedArray (in the case it is initialized with a complex array), so it is hidden in this extra layer, and maybe keep the current indexing strategy for the complex view of the array. a = rand(256,256)
irff!(rfft(a),256) should work. |
And that's the problem. julia> a = rand(256,256); ars = rand(2, 128, 256);
julia> b = reinterpret(Complex{Float64},a); brs = reinterpret(reshape, Complex{Float64}, ars);
julia> @btime sum($b);
20.360 μs (0 allocations: 0 bytes)
julia> @btime sum($brs);
11.583 μs (0 allocations: 0 bytes)
julia> @btime sum($c);
11.362 μs (0 allocations: 0 bytes) |
The new implementation is here. |
This PR addresses #52. This is a rewrite of my code with fixes to address possible problems with usage of
unsafe_wrap
raised here. I don't fully understand those issues, so an expert review is needed to see if the changes I introduced indeed solved the problem.The code is already functional. The test I wrote originally for my package are passing. The code still needs documentation of the new type and new functions.