其他-Calculate Array in a Parallel Way in Julia

Przemyslaw Szufel 2020-11-29 08:07:24

When all is done properly the multithreaded code is faster indeed.

function GetBasis(n, N)
    BI = BitFlip.(n,0:N-1).+1
end

function BitFlip(n, i)
    n = n ⊻ (1 << i)
end

const N=24
const D=2^N

const ψ = rand(Complex{Float64},D)
const ψ1 = Vector{Complex{Float64}}(undef, D)

using BenchmarkTools

Threads.nthreads() # should return 4 or more
                   # set JULIA_NUM_THREADS environment variable

Now testing:

julia> GC.gc()

julia> @btime ψ2 = [@views sum(ψ[GetBasis(k - 1, N)]) for k=1:D];
  5.591 s (16777218 allocations: 4.50 GiB)

julia> GC.gc()

julia> @btime Threads.@threads for k = 1:D
           @inbounds ψ1[k] = @views sum(ψ[GetBasis(k - 1, N)])
       end
  2.293 s (16777237 allocations: 4.25 GiB)

Note the amount of memory used by this code - you need to run garbage collector before running the benchamrk and the test will be less meaningful when you have less than 16GB RAM in your machine.

wangjz87 2020-11-29 09:01:11

Thank you very much! The performance is much better now! May I ask why should I defined ψ and ψ1 as a const? And I've tried study N=26, and in this case, the parallel code is slower than the first code again, I'm not sure why it is the case, here I have 16GB RAM in my machine.

Przemyslaw Szufel 2020-11-29 12:01:02

For N=26 you are running into garbage collection. You need 64GB of RAM to test the performance for N=26.

Przemyslaw Szufel 2020-11-29 12:03:48

Regarding const - if you are running benchmarks of code that is outside of a function you need to make it type stable. When you put that code inside a function you can remove consts. See docs.julialang.org/en/v1/manual/performance-tips for details.

wangjz87 2020-11-29 12:51:18

Then, does it mean that, in case the RAM of my machine is not enough, the parallel code may be slower than the non-parallel one? If so, is there some way to improve it? And I also want to run it by GPU, but I could not find a function similar to @threads in CUDA of Julia...

Przemyslaw Szufel 2020-11-29 14:07:55

For GPU look at juliagpu.github.io/CUDA.jl/stable/usage/array Again you will have memory problem. "the parallel code may be slower than the non-parallel one" - it depends on how garbage collector is going to behave. You would need to bechmark but in that case those will need to be run many hundred times due to high variance caused by the garbage collector.

Calculate Array in a Parallel Way in Julia

热门帖子

热门github