Warm tip: This article is reproduced from serverfault.com, please click

Calculate Array in a Parallel Way in Julia

发布于 2020-11-28 17:53:24

I've used Julia for some time, but still know little about it, especially for the paralell computing. I want to obtain a new array with the existing data, as the array is very large, I want to do it in a parallel way, and the code is written as follows:

ψ1 = Array{Complex}(undef, D)
ψ = rand(Complex{Float64},D)
Threads.@threads for k = 1:D   
    ψ1[k] = @views sum(ψ[GetBasis(k - 1, N)])  
end

I run it with julia -t 4, But it turn out to be very slow, compared to the non parallel code as follows,

ψ1 = [@views sum(ψ[GetBasis(k - 1, N)]) for k=1:D]

I have no idea about why this happens, and GetBasis() is just a function to generate a Array, Array{Int64,1}(N).

I would like to ask how I could improve the first code, or, is there some way I can modify the second code to also run it in a parallel way? As the array can be very large, and I want to find a way to speed it up...

Thanks a lot, and look forward to your replies!

The complete code can be found as follows

function GetBasis(n, N)
    xxN = collect(0:N-1)
    BI = BitFlip.(n,xxN).+1
end


function BitFlip(n, i)
    n = n ⊻ (1 << i)
end

N=24
D=2^N

ψ1 = Array{Complex}(undef, D)
ψ = rand(Complex{Float64},D)
Threads.@threads for k = 1:D   
    ψ1[k] = @views sum(ψ[GetBasis(k - 1, N)])  
end

ψ2 = [@views sum(ψ[GetBasis(k - 1, N)]) for k=1:D]

Questioner
wangjz87
Viewed
0
Przemyslaw Szufel 2020-11-29 08:07:24

When all is done properly the multithreaded code is faster indeed.

function GetBasis(n, N)
    BI = BitFlip.(n,0:N-1).+1
end

function BitFlip(n, i)
    n = n ⊻ (1 << i)
end

const N=24
const D=2^N

const ψ = rand(Complex{Float64},D)
const ψ1 = Vector{Complex{Float64}}(undef, D)

using BenchmarkTools

Threads.nthreads() # should return 4 or more
                   # set JULIA_NUM_THREADS environment variable

Now testing:

julia> GC.gc()

julia> @btime ψ2 = [@views sum(ψ[GetBasis(k - 1, N)]) for k=1:D];
  5.591 s (16777218 allocations: 4.50 GiB)

julia> GC.gc()

julia> @btime Threads.@threads for k = 1:D
           @inbounds ψ1[k] = @views sum(ψ[GetBasis(k - 1, N)])
       end
  2.293 s (16777237 allocations: 4.25 GiB)

Note the amount of memory used by this code - you need to run garbage collector before running the benchamrk and the test will be less meaningful when you have less than 16GB RAM in your machine.