朱莉娅：向量化与非向量化代码

Question

朱莉娅：向量化与非向量化代码

3

据我所了解，Julia旨在使for循环更快，与向量化操作一样快。我编写了一个简单函数的三个版本，使用for循环和向量化操作分别计算距离，以及使用DataFrames实现向量化操作：

x = rand(500)
y = rand(500)
a = rand()
b = rand()

function devect()
    dist = Array(Float64, 0)
    twins = Array(Float64, 0,2)

    for i in 1:500
        dist = [dist; sqrt((x[i] - a)^2 + (y[i] - b)^2)]
        if dist[end] < 0.05
            twins = [twins; [x y][end,:]]
        end
    end

    return twins
end

function vect()
    d = sqrt((x-a).^2 + (y-b).^2)
    return [x y][d .< 0.05,:]
end

using DataFrames

function df_vect()
    df = DataFrame(x=x, y=y)
    dist = sqrt((df[:x]-a).^2 + (df[:y]-b).^2)

    return df[dist .< 0.05,:]
end

n = 10^3

@time for i in [1:n] devect() end
@time for i in [1:n] vect() end
@time for i in [1:n] df_vect() end

输出：

elapsed time: 4.308049576 seconds (1977455752 bytes allocated, 24.77% gc time)
elapsed time: 0.046759167 seconds (37295768 bytes allocated, 54.36% gc time)
elapsed time: 0.052463997 seconds (30359752 bytes allocated, 49.44% gc time)

为什么向量化版本执行速度更快？

- Anarcho-Chossid

这可能会有用：https://software.intel.com/en-us/articles/vectorization-in-julia - user3467349

1

这可能是一个全局作用域问题。尝试重写你的函数，使用本地变量而不是全局变量。这通常会影响时间。 - ptb

@ptb 我最初就是这样做的（在每个函数内部声明a、b、x和y）。没有帮助。 - Anarcho-Chossid

使用inbounds和simd并不能显著加速devect()函数。 - Anarcho-Chossid

3

仔细观察您在devect()函数中的代码，巨大的减速很可能是由于您构造返回值的方式。它们一开始是零元素数组，然后在每次迭代时重新赋值（即重新分配）为新数组。 - ptb

3个回答

7

关于在devect中构建解决方案的方法，我想跟进我的评论。这是我的代码：

julia> x, y, a, b = rand(500), rand(500), rand(), rand()

julia> function devect{T}(x::Vector{T}, y::Vector{T}, a::T, b::T)
       res = Array(T, 0)
       dim1 = 0
       for i = 1:size(x,1)
           if sqrt((x[i]-a)^2+(y[i]-b)^2) < 0.05
               push!(res, x[i])
               push!(res, y[i])
               dim1 += 1
           end
       end
       reshape(res, (2, dim1))'
   end
devect (generic function with 1 method)

julia> function vect{T}(x::Vector{T}, y::Vector{T}, a::T, b::T)
       d = sqrt((x-a).^2+(y-b).^2)
       [x y][d.<0.05, :]
   end
vect (generic function with 1 method)

julia> @time vect(x, y, a, b)
elapsed time: 3.7118e-5 seconds (37216 bytes allocated)
2x2 Array{Float64,2}:
 0.978099  0.0405639
 0.94757   0.0224974

julia> @time vect(x, y, a, b)
elapsed time: 7.1977e-5 seconds (37216 bytes allocated)
2x2 Array{Float64,2}:
 0.978099  0.0405639
 0.94757   0.0224974

julia> @time devect(x, y, a, b)
elapsed time: 1.7146e-5 seconds (376 bytes allocated)
2x2 Array{Float64,2}:
 0.978099  0.0405639
 0.94757   0.0224974

julia> @time devect(x, y, a, b)
elapsed time: 1.3065e-5 seconds (376 bytes allocated)
2x2 Array{Float64,2}:
 0.978099  0.0405639
 0.94757   0.0224974

julia> @time devect(x, y, a, b)
elapsed time: 1.8059e-5 seconds (376 bytes allocated)
2x2 Array{Float64,2}:
 0.978099  0.0405639
 0.94757   0.0224974

可能有更快的方法来实现去向量化解决方案，但请注意分配的字节数差异。如果一个去向量化的解决方案分配的内存比一个向量化的解决方案更多，那么它很可能是错误的（至少在Julia中是如此）。

- ptb

谢谢！我从函数中删除了a = [a; b]方法，现在我的devect()比vect()运行速度快了几倍。顺便问一下，有没有办法将一个向量推到另一个向量的末尾？ - Anarcho-Chossid

append! 函数可能是你正在寻找的。 - ptb

其实，我无法理解你的函数如何只分配了376个字节。我复制并粘贴了你的函数，但结果是：耗时：0.012100854秒（分配了197856个字节） - Anarcho-Chossid

我只计时了一次执行，但在你的代码中，你累积了1000次的时间。输入数组也是随机的，所以我不会预期得到相同的结果。 - ptb

好的，我运行了几次（每次n=1），大多数情况下，它会给出一些小字节数，就像你的例子一样，但是偶尔，新的devect()函数的数字会比vect()函数的数字高。 - Anarcho-Chossid

3

你的取消向量化的代码不是很高效。我做了以下更改：

Made all the global variables constant
I preallocated the output vector, rather than appending each time

I show two different ways you could have devectorized the output in a more straightforward way

const x = rand(500)
const y = rand(500)
const a = rand()
const b = rand()

function devect()
    dist = Array(Float64, 500)

    for i in 1:500
        dist[i] = sqrt((x[i] - a)^2 + (y[i] - b)^2)
    end

    return [x y][dist .< 0.05,:]
end

function devect2()
    pairs = Array(Float64, 500, 2)

    for i in 1:500
        dist = sqrt((x[i] - a)^2 + (y[i] - b)^2)
        if dist < 0.05
            pairs[i,:] = [x[i], y[i]]
        end
    end

    return pairs
end

function vect()
    d = sqrt((x-a).^2 + (y-b).^2)
    return [x y][d .< 0.05,:]
end

using DataFrames

function df_vect()
    df = DataFrame(x=x, y=y)
    dist = sqrt((df[:x]-a).^2 + (df[:y]-b).^2)

    return df[dist .< 0.05,:]
end

const n = 10^3

@time for i in [1:n] devect() end
@time for i in [1:n] devect2() end
@time for i in [1:n] vect() end
@time for i in [1:n] df_vect() end

输出结果为：

elapsed time: 0.009283872 seconds (16760064 bytes allocated)
elapsed time: 0.003116157 seconds (8456064 bytes allocated)
elapsed time: 0.050070483 seconds (37248064 bytes allocated, 44.50% gc time)
elapsed time: 0.0566218 seconds (30432064 bytes allocated, 40.35% gc time)

- Rob Donnelly

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- StefanKarpinski · Accepted Answer

https://docs.julialang.org/en/v1/manual/performance-tips/#Avoid-global-variables

你的代码中随处使用非常量全局变量，这意味着你基本上回到了解释语言的性能领域，因为在编译时无法保证它们的类型。为了快速提高速度，只需在所有全局变量赋值前加上const前缀。