据我所了解,Julia旨在使for循环更快,与向量化操作一样快。我编写了一个简单函数的三个版本,使用for循环和向量化操作分别计算距离,以及使用DataFrames实现向量化操作:
x = rand(500)
y = rand(500)
a = rand()
b = rand()
function devect()
dist = Array(Float64, 0)
twins = Array(Float64, 0,2)
for i in 1:500
dist = [dist; sqrt((x[i] - a)^2 + (y[i] - b)^2)]
if dist[end] < 0.05
twins = [twins; [x y][end,:]]
end
end
return twins
end
function vect()
d = sqrt((x-a).^2 + (y-b).^2)
return [x y][d .< 0.05,:]
end
using DataFrames
function df_vect()
df = DataFrame(x=x, y=y)
dist = sqrt((df[:x]-a).^2 + (df[:y]-b).^2)
return df[dist .< 0.05,:]
end
n = 10^3
@time for i in [1:n] devect() end
@time for i in [1:n] vect() end
@time for i in [1:n] df_vect() end
输出:
elapsed time: 4.308049576 seconds (1977455752 bytes allocated, 24.77% gc time)
elapsed time: 0.046759167 seconds (37295768 bytes allocated, 54.36% gc time)
elapsed time: 0.052463997 seconds (30359752 bytes allocated, 49.44% gc time)
为什么向量化版本执行速度更快?