如何使Julia代码更加高效?目前它的性能甚至不如Python。

4
我是一个新手,刚开始接触Julia,并且听说它有惊人的性能。但我还没有体验到那些承诺的性能。我尝试了许多增强性能的方法,这些方法在《JULIA HIGH PERFORMANCE》一书中有描述,但这使得代码的可读性稍微降低了一点。但是,至少在基准测试案例中,我的Python代码比我的Julia代码快3倍以上。 要么是我的代码做错了什么,在Julia中犯了罪,要么就是Julia无法胜任。请证明后者是错误的。
在代码中,我正在尝试将不同的球分配到不同的盒子中,并且每个盒子的容量都有最大和最小限制。放置球的顺序也很重要。我需要在最短的时间内生成所有可能的分配,并满足给定的约束条件。
PYTHON CODE:
import itertools
import time

max_balls = 5
min_balls = 0

def get_assignments(balls, boxes, assignments=[[]]):
    all_assignments = []
    upper_ball_limit = len(balls)
    if upper_ball_limit > max_balls:
        upper_ball_limit = max_balls

    n_boxes = len(boxes)
    lower_ball_limit = len(balls) - upper_ball_limit * (n_boxes - 1)
    if lower_ball_limit < min_balls:
        lower_ball_limit = min_balls

    if len(boxes) == 0:
        raise Exception("No delivery boys found")

    elif len(boxes) == 1:
        for strategy in itertools.permutations(balls, upper_ball_limit):
            #             valid = evaluate_strategy(strategy, db_id)
            for subplan in assignments:
                subplan_copy = subplan[:]
                box_id = boxes[0]
                subplan_copy.append((box_id, strategy))
                all_assignments.append(subplan_copy)
        return all_assignments
    else:
        box_id = boxes[0]
        for i in range(lower_ball_limit, upper_ball_limit+ 1):
            for strategy in itertools.permutations(balls, i):
                temp_plans = []
                for subplan in assignments:
                    subplan_copy = subplan[:]
                    subplan_copy.append((box_id, strategy))
                    temp_plans.append(subplan_copy)
                    remaining_balls = set(balls).difference(strategy)
                    remaining_boxes = list(set(boxes).difference([box_id]))
                    if remaining_balls:
                        all_assignments.extend(get_assignments(remaining_balls, remaining_boxes, temp_plans))
                    else:
                        all_assignments.extend(temp_plans)
        return all_assignments

balls = range(1, 9)
boxes = [1, 2, 3, 4]

t = time.time()
all_assignments = get_assignments(balls, boxes)

print('Number of assignments: %s' % len(all_assignments))
print('Time taken: %s' % (time.time()-t))

以下是我尝试编写上述内容的JULIA代码:
#!/usr/bin/env julia

using Combinatorics

const max_balls=5
const min_balls=0

function plan_assignments(balls::Vector{Int32}, boxes ; plans=[Vector{Tuple{Int32,Array{Int32,1}}}(length(boxes))])
const n_boxes = length(boxes)
const n_balls = length(balls)
const n_plans = length(plans)

if n_boxes*max_balls < n_balls
  print("Invalid Inputs: Number of balls exceed the number of boxes.")
end

all_plans = Vector{Tuple{Int32,Array{Int32,1}}}[]
upper_box_limit = n_balls
if upper_box_limit > max_balls
    upper_box_limit = max_balls
end
lower_box_limit = n_balls - upper_box_limit * (n_boxes-1)

if lower_box_limit < min_balls
    lower_box_limit = min_balls
end

if n_boxes == 1
    box_id = boxes[1]
    @inbounds for strategy in Combinatorics.permutations(balls, upper_box_limit)
        @inbounds for subplan in plans
            subplan = subplan[:]
            subplan[tn_boxes - n_boxes + 1] = (box_id, strategy)
            all_plans = push!(all_plans, subplan)
        end
    end
    return all_plans

else
    box_id = boxes[1]
    @inbounds for i in lower_box_limit:upper_box_limit
        @inbounds for strategy in Combinatorics.permutations(balls, i)
            temp_plans = Array{Vector{Tuple{Int32,Array{Int32,1}}},1}(n_plans)
            # temp_plans = []

            @inbounds for (i,subplan) in zip(1:n_plans, plans)
                subplan = subplan[:]
                subplan[tn_boxes - n_boxes + 1] = (box_id, strategy)
                temp_plans[i] = subplan
                # subplan = push!(subplan, (db_id, strategy))
                # temp_plans = push!(temp_plans, subplan)

                remaining_balls = filter((x) -> !(x in strategy), balls)
                remaining_boxes = filter((x) -> x != box_id , boxes)

                if length(remaining_balls) > 0
                  @inbounds for plan in plan_assignments(remaining_balls, remaining_boxes, plans=temp_plans)
                    push!(all_plans, plan)
                  end
                    # append!(all_plans, plan_assignments(remaining_orders, remaining_delivery_boys, plans=temp_plans))
                else
                  @inbounds for plan in temp_plans
                    push!(all_plans, plan)
                  end
                    # append!(all_plans, temp_plans)

                end
            end
        end
    end
    return all_plans
end
end

balls = Int32[1,2,3,4,5,6,7,8]
boxes = Int32[1,2,3,4]

const tn_boxes = length(boxes)

@timev all_plans = plan_assignments(balls, boxes)
print(length(all_plans))

我的基准测试时间如下:
对于Python:

Number of assignments: 5040000
Time taken: 22.5003659725

对于Julia语言:(这是在不考虑编译时间的情况下。)
76.986338 seconds (122.94 M allocations: 5.793 GB, 77.01% gc time)
elapsed time (ns): 76986338257
gc time (ns):      59287603693
bytes allocated:   6220111360
pool allocs:       122932049
non-pool GC allocs:10587
malloc() calls:    11
realloc() calls:   18
GC pauses:         270
full collections:  28

6
77%的时间被用于垃圾回收...看起来问题在于你创建了大量的对象。CPython通过使用引用计数和GC一起工作来避免这样的大开销。请注意,77秒中的23%约为17秒,因此如果不计算在gc上花费的时间,Julia将比CPython更快...不幸的是我不知道Julia,所以我无法告诉你为什么gc会花费如此多的时间。 - Bakuriu
你需要在 subplan = subplan[:] 中复制 subplan 吗? - Alexander Morley
1
你有没有查看过像[1,2,3]和[1,2,3]这样的简单示例的数据结构内容?我看到了一些#undef。输出应该是什么样子的?另外,你所拥有的数据结构似乎太复杂了。 - David P. Sanders
1
过度分配和垃圾回收器使用大量时间是类型不稳定的预警信号(因为所有事物都必须被装箱,导致出现许多临时变量)。 @code_warntype 的输出是什么?您能够使用 ProfileView 显示瓶颈在哪里吗? - Chris Rackauckas
不相关的:在“if”、“elif”中应该用“n_boxes”代替“len(boxes)”。 - user
显示剩余9条评论
2个回答

7
这是 Julia 的另一个版本,更符合惯用语法,修改以避免递归和一些内存分配:
using Iterators
using Combinatorics

histograms(n,m) = [diff([0;x;n+m]).-1 for x in combinations([1:n+m-1;],m-1)]

good_histograms(n,m,minval,maxval) = 
  [h for h in histograms(n,m) if all(maxval.>=h.>=minval)]

typealias PlanGrid Matrix{SubArray{Int,1,Array{Int,1},Tuple{UnitRange{Int}},true}}

function assignmentplans(balls,boxes,minballs,maxballs)
  nballs, nboxes = length(balls),length(boxes)
  nperms = factorial(nballs)
  partslist = good_histograms(nballs,nboxes,minballs,maxballs)
  plans = PlanGrid(nboxes,nperms*length(partslist))
  permutationvector = vec([balls[p[i]] for i=1:nballs,p in permutations(balls)])
  i1 = 1
  for parts in partslist
    i2 = 0
    breaks = [(x[1]+1:x[2]) for x in partition(cumsum([0;parts]),2,1)]
    for i=1:nperms
      for j=1:nboxes
        plans[j,i1] = view(permutationvector,breaks[j]+i2)
      end
      i1 += 1
      i2 += nballs
    end
  end
  return plans
end

关于时间,我们得到:

julia> assignmentplans([1,2],[1,2],0,2)   # a simple example
2×6 Array{SubArray{Int64,1,Array{Int64,1},Tuple{UnitRange{Int64}},true},2}:
 Int64[]  Int64[]  [1]  [2]  [1,2]    [2,1]  
 [1,2]    [2,1]    [2]  [1]  Int64[]  Int64[]

julia> @time plans = assignmentplans([1:8;],[1:4;],0,5);
  8.279623 seconds (82.28 M allocations: 2.618 GB, 14.07% gc time)

julia> size(plans)
(4,5040000)

julia> plans[:,1000000]                    # sample ball configuration
4-element Array{SubArray{Int64,1,Array{Int64,1},Tuple{UnitRange{Int64}},true},1}:
 Int64[]    
 [7,3,8,2,5]
 [4]        
 [6,1]

当然,时间会因设置的不同而有所变化,但这应该会更快。虽然它并不是完全一致的比较,但它计算的是相同的内容。欢迎在评论中分享海报作者(或其他人)的机器上的时间。


1
感谢您提供的解决方案。我想从中得出的结论可能是,只有以正确的方式编写,Julia才能快速运行。它并不是普遍快速的,无论你如何编写它。虽然我仍然需要编写您提供的解决方案,并使用Python进行比较,以得出最终结论。顺便说一下,我在我的Macbook Pro 2015上使用16 GB内存和i7处理器,获得了“14.367281秒(82.29 M分配:2.618 GB,52.42%gc时间)”。不知道为什么您只有14%而我的却有这么多。有任何线索吗? - manofsins
5
@manofsins 是的,我认为其他要点是语言之间的翻译应该寻找习惯用语等同而非字面转换,并且改善算法可以产生比尝试加速低效算法更多的好处。 (不过,我仍然认为你的原始版本做了太多的分配...) - daycaster
5
"只有用正确的方式编写,Julia才能快速执行"这种说法有一定道理。任何语言中,糟糕的代码都会遇到瓶颈,而所有语言都存在有效的、非标准化的、特定于语言的优化方式。有一篇著名的博客文章使用大量的奇技淫巧在Python中“击败”了Julia。Julia的观点是,在比较标准的Julia与标准的Python/Matlab代码(无论是否矢量化)时,标准的未矢量化代码更快(实际上是建议这么做!)。但是,诸如避免昂贵的重新分配等*标准的良好实践仍然很重要,而故意低效的代码也不会神奇地“加速”。 - Tasos Papastylianou
2
新的代碼在我的電腦上需要7秒。在編寫permutationscombinationspartition 的穩定版本並將一些推導中的顯式類型添加到代碼中之後,我可以獲得1.5秒的運行時間,約佔50%的GC時間。貼這麼多行代碼可以嗎?也許,我應該打開一兩個拉取請求。 - tim
@tim 很好。permutations的类型不稳定确实很烦人。你能否将代码放在pastebin(http://pastebin.com)中并在评论中添加链接,或者只需添加另一个答案,因为这很有趣且相关。 - Dan Getz
@tim请分享具有类型稳定性的代码版本。 - manofsins

5

我对Dan Getz的代码进行了一些微小的修改,以使其类型稳定。主要问题是Combinatorics.permutationsIterators.partition不是类型稳定的,因此我还必须编写类型稳定的版本(我对Combinatorics.combinations的更改实际上是不必要的)。

import Base: start, next, done, eltype, length, size
import Base: iteratorsize, SizeUnknown, IsInfinite, HasLength

import Combinatorics: factorial, combinations

# Copied from Combinatorics
# Only one small change in the `nextpermutation` method

immutable Permutations{T}
    a::T
    t::Int
end

eltype{T}(::Type{Permutations{T}}) = Vector{eltype(T)}

length(p::Permutations) = (0 <= p.t <= length(p.a))?factorial(length(p.a), length(p.a)-p.t):0

"""
Generate all permutations of an indexable object. Because the number of permutations can be very large, this function returns an iterator object. Use `collec
t(permutations(array))` to get an array of all permutations.
"""
permutations(a) = Permutations(a, length(a))

"""
Generate all size t permutations of an indexable object.
"""
function permutations(a, t::Integer)
    if t < 0
        t = length(a) + 1
    end
    Permutations(a, t)
end

start(p::Permutations) = [1:length(p.a);]
next(p::Permutations, s) = nextpermutation(p.a, p.t ,s)

function nextpermutation(m, t, state)
    s = copy(state)                     # was s = copy(s) a few lines down
    perm = [m[s[i]] for i in 1:t]
    n = length(s)
    if t <= 0
        return(perm, [n+1])
    end
    if t < n
        j = t + 1
        while j <= n &&  s[t] >= s[j]; j+=1; end
    end
    if t < n && j <= n
        s[t], s[j] = s[j], s[t]
    else
        if t < n
            reverse!(s, t+1)
        end
        i = t - 1
        while i>=1 && s[i] >= s[i+1]; i -= 1; end
        if i > 0
            j = n
            while j>i && s[i] >= s[j]; j -= 1; end
            s[i], s[j] = s[j], s[i]
            reverse!(s, i+1)
        else
            s[1] = n+1
        end
    end
    return(perm, s)
end

done(p::Permutations, s) = !isempty(s) && max(s[1], p.t) > length(p.a) ||  (isempty(s) && p.t > 0)  

# Copied from Iterators
# Returns an `Array` of `Array`s instead of `Tuple`s now

immutable Partition{I}
    xs::I
    step::Int
    n::Int

end

iteratorsize{T<:Partition}(::Type{T}) = SizeUnknown()

eltype{I}(::Type{Partition{I}}) = Vector{eltype(I)}

function partition{I}(xs::I, n::Int, step::Int = n)
    Partition(xs, step, n)
end

function start{I}(it::Partition{I})
    N = it.n
    p = Vector{eltype(I)}(N)
    s = start(it.xs)
    for i in 1:(N - 1)
        if done(it.xs, s)
            break
        end
        (p[i], s) = next(it.xs, s)
    end
    (s, p)
end

function next{I}(it::Partition{I}, state)
    N = it.n
    (s, p0) = state
    (x, s) = next(it.xs, s)
    ans = p0; ans[end] = x

    p = similar(p0)
    overlap = max(0, N - it.step)
    for i in 1:overlap
        p[i] = ans[it.step + i]
    end

    # when step > n, skip over some elements
    for i in 1:max(0, it.step - N)
        if done(it.xs, s)
            break
        end
        (x, s) = next(it.xs, s)
    end

    for i in (overlap + 1):(N - 1)
        if done(it.xs, s)
            break
        end

        (x, s) = next(it.xs, s)
        p[i] = x
    end

    (ans, (s, p))
end

done(it::Partition, state) = done(it.xs, state[1])

# Copied from the answer of Dan Getz
# Added types to comprehensions and used Vector{Int} instead of Int in vcat    

typealias PlanGrid Matrix{SubArray{Int,1,Array{Int,1},Tuple{UnitRange{Int}},true}}

histograms(n,m) = [diff(vcat([0],x,[n+m])).-1 for x in combinations([1:n+m-1;],m-1)]

good_histograms(n,m,minval,maxval) = 
  Vector{Int}[h for h in histograms(n,m) if all(maxval.>=h.>=minval)]

minballs = 0
maxballs = 5

function assignmentplans(balls,boxes,minballs,maxballs)
  nballs, nboxes = length(balls),length(boxes)
  nperms = factorial(nballs)
  partslist = good_histograms(nballs,nboxes,minballs,maxballs)
  plans = PlanGrid(nboxes,nperms*length(partslist))
  permutationvector = vec([balls[p[i]] for i=1:nballs,p in permutations(balls)])
  i1 = 1
  for parts in partslist
    i2 = 0
    breaks = UnitRange{Int64}[(x[1]+1:x[2]) for x in partition(cumsum(vcat([0],parts)),2,1)]
    for i=1:nperms
      for j=1:nboxes
        plans[j,i1] = view(permutationvector,breaks[j]+i2)
      end
      i1 += 1
      i2 += nballs
    end
  end
  return plans
end

@time plans = assignmentplans(1:8, 1:4, 0, 5)

第一次运行的结果是(由于垃圾回收机制的影响,时间会有很大差异)
  1.589867 seconds (22.02 M allocations: 1.127 GB, 46.50% gc time)
4×5040000 Array{SubArray{Int64,1,Array{Int64,1},Tuple{UnitRange{Int64}},true},2}:
 Int64[]      Int64[]      Int64[]      Int64[]      Int64[]      Int64[]      …  [8,7,6,5,4]  [8,7,6,5,4]  [8,7,6,5,4]  [8,7,6,5,4]  [8,7,6,5,4]
 Int64[]      Int64[]      Int64[]      Int64[]      Int64[]      Int64[]         [1,3,2]      [2,1,3]      [2,3,1]      [3,1,2]      [3,2,1]    
 [1,2,3]      [1,2,3]      [1,2,3]      [1,2,3]      [1,2,3]      [1,2,3]         Int64[]      Int64[]      Int64[]      Int64[]      Int64[]    
 [4,5,6,7,8]  [4,5,6,8,7]  [4,5,7,6,8]  [4,5,7,8,6]  [4,5,8,6,7]  [4,5,8,7,6]     Int64[]      Int64[]      Int64[]      Int64[]      Int64[] 

我没有彻底测试这些更改。此外,我不理解为什么 s = copy(s)combinations 中有效,但在 permutations 中无效。有趣的是,如果我尝试使原始版本类型稳定(仍然 >40秒,gc 时间为 85%),只会有微不足道的改进。


谢谢Tim。是否有将类型稳定的组合数学成为标准包的计划?感觉这是非常重要的事情。 - Dan Getz
如果您能编写一些测试来展示您的版本相对于当前标准的优越性,那么请考虑在Combinatorics.jl存储库中提交拉取请求! - Kevin L. Keys
我只是个用户,对于Julia的开发并不了解太多。我猜,增加更多功能也很重要,并且一些当前的类型不稳定性问题可能会随着基础的改进而消失,例如,已经有一个修复了混合参数类型的vcat的PR。至于其他问题,报告问题甚至作为用户进行修复可能是个好主意。我刚刚发现使用PkgDev.submit(pkgname)来进行修复非常方便。 - tim

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接