如何使Julia代码更加高效？目前它的性能甚至不如Python。

Question

如何使Julia代码更加高效？目前它的性能甚至不如Python。

4

我是一个新手，刚开始接触Julia，并且听说它有惊人的性能。但我还没有体验到那些承诺的性能。我尝试了许多增强性能的方法，这些方法在《JULIA HIGH PERFORMANCE》一书中有描述，但这使得代码的可读性稍微降低了一点。但是，至少在基准测试案例中，我的Python代码比我的Julia代码快3倍以上。要么是我的代码做错了什么，在Julia中犯了罪，要么就是Julia无法胜任。请证明后者是错误的。

在代码中，我正在尝试将不同的球分配到不同的盒子中，并且每个盒子的容量都有最大和最小限制。放置球的顺序也很重要。我需要在最短的时间内生成所有可能的分配，并满足给定的约束条件。

PYTHON CODE:

import itertools
import time

max_balls = 5
min_balls = 0

def get_assignments(balls, boxes, assignments=[[]]):
    all_assignments = []
    upper_ball_limit = len(balls)
    if upper_ball_limit > max_balls:
        upper_ball_limit = max_balls

    n_boxes = len(boxes)
    lower_ball_limit = len(balls) - upper_ball_limit * (n_boxes - 1)
    if lower_ball_limit < min_balls:
        lower_ball_limit = min_balls

    if len(boxes) == 0:
        raise Exception("No delivery boys found")

    elif len(boxes) == 1:
        for strategy in itertools.permutations(balls, upper_ball_limit):
            #             valid = evaluate_strategy(strategy, db_id)
            for subplan in assignments:
                subplan_copy = subplan[:]
                box_id = boxes[0]
                subplan_copy.append((box_id, strategy))
                all_assignments.append(subplan_copy)
        return all_assignments
    else:
        box_id = boxes[0]
        for i in range(lower_ball_limit, upper_ball_limit+ 1):
            for strategy in itertools.permutations(balls, i):
                temp_plans = []
                for subplan in assignments:
                    subplan_copy = subplan[:]
                    subplan_copy.append((box_id, strategy))
                    temp_plans.append(subplan_copy)
                    remaining_balls = set(balls).difference(strategy)
                    remaining_boxes = list(set(boxes).difference([box_id]))
                    if remaining_balls:
                        all_assignments.extend(get_assignments(remaining_balls, remaining_boxes, temp_plans))
                    else:
                        all_assignments.extend(temp_plans)
        return all_assignments

balls = range(1, 9)
boxes = [1, 2, 3, 4]

t = time.time()
all_assignments = get_assignments(balls, boxes)

print('Number of assignments: %s' % len(all_assignments))
print('Time taken: %s' % (time.time()-t))

以下是我尝试编写上述内容的JULIA代码：

#!/usr/bin/env julia

using Combinatorics

const max_balls=5
const min_balls=0

function plan_assignments(balls::Vector{Int32}, boxes ; plans=[Vector{Tuple{Int32,Array{Int32,1}}}(length(boxes))])
const n_boxes = length(boxes)
const n_balls = length(balls)
const n_plans = length(plans)

if n_boxes*max_balls < n_balls
  print("Invalid Inputs: Number of balls exceed the number of boxes.")
end

all_plans = Vector{Tuple{Int32,Array{Int32,1}}}[]
upper_box_limit = n_balls
if upper_box_limit > max_balls
    upper_box_limit = max_balls
end
lower_box_limit = n_balls - upper_box_limit * (n_boxes-1)

if lower_box_limit < min_balls
    lower_box_limit = min_balls
end

if n_boxes == 1
    box_id = boxes[1]
    @inbounds for strategy in Combinatorics.permutations(balls, upper_box_limit)
        @inbounds for subplan in plans
            subplan = subplan[:]
            subplan[tn_boxes - n_boxes + 1] = (box_id, strategy)
            all_plans = push!(all_plans, subplan)
        end
    end
    return all_plans

else
    box_id = boxes[1]
    @inbounds for i in lower_box_limit:upper_box_limit
        @inbounds for strategy in Combinatorics.permutations(balls, i)
            temp_plans = Array{Vector{Tuple{Int32,Array{Int32,1}}},1}(n_plans)
            # temp_plans = []

            @inbounds for (i,subplan) in zip(1:n_plans, plans)
                subplan = subplan[:]
                subplan[tn_boxes - n_boxes + 1] = (box_id, strategy)
                temp_plans[i] = subplan
                # subplan = push!(subplan, (db_id, strategy))
                # temp_plans = push!(temp_plans, subplan)

                remaining_balls = filter((x) -> !(x in strategy), balls)
                remaining_boxes = filter((x) -> x != box_id , boxes)

                if length(remaining_balls) > 0
                  @inbounds for plan in plan_assignments(remaining_balls, remaining_boxes, plans=temp_plans)
                    push!(all_plans, plan)
                  end
                    # append!(all_plans, plan_assignments(remaining_orders, remaining_delivery_boys, plans=temp_plans))
                else
                  @inbounds for plan in temp_plans
                    push!(all_plans, plan)
                  end
                    # append!(all_plans, temp_plans)

                end
            end
        end
    end
    return all_plans
end
end

balls = Int32[1,2,3,4,5,6,7,8]
boxes = Int32[1,2,3,4]

const tn_boxes = length(boxes)

@timev all_plans = plan_assignments(balls, boxes)
print(length(all_plans))

我的基准测试时间如下：

对于Python：

Number of assignments: 5040000
Time taken: 22.5003659725

对于Julia语言：（这是在不考虑编译时间的情况下。）

76.986338 seconds (122.94 M allocations: 5.793 GB, 77.01% gc time)
elapsed time (ns): 76986338257
gc time (ns):      59287603693
bytes allocated:   6220111360
pool allocs:       122932049
non-pool GC allocs:10587
malloc() calls:    11
realloc() calls:   18
GC pauses:         270
full collections:  28

- manofsins

6

77%的时间被用于垃圾回收...看起来问题在于你创建了大量的对象。CPython通过使用引用计数和GC一起工作来避免这样的大开销。请注意，77秒中的23%约为17秒，因此如果不计算在gc上花费的时间，Julia将比CPython更快...不幸的是我不知道Julia，所以我无法告诉你为什么gc会花费如此多的时间。 - Bakuriu

你需要在 subplan = subplan[:] 中复制 subplan 吗？ - Alexander Morley

1

你有没有查看过像[1,2,3]和[1,2,3]这样的简单示例的数据结构内容？我看到了一些#undef。输出应该是什么样子的？另外，你所拥有的数据结构似乎太复杂了。 - David P. Sanders

1

过度分配和垃圾回收器使用大量时间是类型不稳定的预警信号（因为所有事物都必须被装箱，导致出现许多临时变量）。 @code_warntype 的输出是什么？您能够使用 ProfileView 显示瓶颈在哪里吗？ - Chris Rackauckas

不相关的：在“if”、“elif”中应该用“n_boxes”代替“len(boxes)”。 - user

显示剩余9条评论

2个回答

5

我对Dan Getz的代码进行了一些微小的修改，以使其类型稳定。主要问题是Combinatorics.permutations和Iterators.partition不是类型稳定的，因此我还必须编写类型稳定的版本（我对Combinatorics.combinations的更改实际上是不必要的）。

import Base: start, next, done, eltype, length, size
import Base: iteratorsize, SizeUnknown, IsInfinite, HasLength

import Combinatorics: factorial, combinations

# Copied from Combinatorics
# Only one small change in the `nextpermutation` method

immutable Permutations{T}
    a::T
    t::Int
end

eltype{T}(::Type{Permutations{T}}) = Vector{eltype(T)}

length(p::Permutations) = (0 <= p.t <= length(p.a))?factorial(length(p.a), length(p.a)-p.t):0

"""
Generate all permutations of an indexable object. Because the number of permutations can be very large, this function returns an iterator object. Use `collec
t(permutations(array))` to get an array of all permutations.
"""
permutations(a) = Permutations(a, length(a))

"""
Generate all size t permutations of an indexable object.
"""
function permutations(a, t::Integer)
    if t < 0
        t = length(a) + 1
    end
    Permutations(a, t)
end

start(p::Permutations) = [1:length(p.a);]
next(p::Permutations, s) = nextpermutation(p.a, p.t ,s)

function nextpermutation(m, t, state)
    s = copy(state)                     # was s = copy(s) a few lines down
    perm = [m[s[i]] for i in 1:t]
    n = length(s)
    if t <= 0
        return(perm, [n+1])
    end
    if t < n
        j = t + 1
        while j <= n &&  s[t] >= s[j]; j+=1; end
    end
    if t < n && j <= n
        s[t], s[j] = s[j], s[t]
    else
        if t < n
            reverse!(s, t+1)
        end
        i = t - 1
        while i>=1 && s[i] >= s[i+1]; i -= 1; end
        if i > 0
            j = n
            while j>i && s[i] >= s[j]; j -= 1; end
            s[i], s[j] = s[j], s[i]
            reverse!(s, i+1)
        else
            s[1] = n+1
        end
    end
    return(perm, s)
end

done(p::Permutations, s) = !isempty(s) && max(s[1], p.t) > length(p.a) ||  (isempty(s) && p.t > 0)  

# Copied from Iterators
# Returns an `Array` of `Array`s instead of `Tuple`s now

immutable Partition{I}
    xs::I
    step::Int
    n::Int

end

iteratorsize{T<:Partition}(::Type{T}) = SizeUnknown()

eltype{I}(::Type{Partition{I}}) = Vector{eltype(I)}

function partition{I}(xs::I, n::Int, step::Int = n)
    Partition(xs, step, n)
end

function start{I}(it::Partition{I})
    N = it.n
    p = Vector{eltype(I)}(N)
    s = start(it.xs)
    for i in 1:(N - 1)
        if done(it.xs, s)
            break
        end
        (p[i], s) = next(it.xs, s)
    end
    (s, p)
end

function next{I}(it::Partition{I}, state)
    N = it.n
    (s, p0) = state
    (x, s) = next(it.xs, s)
    ans = p0; ans[end] = x

    p = similar(p0)
    overlap = max(0, N - it.step)
    for i in 1:overlap
        p[i] = ans[it.step + i]
    end

    # when step > n, skip over some elements
    for i in 1:max(0, it.step - N)
        if done(it.xs, s)
            break
        end
        (x, s) = next(it.xs, s)
    end

    for i in (overlap + 1):(N - 1)
        if done(it.xs, s)
            break
        end

        (x, s) = next(it.xs, s)
        p[i] = x
    end

    (ans, (s, p))
end

done(it::Partition, state) = done(it.xs, state[1])

# Copied from the answer of Dan Getz
# Added types to comprehensions and used Vector{Int} instead of Int in vcat    

typealias PlanGrid Matrix{SubArray{Int,1,Array{Int,1},Tuple{UnitRange{Int}},true}}

histograms(n,m) = [diff(vcat([0],x,[n+m])).-1 for x in combinations([1:n+m-1;],m-1)]

good_histograms(n,m,minval,maxval) = 
  Vector{Int}[h for h in histograms(n,m) if all(maxval.>=h.>=minval)]

minballs = 0
maxballs = 5

function assignmentplans(balls,boxes,minballs,maxballs)
  nballs, nboxes = length(balls),length(boxes)
  nperms = factorial(nballs)
  partslist = good_histograms(nballs,nboxes,minballs,maxballs)
  plans = PlanGrid(nboxes,nperms*length(partslist))
  permutationvector = vec([balls[p[i]] for i=1:nballs,p in permutations(balls)])
  i1 = 1
  for parts in partslist
    i2 = 0
    breaks = UnitRange{Int64}[(x[1]+1:x[2]) for x in partition(cumsum(vcat([0],parts)),2,1)]
    for i=1:nperms
      for j=1:nboxes
        plans[j,i1] = view(permutationvector,breaks[j]+i2)
      end
      i1 += 1
      i2 += nballs
    end
  end
  return plans
end

@time plans = assignmentplans(1:8, 1:4, 0, 5)

第一次运行的结果是（由于垃圾回收机制的影响，时间会有很大差异）

  1.589867 seconds (22.02 M allocations: 1.127 GB, 46.50% gc time)
4×5040000 Array{SubArray{Int64,1,Array{Int64,1},Tuple{UnitRange{Int64}},true},2}:
 Int64[]      Int64[]      Int64[]      Int64[]      Int64[]      Int64[]      …  [8,7,6,5,4]  [8,7,6,5,4]  [8,7,6,5,4]  [8,7,6,5,4]  [8,7,6,5,4]
 Int64[]      Int64[]      Int64[]      Int64[]      Int64[]      Int64[]         [1,3,2]      [2,1,3]      [2,3,1]      [3,1,2]      [3,2,1]    
 [1,2,3]      [1,2,3]      [1,2,3]      [1,2,3]      [1,2,3]      [1,2,3]         Int64[]      Int64[]      Int64[]      Int64[]      Int64[]    
 [4,5,6,7,8]  [4,5,6,8,7]  [4,5,7,6,8]  [4,5,7,8,6]  [4,5,8,6,7]  [4,5,8,7,6]     Int64[]      Int64[]      Int64[]      Int64[]      Int64[]

我没有彻底测试这些更改。此外，我不理解为什么 s = copy(s) 在 combinations 中有效，但在 permutations 中无效。有趣的是，如果我尝试使原始版本类型稳定（仍然 >40秒，gc 时间为 85%），只会有微不足道的改进。

- tim

谢谢Tim。是否有将类型稳定的组合数学成为标准包的计划？感觉这是非常重要的事情。 - Dan Getz

如果您能编写一些测试来展示您的版本相对于当前标准的优越性，那么请考虑在Combinatorics.jl存储库中提交拉取请求！ - Kevin L. Keys

我只是个用户，对于Julia的开发并不了解太多。我猜，增加更多功能也很重要，并且一些当前的类型不稳定性问题可能会随着基础的改进而消失，例如，已经有一个修复了混合参数类型的vcat的PR。至于其他问题，报告问题甚至作为用户进行修复可能是个好主意。我刚刚发现使用PkgDev.submit(pkgname)来进行修复非常方便。 - tim

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Dan Getz · Accepted Answer

这是 Julia 的另一个版本，更符合惯用语法，修改以避免递归和一些内存分配：

using Iterators
using Combinatorics

histograms(n,m) = [diff([0;x;n+m]).-1 for x in combinations([1:n+m-1;],m-1)]

good_histograms(n,m,minval,maxval) = 
  [h for h in histograms(n,m) if all(maxval.>=h.>=minval)]

typealias PlanGrid Matrix{SubArray{Int,1,Array{Int,1},Tuple{UnitRange{Int}},true}}

function assignmentplans(balls,boxes,minballs,maxballs)
  nballs, nboxes = length(balls),length(boxes)
  nperms = factorial(nballs)
  partslist = good_histograms(nballs,nboxes,minballs,maxballs)
  plans = PlanGrid(nboxes,nperms*length(partslist))
  permutationvector = vec([balls[p[i]] for i=1:nballs,p in permutations(balls)])
  i1 = 1
  for parts in partslist
    i2 = 0
    breaks = [(x[1]+1:x[2]) for x in partition(cumsum([0;parts]),2,1)]
    for i=1:nperms
      for j=1:nboxes
        plans[j,i1] = view(permutationvector,breaks[j]+i2)
      end
      i1 += 1
      i2 += nballs
    end
  end
  return plans
end

关于时间，我们得到：

julia> assignmentplans([1,2],[1,2],0,2)   # a simple example
2×6 Array{SubArray{Int64,1,Array{Int64,1},Tuple{UnitRange{Int64}},true},2}:
 Int64[]  Int64[]  [1]  [2]  [1,2]    [2,1]  
 [1,2]    [2,1]    [2]  [1]  Int64[]  Int64[]

julia> @time plans = assignmentplans([1:8;],[1:4;],0,5);
  8.279623 seconds (82.28 M allocations: 2.618 GB, 14.07% gc time)

julia> size(plans)
(4,5040000)

julia> plans[:,1000000]                    # sample ball configuration
4-element Array{SubArray{Int64,1,Array{Int64,1},Tuple{UnitRange{Int64}},true},1}:
 Int64[]    
 [7,3,8,2,5]
 [4]        
 [6,1]

当然，时间会因设置的不同而有所变化，但这应该会更快。虽然它并不是完全一致的比较，但它计算的是相同的内容。欢迎在评论中分享海报作者（或其他人）的机器上的时间。