Cart-Pole Python 性能比较

Question

Cart-Pole Python 性能比较

6

我正在比较使用Python 3.7和Julia 1.2进行的小车杆模拟。在Python中，该模拟是作为类对象编写的，如下所示，在Julia中则只是一个函数。我发现使用Julia解决问题需要约0.2秒，这比Python慢得多。我不太了解Julia，也不知道具体原因是什么。我猜测与编译或循环设置有关。

import math
import random
from collections import namedtuple

RAD_PER_DEG = 0.0174533
DEG_PER_RAD = 57.2958

State = namedtuple('State', 'x x_dot theta theta_dot')

class CartPole:
    """ Model for the dynamics of an inverted pendulum
    """
    def __init__(self):
        self.gravity   = 9.8
        self.masscart  = 1.0
        self.masspole  = 0.1
        self.length    = 0.5   # actually half the pole's length
        self.force_mag = 10.0
        self.tau       = 0.02  # seconds between state updates

        self.x         = 0
        self.x_dot     = 0
        self.theta     = 0
        self.theta_dot = 0

    @property
    def state(self):
        return State(self.x, self.x_dot, self.theta, self.theta_dot)

    def reset(self, x=0, x_dot=0, theta=0, theta_dot=0):
        """ Reset the model of a cartpole system to it's initial conditions
        "   theta is in radians
        """
        self.x         = x
        self.x_dot     = x_dot
        self.theta     = theta
        self.theta_dot = theta_dot

    def step(self, action):
        """ Move the state of the cartpole simulation forward one time unit
        """
        total_mass      = self.masspole + self.masscart
        pole_masslength = self.masspole * self.length

        force           = self.force_mag if action else -self.force_mag
        costheta        = math.cos(self.theta)
        sintheta        = math.sin(self.theta)

        temp = (force + pole_masslength * self.theta_dot ** 2 * sintheta) / total_mass

        # theta acceleration
        theta_dotdot = (
            (self.gravity * sintheta - costheta * temp)
            / (self.length *
               (4.0/3.0 - self.masspole * costheta * costheta /
                total_mass)))

        # x acceleration
        x_dotdot = temp - pole_masslength * theta_dotdot * costheta / total_mass

        self.x         += self.tau * self.x_dot
        self.x_dot     += self.tau * x_dotdot
        self.theta     += self.tau * self.theta_dot
        self.theta_dot += self.tau * theta_dotdot

        return self.state

运行模拟需要使用以下代码：

from cartpole import CartPole
import time
cp = CartPole()
start = time.time()
for i in range(100000):
      cp.step(True)
end = time.time()
print(end-start)

这段代码是Julia语言编写的

function cartpole(state, action)
"""Cart and Pole simulation in discrete time
Inputs: cartpole( state, action )
state: 1X4 array [cart_position, cart_velocity, pole_angle, pole_velocity]
action: Boolean True or False where true is a positive force and False is a negative force
"""

gravity   = 9.8
masscart  = 1.0
masspole  = 0.1
l    = 0.5   # actually half the pole's length
force_mag = 10.0
tau       = 0.02  # seconds between state updates

# x         = 0
# x_dot     = 0
# theta     = 0
# theta_dot = 0

x         = state[1]
x_dot     = state[2]
theta     = state[3]
theta_dot = state[4]


total_mass = masspole + masscart
pole_massl = masspole * l

if action == 0
 force = force_mag
else
 force = -force_mag
end

costheta = cos(theta)
sintheta = sin(theta)

temp = (force + pole_massl * theta_dot^2 * sintheta) / total_mass

# theta acceleration
theta_dotdot = (gravity * sintheta - costheta * temp)/ (l *(4.0/3.0 - masspole * costheta * costheta / total_mass))

# x acceleration
x_dotdot = temp - pole_massl * theta_dotdot * costheta / total_mass

x         += tau * x_dot
x_dot     += tau * x_dotdot
theta     += tau * theta_dot
theta_dot += tau * theta_dotdot

new_state = [x x_dot theta theta_dot]

return new_state

end

代码如下：

@time include("cartpole.jl")


function run_sim()
"""Runs the cartpole simulation
No inputs or ouputs
Use with @time run_sim() for timing puposes.
"""
 state = [0 0 0 0]
 for i = 1:100000
  state = cartpole( state, 0)
  #print(state)
  #print("\n")
end
end

@time run_sim()

- SneakyPanda 40

1

当我运行你的Julia代码时，第一次运行（包括编译时间）的时间为0.171631秒，然后第二次运行代码时，时间为0.021331秒（即不包括编译时间）。相比之下，当我运行你的Python代码（尽管是在Python 3.6上）时，我得到了0.278秒的时间。不过，正如提到的那样，你还有许多进一步的性能优化可供选择。 - Mason

3个回答

4

好的，我刚刚运行了你的Python和Julia代码，并得到了不同的结果: 对于Julia来说，1000万次迭代需要1.41秒，而对于Python来说，则需要25.5秒。可以看出，Julia快了18倍！

我认为问题可能是@time在全局范围内运行时并不准确 - 你需要进行多秒级别的计时才能得到足够准确的结果。你可以使用BenchmarkTools包来获得小型函数的准确计时。

- Jakob Nissen

3

@time 在全局范围内可能会有轻微的开销，但涉及的时间很长（对于计算机来说，1.41秒就像是永恒），而且开销并不是很大。问题在于第一次运行包括JIT。如果增加迭代次数，则JIT时间保持不变，但执行时间当然会增加。（这就是为什么我们不在基准测试中计算JIT的原因。） - StefanKarpinski

2

标准的性能优化技巧适用：使用点来避免分配，并融合循环。此外，对于这种小数组计算，请考虑使用速度更快的 https://github.com/JuliaArrays/StaticArrays.jl。更多信息请参见https://docs.julialang.org/en/v1/manual/performance-tips/index.html。

- Antoine Levitt

不要只计时一次：使用@time多次来分摊编译时间，或者（更好的选择）使用BenchmarkTools。使用StaticArrays和多个@time可以得到0.006秒的执行时间。 - Antoine Levitt

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- StefanKarpinski · Accepted Answer

你的Python版本在我的笔记本电脑上用了0.21秒。以下是同一系统上原始Julia版本的计时结果：

julia> @time run_sim()
  0.222335 seconds (654.98 k allocations: 38.342 MiB)

julia> @time run_sim()
  0.019425 seconds (100.00 k allocations: 10.681 MiB, 37.52% gc time)

julia> @time run_sim()
  0.010103 seconds (100.00 k allocations: 10.681 MiB)

julia> @time run_sim()
  0.012553 seconds (100.00 k allocations: 10.681 MiB)

julia> @time run_sim()
  0.011470 seconds (100.00 k allocations: 10.681 MiB)

julia> @time run_sim()
  0.025003 seconds (100.00 k allocations: 10.681 MiB, 52.82% gc time)

第一次运行包括JIT编译，需要约0.2秒，而之后的每个运行大约需要10-20毫秒。其中大约有10毫秒的实际计算时间和大约10秒的垃圾回收时间，每四次调用左右会触发一次。这意味着Julia比Python快大约10-20倍，不算JIT编译时间，这对于一个直接移植来说并不差。

为什么在基准测试中不计算JIT时间？因为你实际上不关心运行快速程序（如基准测试）需要多长时间。你正在计时小型基准问题，以推断解决真正重要的速度的更大问题需要多长时间。JIT编译时间是与你编译的代码数量成比例的，而不是与问题大小成比例。因此，在解决您实际上关心的更大问题时，JIT编译仍将只需0.2秒，这对于大问题的执行时间来说是可以忽略不计的。

现在，让我们看看如何使Julia代码更快。这实际上非常简单：使用元组而不是行向量作为状态。所以将状态初始化为state = (0, 0, 0, 0)，然后类似地更新状态：

new_state = (x, x_dot, theta, theta_dot)

就是这样，除此之外代码完全相同。对于这个版本，时间如下:

julia> @time run_sim()
  0.132459 seconds (479.53 k allocations: 24.020 MiB)

julia> @time run_sim()
  0.008218 seconds (4 allocations: 160 bytes)

julia> @time run_sim()
  0.007230 seconds (4 allocations: 160 bytes)

julia> @time run_sim()
  0.005379 seconds (4 allocations: 160 bytes)

julia> @time run_sim()
  0.008773 seconds (4 allocations: 160 bytes)

第一次运行仍包括JIT时间。后续运行现在只需5-10毫秒，比Python版本快大约25-40倍。请注意，几乎没有分配 - 小的、固定的分配仅用于返回值，并且如果从其他代码中的循环调用此函数，则不会触发GC。