如何开始Julia强化学习实验?

4

我是Julia和JuliaReinforcementLearning的新手,只想开始在https://juliareinforcementlearning.org/docs/experiments/提供的实验。

因此,我创建了一个类似于以下内容的文件:

using ReinforcementLearning
using StableRNGs
using Flux
using Flux.Losses

function RL.Experiment(
    ::Val{:JuliaRL},
    ::Val{:BasicDQN},
    ::Val{:CartPole},
    ::Nothing;
    seed = 123,
)
    rng = StableRNG(seed)
    env = CartPoleEnv(; T = Float32, rng = rng)
    ns, na = length(state(env)), length(action_space(env))

    policy = Agent(
        policy = QBasedPolicy(
            learner = BasicDQNLearner(
                approximator = NeuralNetworkApproximator(
                    model = Chain(
                        Dense(ns, 128, relu; init = glorot_uniform(rng)),
                        Dense(128, 128, relu; init = glorot_uniform(rng)),
                        Dense(128, na; init = glorot_uniform(rng)),
                    ) |> gpu,
                    optimizer = ADAM(),
                ),
                batch_size = 32,
                min_replay_history = 100,
                loss_func = huber_loss,
                rng = rng,
            ),
            explorer = EpsilonGreedyExplorer(
                kind = :exp,
                ϵ_stable = 0.01,
                decay_steps = 500,
                rng = rng,
            ),
        ),
        trajectory = CircularArraySARTTrajectory(
            capacity = 1000,
            state = Vector{Float32} => (ns,),
        ),
    )
    stop_condition = StopAfterStep(10_000, is_show_progress=!haskey(ENV, "CI"))
    hook = TotalRewardPerEpisode()
    Experiment(policy, env, stop_condition, hook, "# BasicDQN <-> CartPole")
end

将文件命名为"JuliaRL_BasicDQN_CartPole.jl"

还有第二个文件如下:

include("JuliaRL_BasicDQN_CartPole.jl")
using Plots
pyplot() 
ex = E`JuliaRL_BasicDQN_CartPole`
run(ex)
plot(ex.hook.rewards)
savefig("assets/JuliaRL_BasicDQN_CartPole.png") #hide

命名为"test.jl"。 (->一个问题:Exxx究竟是什么意思?)

实验似乎开始了,它显示出这段文字:

BasicDQN <-> CartPole
≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡

但是它停在这个错误信息:

LoadError: UndefVarError: params not defined
Stacktrace:
  [1] update!(learner::BasicDQNLearner{NeuralNetworkApproximator{Chain{Tuple{Dense{typeof(relu), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(relu), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(identity), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}, Adam}, typeof(huber_loss), StableRNGs.LehmerRNG}, batch::NamedTuple{(:state, :action, :reward, :terminal, :next_state), Tuple{Matrix{Float32}, Vector{Int64}, Vector{Float32}, Vector{Bool}, Matrix{Float32}}})
    @ ReinforcementLearningZoo ~/.julia/packages/ReinforcementLearningZoo/tvfq9/src/algorithms/dqns/basic_dqn.jl:78
  [2] update!(learner::BasicDQNLearner{NeuralNetworkApproximator{Chain{Tuple{Dense{typeof(relu), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(relu), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(identity), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}, Adam}, typeof(huber_loss), StableRNGs.LehmerRNG}, traj::CircularArraySARTTrajectory{NamedTuple{(:state, :action, :reward, :terminal), Tuple{CircularArrayBuffers.CircularArrayBuffer{Float32, 2, Matrix{Float32}}, CircularArrayBuffers.CircularVectorBuffer{Int64, Vector{Int64}}, CircularArrayBuffers.CircularVectorBuffer{Float32, Vector{Float32}}, CircularArrayBuffers.CircularVectorBuffer{Bool, Vector{Bool}}}}})
    @ ReinforcementLearningZoo ~/.julia/packages/ReinforcementLearningZoo/tvfq9/src/algorithms/dqns/basic_dqn.jl:65
  [3] update!
    @ ~/.julia/packages/ReinforcementLearningCore/yeRLW/src/policies/q_based_policies/learners/abstract_learner.jl:35 [inlined]
  [4] update!
    @ ~/.julia/packages/ReinforcementLearningCore/yeRLW/src/policies/q_based_policies/q_based_policy.jl:67 [inlined]
  [5] (::Agent{QBasedPolicy{BasicDQNLearner{NeuralNetworkApproximator{Chain{Tuple{Dense{typeof(relu), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(relu), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(identity), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}, Adam}, typeof(huber_loss), StableRNGs.LehmerRNG}, EpsilonGreedyExplorer{:exp, false, StableRNGs.LehmerRNG}}, CircularArraySARTTrajectory{NamedTuple{(:state, :action, :reward, :terminal), Tuple{CircularArrayBuffers.CircularArrayBuffer{Float32, 2, Matrix{Float32}}, CircularArrayBuffers.CircularVectorBuffer{Int64, Vector{Int64}}, CircularArrayBuffers.CircularVectorBuffer{Float32, Vector{Float32}}, CircularArrayBuffers.CircularVectorBuffer{Bool, Vector{Bool}}}}}})(stage::PreActStage, env::CartPoleEnv{Base.OneTo{Int64}, Float32, Int64, StableRNGs.LehmerRNG}, action::Int64)
    @ ReinforcementLearningCore ~/.julia/packages/ReinforcementLearningCore/yeRLW/src/policies/agents/agent.jl:78
  [6] _run(policy::Agent{QBasedPolicy{BasicDQNLearner{NeuralNetworkApproximator{Chain{Tuple{Dense{typeof(relu), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(relu), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(identity), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}, Adam}, typeof(huber_loss), StableRNGs.LehmerRNG}, EpsilonGreedyExplorer{:exp, false, StableRNGs.LehmerRNG}}, CircularArraySARTTrajectory{NamedTuple{(:state, :action, :reward, :terminal), Tuple{CircularArrayBuffers.CircularArrayBuffer{Float32, 2, Matrix{Float32}}, CircularArrayBuffers.CircularVectorBuffer{Int64, Vector{Int64}}, CircularArrayBuffers.CircularVectorBuffer{Float32, Vector{Float32}}, CircularArrayBuffers.CircularVectorBuffer{Bool, Vector{Bool}}}}}}, env::CartPoleEnv{Base.OneTo{Int64}, Float32, Int64, StableRNGs.LehmerRNG}, stop_condition::StopAfterStep{ProgressMeter.Progress}, hook::TotalRewardPerEpisode)
    @ ReinforcementLearningCore ~/.julia/packages/ReinforcementLearningCore/yeRLW/src/core/run.jl:29
  [7] run(policy::Agent{QBasedPolicy{BasicDQNLearner{NeuralNetworkApproximator{Chain{Tuple{Dense{typeof(relu), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(relu), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(identity), CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}, Adam}, typeof(huber_loss), StableRNGs.LehmerRNG}, EpsilonGreedyExplorer{:exp, false, StableRNGs.LehmerRNG}}, CircularArraySARTTrajectory{NamedTuple{(:state, :action, :reward, :terminal), Tuple{CircularArrayBuffers.CircularArrayBuffer{Float32, 2, Matrix{Float32}}, CircularArrayBuffers.CircularVectorBuffer{Int64, Vector{Int64}}, CircularArrayBuffers.CircularVectorBuffer{Float32, Vector{Float32}}, CircularArrayBuffers.CircularVectorBuffer{Bool, Vector{Bool}}}}}}, env::CartPoleEnv{Base.OneTo{Int64}, Float32, Int64, StableRNGs.LehmerRNG}, stop_condition::StopAfterStep{ProgressMeter.Progress}, hook::TotalRewardPerEpisode)
    @ ReinforcementLearningCore ~/.julia/packages/ReinforcementLearningCore/yeRLW/src/core/run.jl:10
  [8] run(x::Experiment; describe::Bool)
    @ ReinforcementLearningCore ~/.julia/packages/ReinforcementLearningCore/yeRLW/src/core/experiment.jl:56
  [9] run(x::Experiment)
    @ ReinforcementLearningCore ~/.julia/packages/ReinforcementLearningCore/yeRLW/src/core/experiment.jl:54
 [10] top-level scope
    @ ~/Documents/julia/reinforcement/test.jl:9
 [11] include(fname::String)
    @ Base.MainInclude ./client.jl:476
 [12] top-level scope
    @ REPL[6]:1
 [13] top-level scope
    @ ~/.julia/packages/CUDA/DfvRa/src/initialization.jl:52
in expression starting at /home/std/Documents/julia/reinforcement/test.jl:9

那么运行实验还需要定义哪些参数呢?

谢谢!


1
UndefVarError: params 的最可能原因是代码是针对旧版本的Flux编写的,该版本导出了该符号。添加 using Flux: params 应该可以解决问题。 - mcabbott
谢谢。这个有用,但下一个错误是: “LoadError: UndefVarError: Dense未定义”(->第18行) - Mike75
1
你可能已经按照mcabbott的建议更改了现有的using Flux行,但是请将其作为单独的一行添加。也就是说,你需要同时添加using Fluxusing Flux: params - Sundar R
没错。我确实使用了Flux替换而不是添加它。 现在我尝试了两种变体: 使用Flux,然后使用Flux:params 和 Flux:params,然后使用Flux 但是第一个错误“ERROR: LoadError: UndefVarError: params not defined”在两种情况下都会再次出现。 - Mike75
有一件事我发现了:如果我运行脚本三次:(1) 将 #using Flux 这一行注释掉,会出现密集错误。(2) 然后将 using Flux 这一行取消注释:会出现参数错误。然后 (3) 再次将 using Flux 这一行注释掉:再次出现参数错误。(-> 在 Ubuntu 控制台中通过运行命令 include("test.jl") 来运行 Julia)。第一次运行 include 命令时需要更长时间,并且在开始时会显示一个进度条,但在第二次和以后的执行中不再显示。 - Mike75
在神经网络逼近器.jl(以Flux.params为例)中提到了Flux.params。 - Mike75
1个回答

0

猜猜,我在这里找到了答案: https://juliareinforcementlearning.org/ (在“3行入门!”中):

因此,进行演示的第一步是:

]add ReinforcementLearningExperiments

然后添加

using ReinforcementLearningExperiments

回到文件JuliaRL_BasicDQN_CartPole.jl的第一行

其次是查看.julia/packages/ReinforcementLearningExperiments/dWZym/src中的文件ReinforcementLearningExperiments.jl

导入/使用-模块如下:

using ReinforcementLearning
using Requires
using StableRNGs
using Flux
using Flux.Losses
using Setfield
using Dates
using TensorBoardLogger
using Logging
using Distributions
using IntervalSets
using BSON

对于DQN-Demo来说,这个简短的表单已经足够了。 因此,第一个文件“JuliaRL_BasicDQN_CartPole.jl”的修正版本现在看起来像这样:

using ReinforcementLearningExperiments
using ReinforcementLearning
using StableRNGs
using Flux
using Flux.Losses
using Dates
using Logging


function RL.Experiment(
    ::Val{:JuliaRL},
    ::Val{:BasicDQN},
    ::Val{:CartPole},
    ::Nothing;
    seed = 123,
)
    rng = StableRNG(seed)
    env = CartPoleEnv(; T = Float32, rng = rng)
    ns, na = length(state(env)), length(action_space(env))

    policy = Agent(
        policy = QBasedPolicy(
            learner = BasicDQNLearner(
                approximator = NeuralNetworkApproximator(
                    model = Chain(
                        Dense(ns, 128, relu; init = glorot_uniform(rng)),
                        Dense(128, 128, relu; init = glorot_uniform(rng)),
                        Dense(128, na; init = glorot_uniform(rng)),
                    ) |> gpu,
                    optimizer = ADAM(),
                ),
                batch_size = 32,
                min_replay_history = 100,
                loss_func = huber_loss,
                rng = rng,
            ),
            explorer = EpsilonGreedyExplorer(
                kind = :exp,
                ϵ_stable = 0.01,
                decay_steps = 500,
                rng = rng,
            ),
        ),
        trajectory = CircularArraySARTTrajectory(
            capacity = 1000,
            state = Vector{Float32} => (ns,),
        ),
    )
    stop_condition = StopAfterStep(10_000, is_show_progress=!haskey(ENV, "CI"))
    hook = TotalRewardPerEpisode()
    Experiment(policy, env, stop_condition, hook, "# BasicDQN <-> CartPole")
end

通过这些修改,实验进行了模拟。


----------> 注释: 我刚刚在这个笔记本中找到了有关强化学习和Julia强化实验的所有方面的详细描述:

https://github.com/JuliaReinforcementLearning/ReinforcementLearningAnIntroduction.jl/tree/master/notebooks

我的Julia版本1.8.2可以很好地运行这些笔记本,并且笔记本注释中有很多文本解释,可以澄清如何开始实验和使用环境的所有问题。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接