我在计算集群上运行Python脚本遇到了问题,如果这是一个天真的错误,请提前向您道歉。我不确定问题是否源于我错误地配置了自己的conda虚拟环境,但无论如何,当我运行以下命令时都会重现问题:
srun -p use-everything --pty python test.py
我遇到了错误。
Traceback (most recent call last):
File "test.py", line 4, in <module>
from acme.agents.tf import dqn
File "/om2/user/armas/anaconda/envs/dist_rl/lib/python3.7/site-packages/acme/agents/tf/dqn/__init__.py", line 18, in <module>
from acme.agents.tf.dqn.agent import DQN
File "/om2/user/armas/anaconda/envs/dist_rl/lib/python3.7/site-packages/acme/agents/tf/dqn/agent.py", line 20, in <module>
from acme import datasets
File "/om2/user/armas/anaconda/envs/dist_rl/lib/python3.7/site-packages/acme/datasets/__init__.py", line 17, in <module>
from acme.datasets.reverb import make_reverb_dataset
File "/om2/user/armas/anaconda/envs/dist_rl/lib/python3.7/site-packages/acme/datasets/reverb.py", line 22, in <module>
from acme.adders import reverb as adders
File "/om2/user/armas/anaconda/envs/dist_rl/lib/python3.7/site-packages/acme/adders/reverb/__init__.py", line 21, in <module>
from acme.adders.reverb.base import DEFAULT_PRIORITY_TABLE
File "/om2/user/armas/anaconda/envs/dist_rl/lib/python3.7/site-packages/acme/adders/reverb/base.py", line 26, in <module>
import reverb
File "/om2/user/armas/anaconda/envs/dist_rl/lib/python3.7/site-packages/reverb/__init__.py", line 27, in <module>
from reverb import item_selectors as selectors
File "/om2/user/armas/anaconda/envs/dist_rl/lib/python3.7/site-packages/reverb/item_selectors.py", line 19, in <module>
from reverb import pybind
File "/om2/user/armas/anaconda/envs/dist_rl/lib/python3.7/site-packages/reverb/pybind.py", line 1, in <module>
import tensorflow as _tf; from .libpybind import *; del _tf
ImportError: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory
srun: error: node014: task 0: Exited with exit code 1
在我的本地机器上,当我运行虚拟环境时,也遇到了同样的问题,我通过
sudo apt-get install libpython3.7
简单地解决了这个问题。以下是其他可能有帮助的信息。
$which libpython
/usr/bin/which: no libpython in (/om2/user/armas/anaconda/envs/dist_rl/bin:/om2/user/armas/anaconda/bin:/om2/user/armas/anaconda/condabin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin)
$echo $PATH
/om2/user/armas/anaconda/envs/dist_rl/bin:/om2/user/armas/anaconda/bin:/om2/user/armas/anaconda/condabin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
$echo $LD_LIBRARY_PATH
/om2/user/armas/anaconda/bin/
当我更改我的
LD_LIBRARY_PATH
时,即export LD_LIBRARY_PATH=/om2/user/armas/anaconda/lib:$LD_LIBRARY_PATH
并运行脚本时,我的anaconda认为我没有安装jax。我运行了pip install dm-acme[jax],现在当我运行脚本时,它会说我没有名为atari_py的模块。我认为这将引导我进入依赖链。我使用此链接安装acme,但是使用了conda环境。我的系统管理员说这可能是因为acme不适用于anaconda。如果是这种情况,为什么会这样?
如果我遗漏了任何信息,请让我知道,并确保添加,再次感谢!