设置PyCharm远程conda解释器

Question

设置PyCharm远程conda解释器

16

我正在尝试在MacOS Mojave上为Anaconda的PyCharm 2019.1.2 Pro设置远程conda解释器，但无法使其正常工作。我的现有远程conda环境（v4.5.12）运行在一个Ubuntu 16 EC2机器上，该机器是从Amazon的Deep Learning AMI实例化的。

我尝试了设置ssh解释器，并将其指向：/home/ubuntu/anaconda3/envs/tensorflow_p36/bin/python这是我的conda环境。然后我尝试在此解释器上运行简单的Tensorflow GPU测试，并收到以下消息，强烈表明环境未被激活：（服务器的IP地址和公司名称被故意隐藏）。

ssh://ubuntu@xx.xx.xx.xx:22/home/ubuntu/anaconda3/envs/tensorflow_p36/bin/python -u /home/ubuntu/company/DeepLearning_copy/apps/test_gpu.py
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/company/DeepLearning_copy/apps/test_gpu.py", line 1, in <module>
    import tensorflow as tf
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.

Process finished with exit code 1

当通过SSH进入服务器并运行conda activate tensorflow_p36，然后运行python gpu_test.py时，代码可以完美运行。

我希望有任何解决方法来允许使用现有的远程conda环境进行远程调试。同时，我已经向JetBrains的问题反馈页面和Anaconda社区组提出了问题。

编辑：请参见JetBrains问题页面中的一个潜在解决方法。

- Assif

请提供上下文，说明您尝试了什么，哪些不起作用。 - Diane M

强烈建议检查环境是否已激活。回溯信息表明使用了虚拟环境中的包，即错误是由venv的Tensorflow引起的。看起来错误出现在Tensorflow而不是Python安装中。您可能遇到了Tensorflow / cuda兼容性问题，就像其他一些用户一样（https://github.com/tensorflow/tensorflow/issues/26209）。 - Diane M

感谢@ArthurHavlicek。我相信这种行为与设置PATH方式一致，就像这样，但不运行source activate tensorflow_p36。 - Assif

我还没有验证这一点，但我怀疑（因为您正在使用AWS AMI），这是因为AWS编译了一个优化版本的Tensorflow，在您第一次conda activate您的环境时安装（例如conda activate tensorflow_p36）。也许您可以尝试从pip重新安装tensorflow-gpu并尝试？ - tauculator

4个回答

0

不需要指定“python”路径，我可以像这样指定“activate”路径：

ssh [host] "source ~/anaconda3/bin/activate [name of conda env] ; cd [pick a dir] ; [command]"

对于 [command]，尝试使用 "conda env list" 命令查看已激活的环境。或者您可以执行 "python foo.py" 命令。

您可能需要调整路径 "~/anaconda3/bin/activate"。

- mepster

-1

楼主，可能是有人在你的环境中做了一些事情，导致CUDA安装出现问题，就像其他几个人提到的那样。

我刚在AWS上配置了一个新的深度学习AMI实例 - 这对你来说是一个可行的选择吗？

无论如何，在通过ssh连接到（新配置的）服务器后，我执行了以下步骤：

初始激活

$ conda activate tensorflow_p36
WARNING: First activation might take some time (1+ min).
Installing TensorFlow optimized for your Amazon EC2 instance......
Env where framework will be re-installed: tensorflow_p36
Instance p2.xlarge is identified as a GPU instance, removing tensorflow-serving-cpu
Installation complete.

场景1：在tensorflow_p36 conda环境中运行GPU测试：

按照OP的场景执行以下操作，以确保Tensorflow正常工作。

$ python
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> # Creates a graph.
... a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
>>> b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
>>> c = tf.matmul(a, b)
>>> # Creates a session with log_device_placement set to True.
... sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

Device mapping:
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7
>>> # Runs the op.
... print(sess.run(c))
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
a: (Const): /job:localhost/replica:0/task:0/device:GPU:0
b: (Const): /job:localhost/replica:0/task:0/device:GPU:0
[[22. 28.]
 [49. 64.]]

场景2：停用环境，并调用与环境内相同的python可执行文件。

这应该与配置远程解释器使用特定的python解释器相同。请注意，在sess = tf.Session(...)之后有更多的输出，但仍然可以正常运行。

$ conda deactivate
$ /home/ubuntu/anaconda3/envs/tensorflow_p36/bin/python

Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> # Creates a graph.
... a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
>>> b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
>>> c = tf.matmul(a, b)
>>> # Creates a session with log_device_placement set to True.
... sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
2019-05-31 07:14:23.840474: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-05-31 07:14:23.841300: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55ec160ca020 executing computations on platform CUDA. Devices:
2019-05-31 07:14:23.841334: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla K80, Compute Capability 3.7
2019-05-31 07:14:23.843647: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300060000 Hz
2019-05-31 07:14:23.843845: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55ec16131af0 executing computations on platform Host. Devices:
2019-05-31 07:14:23.843870: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-05-31 07:14:23.844965: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
totalMemory: 11.17GiB freeMemory: 11.11GiB
2019-05-31 07:14:23.844992: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-05-31 07:14:23.845991: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-31 07:14:23.846013: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-05-31 07:14:23.846020: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-05-31 07:14:23.846577: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10805 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7
2019-05-31 07:14:23.847176: I tensorflow/core/common_runtime/direct_session.cc:317] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7

>>> # Runs the op.
... print(sess.run(c))
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
2019-05-31 07:14:25.478310: I tensorflow/core/common_runtime/placer.cc:1059] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0
a: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2019-05-31 07:14:25.478383: I tensorflow/core/common_runtime/placer.cc:1059] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0
b: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2019-05-31 07:14:25.478413: I tensorflow/core/common_runtime/placer.cc:1059] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0
[[22. 28.]
[49. 64.]]

场景3：尝试使用特定的conda环境解释器作为Jetbrains PyCharm中的远程解释器，在PyCharm Python控制台中使用

请注意，输出基本与上述场景2相同，但Tensorflow GPU测试正常工作，不会抛出任何错误。

ssh://ubuntu@XX.XX.XX.XX:22/home/ubuntu/anaconda3/envs/tensorflow_p36/bin/python -u /home/ubuntu/.pycharm_helpers/pydev/pydevconsole.py --mode=server

Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 6.4.0
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56) 
[GCC 7.2.0] on linux

import tensorflow as tf
# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))
2019-05-31 07:17:03.883169: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-05-31 07:17:03.883577: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55be28eef280 executing computations on platform CUDA. Devices:
2019-05-31 07:17:03.883609: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla K80, Compute Capability 3.7
2019-05-31 07:17:03.886035: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300060000 Hz
2019-05-31 07:17:03.886752: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55be28f56d50 executing computations on platform Host. Devices:
2019-05-31 07:17:03.886777: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-05-31 07:17:03.886983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
totalMemory: 11.17GiB freeMemory: 508.38MiB
2019-05-31 07:17:03.887009: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-05-31 07:17:03.887658: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-31 07:17:03.887681: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-05-31 07:17:03.887697: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-05-31 07:17:03.887881: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 283 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7
2019-05-31 07:17:03.889133: I tensorflow/core/common_runtime/direct_session.cc:317] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
2019-05-31 07:17:03.890673: I tensorflow/core/common_runtime/placer.cc:1059] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0
a: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2019-05-31 07:17:03.890718: I tensorflow/core/common_runtime/placer.cc:1059] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0
b: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2019-05-31 07:17:03.890750: I tensorflow/core/common_runtime/placer.cc:1059] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0
[[22. 28.]
[49. 64.]]

- tauculator

谢谢！不幸的是，当我停用并尝试使用该环境运行时，它无法工作（我得到与问题中相同的错误）。因此，只有在激活环境时才能运行，而在未激活时无法运行。 - Assif

@Assif - 如果您提供了一个新实例，它是否有效？在另一个环境中，我遇到了与您相同的问题，但在新提供的实例上一切正常运行。您是否在提供后更改了实例类型，例如p2.xlarge -> 非GPU实例？我怀疑这可能会影响CUDA安装... - tauculator

谢谢@user5042861，我在实例创建后没有更改实例类型。当我创建一个新的实例时，问题仍然存在。 - Assif

-2

我认为这是一个CUDA错误。CUDA没有正确配置。你是否在使用tensorflow-gpu？

- Dhruv Rajkotia

谢谢！我确信环境已经设置好了，有两个原因：（1）它是通过AWS的DL AMI设置的（2）当通过SSH登录到服务器时，运行“conda activate tensorflow_p36”，然后运行“python gpu_test.py”时，代码可以完美地运行。 - Assif

确实，我正在使用tensorflow-gpu。 - Assif

所以，我确定这是一个错误，它是CUDA错误。请重新配置它。 - Dhruv Rajkotia

只有在我通过 PyCharm SSH 解释器运行脚本时才会出现此问题。如果我 SSH 进入服务器，激活环境，然后运行脚本 - 它就可以完美运行。因此，我确信环境已经正确配置，但是在远程脚本执行时，PyCharm 没有激活它。 - Assif

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Kamil Pokora · Accepted Answer

你可以做以下几件事情：

进入“运行/调试配置”
在“环境”下，你可以看到“环境变量”
你需要设置正确的cuda路径。在我的情况下是：“LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64”

我也很失望，这不是JetBrains团队默认完成的。