进程以退出码-1073741571(0xC00000FD)结束 Tensorflow

3

我知道这个问题经常被问到,但在我的情况下有点奇怪。我刚刚购买了一张RTX 3080,并尝试根据我在reddit上找到的教程安装Tensorflow。我按照教程描述的做了以下几步操作: 安装Anaconda --> Python 3.8 --> TF-nightly v. 2.5.0 --> Visual Studio C++ --> Cuda 11.1.0 --> cuDNN 8.0.4 --> 添加路径 --> 重新启动电脑。起初似乎一切正常。我尝试运行以下命令:

import tensorflow as tf
tf.config.list_physical_devices()

你可以在输出中看到,这个代码没有任何错误:

C:\Users\loose\.conda\envs\tf2\python.exe C:/Users/loose/PycharmProjects/GenerateAutomatedEMail/python/test.py
2021-01-16 00:40:45.043205: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-01-16 00:40:46.676446: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-01-16 00:40:46.699117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1760] Found device 0 with properties: 
pciBusID: 0000:2d:00.0 name: GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.785GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s
2021-01-16 00:40:46.699285: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-01-16 00:40:46.713523: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-16 00:40:46.713626: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-16 00:40:46.717017: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-01-16 00:40:46.718013: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-01-16 00:40:46.725508: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-01-16 00:40:46.728010: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-01-16 00:40:46.728534: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-16 00:40:46.728660: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1898] Adding visible gpu devices: 0

Process finished with exit code 0

我目前正在尝试从 TensorFlow教程 中训练Seq2Seq模型。代码几乎完全相同,但我使用的是PyCharm而不是Jupyter,并且将所有内容都放在了一个类中,但代码本身是相同的。我的完整代码可在GitHub上找到。当我想要训练模型时,会出现错误"Process finished with exit code -1073741571 (0xC00000FD)"。但实际上没有显示真正的错误,程序只是以这个退出码结束:

C:\Users\loose\.conda\envs\tf2\python.exe C:/Users/loose/PycharmProjects/GenerateAutomatedEMail/python/train_model.py
2021-01-16 00:50:34.337791: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-01-16 00:50:36.873698: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-01-16 00:50:36.894834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1760] Found device 0 with properties: 
pciBusID: 0000:2d:00.0 name: GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.785GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s
2021-01-16 00:50:36.895004: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-01-16 00:50:36.909453: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-16 00:50:36.909542: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-16 00:50:36.912954: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-01-16 00:50:36.914024: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-01-16 00:50:36.921476: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-01-16 00:50:36.924059: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-01-16 00:50:36.924660: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-16 00:50:36.924807: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1898] Adding visible gpu devices: 0
2021-01-16 00:50:36.925280: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-16 00:50:36.926213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1760] Found device 0 with properties: 
pciBusID: 0000:2d:00.0 name: GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.785GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s
2021-01-16 00:50:36.926418: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1898] Adding visible gpu devices: 0
2021-01-16 00:50:37.388811: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1300] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-16 00:50:37.388901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1306]      0 
2021-01-16 00:50:37.388947: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1319] 0:   N 
2021-01-16 00:50:37.389134: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1446] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7447 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3080, pci bus id: 0000:2d:00.0, compute capability: 8.6)
2021-01-16 00:50:38.006971: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-16 00:50:38.586194: I tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Loaded cuDNN version 8004
2021-01-16 00:50:38.709516: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-16 00:50:39.312210: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-16 00:50:39.313013: I tensorflow/stream_executor/cuda/cuda_bl

as.cc:1838] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.

Process finished with exit code -1073741571 (0xC00000FD)

因此,我尝试找出程序崩溃的行。我发现只要我初始化“BahdanauAttention”类(如图片所示),程序就会崩溃。

经过几个小时的测试,我可以确认以下几点:

  • 我可以在该虚拟环境中正常运行普通(non tensorflow)代码而不出现这个错误
  • 我的内存没有用完(最多只使用了32GB内存的17GB)
  • 我没有运行可能导致冲突的任何程序(如NVIDIA Broadcast或Jupyter Lab等)

我尝试修复问题的方法:

  • 重新安装Conda
  • 创建新的虚拟环境
  • 重新安装TF以及所有NVIVIDA驱动程序
  • 尝试不同的Python版本(3.8改为3.7)
  • 重新启动我的电脑

我现在已经无计可施。有人有办法解决这个问题吗?


你能否尝试使用不同的Tensorflow稳定版本,目前最新版本为2.4。 - user11530462
很抱歉,由于RTX 30系列与任何稳定版本不兼容,所以这是不可能的。 - Daniel
1个回答

1
你可以将 Tensorflow 升级到最新的稳定版本,因为从 Tensorflow 2.4 版本开始支持新的 NvidiaAmpere 架构,该架构属于 RTX 30 系列,并且支持 CUDA 11。你可以查看此图表以获取详细信息,并按照指南进行安装。
https://www.tensorflow.org/install/source_windows#tested_build_configurations 关于 GPU 上的内存使用,你可以在代码开头设置内存增长,如此处所述 here

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接