尝试使用mxnet后端在h2o deepwater中对多类分类进行lenet模型训练时,我遇到了以下错误:
Loading H2O mxnet bindings. Found CUDA_HOME or CUDA_PATH environment variable, trying to connect to GPU devices. Loading CUDA library. Loading mxnet library. Loading H2O mxnet bindings. Done loading H2O mxnet bindings. Constructing model. Done constructing model. Building network. mxnet data input shape: (32,100) [10:40:16] /home/jenkins/slave_dir_from_mr-0xb1/workspace/deepwater-master/thirdparty/mxnet/dmlc-core/include/dmlc/logging.h:235: [10:40:16] src/operator/./convolution-inl.h:349: Check failed: (dshape.ndim()) == (4) Input data should be 4D in batch-num_filter-y-x [10:40:16] src/symbol.cxx:189: Check failed: (MXSymbolInferShape(GetHandle(), keys.size(), keys.data(), arg_ind_ptr.data(), arg_shape_data.data(), &in_shape_size, &in_shape_ndim, &in_shape_data, &out_shape_size, &out_shape_ndim, &out_shape_data, &aux_shape_size, &aux_shape_ndim, &aux_shape_data, &complete)) == (0)
我的设置详情如下: Ubuntu:16.04 内存:12GB 显卡:Nvidia 920mx 驱动版本:384.90 Cuda:8.0.61 Cudnn:6.0 R版本:3.4.3 H2o版本:3.15.0.393 & h2o-R软件包:3.16.0.2 mxnet:0.11.0 训练数据大小:400MB(转换为h2o帧对象时约为822MB)
我所做的事情如下: 1.在运行h2o集群时,为java堆分配了足够的内存(java -Xmx9g -jar h2o.jar) 2.从源代码中构建了支持GPU的mxnet 3.通过nvidia-smi和系统监视器监视了GPU和系统。 在出现错误之前,它们不会占用所有内存以显示“内存不足”问题。 在出现错误之前,仍然有大约2-3GB的空闲内存 4.已尝试使用从源代码构建的tensorflow-gpu。 检查pip列表确保已安装,但在R中创建模型时会出现错误: Error: java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: null 5.我唯一能够使h2o deepwater与所有后端和w / wo GPU配合工作的方法是通过安装教程中提供的docker设置。
Loading H2O mxnet bindings. Found CUDA_HOME or CUDA_PATH environment variable, trying to connect to GPU devices. Loading CUDA library. Loading mxnet library. Loading H2O mxnet bindings. Done loading H2O mxnet bindings. Constructing model. Done constructing model. Building network. mxnet data input shape: (32,100) [10:40:16] /home/jenkins/slave_dir_from_mr-0xb1/workspace/deepwater-master/thirdparty/mxnet/dmlc-core/include/dmlc/logging.h:235: [10:40:16] src/operator/./convolution-inl.h:349: Check failed: (dshape.ndim()) == (4) Input data should be 4D in batch-num_filter-y-x [10:40:16] src/symbol.cxx:189: Check failed: (MXSymbolInferShape(GetHandle(), keys.size(), keys.data(), arg_ind_ptr.data(), arg_shape_data.data(), &in_shape_size, &in_shape_ndim, &in_shape_data, &out_shape_size, &out_shape_ndim, &out_shape_data, &aux_shape_size, &aux_shape_ndim, &aux_shape_data, &complete)) == (0)
我的设置详情如下: Ubuntu:16.04 内存:12GB 显卡:Nvidia 920mx 驱动版本:384.90 Cuda:8.0.61 Cudnn:6.0 R版本:3.4.3 H2o版本:3.15.0.393 & h2o-R软件包:3.16.0.2 mxnet:0.11.0 训练数据大小:400MB(转换为h2o帧对象时约为822MB)
我所做的事情如下: 1.在运行h2o集群时,为java堆分配了足够的内存(java -Xmx9g -jar h2o.jar) 2.从源代码中构建了支持GPU的mxnet 3.通过nvidia-smi和系统监视器监视了GPU和系统。 在出现错误之前,它们不会占用所有内存以显示“内存不足”问题。 在出现错误之前,仍然有大约2-3GB的空闲内存 4.已尝试使用从源代码构建的tensorflow-gpu。 检查pip列表确保已安装,但在R中创建模型时会出现错误: Error: java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: null 5.我唯一能够使h2o deepwater与所有后端和w / wo GPU配合工作的方法是通过安装教程中提供的docker设置。
我希望在我的笔记本电脑上实现与Docker相同的功能。此外,是否有办法只使用CPU运行Deepwater?链接“在没有CUDA的情况下在H2O中构建Deep Water/TensorFlow模型是否可行”没有提供任何有用的答案。任何帮助或建议将不胜感激!