这是运行检查Tensorflow是否工作的脚本后收到的消息:
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
我注意到它提到了SSE4.2和AVX,
- SSE4.2和AVX是什么?
- SSE4.2和AVX如何改善Tensorflow任务的CPU计算?
- 如何使Tensorflow使用这两个库进行编译?
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --config=cuda -k //tensorflow/tools/pip_package:build_pip_package
在 Xeon E5 v3 上,与官方版本相比,8k矩阵乘法的CPU速度提高了3倍(0.35 -> 1.05 T ops/sec)。 - Yaroslav Bulatov