我正在使用Apache Spark本地模式运行pyspark 2.2.0作业,并看到以下警告:
WARN RowBasedKeyValueBatch: Calling spill() on RowBasedKeyValueBatch. Will not spill but return 0.
这个警告的原因是什么?我应该关注它吗,还是可以安全地忽略它?
我正在使用Apache Spark本地模式运行pyspark 2.2.0作业,并看到以下警告:
WARN RowBasedKeyValueBatch: Calling spill() on RowBasedKeyValueBatch. Will not spill but return 0.
这个警告的原因是什么?我应该关注它吗,还是可以安全地忽略它?
我想这个消息比简单的警告更糟糕:它快要成为一个错误了。
看一下源代码:
/**
* Sometimes the TaskMemoryManager may call spill() on its associated MemoryConsumers to make
* space for new consumers. For RowBasedKeyValueBatch, we do not actually spill and return 0.
* We should not throw OutOfMemory exception here because other associated consumers might spill
*/
public final long spill(long size, MemoryConsumer trigger) throws IOException {
logger.warn("Calling spill() on RowBasedKeyValueBatch. Will not spill but return 0.");
return 0;
}
补充上述内容,当我运行 jupyter/scipy-notebook
Docker 镜像(之后独立导入 PySpark)时,我收到了此警告。当切换到 jupyter/pyspark-notebook
镜像时,问题得到解决。