Spark 2.3执行器内存泄漏问题

12

我收到了内存泄漏警告,理论上这是一个Spark bug,从1.6版本开始并已得到解决。

模式:独立运行 IDE:PyCharm Spark 版本:2.3 Python 版本:3.6

以下是堆栈跟踪 -

2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3148
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3152
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3151
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3150
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3149
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3153
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3154
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3158
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3155
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3157
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3160
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3161
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3156
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3159
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3165
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3163
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3162
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3166

为什么会发生这种情况?尽管我的工作成功完成。

编辑:许多人说这是一个2年前问题的重复,但那里的答案说这是一个Spark bug,但在Spark的Jira中检查后,它说已经解决。

问题在于,经过这么多版本之后,为什么我在Spark 2.3中仍然遇到同样的问题?如果对我的问题有某个有效或合理的答案,我一定会删除这个问题。


你可能忘记关闭一些资源,例如数据库连接或打开文件等。 - Ramesh Maharjan
这里不是那种情况,Ramesh。 - Aakash Basu
1
我看到了类似的情况。我甚至看到了完全相同的字节值(262144字节),尽管我正在使用Scala。你有没有在调试这个问题上有什么好运? - turtlemonvh
1
@Aakash Basu,你解决了这个问题吗? - Sidhom
1个回答

4
根据SPARK-14168,警告是由于未消耗完整个迭代器引起的。在Spark shell中从RDD中取n个元素时,我也遇到了同样的错误。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接