Flink集群配置问题-没有可用的插槽

Question

Flink集群配置问题-没有可用的插槽

11

我已经部署了配置为以下并行度的Flink集群：

jobmanager.heap.mb: 2048
taskmanager.heap.mb: 2048
taskmanager.numberOfTaskSlots: 5
parallelism.default: 2

但是，如果我尝试运行任何示例或jar文件，即使使用-p标志，我仍会收到以下错误：

org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: 
Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. 
Task to schedule: < Attempt #1 (Source: Custom Source -> Sink: Unnamed (1/1)) @ (unassigned) - [SCHEDULED] > with groupID < 22f48c24254702e4d1674069e455c81a > in sharing group < SlotSharingGroup [22f48c24254702e4d1674069e455c81a] >. Resources available to scheduler: 
Number of instances=0, total number of slots=0, available slots=0
        at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:255)
        at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmediately(Scheduler.java:131)
        at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:303)
        at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:453)
        at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:326)
        at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:742)
        at org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:889)
        at org.apache.flink.runtime.executiongraph.restart.FixedDelayRestartStrategy$1.call(FixedDelayRestartStrategy.java:80)
        at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94)
        at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
        at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
        at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

正如仪表板显示的那样，这应该不会让人感到惊讶：

我尝试了几次重新启动集群，但似乎没有使用配置。

- Tomasz Sosiński

你在主菜单的“任务管理器”部分看到任务管理器了吗？似乎你没有正在运行的任务管理器或者它们的端口被防火墙阻止了。因此，请尝试检查任务管理器的日志（在<flink>/log目录中）并检查防火墙设置。 - Maxim

实际上我在“任务管理器”部分没有看到任何管理器。任务管理器的日志显示以下错误：

Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/flink/runtime/leaderretrieval/LeaderRetrievalListener : Unsupported major.minor version 51.0

- Tomasz Sosiński

你在任务管理器节点上使用的JRE/JDK版本是哪个？看起来似乎小于7（在内部格式中为51）。 - Maxim

事实证明，我的JobManager机器上安装了Java 1.8，而TaskManager机器上安装了1.6。在每台机器上更新到Java 1.8后，Flink集群正常工作（感谢maxd）。然而，仪表板UI没有跟随新的集群及其配置，我能否无论如何重新启动它？ - Tomasz Sosiński

您是否按照文档所述，在 <flink>/conf/masters 和 <flink>/conf/slaves 文件中设置了有效的IP地址参考链接？ - Maxim

好的，一切似乎都在工作，我仍然无法访问用户界面，但是当我从本地主机使用curl时，它返回适当的值。看起来我有一个路由iptables的问题。感谢您的帮助，maxd！ - Tomasz Sosiński

3个回答

0

异常通常意味着没有任务管理器，因此没有可用的插槽来运行作业。任务管理器崩溃的原因可能有很多，例如运行时异常或配置错误。只需检查日志以获取确切原因。您需要重新启动集群，并在仪表板中任务管理器可用时再次运行作业。您可以在配置中定义适当的重启策略，例如固定延迟重启，以便在出现真正故障的情况下重试作业。

- Mudit bhaintwal

0

最终我在我的 FLINK 问题上找到了解决方案。首先，我将解释根本原因，然后再解释解决方案。

根本原因：无法创建Java虚拟机。

请检查Flink日志并跟踪task-executor日志

tail -500f flink-root-taskexecutor-3-osboxes.out 找到以下日志。

Invalid maximum direct memory size: -XX:MaxDirectMemorySize=8388607T
The specified size exceeds the maximum representable size.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

为什么会出现这个问题呢？因为Java版本不正确。操作系统是64位的，但我安装了32位的jdk。

解决方案： 1. 安装正确的JDK-1.8 64位 [安装后，任务执行器中的错误消失了]

编辑flink-conf.yaml文件，更新 taskmanager.numberOfTaskSlots: 10 parallelism.default: 1

我的问题得到了解决，在本地和云端完美运行Flink集群。

- Rajeev Rathor

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- GoForce5500 · Accepted Answer

我遇到了相同的问题。我记得当 Spark 遇到问题时，那是因为我新安装了 JDK11 进行测试，它改变了我的环境变量 JAVA_HOME 成为 /Library/Java/JavaVirtualMachines/jdk-11.jdk/Contents/Home.

所以我将 JAVA_HOME 设置回 JDK8，使用以下命令： export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home

然后一切都运行正常了。这个路径是针对我的 Mac，你可以找到自己的 JAVA_HOME。
希望这会有所帮助。