请注意:我将此标记为JClouds,因为如果您阅读整个问题和随后的评论,我认为这要么是JClouds的错误,要么是对该库的误用。
我有一个可执行的JAR文件,运行一段时间后工作正常,完成工作时不会抛出任何错误/异常,但当它应该退出时却永远挂起。我使用VisualVM对其进行了分析(关注正在运行的线程),并在应用程序挂起的位置(在
当我的JAR执行此代码时,我会看到以下输出:
显然,主线程仍在运行,因为有些东西阻止它退出。嗯,
这是应用程序挂起的点的线程视图(上面的控制台输出打印时正在发生什么)。嗯,
所以在杀死应用程序之前,我获取了一个线程转储。这是
我有一个可执行的JAR文件,运行一段时间后工作正常,完成工作时不会抛出任何错误/异常,但当它应该退出时却永远挂起。我使用VisualVM对其进行了分析(关注正在运行的线程),并在应用程序挂起的位置(在
main()
方法的末尾)添加了日志语句。以下是我的主要方法的最后部分:Set<Thread> threadSet = Thread.getAllStackTraces().keySet();
for(Thread t : threadSet) {
String daemon = (t.isDaemon()? "Yes" : "No");
System.out.println("The ${t.getName()} thread is currently running; is it a daemon? ${daemon}.");
}
当我的JAR执行此代码时,我会看到以下输出:
The com.google.inject.internal.util.Finalizer thread is currently running; is it a daemon? Yes.
The Signal Dispatcher thread is currently running; is it a daemon? Yes.
The RMI Scheduler(0) thread is currently running; is it a daemon? Yes.
The Attach Listener thread is currently running; is it a daemon? Yes.
The user thread 3 thread is currently running; is it a daemon? No.
The Finalizer thread is currently running; is it a daemon? Yes.
The RMI TCP Accept-0 thread is currently running; is it a daemon? Yes.
The main thread is currently running; is it a daemon? No.
The RMI TCP Connection(1)-10.10.99.8 thread is currently running; is it a daemon? Yes.
The Reference Handler thread is currently running; is it a daemon? Yes.
The JMX server connection timeout 24 thread is currently running; is it a daemon? Yes.
我不认为我需要担心守护进程(如果我错了请纠正),因此将其过滤为非守护进程:
The user thread 3 thread is currently running; is it a daemon? No.
The main thread is currently running; is it a daemon? No.
显然,主线程仍在运行,因为有些东西阻止它退出。嗯,
用户线程3
看起来很有趣。VisualVM告诉我们什么?这是应用程序挂起的点的线程视图(上面的控制台输出打印时正在发生什么)。嗯,
用户线程3
看起来更可疑了!所以在杀死应用程序之前,我获取了一个线程转储。这是
用户线程3
的堆栈跟踪:"user thread 3" prio=6 tid=0x000000000dfd4000 nid=0x2360 waiting on condition [0x00000000114ff000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000782cba410> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Locked ownable synchronizers:
- None
我以前从未分析过这类内容,所以对我来说是无意义的(但对受过训练的人来说可能不是!)。
在结束应用程序后,VisualVM的时间轴停止每秒钟滴答/递增,我可以横向向后滚动时间线到用户线程3
被创建并开始作为一个令人讨厌的线程生活的地方:
然而,我无法弄清楚如何确定在哪个代码中创建了用户线程3
。所以我问:
- 如何确定是谁创建了
用户线程3
,在哪里(特别是因为我怀疑它是由第三方OSS库创建的线程)? - 如何诊断、诊断和修复这个线程挂起问题?
更新:
以下是我的代码,大约在创建用户线程3
时触发:
ExecutorService myExecutor = Executors.newCachedThreadPool();
for(Node node : nodes) {
BootstrapAndKickTask bootAndKickTask = new BootstrapAndKickTask(node, ctx);
myExecutor.execute(bootAndKickTask);
}
myExecutor.shutdown();
if(!myExecutor.awaitTermination(15, TimeUnit.MINUTES)) {
TimeoutException toExc = new TimeoutException("Hung after the 15 minute timeout was reached.");
log.error(toExc);
throw toExc;
}
这里还有我的GitHub Gist,其中包含完整的线程转储。
shutdown
、terminate
、close
方法。 - Thilo