调试挂起的java.lang.UNIXProcess.forkAndExec问题

3

我遇到了一个问题,几乎与this SO question相同,只是操作系统和Java版本不同(他的答案似乎是针对Solaris的特定修复程序,而我正在使用Linux)。在某些情况下,当我尝试从Java中运行进程时,它会完全挂起在java.lang.UNIXProcess.forkAndExec上。

情况

$ uname -a
Linux localhost.localdomain 2.6.33.9-rt31.75.el6rt.x86_64 #1 SMP PREEMPT RT Tue Sep 13 11:24:45 CEST 2011 x86_64 x86_64 x86_64 GNU/Linux
$ java -version
java version "1.7.0_05"
Java(TM) SE Runtime Environment (build 1.7.0_05-b06)
Java HotSpot(TM) 64-Bit Server VM (build 23.1-b03, mixed mode)

(我无法使用Sun 1.6.0_27-b07 JDK重现此问题。)

我启动的子进程只是带有一些参数的ps。在运行之前,我会打印出我要运行的内容,在它挂起后,当我在shell中尝试完全相同的命令时,ps可以正常运行。

当发生这种情况时,它只会偶尔发生(也许500次运行中只有一次)。

如果我在启动后不久启动子进程,则不会挂起。只有在执行其他操作(例如字符串操作、打开/通信/关闭ObjectInputStream/ObjectOutputStream套接字以及从小文件中读取)后启动子进程时,才会偶尔发生。

由于挂起的不频繁,很难准确地缩小问题范围,但是通过在Bash while-true循环中运行程序直到挂起,我可以在大约10分钟内肯定地复现它。

另外,我昨天重新启动了我的机器,因此与闰秒错误无关。

症状

当它挂起时,堆栈跟踪看起来像这样:

Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.1-b03 mixed mode):

"process reaper" daemon prio=10 tid=0x00007f42904dc000 nid=0x14bf waiting on condition [0x00007f427aa2f000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000007c229fbf0> (a java.util.concurrent.SynchronousQueue$TransferStack)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
    at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
    at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
    at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:722)

"Service Thread" daemon prio=10 tid=0x00007f42900f1000 nid=0x14bc runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x00007f42900ee000 nid=0x14bb waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x00007f42900eb000 nid=0x14ba waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00007f42900e8000 nid=0x14b9 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x00007f429009b800 nid=0x14b8 in Object.wait() [0x00007f427b8f6000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000007c00057f0> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
    - locked <0x00000007c00057f0> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:177)

"Reference Handler" daemon prio=10 tid=0x00007f4290099000 nid=0x14b7 in Object.wait() [0x00007f427b9f8000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000007c0005370> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:503)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
    - locked <0x00000007c0005370> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x00007f4290009000 nid=0x14b1 runnable [0x00007f4299077000]
   java.lang.Thread.State: RUNNABLE
    at java.lang.UNIXProcess.forkAndExec(Native Method)
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:135)
    at java.lang.ProcessImpl.start(ProcessImpl.java:130)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1021)
    at scala.sys.process.ProcessBuilderImpl$Simple.run(ProcessBuilderImpl.scala:68)
    at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.run(ProcessBuilderImpl.scala:99)
    at scala.sys.process.ProcessBuilderImpl$AbstractBuilder$$anonfun$runBuffered$1.apply(ProcessBuilderImpl.scala:147)
    at scala.sys.process.ProcessBuilderImpl$AbstractBuilder$$anonfun$runBuffered$1.apply(ProcessBuilderImpl.scala:147)
    at scala.sys.process.ProcessLogger$$anon$1.buffer(ProcessLogger.scala:64)
    at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.runBuffered(ProcessBuilderImpl.scala:147)
    at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.$bang(ProcessBuilderImpl.scala:113)
    at mypackage.Main.main(Main.scala)

"VM Thread" prio=10 tid=0x00007f4290091800 nid=0x14b6 runnable 

"GC task thread#0 (ParallelGC)" prio=10 tid=0x00007f4290017000 nid=0x14b2 runnable 

"GC task thread#1 (ParallelGC)" prio=10 tid=0x00007f4290018800 nid=0x14b3 runnable 

"GC task thread#2 (ParallelGC)" prio=10 tid=0x00007f429001a800 nid=0x14b4 runnable 

"GC task thread#3 (ParallelGC)" prio=10 tid=0x00007f429001c800 nid=0x14b5 runnable 

"VM Periodic Task Thread" prio=10 tid=0x00007f42900f5800 nid=0x14bd waiting on condition 

如果我在进程 ID 挂起后运行 jstack -mjstack -F,我会得到以下输出(使用 -m-F 时完全相同):
$ jstack -m 3199
Attaching to process ID 3199, please wait...
Exception in thread "main" java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at sun.tools.jstack.JStack.runJStackTool(JStack.java:118)
    at sun.tools.jstack.JStack.main(JStack.java:84)
Caused by: java.lang.RuntimeException: Type "nmethodBucket*", referenced in VMStructs::localHotSpotVMStructs in the remote VM, was not present in the remote VMStructs::localHotSpotVMTypes table (should have been caught in the debug build of that VM). Can not continue.
    at sun.jvm.hotspot.HotSpotTypeDataBase.lookupOrFail(HotSpotTypeDataBase.java:362)
    at sun.jvm.hotspot.HotSpotTypeDataBase.readVMStructs(HotSpotTypeDataBase.java:253)
    at sun.jvm.hotspot.HotSpotTypeDataBase.<init>(HotSpotTypeDataBase.java:87)
    at sun.jvm.hotspot.bugspot.BugSpotAgent.setupVM(BugSpotAgent.java:568)
    at sun.jvm.hotspot.bugspot.BugSpotAgent.go(BugSpotAgent.java:494)
    at sun.jvm.hotspot.bugspot.BugSpotAgent.attach(BugSpotAgent.java:332)
    at sun.jvm.hotspot.tools.Tool.start(Tool.java:163)
    at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
    ... 6 more

简介

我认为java.lang.UNIXProcess.forkAndExec不应该挂起。如果我的代码中存在竞争条件,它会在其他地方挂起;如果子进程本身挂起,我的Java进程只会在waitFor而不是在java.lang.UNIXProcess.forkAndExec中挂起。对我来说,这似乎是JVM的一个错误,但我不确定如何将其定位到可重现的测试用例。有什么建议吗?

编辑

当挂起发生时,我会在top中看到两个相同的Java进程。当我按下ctrl-c时,主进程退出,但另一个进程仍然存在,并且直到我使用kill -9命令才能杀死它。两个进程都使用0%的CPU。

编辑

当我运行kill -QUIT时,子进程什么也不做。

当我在子进程上运行pstack时,我得到:

#0  0x0000003c2fa0e054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003c2fa09388 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x0000003c2fa09257 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007fb79c4c8a30 in __oo_rwlock_unlock_write_slow () from /usr/lib64/libonload.so
#4  0x00007fb79c4990be in citp_netif_child_fork_hook () from /usr/lib64/libonload.so
#5  0x0000003c2f2abb76 in fork () from /lib64/libc.so.6
#6  0x00007fb79b0219f6 in startChild () from /usr/java/jdk1.7.0_05/jre/lib/amd64/libjava.so
#7  0x00007fb79b0220eb in Java_java_lang_UNIXProcess_forkAndExec () from /usr/java/jdk1.7.0_05/jre/lib/amd64/libjava.so
#8  0x00007fb791011f90 in ?? ()
#9  0x00007fb700000000 in ?? ()
#10 0x0000000000000000 in ?? ()

当我在父进程上运行 pstack 时,我得到:
Thread 17 (Thread 0x7fb79b544700 (LWP 8437)):
#0  0x0000003c2fa0e54d in read () from /lib64/libpthread.so.0
#1  0x00007fb79c488816 in read () from /usr/lib64/libonload.so
#2  0x00007fb79b022126 in Java_java_lang_UNIXProcess_forkAndExec () from /usr/java/jdk1.7.0_05/jre/lib/amd64/libjava.so
#3  0x00007fb791011f90 in ?? ()
#4  0x00007fb700000000 in ?? ()
#5  0x0000000000000000 in ?? ()
Thread 16 (Thread 0x7fb79ab11700 (LWP 8438)):
#0  0x0000003c2fa0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fb79bc811b3 in os::PlatformEvent::park() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#2  0x00007fb79bc4cfbf in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#3  0x00007fb79bc4d74e in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#4  0x00007fb79bdfbb2b in GangWorker::loop() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#5  0x00007fb79bc870a0 in java_start(Thread*) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#6  0x0000003c2fa07851 in start_thread () from /lib64/libpthread.so.0
#7  0x0000003c2f2e76dd in clone () from /lib64/libc.so.6
Thread 15 (Thread 0x7fb79aa10700 (LWP 8439)):
#0  0x0000003c2fa0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fb79bc811b3 in os::PlatformEvent::park() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#2  0x00007fb79bc4cfbf in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#3  0x00007fb79bc4d74e in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#4  0x00007fb79bdfbb2b in GangWorker::loop() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#5  0x00007fb79bc870a0 in java_start(Thread*) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#6  0x0000003c2fa07851 in start_thread () from /lib64/libpthread.so.0
#7  0x0000003c2f2e76dd in clone () from /lib64/libc.so.6
Thread 14 (Thread 0x7fb79a90f700 (LWP 8440)):
#0  0x0000003c2fa0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fb79bc811b3 in os::PlatformEvent::park() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#2  0x00007fb79bc4cfbf in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#3  0x00007fb79bc4d74e in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#4  0x00007fb79bdfbb2b in GangWorker::loop() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#5  0x00007fb79bc870a0 in java_start(Thread*) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#6  0x0000003c2fa07851 in start_thread () from /lib64/libpthread.so.0
#7  0x0000003c2f2e76dd in clone () from /lib64/libc.so.6
Thread 13 (Thread 0x7fb79a80e700 (LWP 8441)):
#0  0x0000003c2fa0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fb79bc811b3 in os::PlatformEvent::park() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#2  0x00007fb79bc4cfbf in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#3  0x00007fb79bc4d74e in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#4  0x00007fb79bdfbb2b in GangWorker::loop() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#5  0x00007fb79bc870a0 in java_start(Thread*) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#6  0x0000003c2fa07851 in start_thread () from /lib64/libpthread.so.0
#7  0x0000003c2f2e76dd in clone () from /lib64/libc.so.6
Thread 12 (Thread 0x7fb798100700 (LWP 8442)):
#0  0x0000003c2fa0b7bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fb79bc85cd7 in os::PlatformEvent::park(long) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#2  0x00007fb79bc4d26e in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#3  0x00007fb79bc4d74e in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#4  0x00007fb79b93b498 in ConcurrentMarkSweepThread::run() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#5  0x00007fb79bc870a0 in java_start(Thread*) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#6  0x0000003c2fa07851 in start_thread () from /lib64/libpthread.so.0
#7  0x0000003c2f2e76dd in clone () from /lib64/libc.so.6
Thread 11 (Thread 0x7fb78e2c9700 (LWP 8443)):
#0  0x0000003c2fa0b7bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fb79bc85cd7 in os::PlatformEvent::park(long) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#2  0x00007fb79bc4d26e in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#3  0x00007fb79bc4d74e in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#4  0x00007fb79bded430 in VMThread::loop() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#5  0x00007fb79bded970 in VMThread::run() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#6  0x00007fb79bc870a0 in java_start(Thread*) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#7  0x0000003c2fa07851 in start_thread () from /lib64/libpthread.so.0
#8  0x0000003c2f2e76dd in clone () from /lib64/libc.so.6
Thread 10 (Thread 0x7fb78e1c8700 (LWP 8444)):
#0  0x0000003c2fa0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fb79bc811b3 in os::PlatformEvent::park() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#2  0x00007fb79bc76c5c in ObjectMonitor::wait(long, bool, Thread*) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#3  0x00007fb79baebc81 in JVM_MonitorWait () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#4  0x00007fb791011f90 in ?? ()
#5  0x00000007e7802b98 in ?? ()
#6  0x00007fb794123000 in ?? ()
#7  0x00007fb78e1c7160 in ?? ()
#8  0x00007fb78e1c7108 in ?? ()
#9  0x0000000000000000 in ?? ()
Thread 9 (Thread 0x7fb78e0c7700 (LWP 8445)):
#0  0x0000003c2fa0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fb79bc811b3 in os::PlatformEvent::park() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#2  0x00007fb79bc76c5c in ObjectMonitor::wait(long, bool, Thread*) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#3  0x00007fb79baebc81 in JVM_MonitorWait () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#4  0x00007fb791011f90 in ?? ()
#5  0x00007fb78e0c6160 in ?? ()
#6  0x00007fb7910124ea in ?? ()
#7  0x0000000000000000 in ?? ()
Thread 8 (Thread 0x7fb78dfc6700 (LWP 8446)):
#0  0x0000003c2fa0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fb79bc811b3 in os::PlatformEvent::park() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#2  0x00007fb79bc4cfbf in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#3  0x00007fb79bc4d7c6 in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#4  0x00007fb79b910cfe in SurrogateLockerThread::loop() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#5  0x00007fb79bda52e8 in JavaThread::thread_main_inner() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#6  0x00007fb79bda5438 in JavaThread::run() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#7  0x00007fb79bc870a0 in java_start(Thread*) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#8  0x0000003c2fa07851 in start_thread () from /lib64/libpthread.so.0
#9  0x0000003c2f2e76dd in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7fb78dec5700 (LWP 8447)):
#0  0x0000003c2fa0d720 in sem_wait () from /lib64/libpthread.so.0
#1  0x00007fb79bc858ca in check_pending_signals(bool) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#2  0x00007fb79bc7f9f5 in signal_thread_entry(JavaThread*, Thread*) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#3  0x00007fb79bda52e8 in JavaThread::thread_main_inner() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#4  0x00007fb79bda5438 in JavaThread::run() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#5  0x00007fb79bc870a0 in java_start(Thread*) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#6  0x0000003c2fa07851 in start_thread () from /lib64/libpthread.so.0
#7  0x0000003c2f2e76dd in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7fb78ddc4700 (LWP 8448)):
#0  0x0000003c2fa0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fb79bc811b3 in os::PlatformEvent::park() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#2  0x00007fb79bc4cfbf in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#3  0x00007fb79bc4d7c6 in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#4  0x00007fb79b903458 in CompileQueue::get() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#5  0x00007fb79b90616a in CompileBroker::compiler_thread_loop() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#6  0x00007fb79bda52e8 in JavaThread::thread_main_inner() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#7  0x00007fb79bda5438 in JavaThread::run() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#8  0x00007fb79bc870a0 in java_start(Thread*) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#9  0x0000003c2fa07851 in start_thread () from /lib64/libpthread.so.0
#10 0x0000003c2f2e76dd in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7fb78dcc3700 (LWP 8449)):
#0  0x0000003c2fa0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fb79bc811b3 in os::PlatformEvent::park() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#2  0x00007fb79bc4cfbf in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#3  0x00007fb79bc4d7c6 in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#4  0x00007fb79b903458 in CompileQueue::get() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#5  0x00007fb79b90616a in CompileBroker::compiler_thread_loop() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#6  0x00007fb79bda52e8 in JavaThread::thread_main_inner() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#7  0x00007fb79bda5438 in JavaThread::run() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#8  0x00007fb79bc870a0 in java_start(Thread*) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#9  0x0000003c2fa07851 in start_thread () from /lib64/libpthread.so.0
#10 0x0000003c2f2e76dd in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7fb78dbc2700 (LWP 8450)):
#0  0x0000003c2fa0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fb79bc811b3 in os::PlatformEvent::park() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#2  0x00007fb79bc4cfbf in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#3  0x00007fb79bc4d74e in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#4  0x00007fb79bd0c9c8 in ServiceThread::service_thread_entry(JavaThread*, Thread*) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#5  0x00007fb79bda52e8 in JavaThread::thread_main_inner() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#6  0x00007fb79bda5438 in JavaThread::run() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#7  0x00007fb79bc870a0 in java_start(Thread*) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#8  0x0000003c2fa07851 in start_thread () from /lib64/libpthread.so.0
#9  0x0000003c2f2e76dd in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7fb78dac1700 (LWP 8451)):
#0  0x0000003c2fa0b7bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fb79bc85cd7 in os::PlatformEvent::park(long) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#2  0x00007fb79bda2ae7 in WatcherThread::run() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#3  0x00007fb79bc870a0 in java_start(Thread*) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#4  0x0000003c2fa07851 in start_thread () from /lib64/libpthread.so.0
#5  0x0000003c2f2e76dd in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7fb78c11f700 (LWP 8501)):
#0  0x0000003c2fa0e84d in accept () from /lib64/libpthread.so.0
#1  0x00007fb79c489f34 in onload_accept () from /usr/lib64/libonload.so
#2  0x00007fb79b7c5171 in LinuxAttachListener::dequeue() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#3  0x00007fb79b7c530b in AttachListener::dequeue() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#4  0x00007fb79b7c3d5f in attach_listener_thread_entry(JavaThread*, Thread*) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#5  0x00007fb79bda52e8 in JavaThread::thread_main_inner() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#6  0x00007fb79bda5438 in JavaThread::run() () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#7  0x00007fb79bc870a0 in java_start(Thread*) () from /usr/java/jdk1.7.0_05/jre/lib/amd64/server/libjvm.so
#8  0x0000003c2fa07851 in start_thread () from /lib64/libpthread.so.0
#9  0x0000003c2f2e76dd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7fb79c264da0 (LWP 8413)):
#0  0x0000003c2fa080ad in pthread_join () from /lib64/libpthread.so.0
#1  0x00007fb79c2754d5 in ContinueInNewThread0 () from /usr/java/jdk1.7.0_05/bin/../jre/lib/amd64/jli/libjli.so
#2  0x00007fb79c26a4fa in ContinueInNewThread () from /usr/java/jdk1.7.0_05/bin/../jre/lib/amd64/jli/libjli.so
#3  0x00007fb79c26d085 in JLI_Launch () from /usr/java/jdk1.7.0_05/bin/../jre/lib/amd64/jli/libjli.so
#4  0x00000000004006a6 in main ()

你链接的问题涉及到这个bug。症状描述相符。也许他们“忘记”修复Linux了? - A.H.
子进程会响应 kill -QUIT 或其他获取线程转储的尝试吗?我想知道那个进程是否能够提供有关它正在做什么(或不做什么)的任何信息。 - Christopher Schultz
A.H. 说得好,尽管让我觉得这个问题与此无关的一个因素是,在我的情况下它从未在Java 6中崩溃,但偶尔会在Java 7中崩溃,因此似乎这是Java 7中的回归(除非该错误一直存在,并且Java 7碰巧在某个地方具有不同的延迟,以使其更有可能出现)。 - Mike
1
此外,您可以针对子(或父)进程运行pstack吗? - Christopher Schultz
请问您是否尝试过使用IBM的线程转储分析器(Thread Dump Analyzer)?https://www.ibm.com/developerworks/community/groups/service/html/communityview?communityUuid=2245aa39-fa5c-4475-b891-14c205f7333c - Kowser
Christopher:我正在编辑问题,以展示 kill -QUITpstack - Mike
2个回答

1

我在Oracle网站上发现了一些关于Linux和SunOS中forkandexec的错误,但是这些错误已经在您的Java 1.7中得到了修复。

根据堆栈跟踪,我感觉线程正在等待某些条件。请检查您的代码是否使用了以下任何方法:

get()

或者

poll()

由于这些方法会使线程进入TIMED_WAITING状态。

我建议使用JDK / bin中提供的Jvisualvm工具更详细地监视。 相信您能找到根本原因。

您可以使用相同的工具监视子线程和父线程。 如果在启动和分析程序时遇到任何问题,请告诉我。


我不确定我的任何代码如何影响卡住的线程,因为它在RUNNABLE中的本地方法中,而不是TIMED_WAITING中。此时,我的代码也是单线程的,因此没有子线程或父线程,只有最终进入forkAndExec的主线程。 - Mike

1
感谢Christopher Schultz建议在子进程上运行pstack
看起来是libonload引起了问题。当没有使用onload运行时,无法重现该问题,并且升级onload到最新版本时也无法重现(尽管changelog中没有类似的内容)。

你能提供更多细节吗?我也遇到了同样的问题 :( - Geek
@Geek,你使用的onload版本是哪个?你尝试过EF_VFORK_MODE=2吗? - Mike

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接