Categories: OpenJDK

Intermittent Hang in ProducerConsumerLoops on Windows AArch64

Earlier this year I dug into intermittent test hangs in the ProducerConsumerLoops test when using the Windows AArch64 JDK 25 build. I decided to test it on JDK 17 and JDK 21 to see how far back the hang went:

export JAVA_HOME17=/c/java/binaries/jdk/aarch64/2025-10/windows-jdk17u/jdk-17.0.17+10
export JAVA_HOME21=/c/java/binaries/jdk/aarch64/2025-10/windows-jdk21u/jdk-21.0.9+10

date; time $JAVA_HOME17/bin/java -Xcomp -XX:-TieredCompilation ProducerConsumerLoops
date; time $JAVA_HOME21/bin/java -Xcomp -XX:-TieredCompilation ProducerConsumerLoops

The test passed on jdk17u but hang on jdk21u. The hang did not happen with -Xint or -Xcomp -XX:TieredStopAtLevel=1 and the jstack command did not report any deadlock. I couldn’t find a Windows AArch64 jdk19u build to test, so I had to build the sources myself. I suspected that I would need a jdk18u build for the jdk19u boot JDK, so I started by building jdk18u. See Building OpenJDK 18 for Windows AArch64 for details on the errors I ran into and how I worked around them – I needed a jdk18u build for the boot JDK when building jdk18u.

configure: Found potential Boot JDK using configure arguments
configure: Potential Boot JDK found at /cygdrive/d/java/binaries/jdk/x64/2025-10/windows-jdk17u/jdk-17.0.17+10 is incorrect JDK version (openjdk version "17.); ignoring-Bit Server VM Microsoft-12574423 (build 17.0.17+10-LTS, mixed mode, sharing)
configure: (Your Boot JDK version must be one of: 18 19)
configure: error: The path given by --with-boot-jdk does not contain a valid Boot JDK
configure exiting with result code 1

Turns out I didn’t need to build jdk18u since there was a jdk19u build available for Windows x64 at Latest Releases | Adoptium (Temurin 19.0.2+7 – 01/20/2023). I successfully built openjdk/jdk at jdk-19-ga and found that the ProducerConsumerLoops test didn’t fail. I thought that openjdk/jdk at jdk-20-ga would be the next natural branch to try building but that GitHub page says that “This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.” Therefore, I built and tested openjdk/jdk at jdk-20+35 instead (8301863: ObjectInputFilter example incorrectly calls rejectUndecidedC… · openjdk/jdk@6f460e4). jdk20u was still on google/googletest at release-1.8.1. The build failed with a warning that was treated as an error:

ERROR: Build failed for target 'images' in configuration 'windows-aarch64-server-release' (exit code 2) 
Stopping javac server

=== Output from failing command(s) repeated here ===
* For target support_native_jdk.jdwp.agent_libjdwp_debugInit.obj:
debugInit.c
d:\java\forks\openjdk\jdk\src\jdk.jdwp.agent\share\native\libjdwp\debugInit.c(248): error C2220: the following warning is treated as an error
d:\java\forks\openjdk\jdk\src\jdk.jdwp.agent\share\native\libjdwp\debugInit.c(248): warning C5287: operands are different enum types '<unnamed-enum-JVMTI_VERSION_1>' and '<unnamed-enum-JVMTI_VERSION_MASK_INTERFACE_TYPE>'; use an explicit cast to silence this warning
d:\java\forks\openjdk\jdk\src\jdk.jdwp.agent\share\native\libjdwp\debugInit.c(248): note: to simplify migration, consider the temporary use of /Wv:18 flag with the version of the compiler with which you used to build without warnings
d:\java\forks\openjdk\jdk\src\jdk.jdwp.agent\share\native\libjdwp\debugInit.c(250): warning C5287: operands are different enum types '<unnamed-enum-JVMTI_VERSION_1>' and '<unnamed-enum-JVMTI_VERSION_MASK_INTERFACE_TYPE>'; use an explicit cast to silence this warning
d:\java\forks\openjdk\jdk\src\jdk.jdwp.agent\share\native\libjdwp\debugInit.c(250): note: to simplify migration, consider the temporary use of /Wv:18 flag with the version of the compiler with which you used to build without warnings
d:\java\forks\openjdk\jdk\src\jdk.jdwp.agent\share\native\libjdwp\debugInit.c(252): warning C5287: operands are different enum types '<unnamed-enum-JVMTI_VERSION_1>' and '<unnamed-enum-JVMTI_VERSION_MASK_INTERFACE_TYPE>'; use an explicit cast to silence this warning
d:\java\forks\openjdk\jdk\src\jdk.jdwp.agent\share\native\libjdwp\debugInit.c(252): note: to simplify migration, consider the temporary use of /Wv:18 flag with the version of the compiler with which you used to build without warnings
   ... (rest of output omitted)

* All command lines available in /cygdrive/d/java/forks/openjdk/jdk/build/windows-aarch64-server-release/make-support/failure-logs.
=== End of repeated output ===

I deleted the build directory and reran configure with the --disable-warnings-as-errors flag to work around this and found that the test passed on the jdk20u (release configuration) build.

Next was openjdk/jdk at jdk-21-ga but it also had the “commit does not belong to any branch on this repository” message. openjdk/jdk at jdk-21+26 was the first tag that didn’t. I bisected between 8306841: Generational ZGC: NMT reports Java heap size larger than max… · openjdk/jdk@bb377b2 and 8301863: ObjectInputFilter example incorrectly calls rejectUndecidedC… · openjdk/jdk@6f460e4. The half-way point (calendar-wise) was 6441827: Documentation mentions nonexistent NullReferenceException · openjdk/jdk@ec9d816.

git co ec9d816abf29efe1eb6af46c394fafa7f75e3d7b
checking for gtest... /cygdrive/d/repos/googletest
configure: error: gtest version is too old, at least version 1.13.0 is required
configure exiting with result code 1

I fixed the gtest tag and the test passed on that Windows AArch64 build. It looked like the test was passing on every build was so I decided to build the end tag of the search: jdk-21+26. The test passed there too! I downloaded the 21.0.9+10 build from Adoptium and it failed so I needed to continue bisecting in the jdk21u repo. I had a fork of openjdk/jdk21u-dev locally so I started by searching (by JBS ID) for the last commit that passed the test on tip:

$ git log --grep='8306841'
commit bb377b26730f3d9da7c76e0d171517e811cef3ce (tag: jdk-22+0, tag: jdk-21+26)
Author: Stefan Karlsson <stefank@openjdk.org>
Date:   Thu Jun 8 14:06:27 2023 +0000

    8306841: Generational ZGC: NMT reports Java heap size larger than max heap size

    Reviewed-by: eosterlund, stuefe

I used these commands to start the bisection with master (HEAD) at 8366694: Test JdbStopInNotificationThreadTest.java timed out after 60… · openjdk/jdk21u-dev@7d35866:

# show log up to a specific commit
git log bb377b2..HEAD

# show just summary line of a commit
git show -s --oneline bb377b2..HEAD

# command to count number of lines (outputs 1727)
git show -s --oneline bb377b2..HEAD | wc -l

# show the commit halfway between the two ends
git show -s --oneline bb377b2..HEAD | head -n 863

# outputs 8ac431347fd 8324723: GHA: Upgrade some actions to avoid deprecated Node 16

git checkout 8ac431347fd

I updated make/autoconf/toolchain_microsoft.m4 with the working Windows SDK version (see Building OpenJDK 18 for Windows AArch64) and built that commit. The test hang on 8324723: GHA: Upgrade some actions to avoid deprecated Node 16 · openjdk/jdk21u-dev@8ac4313. I found the next commit by:

# command to count number of lines (outputs 865)
git show -s --oneline bb377b2..8ac4313 | wc -l

# show the commit halfway between the two ends
git show -s --oneline bb377b2..8ac4313 | head -n 432

# outputs a4e78f30fce Merge remote-tracking branch 'jdk21u/master'

a4e78f30fce was a merge commit with a lot going on around it so I went with commit 9ca8761 instead: 8323086: Shenandoah: Heap could be corrupted by oom during evacuation · openjdk/jdk21u-dev@9ca8761

$ git checkout 9ca8761
error: short object ID 9ca8761 is ambiguous
hint: The candidates are:
hint:   9ca87615550 commit 2024-01-17 - 8323086: Shenandoah: Heap could be corrupted by oom during evacuation
hint:   9ca8761a221 tree
error: pathspec '9ca8761' did not match any file(s) known to git

$ git checkout 9ca87615550eba5493dde94e6204e58ca8cc1119

The test failed on the 9ca8761 build. Next was 8317987: C2 recompilations cause high memory footprint · openjdk/jdk21u-dev@0af96a8:

git co 0af96a899391b586fd990cdf8ef61575e58bd9bd

The test passed on this build but failed on 8320798: Console read line with zero out should zero out underlying b… · openjdk/jdk21u-dev@e8cf56c, 8315920: C2: “control input must dominate current control” assert fai… · openjdk/jdk21u-dev@cd4ce01, and 8316659: assert(LockingMode != LM_LIGHTWEIGHT || flag == CCR0) failed… · openjdk/jdk21u-dev@3552d5a. I was down to about 40 commits, which was small enough for me to inspected the middle region and find what looked like the culprit: 8301341: LinkedTransferQueue does not respect timeout for poll() · openjdk/jdk21u-dev@bbc5ad7! Sure enough, the test did not hang on its parent (8312126: NullPointerException in CertStore.getCRLs after 8297955 · openjdk/jdk21u-dev@69adcc6).

I confirmed that the hang was introduced in tip by 8301341: LinkedTransferQueue does not respect timeout for poll() by DougLea · Pull Request #14317 · openjdk/jdk (for JBS issue [JDK-8301341] LinkedTransferQueue does not respect timeout for poll() – Java Bug System). This test passed reliably on my local jdk26u macosx-aarch64 build so I concluded that there was likely a Windows AArch64 bug. This was most likely the same bug behind intermittent failures we had been seeing in GetStackTraceSuspendedStressTest and GetStackTraceNotSuspendedStressTest as well. All these tests use a SynchronousQueue. Note though that unlike the others, the ProducerConsumerLoops test does not involve virtual threads. It was a relief to finally catch the culprit causing this hang.

Article info



Leave a Reply

Your email address will not be published. Required fields are marked *