Categories: MATLAB/Octave

Building Octave on Windows

I recently created a simple script for Matrix Multiplication Tests in Octave (or Matlab). I was curious about how to build it. Get Involved (octave.org) links to the Mercurial SCM (mercurial-scm.org) website. It has been more than a decade since I last used Mercurial. The TortoiseHg 6.2.3 MSI installer brings KDiff3, which I haven’t seen in ages either.

cd \dev\repos
hg clone https://hg.octave.org/octave

Developer FAQ – Octave has a discourse forum (just like LLVM) and I’m realizing I need to jump into these forums and at least hear what’s happening. Instructions for building on Windows have a separate page ๐Ÿ˜€ Building on Microsoft Windows – Octave. MSYS2 is used for building natively, and since I used it to build Elmer, I might as well see how well it works for Octave.

pacman -S base-devel mingw-w64-x86_64-autotools mingw-w64-x86_64-cc mingw-w64-x86_64-gcc-fortran mingw-w64-x86_64-lapack mingw-w64-x86_64-openblas mingw-w64-x86_64-pcre mingw-w64-x86_64-arpack mingw-w64-x86_64-curl mingw-w64-x86_64-fftw mingw-w64-x86_64-fltk mingw-w64-x86_64-gl2ps mingw-w64-x86_64-glpk mingw-w64-x86_64-ghostscript mingw-w64-x86_64-gnuplot mingw-w64-x86_64-graphicsmagick mingw-w64-x86_64-hdf5 mingw-w64-x86_64-libsndfile mingw-w64-x86_64-portaudio mingw-w64-x86_64-qhull mingw-w64-x86_64-qrupdate mingw-w64-x86_64-qscintilla mingw-w64-x86_64-qt5 mingw-w64-x86_64-rapidjson mingw-w64-x86_64-suitesparse mingw-w64-x86_64-sundials git mercurial mingw-w64-x86_64-ccache mingw-w64-x86_64-icoutils mingw-w64-x86_64-librsvg texinfo unzip zip

The vastness of qt5 is the first thing that confronts me when I run this command. The install size of all the packages is about 3.5 GB. Installation takes about 7.5 minutes.

37 Members in Group mingw-w64-x86_64-qt5
cd /c/dev/repos/octave
./bootstrap
mkdir -p .build
cd .build
../configure --disable-docs ac_cv_search_tputs=-ltermcap

Digging into Configure Failures

Configure fails on my machine with this error the first time I run it:

configure: loading site script /etc/config.site
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a race-free mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether UID '197630' is supported by ustar format... yes
checking whether GID '197630' is supported by ustar format... yes
checking how to create a ustar tar archive... gnutar
checking whether make supports nested variables... (cached) yes
checking build system type... x86_64-w64-mingw32
checking host system type... x86_64-w64-mingw32
checking whether make supports the include directive... yes (GNU style)
checking for gcc... no
checking for cc... no
checking for cl.exe... no
checking for clang... no
configure: error: in `/c/dev/repos/octave/.build':
configure: error: no acceptable C compiler found in $PATH
See `config.log' for more details

I try updating the .bash_profile as suggested but this doesn’t help:

echo "export PERL5SHELL=\"bash -l -c\"" >> ~/.bash_profile

The Stack Overflow post linux – configure: error: no acceptable C compiler found in $PATH makes me realize that gcc is not installed. It also reminds me of how I installed dependencies when Investigating how to Build Elmer on Windows. I added instructions to the readme at ElmerCSC/elmerfem: Official git repository of Elmer FEM software (github.com) (see this PR). This command from that readme should installs the compiler tools I think will be necessary:

pacman -S --noconfirm --needed base-devel mingw-w64-x86_64-toolchain mingw64/mingw-w64-x86_64-cmake mingw64/mingw-w64-x86_64-openblas mingw64/mingw-w64-x86_64-qt5 mingw64/mingw-w64-x86_64-qwt-qt5 mingw64/mingw-w64-x86_64-nsis git

Unfortunately, gcc is still not found. I verified that gcc is indeed on disk C:\dev\software\msys64\mingw64\bin\gcc.exe using the path structure at Package: mingw-w64-x86_64-gcc – MSYS2 Packages. The top answer did suggest modifying the PATH but I’m perplexed at how the compiler was found in the ElmerFEM build environment since it also cannot find the gcc command.

export PATH=${PATH}:c/dev/software/msys64/mingw64/bin

So, the culprit turns out to be the fact that I was using the UCRT shell instead of the MINGW64 shell. I think this bit me with Elmer as well. Should have carefully reviewed that post (see the Custom Generator in MSYS section).

Building the Code

Using the correct MSYS terminal allows configure to work. It takes 4m:45s on my machine. Here is the summary after all the flags are displayed. This piques my curiosity about where Java methods are called by Octave but I’ll ignore it for now!

  Default pager:                 less
  gnuplot:                       gnuplot

  Build Octave Qt GUI:                  yes (version: 5)
  Build Java interface:                 no
  Build static libraries:               no
  Build shared libraries:               yes
  Dynamic Linking API:                  LoadLibrary
  Include support for GNU readline:     yes
  Use push parser in command line REPL: yes
  64-bit array dims and indexing:       yes
  64-bit BLAS array dims and indexing:  no
  OpenMP SMP multithreading:            yes
  Truncate intermediate FP results:     yes
  Build cross tools:                    no
  Build docs:                           no

configure: WARNING: JAVA_HOME environment variable not initialized.  Auto-detection will proceed but is unreliable.
configure: WARNING: No Java executable found.  Octave will not be able to call Java methods.
configure: WARNING: building documentation disabled; make dist will fail.
configure:
configure: NOTE: Libraries or auxiliary programs may be skipped if they are not found
configure: NOTE: OR if they are missing required features on your system.

Build Octave by running make with these options. The last few lines of output from a successful build are shown below as well. The build took 21min on my new desktop.

make all -j8
...
  GEN      libinterp/dldfcn/gzip.oct
  GEN      doc/interpreter/doc-cache

Octave successfully built.  Now choose from the following:

   ./run-octave    - to run in place to test before installing
   make check      - to run the tests
   make install    - to install (PREFIX=/mingw64)

   HG ID for this build is "5744dac88986"

make[2]: Leaving directory '/c/dev/repos/octave/.build'
make[1]: Leaving directory '/c/dev/repos/octave/.build'

./run-octave launches the Octave command line. Since we built with Qt, we can launch the Octave GUI using this command:

./src/octave-gui --gui

Unfortunately, there is a segmentation fault when I close Octave! TODO: Why?


Categories: MATLAB/Octave

Matrix Multiplication Tests in Octave (or Matlab)

I was recently implementing matrix multiplication on the GPU (using CUDA). For my application, I was generating random numbers and generating statistics about the performance of matrix multiplication variants (e.g. using shared memory vs naive multiplication). Some of the results tended to differ from the CPU’s results. Therefore, I decided to use deterministic matrices for the inputs to ensure my algorithm is correct. What I needed was a neutral (3rd party) matrix multiplication algorithm. This seems like a job for MATLAB. Unfortunately, my license expired a few years ago. My robotics professor at the University of Washington was a fan of Octave because it is open source and free. Here is the script I created to generate matrices with the positive integers.

A = 1:10000;
B = 10001:20000;

A = reshape(A, [100,100]);
B = reshape(B, [100,100]);

A = transpose(A);
B = transpose(B);

C = A * B;

# format short;

save 'mmult100x100.txt' C;

Backstory

It has been a while since I used MATLAB. Here are the searches I used to create the script.

  1. array 1 2 3 4 5 matlab – Google Search
  2. display full precision matlab – Google Search
  3. write matlab array to file – Google Search
  4. write matlab matrix to file – Google Search
  5. My matrix multiplication algorithms use floats so comparisons fail. use float32 in matlab – Search (bing.com)
  6. check matlab matrix datatype – Google Search >
  7. create matrix of float32 matlab – Google Search >

Categories: CUDA, Performance

Profiling nVidia CUDA Kernels

This process of using Nsight Compute to profile CUDA kernels is documented in detail at Nsight Compute :: Nsight Compute Documentation (nvidia.com). Here are the screenshots with the “quick start” steps without all the verbosity of the documentation.


Categories: Java, OpenJDK

Java’s Foreign Function API vs the Windows AArch64 ABI

I just opened PR 8295290: Add Windows ARM64 ABI support to the Foreign Function & Memory API ยท Pull Request #754 ยท openjdk/panama-foreign (github.com) (almost) completing some work that Bernhard had started to properly support the Windows ARM64 ABI in the JDK’s Foreign Function & Memory API. This post documents how I learned about the feature and its implementation. I picked up from where Bernhard left off… here is how my investigation proceeded.

I need to understand what happens if we build the jdk master branch (at commit 18cd16d2 when I started) without any ABI-specific changes. To do so, we need JDK 18 or later as a boot JDK to build the latest code, e.g. Oracle’s JDK 18 Windows x64 Installer. Here are the commands I used in Cygwin:

git clone https://github.com/swesonga/jdk
cd jdk

bash configure --openjdk-target=aarch64-unknown-cygwin --with-debug-level=slowdebug --with-boot-jdk=/cygdrive/d/dev/repos/java/infra/binaries/jdk-18.0.2

make images LOG=debug > build/abi-20220802-1500.txt
make build-test-jdk-jtreg-native LOG=debug > build/test-20220802-1500.txt

Once the build complete, create the artifacts for an AArch64 Windows device. These build and archive steps are available as the build-aarch64.sh script.

cd build/windows-aarch64-server-slowdebug/jdk
zip -qru jdk-20220802-1500-master.zip .
mv jdk-20220802-1500-master.zip ..

cd ..
zip -qru test-jdk-20220802-1500-master.zip support/test

Copy the two zip files to the 64-bit ARM device (e.g. by sharing folders or using OneDrive). I used a Surface Pro X device running Windows 11 build 22000.795. I unzipped the 2 files into these paths:

C:\dev\java\abi\master\jdk\
C:\dev\java\abi\master\support\test\..

I later discovered that unzip is available in the Git Bash terminal! These commands can be used to unzip the files:

mkdir -p /c/dev/java/abi/devbranch/jdk
cd /c/dev/java/abi/devbranch/jdk
unzip -q /c/dev/java/builds/debug/jdk-20220802-1500-devbranch.zip
cd ..
unzip -q test-jdk-20220802-1500-master.zip

I also downloaded jtreg and placed it in this path (note that it might be easier to extract the .tar.gz on the Windows x64 build machine then share it).

C:\dev\java\jtreg\

Finish setting up the Windows AArch64 device to run the ABI jtreg tests by cloning the OpenJDK repo onto it. The jtreg tests will be run from the root of the OpenJDK repo.

cd \dev\java\repos\forks
git clone https://github.com/swesonga/jdk
cd jdk

We’ll run VaListTest.java to see how it fails on Windows AArch64.

C:\dev\java\abi\master\jdk\bin\java.exe -jar C:\dev\java\jtreg\lib\jtreg.jar -agentvm -timeoutFactor:4 -concurrency:4 -verbose:fail,error,summary -nativepath:C:\dev\java\abi\master\support\test\jdk\jtreg\native\lib test/jdk/java/foreign/valist/VaListTest.java

Test fails:

--------------------------------------------------
TEST: java/foreign/valist/VaListTest.java
TEST JDK: C:\dev\java\abi\master\jdk

ACTION: build -- Passed. All files up to date
REASON: Named class compiled on demand
TIME:   0.069 seconds
messages:
command: build VaListTest
reason: Named class compiled on demand
elapsed time (seconds): 0.069

ACTION: testng -- Failed. Execution failed: `main' threw exception: org.testng.TestNGException: An error occurred while instantiating class VaListTest: null
REASON: User specified action: run testng/othervm --enable-native-access=ALL-UNNAMED VaListTest
TIME:   12.557 seconds
messages:
command: testng --enable-native-access=ALL-UNNAMED VaListTest
reason: User specified action: run testng/othervm --enable-native-access=ALL-UNNAMED VaListTest
Mode: othervm [/othervm specified]
Additional options from @modules: --add-modules java.base --add-exports java.base/jdk.internal.foreign=ALL-UNNAMED --add-exports java.base/jdk.internal.foreign.abi=ALL-UNNAMED --add-exports java.base/jdk.internal.foreign.abi.x64=ALL-UNNAMED --add-exports java.base/jdk.internal.foreign.abi.x64.sysv=ALL-UNNAMED --add-exports java.base/jdk.internal.foreign.abi.x64.windows=ALL-UNNAMED --add-exports java.base/jdk.internal.foreign.abi.aarch64=ALL-UNNAMED --add-exports java.base/jdk.internal.foreign.abi.aarch64.linux=ALL-UNNAMED --add-exports java.base/jdk.internal.foreign.abi.aarch64.macos=ALL-UNNAMED --add-exports java.base/jdk.internal.foreign.abi.aarch64.windows=ALL-UNNAMED
elapsed time (seconds): 12.557
configuration:
Boot Layer
  add modules: java.base
  add exports: java.base/jdk.internal.foreign                     ALL-UNNAMED
               java.base/jdk.internal.foreign.abi                 ALL-UNNAMED
               java.base/jdk.internal.foreign.abi.aarch64         ALL-UNNAMED
               java.base/jdk.internal.foreign.abi.aarch64.linux   ALL-UNNAMED
               java.base/jdk.internal.foreign.abi.aarch64.macos   ALL-UNNAMED
               java.base/jdk.internal.foreign.abi.aarch64.windows ALL-UNNAMED
               java.base/jdk.internal.foreign.abi.x64             ALL-UNNAMED
               java.base/jdk.internal.foreign.abi.x64.sysv        ALL-UNNAMED
               java.base/jdk.internal.foreign.abi.x64.windows     ALL-UNNAMED

STDOUT:
STDERR:
WARNING: package jdk.internal.foreign.abi.aarch64.windows not in java.base
org.testng.TestNGException:
An error occurred while instantiating class VaListTest: null
        at org.testng.internal.InstanceCreator.createInstanceUsingObjectFactory(InstanceCreator.java:123)
        at org.testng.internal.InstanceCreator.createInstance(InstanceCreator.java:79)
...

I expected Bernhard’s code to be the one introducing Windows AArch64 ABI clean-up code. So why are there failures about the aarch64.windows foreign abi package missing? This requirement is from VaListTest.java and was introduced by the Foreign Function & Memory API (Preview) PR (it added the java.base/jdk.internal.foreign.abi.aarch64.windows module to the failing test).

Porting the Changes

I worked on porting Bernhard’s code on a Windows x64 machine.

# Switch the the OpenJDK repo directory
cd jdk

# This was the tip of the upstream master branch
# git checkout 18cd16d2eae2ee624827eb86621f3a4ffd98fe8c

git switch -c WinAArch64ABI
git remote add lewurm https://github.com/lewurm/openjdk
git fetch lewurm
git switch foreign-windows-aarch64
git rebase WinAArch64ABI

The files he modified have been deleted in the current repo:

Files with Conflicts

Find when a file was deleted in Git – Stack Overflow has the command to view when these files were deleted. Turns out to be the same Foreign Function & Memory API (Preview) PR that added the aarch64.windows foreign abi package to VaListTest.java.

$ git log --full-history -2 -- src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/CLinker.java
commit 2c5d136260fa717afa374db8b923b7c886d069b7

Author: Maurizio Cimadamore <mcimadamore@openjdk.org>
Date:   Thu May 12 16:17:45 2022 +0000

    8282191: Implementation of Foreign Function & Memory API (Preview)

    Reviewed-by: erikj, jvernee, psandoz, dholmes, mchung

The deleted files moved to src/java.base/share/classes/jdk/internal/foreign. Bernhard’s changes are small enough that I manually port them (copy/paste) into the files in the new locations in the tree. It’s interesting seeing the newer Java language features in use, e.g. the permits keyword. Now build the changes using the build-aarch64.sh script:

bash configure --openjdk-target=aarch64-unknown-cygwin --with-debug-level=slowdebug --with-boot-jdk=/cygdrive/d/dev/repos/java/infra/binaries/jdk-18.0.2

/cygdrive/d/dev/repos/scratchpad/scripts/java/cygwin/build-aarch64.sh

The newly added files are packed as .class files.

$ find build/windows-aarch64-server-slowdebug/jdk/ -name "WindowsAArch64CallArranger*"
...
build/windows-aarch64-server-slowdebug/jdk/modules/java.base/jdk/internal/foreign/abi/aarch64/windows/WindowsAArch64CallArranger.class

# Verify last modification time

$ ls -l build/windows-aarch64-server-slowdebug/jdk/./modules/java.base/jdk/internal/foreign/abi/aarch64/windows/WindowsAArch64CallArranger.class

Need to create a WindowsAArch64CallArranger to match the current structure of the foreign ABI. With these changes, VaListTest.java now passes. However, StdLibTest.java and TestVarArgs.java fail.

TEST: java/foreign/StdLibTest.java
TEST JDK: C:\dev\java\abi\devbranch\jdk

ACTION: build -- Passed. All files up to date
REASON: Named class compiled on demand
TIME:   0.039 seconds
messages:
command: build StdLibTest
reason: Named class compiled on demand
elapsed time (seconds): 0.039

ACTION: testng -- Failed. Unexpected exit from test [exit code: -1073741819]
REASON: User specified action: run testng/othervm --enable-native-access=ALL-UNNAMED StdLibTest
TIME:   15.02 seconds
messages:
command: testng --enable-native-access=ALL-UNNAMED StdLibTest
reason: User specified action: run testng/othervm --enable-native-access=ALL-UNNAMED StdLibTest
Mode: othervm [/othervm specified]
elapsed time (seconds): 15.02
configuration:
STDOUT:
test StdLibTest.test_printf([STRING]): failure
java.lang.AssertionError: expected [11] but found [14]
        at org.testng.Assert.fail(Assert.java:99)
        ...
        at org.testng.Assert.assertEquals(Assert.java:917)
        at StdLibTest.test_printf(StdLibTest.java:135)
        ...
        at org.testng.TestNG.run(TestNG.java:1037)
        ...
        at java.base/java.lang.Thread.run(Thread.java:1589)
test StdLibTest.test_printf(java.util.ArrayList@5499b7af): success
test StdLibTest.test_printf([DOUBLE, DOUBLE, CHAR]): success
TEST: java/foreign/TestVarArgs.java
TEST JDK: C:\dev\java\abi\devbranch\jdk

ACTION: build -- Passed. All files up to date
REASON: Named class compiled on demand
TIME:   0.031 seconds
messages:
command: build TestVarArgs
reason: Named class compiled on demand
elapsed time (seconds): 0.031

ACTION: testng -- Failed. Unexpected exit from test [exit code: 1]
REASON: User specified action: run testng/othervm --enable-native-access=ALL-UNNAMED -Dgenerator.sample.factor=17 TestVarArgs
TIME:   17.52 seconds
messages:
command: testng --enable-native-access=ALL-UNNAMED -Dgenerator.sample.factor=17 TestVarArgs
reason: User specified action: run testng/othervm --enable-native-access=ALL-UNNAMED -Dgenerator.sample.factor=17 TestVarArgs
Mode: othervm [/othervm specified]
elapsed time (seconds): 17.52
configuration:
STDOUT:
test TestVarArgs.testVarArgs(0, "f0_V__", VOID, [], []): success
STDERR:
java.lang.RuntimeException: java.lang.IllegalStateException: java.lang.AssertionError: expected [24.0] but found [8.135772792034E-312]
        at TestVarArgs.check(TestVarArgs.java:134)
        ...
        at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:758)
        at TestVarArgs.testVarArgs(TestVarArgs.java:104)
        ...
        at org.testng.TestNG.runSuites(TestNG.java:1069)
        at org.testng.TestNG.run(TestNG.java:1037)
        ...

The data for these tests is supplied by a testng dataProvider that returns an array of arrays of objects. As per the dataProvider docs, the first dimension’s size is the number of times the test method will be invoked and the second dimension size contains an array of objects that must be compatible with the parameter types of the test method.

Java Concepts in the Tests

  1. As per the article Enum Types, enums implicitly extend java.lang.Enum and cannot extend anything else because Java does not support multiple inheritance. The Enum class docs also point out that all the constants of an enum class can be obtained by calling the implicit public static T[] values() method of that class and that more information about enums, including descriptions of the implicitly declared methods synthesized by the compiler, can be found in section 8.9 of The Java Language Specification. Section 8.9 explains that an enum constant may be followed by arguments, which are passed to the constructor of the enum when the constant is created during class initialization as described later in this section. The constructor to be invoked is chosen using the normal rules of overload resolution (ยง15.12.2). If the arguments are omitted, an empty argument list is assumed. This is helpful for understanding all the code I’m seeing in the PrintfArg enum!
  2. The printfArgs dataProvider permutes the values of the PrintfArg enum. The implementation uses streams, which are new to me since I last wrote Java before JDK 8 was released. The overview of streams on Oracle’s technical resources website is helpful in coming up to speed with streams. TODO: the implementation of the permutation is mysterious to me, need to study it closely. It uses List.of(), Set.of(), and Collections.shuffle().
  3. Try blocks without catch or finally blocks is a try-with-resources statement. This helps prevent leaks of native resources.
  4. StdLibTest.java uses functionality from JEP 424: Foreign Function & Memory API (Preview). This JEP provides a good overview of why we need a supported API for accessing off-heap data (i.e. foreign memory) designed from the ground up to be safe and with JIT optimizations in mind.

JEP 424 Concepts via vprintf

The StdLibTest passes when run with the test_printf test commented out. This implies that test_vprintf works as expected, making it a good candidate for reviewing JEP 424: Foreign Function & Memory API (Preview). This test

  1. Creates a confined closeable MemorySession on line 311. Confined memory sessions, support strong thread-confinement guarantees as per the MemorySession docs.
  2. Creates a memory segment on line 312 using the allocateUtf8String method of the MemorySession‘s SegmentAllocator base interface. This method “converts a Java string into a UTF-8 encoded, null-terminated C string, storing the result into a memory segment.”
  3. Create a variable argument list using the VaList.make() method. This invokes SharedUtils.newVaList, which we modified to support Windows on AArch64.
  4. Invoke the native vprintf function via its method handle: final static MethodHandle vprintf = abi.downcallHandle(abi.defaultLookup().lookup("vprintf").get(), FunctionDescriptor.of(C_INT, C_POINTER, C_POINTER));.

The value of the abi variable is computed by the SharedUtils.getSystemLinker method, hence the need for creating a WindowsAArch64Linker here. As explained at JEP 424: Foreign Function & Memory API (Preview), abi.defaultLookup() “creates a default lookup, which locates all the symbols in libraries that are commonly used on the OS and processor combination associated with the Linker instance.” defaultLookup() returns a SymbolLookup on which the lookup(“vprintf”) method is invoked. Note that Optional<T>.get() will throw a NoSuchElementException if no value is present. Otherwise, it will return the zero-length MemorySegment whose base address indicates the address of the vprintf function.

As per JEP 424, the Linker interface enables both downcalls (calls from Java code to native code) and upcalls (calls from native code back to Java code). The MemorySegment associated with the address of the vprintf function and a FunctionDescriptor (created by the static FunctionDescriptor.of method) are passed to Linker.downcallHandle to create a MethodHandle which can be used to call vprintf. The arguments to FunctionDescriptor.of are the MemoryLayouts representing the return type (int), the format string, and the format arguments. MethodHandle.invoke() is the how the native vprintf gets, well, invoked, with the format string and the variable argument list. Here’s the Java vprint method.

int vprintf(String format, List<PrintfArg> args) throws Throwable {
    try (MemorySession session = MemorySession.openConfined()) {
        MemorySegment formatStr = session.allocateUtf8String(format);
        VaList vaList = VaList.make(b -> args.forEach(
            a -> a.accept(b, session)), session);
        return (int)vprintf.invoke(formatStr, vaList);
    }
}

Reviewing test_printf

Inlining the code invoked by test_printf here for easy reference. See the docs for the printf function and the printf format specification for additional information about printf. Line 20 of specializedPrintf creates a MethodType for a method returning an int and taking a single pointer (MemoryAddress). appendParameterTypes is used to add all the other printf parameter types to the MethodType. The MemoryLayouts of the arguments are also accumulated into a list. It doesn’t look like we do anything with the method type (mt) though! Looks like dead code from this PR.

final static FunctionDescriptor printfBase = FunctionDescriptor.of(C_INT, C_POINTER);

...

int printf(String format, List<PrintfArg> args) throws Throwable {
    try (MemorySession session = MemorySession.openConfined()) {
        MemorySegment formatStr = session.allocateUtf8String(format);
        return (int)specializedPrintf(args).invoke(formatStr,
                args.stream().map(a -> a.nativeValue(session)).toArray());
    }
}

private MethodHandle specializedPrintf(List<PrintfArg> args) {
    //method type
    MethodType mt = MethodType.methodType(int.class, MemoryAddress.class);
    FunctionDescriptor fd = printfBase;
    List<MemoryLayout> variadicLayouts = new ArrayList<>(args.size());
    for (PrintfArg arg : args) {
        mt = mt.appendParameterTypes(arg.carrier);
        variadicLayouts.add(arg.layout);
    }
    MethodHandle mh = abi.downcallHandle(printfAddr,
            fd.asVariadic(variadicLayouts.toArray(new MemoryLayout[args.size()])));
    return mh.asSpreader(1, Object[].class, args.size());
}

That PR also changed from invokeExact to invoke. Why?

As an aside, notice that the test_time test (and every other test) passed when we disabled test_printf. test_time calls gmtime, which returns a tm struct so that side of things is working fine.

The question is what is all this spreading about? The asSpreader docs explain it as follow

Makes an array-spreading method handle, which accepts an array argument at a given position and spreads its elements as positional arguments in place of the array. The new method handle adapts, as its target, the current method handle. The type of the adapter will be the same as the type of the target, except that the arrayLength parameters of the target’s type, starting at the zero-based position spreadArgPos, are replaced by a single array parameter of type arrayType.

MethodHandle.asSpreader

Therefore, the test is essentially converting all the printf arguments into positional arguments.

Question: how is the translation from all this to native code actually done? PR 8282191: Implementation of Foreign Function & Memory API (Preview) ยท openjdk/jdk@2c5d136 (github.com) changes some of the hotspot code, which might make it easier to explore the related code.

Looking at the ABIDescriptor in the AArch64 CallArranger, there is a shadow space entry with the value of 0. windows – What is the ‘shadow space’ in x64 assembly? – Stack Overflow explains what shadow space is.

CallArranger.getBindings seems like an interesting place – it uses the abstract method varArgsOnStack() on line 145 and calls SharedUtils.isVarargsIndex(). Notice that the FunctionDescriptor has a firstVariadicArgumentIndex() method that returns -1. This is why specializedPrintf calls FunctionDescriptor.asVariadic(). VariadicFunction sets the firstVariadicIndex to the size of the argumentLayouts of the FunctionDescriptor.

CallArranger.classifyLayout() will return either INTEGER, FLOAT, or POINTER for the case I’m interested in. These cases in UnboxBindingCalculator.getBindings call storageCalculator.nextStorage. DIving into that implementation reveals that we don’t want adjustForVarArgs() to be called! Hmm, after looking at the optimized code in my post on “Building & Disassembling ARM64 Code using Visual C++”, I notice FMOV being used to load general purpose registers x1-x3 with the IEEE double! This looks idfferent from the getBindings implementation, which gets the next storage for FLOATs from the vector registers! et voila! The contradiction I’ve been waiting for: now the addendum on variadic functions at Overview of ARM64 ABI conventions makes sense.

Tests still fail with my change.

Creating a Narrow Test Case

Get a Windows x64 JDK 19 nightly build from Adoptium. Create a Java Project in Eclipse and change the JRE System Library to jdk-19+34. See MinimizedStdLibTest.java. We will use hsdis to explore this testcase. See Blog Theme – Details (oracle.com) and the post on the hsdis LLVM backend for Windows ARM64 for more info. Here is the updated configure command.

bash configure --openjdk-target=aarch64-unknown-cygwin --with-debug-level=slowdebug --with-boot-jdk=/cygdrive/d/dev/repos/java/infra/binaries/jdk-18.0.2  --with-hsdis=llvm --with-llvm=/cygdrive/d/dev/software/llvm-aarch64/

After running the build-aarch64.sh script, we can now disassemble the code on the host:

C:\dev\java\abi\devbranch4\jdk\bin\javac.exe -g --enable-preview --release 20 MinimizedStdLibTest.java

C:\dev\java\abi\devbranch4\jdk\bin\java.exe --enable-preview -XX:+PrintAssembly MinimizedStdLibTest > MinimizedStdLibTest.asm

Inspecting Disassembly using JitWatch

Found this blog post while looking up hsdis: Developers disassemble! Use Java and hsdis to see it all. (oracle.com)

Clone the JitWatch repo. Download the mvn binaries. Set JAVA_HOME to the path of our custom JDK (with hsdis) then start JitWatch. Errors running it though.

No Windows AArch64 binaries at Adoptium or Oracle though.

Let’s just try on x64. Might gain some insight:

cd /d/dev/repos/java/AdoptOpenJDK/jitwatch
/d/dev/repos/java/infra/binaries/jdk-19+34/bin/java --enable-preview -jar ./ui/target/jitwatch-ui-shaded.jar

Looking at these options, I wonder if manually setting the Compile Threshold could show more disassembly:

Update JitWatch to support preview features then change JAVA_HOME. This doesn’t make mvn clean package use my latest JDK…

$ echo $JAVA_HOME
C:\Program Files\Microsoft\jdk-17.0.1.12-hotspot\

$ JAVA_HOME=/d/dev/repos/java/infra/binaries/jdk-19+34/

I can get the JIT to assemble for the main method. Why doesn’t this work on Windows for ARM64? Perhaps I should try a non-debug configuration by configuring as follows before running the build-aarch64.sh script:

bash configure --openjdk-target=aarch64-unknown-cygwin --with-boot-jdk=/cygdrive/d/dev/repos/java/infra/binaries/jdk-18.0.2  --with-hsdis=llvm --with-llvm=/cygdrive/d/dev/software/llvm-aarch64/

I get the same results with the release build – no native code for my printf function! I wonder about downloading something heavier and seeing if anything interesting gets compiled to native code. How about Eclipse? Interestingly, there is no Eclipse build for Windows on ARM64!

Reexamining the Source Code

Desperation leads me to force java native code compilation at DuckDuckGo and java – Can I force the JVM to natively compile a given method? – Stack Overflow. At this point, a review of the java command options leads me to -XX:-Inline and –XX:CompileOnly=MinimizedStdLibTest.printf. This at least reduces the volume of the hsdis output from hundreds of thousands of lines to just under 5500 lines.

C:\...\devbranch-rel\jdk\bin\javac.exe -g --enable-preview --release 20 MinimizedStdLibTest.java

C:\...\devbranch-rel\jdk\bin\java.exe --enable-preview -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:-Inline -XX:CompileOnly=MinimizedStdLibTest.printf MinimizedStdLibTest > MinimizedStdLibTestOnlyPrintf.asm

Examining this reduced output now helps me realize that the double keyword is what I should have been looking for all along! Look at this snippet with arguments that look similar to my modified test case (where I call with a char, a double, and an integer).

[Verified Entry Point]
  # {method} {0x000001dd8f866158} 'linkToStatic' '(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;IDILjava/lang/invoke/MemberName;)I' in 'java/lang/invoke/MethodHandle'
  # parm0:    c_rarg1:c_rarg1 
                        = 'java/lang/Object'
  # parm1:    c_rarg2:c_rarg2 
                        = 'java/lang/Object'
  # parm2:    c_rarg3:c_rarg3 
                        = 'java/lang/Object'
  # parm3:    c_rarg4   = int
  # parm4:    v0:v0     = double
  # parm5:    c_rarg5   = int
  # parm6:    c_rarg6:c_rarg6 
                        = 'java/lang/invoke/MemberName'
  #           [sp+0x0]  (sp of caller)
  0x000001dd87ae6080:   	nop
  0x000001dd87ae6084:   	ldr	w12, [x6, #0x24]
  0x000001dd87ae6088:   	lsl	x12, x12, #3
  0x000001dd87ae608c:   	ldr	x12, [x12, #0x10]
  0x000001dd87ae6090:   	cbz	x12, #0xc
  0x000001dd87ae6094:   	ldr	x8, [x12, #0x40]
  0x000001dd87ae6098:   	br	x8
  0x000001dd87ae609c:   	b	#-0x56729c          ;   {runtime_call AbstractMethodError throw_exception}

I’m still unsure what the parm fields mean but I’m assuming that the double is still being passed in a vector register! Sure enough, I changed the BoxBindingCalculator instead of the UnboxBindingCalculator. Fixed that then reran the test:

C:\dev\java\abi\devbranch-rel2\jdk\bin\java.exe --enable-preview -jar C:\dev\java\jtreg\lib\jtreg.jar -agentvm -timeoutFactor:4 -concurrency:4 -verbose:fail,error,summary -nativepath:C:\dev\java\abi\devbranch-rel2\support\test\jdk\jtreg\native\lib test/jdk/java/foreign/StdLibTest.java

The test fails but this time there is a fatal error! Feels like progress.

Note: C:\dev\repos\java\forks\jdk\test\jdk\java\foreign\StdLibTest.java uses preview features of Java SE 20.
Note: Recompile with -Xlint:preview for details.

ACTION: testng -- Failed. Unexpected exit from test [exit code: 1]
REASON: User specified action: run testng/othervm --enable-native-access=ALL-UNNAMED StdLibTest
TIME:   4.783 seconds
messages:
command: testng --enable-native-access=ALL-UNNAMED StdLibTest
reason: User specified action: run testng/othervm --enable-native-access=ALL-UNNAMED StdLibTest
Mode: othervm [/othervm specified]
elapsed time (seconds): 4.783
configuration:
STDOUT:
test StdLibTest.test_printf([INTEGRAL, STRING, CHAR, CHAR]): success
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (assembler_aarch64.hpp:253), pid=11060, tid=5996
#  guarantee(val < (1ULL << nbits)) failed: Field too big for insn
#
# JRE version: OpenJDK Runtime Environment (20.0) (build 20-internal-adhoc.sawesong.jdk)
# Java VM: OpenJDK 64-Bit Server VM (20-internal-adhoc.sawesong.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, windows-aarch64)
# No core dump will be written. Minidumps are not enabled by default on client versions of Windows
#
# An error report file with more information is saved as:
# C:\dev\repos\java\forks\jdk\JTwork\scratch\0\hs_err_pid11060.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#
hello(42,str,h,h)

Searching for the string “C1-compiled” (which shows up in the hsdis output) reveals its source: nmethod.cpp. The compilation summary is generated by nmethod::print. For an explanation of how to interpret hsdis output, see PrintAssembly output explained! | Itโ€™s All Relative (jpbempel.github.io)

Inspecting the Core Dump

Since the fatal error in the JRE states that Minidumps are not enabled by default on client versions of Windows, I enabled collection of dump files using the enable-crash-dumps.bat script. Now we see a minidump written to disk:

C:\dev\java\abi\devbranch5\jdk\bin\java.exe --enable-preview MinimizedStdLibTest
WARNING: A restricted method in java.lang.foreign.Linker has been called
WARNING: java.lang.foreign.Linker::nativeLinker has been called by the unnamed module
WARNING: Use --enable-native-access=ALL-UNNAMED to avoid a warning for this module

# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=\vmreg_aarch64.hpp:48
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (c:\dev\repos\java\forks\jdk\src\hotspot\cpu\aarch64\vmreg_aarch64.hpp:48), pid=14728, tid=11380
#  assert(is_FloatRegister() && is_even(value())) failed: must be
#
# JRE version: OpenJDK Runtime Environment (20.0) (slowdebug build 20-internal-adhoc.sawesong.jdk)
# Java VM: OpenJDK 64-Bit Server VM (slowdebug 20-internal-adhoc.sawesong.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, windows-aarch64)
# Core dump will be written. Default location: C:\dev\java\abi\tests\hs_err_pid14728.mdmp
#
# An error report file with more information is saved as:
# C:\dev\java\abi\tests\hs_err_pid14728.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#

We can now open the dump file using WinDbg.

0:000> k
 # Child-SP          RetAddr               Call Site
00 00000096`df4ff310 00007ffe`72a05408     ntdll!NtWaitForSingleObject+0x4
01 00000096`df4ff310 00007ffe`6aa90c84     KERNELBASE!WaitForSingleObjectEx+0x88
02 00000096`df4ff3a0 00007ffe`6aa8ae18     jli!CallJavaMainInNewThread+0xac [c:\dev\repos\java\forks\jdk\src\java.base\windows\native\libjli\java_md.c @ 809] 
03 00000096`df4ff3d0 00007ffe`6aa90dc8     jli!ContinueInNewThread+0xd0 [c:\dev\repos\java\forks\jdk\src\java.base\share\native\libjli\java.c @ 2278] 
04 00000096`df4ff4d0 00007ffe`6aa89c18     jli!JVMInit+0x48 [c:\dev\repos\java\forks\jdk\src\java.base\windows\native\libjli\java_md.c @ 974] 
05 00000096`df4ff510 00007ff6`50751408     jli!JLI_Launch+0x360 [c:\dev\repos\java\forks\jdk\src\java.base\share\native\libjli\java.c @ 340] 
06 00000096`df4ff8d0 00007ff6`507517c4     java_exe!main+0x408 [c:\dev\repos\java\forks\jdk\src\java.base\share\native\launcher\main.c @ 166] 
07 (Inline Function) --------`--------     java_exe!invoke_main+0x24
08 00000096`df4ff980 00007ff6`50751850     java_exe!__scrt_common_main_seh+0x124
09 (Inline Function) --------`--------     java_exe!__scrt_common_main+0x8
0a 00000096`df4ff9c0 00007ffe`740b84a8     java_exe!mainCRTStartup+0x10
0b 00000096`df4ff9d0 00007ffe`76fc3108     kernel32!BaseThreadInitThunk+0x38
0c 00000096`df4ffa10 00000000`00000000     ntdll!RtlUserThreadStart+0x48

Running in WinDbg

Decide to run java under the debugger and see what happens.

  1. Launch WinDbg and go to File > Open Executable…
  2. Browse to the java.exe path.
  3. Specify the starting directory containing the compiled MinimizedStdLibTest file.
  4. Specify these arguments: --enable-preview MinimizedStdLibTest then click Open.
  5. Press F5 to start the program.

After a few breaks due to unhandled exceptions, I decide to look up the warnings in the text on-screen when a foreign function API is invoked. These messages are from Reflection.ensureNativeAccess and are called by …

WARNING: A restricted method in java.lang.foreign.Linker has been called
WARNING: java.lang.foreign.Linker::nativeLinker has been called by the unnamed module
WARNING: Use --enable-native-access=ALL-UNNAMED to avoid a warning for this module

Debugging in Visual Studio 2019

Create a C++ Console Application then open its Configuration Properties. On the Debug page, change the command, command arguments, and working directory to that of the newly built java.exe. Here are some interesting methods based on exploring after setting breakpoints in methodHandles.cpp:

  1. InterpreterRuntime::resolve_from_cache
  2. MethodHandles::resolve_MemberName
  3. JavaCallArguments (from InstanceKlass.cpp:1163)
  4. InterpreterRuntime::prepare_native_call
  5. NativeLookup::lookup reveals to me the -verbose:jni flag.
C:\dev\repos\java\forks\dups\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\javac.exe -g --enable-preview --release 20 MinimizedStdLibTest.java

C:\dev\repos\java\forks\dups\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\java.exe --enable-preview -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=jit_compiler.log -verbose:jni MinimizedStdLibTest
...
[2.232s][debug][jni,resolve] [Dynamic-linking native method sun.nio.ch.FileDispatcherImpl.size0 ... JNI]
[2.592s][debug][jni,resolve] [Dynamic-linking native method jdk.internal.foreign.abi.NativeEntryPoint.registerNatives ... JNI]
[2.592s][debug][jni,resolve] [Registering JNI native method jdk.internal.foreign.abi.NativeEntryPoint.makeDowncallStub]
[2.592s][debug][jni,resolve] [Registering JNI native method jdk.internal.foreign.abi.NativeEntryPoint.freeDowncallStub0]
hello(h,1.2345,42)

There is a NativeEntryPoint.java and NativeEntryPoint.cpp. Other interesting methods:

  1. DowncallLinker::make_downcall_stub creates a CodeBuffer on line 98, which is initialized by CodeBuffer::initialize.

There are threads with native code (such as the methods above) but no method info. I think those are Java methods. I end up stepping through the code on x64 to gain a better understanding of how the native code stubs are generated. VZEROUPPER motivates a quick detour into AVX-512 just to get a better feel of what it’s about. The instruction set reference (from Intelยฎ 64 and IA-32 Architectures Software Developer Manuals) explains that in 64-bit mode, VZEROUPPER zeroes the bits in positions 128 and higher in YMM0-YMM15 and ZMM0-ZMM15.

Reexamining the Assembly

I decide to find a way to compile everything to assembly. java – Can I force the JVM to natively compile a given method? – Stack Overflow suggests the -Xcomp flag, which works wonders!

javac.exe -g --enable-preview --release 20 MinimizedStdLibTest.java

java.exe --enable-preview -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:-Inline -XX:CompileOnly=MinimizedStdLibTest.printf -Xcomp MinimizedStdLibTest > MinimizedStdLibTestAsmForPrintfOnly.asm

I end up updating the test to have a single MethodHandle.invoke() call on its own line to simplify narrowing down the call in the disassembly. To simplify debugging even further, I create another test (MinimizedStdLibTest20Args) with 20 arguments (most of them doubles) that need to be formatted. This should make it easier to identify the code I am interested in and how these arguments are passed. I have a better grasp of x86-64 architecture so that seems like a better place to start examining to better understanding how this native call is handled.

amd64 Disassembly

There are several verified entry points with these many parameters. Why? Here’s the last one on my Intel(R) Xeon(R) W-2133 CPU.

[Verified Entry Point]
  # {method} {0x000002876ccd2e30} 'linkToSpecial' '(Ljava/lang/Object;JJIDDIDDDDDDDDDDDDDDDDDLjava/lang/invoke/MemberName;)I' in 'java/lang/invoke/MethodHandle'
  # parm0:    rdx:rdx   = 'java/lang/Object'
  # parm1:    r8:r8     = long
  # parm2:    r9:r9     = long
  # parm3:    rdi       = int
  # parm4:    xmm0:xmm0   = double
  # parm5:    xmm1:xmm1   = double
  # parm6:    rsi       = int
  # parm7:    xmm2:xmm2   = double
  # parm8:    xmm3:xmm3   = double
  # parm9:    xmm4:xmm4   = double
  # parm10:   xmm5:xmm5   = double
  # parm11:   xmm6:xmm6   = double
  # parm12:   xmm7:xmm7   = double
  # parm13:   [sp+0x0]   = double  (sp of caller)
  # parm14:   [sp+0x8]   = double
  # parm15:   [sp+0x10]   = double
  # parm16:   [sp+0x18]   = double
  # parm17:   [sp+0x20]   = double
  # parm18:   [sp+0x28]   = double
  # parm19:   [sp+0x30]   = double
  # parm20:   [sp+0x38]   = double
  # parm21:   [sp+0x40]   = double
  # parm22:   [sp+0x48]   = double
  # parm23:   [sp+0x50]   = double
  # parm24:   rcx:rcx   = 'java/lang/invoke/MemberName'
 ;; verify_klass {
  0x000002875655e580:   	testq	%rcx, %rcx
  0x000002875655e583:   	je	0x40
  0x000002875655e589:   	pushq	%rdi
  0x000002875655e58a:   	pushq	%r10
  0x000002875655e58c:   	movl	0x8(%rcx), %edi
  0x000002875655e58f:   	movabsq	$0x800000000, %r10
  0x000002875655e599:   	addq	%r10, %rdi
  0x000002875655e59c:   	movabsq	$0x7ffc8959c6a0, %r10;   {external_word}
  0x000002875655e5a6:   	cmpq	(%r10), %rdi
  0x000002875655e5a9:   	je	0x36
  0x000002875655e5af:   	movq	0x40(%rdi), %rdi
  0x000002875655e5b3:   	movabsq	$0x7ffc8959c6a0, %r10;   {external_word}
  0x000002875655e5bd:   	cmpq	(%r10), %rdi
  0x000002875655e5c0:   	je	0x1f
  0x000002875655e5c6:   	popq	%r10
  0x000002875655e5c8:   	popq	%rdi
 ;; MemberName required for invokeVirtual etc.
  0x000002875655e5c9:   	movabsq	$0x7ffc88f3a110, %rcx;   {external_word}
  0x000002875655e5d3:   	andq	$-0x10, %rsp
  0x000002875655e5d7:   	movabsq	$0x7ffc88127ef0, %r10;   {runtime_call MacroAssembler::debug64}
  0x000002875655e5e1:   	callq	*%r10
  0x000002875655e5e4:   	hlt
 ;; L_ok:
  0x000002875655e5e5:   	popq	%r10
  0x000002875655e5e7:   	popq	%rdi
 ;; } verify_klass
.
.
.

The string “MemberName required for invokeVirtual etc” looks like a unique string and is therefore a reasonable one to use to find the code that set up the entry point. It comes from the generate_method_handle_dispatch method. Placing a breakpoint here reveals an interesting stack:

jvm.dll!MethodHandles::generate_method_handle_dispatch(MacroAssembler * _masm, vmIntrinsicID iid, RegisterImpl * receiver_reg, RegisterImpl * member_reg, bool for_compiler_entry) Line 364	C++
 	jvm.dll!gen_special_dispatch(MacroAssembler * masm, const methodHandle & method, const BasicType * sig_bt, const VMRegPair * regs) Line 1508	C++
 	jvm.dll!SharedRuntime::generate_native_wrapper(MacroAssembler * masm, const methodHandle & method, int compile_id, BasicType * in_sig_bt, VMRegPair * in_regs, BasicType ret_type) Line 1572	C++
 	jvm.dll!AdapterHandlerLibrary::create_native_wrapper(const methodHandle & method) Line 3159	C++
 	jvm.dll!SystemDictionary::find_method_handle_intrinsic(vmIntrinsicID iid, Symbol * signature, JavaThread * __the_thread__) Line 2017	C++
 	jvm.dll!LinkResolver::lookup_polymorphic_method(const LinkInfo & link_info, Handle * appendix_result_or_null, JavaThread * __the_thread__) Line 446	C++
 	jvm.dll!LinkResolver::resolve_method(const LinkInfo & link_info, Bytecodes::Code code, JavaThread * __the_thread__) Line 756	C++
 	jvm.dll!LinkResolver::linktime_resolve_static_method(const LinkInfo & link_info, JavaThread * __the_thread__) Line 1106	C++
 	jvm.dll!LinkResolver::resolve_static_call(CallInfo & result, const LinkInfo & link_info, bool initialize_class, JavaThread * __the_thread__) Line 1072	C++
 	jvm.dll!MethodHandles::resolve_MemberName(Handle mname, Klass * caller, int lookup_mode, bool speculative_resolve, JavaThread * __the_thread__) Line 777	C++
 	jvm.dll!MHN_resolve_Mem(JNIEnv_ * env, _jobject * igcls, _jobject * mname_jh, _jclass * caller_jh, long lookup_mode, unsigned char speculative_resolve) Line 1252	C++
 	0000020a0a26fb92()	Unknown
 	0000020a0058eb00()	Unknown
 	0000005f992fd040()	Unknown
 	0000005f992fd010()	Unknown

This is essentially all the interesting action I have been searching for! Especially AdapterHandlerLibrary::create_native_wrapper, which calls SharedRuntime::java_calling_convention and SharedRuntime::generate_native_wrapper. The latter are exactly what I’ve been seeking!

What does the new_native_nmethod implementation actually do? It ends up calling this nmethod constructor that reveals the existence of the PrintNativeNMethods flag.

javac.exe -g --enable-preview --release 20 MinimizedStdLibTest20Args.java

java.exe --enable-preview -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:-Inline -XX:CompileOnly=MinimizedStdLibTest20Args.printf -Xcomp MinimizedStdLibTest20Args > MinimizedStdLibTest20ArgsAsmForPrintfOnly.asm

Some questions from inspecting the verify_klass method:

  1. You can have spaces after the -> operator. See the expansion of __
  2. You can have #defines inside the class itself since they are processed before the compiler is invoked. See c++ – Is it possible to use #define inside a function? – Stack Overflow.

The VerifyOops flag is off by default so the verify_oop doesn’t generate any code. The testptr is therefore the first MacroAssembler code to be generated. Notice that the code jumps to the MemberName required for invokeVirtual etc label if rcx is zero – that must be error-handling code. The jz mnemonic would be preferrable to je (see assembly – Difference between JE/JNE and JZ/JNZ – Stack Overflow) but they are identical opcodes. Here is the listing with links to the methods that generated them.

...
  # parm24:   rcx:rcx   = 'java/lang/invoke/MemberName'
 ;; verify_klass {
  0x000002875655e580:   	testq	%rcx, %rcx
  0x000002875655e583:   	je	0x40
  0x000002875655e589:   	pushq	%rdi
  0x000002875655e58a:   	pushq	%r10
  0x000002875655e58c:   	movl	0x8(%rcx), %edi
  0x000002875655e58f:   	movabsq	$0x800000000, %r10
  0x000002875655e599:   	addq	%r10, %rdi
  0x000002875655e59c:   	movabsq	$0x7ffc8959c6a0, %r10;   {external_word}
  0x000002875655e5a6:   	cmpq	(%r10), %rdi
  0x000002875655e5a9:   	je	0x36 // L_ok
  0x000002875655e5af:   	movq	0x40(%rdi), %rdi
  0x000002875655e5b3:   	movabsq	$0x7ffc8959c6a0, %r10;   {external_word}
  0x000002875655e5bd:   	cmpq	(%r10), %rdi
  0x000002875655e5c0:   	je	0x1f // L_ok
  0x000002875655e5c6:   	popq	%r10
  0x000002875655e5c8:   	popq	%rdi
 ;; MemberName required for invokeVirtual etc.
  0x000002875655e5c9:   	movabsq	$0x7ffc88f3a110, %rcx;   {external_word}
  0x000002875655e5d3:   	andq	$-0x10, %rsp
  0x000002875655e5d7:   	movabsq	$0x7ffc88127ef0, %r10;   {runtime_call MacroAssembler::debug64}
  0x000002875655e5e1:   	callq	*%r10
  0x000002875655e5e4:   	hlt
 ;; L_ok:
  0x000002875655e5e5:   	popq	%r10
  0x000002875655e5e7:   	popq	%rdi
 ;; } verify_klass
.
.
.

The movl is a 32-bit mov of the klass* into edi – see gcc – The difference between mov and movl instruction in X86? – Stack Overflow. The offset of 8 is the klass offset in bytes. This klass offset is computed using the offsetof macro. From the beginning of the oopDesc class definition below, the klass offset is 8 to accomodate the markWord.

class oopDesc {
  friend class VMStructs;
  friend class JVMCIVMStructs;
 private:
  volatile markWord _mark;
  union _metadata {
    Klass*      _klass;
    narrowKlass _compressed_klass;
  } _metadata;

The first movabsq instruction loads (int64_t)CompressedKlassPointers::base() into the temporary register r10. As per NarrowPtrStruct._base, this is the base address for oop-within-java-object materialization. Not yet exactly sure whether that means an offset to add to the klass* to get the virtual address of the object since this base is added to the klass* in rdi. That addition ends the MacroAssembler::load_klass call.

The 2nd movabsq instruction loads the external klass address of the klass with vmClassID java_lang_invoke_MemberName. This value is then compared with the computed klass address in r10. If these 2 values are equal, then all is well and the CPU will branch to L_ok. If this branch is not taken, then the super_check_offset of the MemberName Klass is computed by Klass::super_check_offset. This offset indicates where to look to observe a supertype. So for my purposes, everything in the ;; verify_klass {... ;; } verify_klass section can be ignored since it is MemberName validation.

Without looking at the rest of the assembly code, the key thing to notice is that rcx was assumed to have a MemberName, meaning that by the time all these instructions execute, all the arguments I passed to printf are already in registers/on the stack. A quick detour into the method header is in order though. Here’s the first instance of that signature.

-------------------------- Assembly (native nmethod) ---------------------------

Compiled method (n/a)   16155  119     n 0       java.lang.invoke.MethodHandle::linkToNative(JJIDDIDDDDDDDDDDDDDDDDDL)I (native)
 total in heap  [0x0000021b87aea310,0x0000021b87aea488] = 376
 main code      [0x0000021b87aea480,0x0000021b87aea487] = 7
 stub code      [0x0000021b87aea487,0x0000021b87aea488] = 1

[Disassembly]
--------------------------------------------------------------------------------
[Constant Pool (empty)]

--------------------------------------------------------------------------------

[Verified Entry Point]
  # {method} {0x0000021b978bb868} 'linkToNative' '(JJIDDIDDDDDDDDDDDDDDDDDLjava/lang/Object;)I' in 'java/lang/invoke/MethodHandle'
  # parm0:    rdx:rdx   = long
  # parm1:    r8:r8     = long
  # parm2:    r9        = int
  # parm3:    xmm0:xmm0   = double
  # parm4:    xmm1:xmm1   = double
  # parm5:    rdi       = int
  # parm6:    xmm2:xmm2   = double
  # parm7:    xmm3:xmm3   = double
  # parm8:    xmm4:xmm4   = double
  # parm9:    xmm5:xmm5   = double
  # parm10:   xmm6:xmm6   = double
  # parm11:   xmm7:xmm7   = double
  # parm12:   [sp+0x0]   = double  (sp of caller)
  # parm13:   [sp+0x8]   = double
  # parm14:   [sp+0x10]   = double
  # parm15:   [sp+0x18]   = double
  # parm16:   [sp+0x20]   = double
  # parm17:   [sp+0x28]   = double
  # parm18:   [sp+0x30]   = double
  # parm19:   [sp+0x38]   = double
  # parm20:   [sp+0x40]   = double
  # parm21:   [sp+0x48]   = double
  # parm22:   [sp+0x50]   = double
  # parm23:   rsi:rsi   = 'java/lang/Object'
 ;; jump_to_native_invoker {
  0x0000021b87aea480:   	movq	0x10(%rsi), %r10
  0x0000021b87aea484:   	jmpq	*%r10
[Stub Code]
 ;; } jump_to_native_invoker
  0x0000021b87aea487:   	hlt
--------------------------------------------------------------------------------
[/Disassembly]

What output the parm\d+ strings after the method header? These are from nmethod::print_nmethod_labels. This method also calls Method::print_value_on, which outputs the JJIDDIDDDDDDDDDDDDDDDDDL stuff in the method header. That is the method signature. Some digging around on SO, e.g. Compute a Java function’s signature – Stack Overflow and L, Z and V in Java method signature – Stack Overflow leads me to Java Native Interface Specification: 3 – JNI Types and Data Structures (oracle.com), which explains the types represented by each letter. Inspecting these signatures actually leads me to discover that there are double entries for the ‘linkToNative’ native methods. The difference is the Compiled method (n/a) line.

The string ;; jump_to_native_invoker { comes from MethodHandles::jump_to_native_invoker. I’m pleasantly surprised to see only 2 instances in the disassembly since that will simplify breaking in that code. jump_to_native_invoker mentions NEP, which takes me back to NativeEntryPoint.java and the fact that JVM_RegisterNativeEntryPointMethods get called after the program starts. Is this because NativeEntryPoint’s static constructor calls the native method registerNatives? This prompts a review of how the Java code gets into all this native code.

Java Code Going Native

The test’s printf function calls Linker.downcallHandle on line 119. The implementation of Linker.downcallHandle in my first port goes to AbstractLinker::downcallHandle. That implementation calls the abstract method arrangeDowncall. The AbstractLinker subclass I created (WindowsAArch64Linker) is similar to LinuxAArch64Linker and MacOsAArch64Linker in that it delegates arrangeDowncall to CallArranger.arrangeDowncall. This method in turn creates a new DowncallLinker and calls its getBoundMethodHandle method.

getBoundMethodHandle calls NativeEntryPoint.make. I suspect that this is what causes NativeEntryPoint’s static constructor to be executed (and JVM_RegisterNativeEntryPointMethods and NEP_makeDowncallStub in turn). Also observe that once a NativeEntryPoint has been created, a method handle is created by JLIA.nativeMethodHandle. I think the actual implementation of this is in MethodHandleImpl, which defers to NativeMethodHandle. The makePreparedLambdaForm method has a reference to the ‘linkToNative‘ method I’ve been seeing in the hsdis output.

Here is a particularly interesting callstack showing how NEP_makeDowncallStub ends up calling the DowncallStubGenerator.

>	jvm.dll!DowncallStubGenerator::generate() Line 142	C++
 	jvm.dll!DowncallLinker::make_downcall_stub(BasicType * signature, int num_args, BasicType ret_bt, const ABIDescriptor & abi, const GrowableArray<VMRegImpl *> & input_registers, const GrowableArray<VMRegImpl *> & output_registers, bool needs_return_buffer) Line 101	C++
 	jvm.dll!NEP_makeDowncallStub(JNIEnv_ * env, _jclass * _unused, _jobject * method_type, _jobject * jabi, _jobjectArray * arg_moves, _jobjectArray * ret_moves, unsigned char needs_return_buffer) Line 77	C++
 	0000017244641db1()	Unknown
...

What is interesting about this? The DowncallStubGenerator is not only generating assembly instructions that are most likely what I have been searching for, it also has logging code that is being skipped. That looks like unified logging code! Therefore, using +PrintAssembly was not sufficient to generate the code I wanted to see! Here’s an updated command line after which downcall.txt will contain the results of argument shuffling.

javac.exe -g --enable-preview --release 20 MinimizedStdLibTest20Args.java

java.exe --enable-preview -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:-Inline -XX:CompileOnly=MinimizedStdLibTest20Args.printf -Xcomp -Xlog:foreign+downcall=trace:file=downcall.txt::filecount=0 MinimizedStdLibTest20Args > MinimizedStdLibTest20ArgsAsmForPrintfOnly.asm

Here is a stack revealing a bit more detail about how the arguments are set up.

jvm.dll!SharedRuntime::java_calling_convention(const BasicType * sig_bt, VMRegPair * regs, int total_args_passed) Line 505	C++
jvm.dll!JavaCallingConvention::calling_convention(BasicType * sig_bt, VMRegPair * regs, int num_args) Line 66	C++
jvm.dll!ArgumentShuffle::ArgumentShuffle(BasicType * in_sig_bt, int num_in_args, BasicType * out_sig_bt, int num_out_args, const CallingConventionClosure * input_conv, const CallingConventionClosure * output_conv, VMRegImpl * shuffle_temp) Line 328	C++
jvm.dll!DowncallStubGenerator::generate() Line 141	C++
jvm.dll!DowncallLinker::make_downcall_stub(BasicType * signature, int num_args, BasicType ret_bt, const ABIDescriptor & abi, const GrowableArray<VMRegImpl *> & input_registers, const GrowableArray<VMRegImpl *> & output_registers, bool needs_return_buffer) Line 101	C++
jvm.dll!NEP_makeDowncallStub(JNIEnv_ * env, _jclass * _unused, _jobject * method_type, _jobject * jabi, _jobjectArray * arg_moves, _jobjectArray * ret_moves, unsigned char needs_return_buffer) Line 77	C++
0000017244641db1()	Unknown

More questions about how all this works:

  1. What happens after all the hsdis code is executed? Is the final jump to the native code?
  2. Where is rbx loaded (since that’s what we’re jumping to)?

AArch64 Disassembly

Having now understood that I can log the downcall stubs using the unified logging flags, this is the stub I get on the Surface Pro X (generated by DowncallStubGenerator::generate)

Argument shuffle {
Move a double from ([-1137525940],[-1137525936]) to ([-1137525916],[-1137525912])
Move a double from ([-1137525948],[-1137525944]) to ([-1137525924],[-1137525920])
Move a double from ([-1137525956],[-1137525952]) to ([-1137525932],[-1137525928])
Move a double from ([-1137525964],[-1137525960]) to ([-1137525940],[-1137525936])
Move a double from ([-1137525972],[-1137525968]) to ([-1137525948],[-1137525944])
Move a double from ([-1137525980],[-1137525976]) to ([-1137525956],[-1137525952])
Move a double from ([-1137525988],[-1137525984]) to ([-1137525964],[-1137525960])
Move a double from ([-1137525996],[-1137525992]) to ([-1137525972],[-1137525968])
Move a double from ([-1137526004],[-1137526000]) to ([-1137525980],[-1137525976])
Move a double from ([-1137526012],[-1137526008]) to ([-1137525988],[-1137525984])
Move a double from (v7,v7) to ([-1137525996],[-1137525992])
Move a double from (v6,v6) to ([-1137526004],[-1137526000])
Move a double from (v5,v5) to ([-1137526012],[-1137526008])
Move a double from (v4,v4) to (c_rarg7,c_rarg7)
Move a double from (v3,v3) to (c_rarg6,c_rarg6)
Move a double from (v2,v2) to (c_rarg5,c_rarg5)
Move a long from (c_rarg1,c_rarg1) to (rscratch2,rscratch2)
Move a byte from (c_rarg3,BAD!) to (c_rarg1,BAD!)
Move a int from (c_rarg4,BAD!) to (c_rarg3,BAD!)
Move a double from (v1,v1) to (c_rarg4,c_rarg4)
Move a long from (c_rarg2,c_rarg2) to (c_rarg0,c_rarg0)
Move a double from (v0,v0) to (c_rarg2,c_rarg2)
Stack argument slots: 26
}

It is immediately evident that there are BAD! registers. Why isn’t there more output as one would expect from looking at the additional logging in DowncallStubGenerator::generate? Well, the JVM crash might have something to do with it…

# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=\vmreg_aarch64.hpp:48
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (c:\dev\repos\java\forks\jdk\src\hotspot\cpu\aarch64\vmreg_aarch64.hpp:48), pid=11888, tid=18884
#  assert(is_FloatRegister() && is_even(value())) failed: must be
#
# JRE version: OpenJDK Runtime Environment (20.0) (slowdebug build 20-internal-adhoc.sawesong.jdk)
# Java VM: OpenJDK 64-Bit Server VM (slowdebug 20-internal-adhoc.sawesong.jdk, compiled mode, tiered, compressed oops, compressed class ptrs, g1 gc, windows-aarch64)
# Core dump will be written. Default location: C:\dev\repos\scratchpad\compilers\tests\aarch64\abi\printf\java\hs_err_pid11888.mdmp
#
# An error report file with more information is saved as:
# C:\dev\repos\scratchpad\compilers\tests\aarch64\abi\printf\java\hs_err_pid11888.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#

The most likely culprit here is arg_shuffle.generate. It ends up in ArgumentShuffle::pd_generate which uses the MacroAssembler::double_move and float_move methods. However, addressing the BAD! registers is really the next step before dealing with the assertion failure.

NEP_makeDowncallStub calls ForeignGlobals::parse_vmstorage, which in turn defers to the architecture-specific ForeignGlobals::vmstorage_to_vmreg implementation. This code returns the BAD register if the VMStorage type and does not match the register type! This must be the culprit! How do I log the asString output?

Rexamining the x64 foreign downcall log below, I notice the BAD registers there too! Perhaps this is not an oddity after all. Could it be NativeCallingConvention::calling_convention marking half slots as bad? Actually, notice that in both x64 and AArch64 logs, only the byte and int have these BAD! entries. This must be the other 32-bit slot for the arguments! This means that the AArch64 log is actually fine!

Argument shuffle {
Move a double from ([79203860],[79203864]) to ([79203908],[79203912])
Move a double from ([79203852],[79203856]) to ([79203900],[79203904])
Move a double from ([79203844],[79203848]) to ([79203892],[79203896])
Move a double from ([79203836],[79203840]) to ([79203884],[79203888])
Move a double from ([79203828],[79203832]) to ([79203876],[79203880])
Move a double from ([79203820],[79203824]) to ([79203868],[79203872])
Move a double from ([79203812],[79203816]) to ([79203860],[79203864])
Move a double from ([79203804],[79203808]) to ([79203852],[79203856])
Move a double from ([79203796],[79203800]) to ([79203844],[79203848])
Move a double from ([79203788],[79203792]) to ([79203836],[79203840])
Move a double from ([79203780],[79203784]) to ([79203828],[79203832])
Move a double from (xmm7,xmm7) to ([79203820],[79203824])
Move a double from (xmm6,xmm6) to ([79203812],[79203816])
Move a double from (xmm5,xmm5) to ([79203804],[79203808])
Move a double from (xmm4,xmm4) to ([79203796],[79203800])
Move a double from (xmm3,xmm3) to ([79203788],[79203792])
Move a double from (xmm2,xmm2) to ([79203780],[79203784])
Move a long from (rdx,rdx) to (r10,r10)
Move a byte from (r9,BAD!) to (rdx,BAD!)
Move a int from (rdi,BAD!) to (r9,BAD!)
Move a double from (xmm1,xmm1) to (xmm2,xmm2)
Move a long from (r8,r8) to (rcx,rcx)
Move a double from (xmm0,xmm0) to (r8,r8)
Stack argument slots: 34
}

Back to the MacroAssembler’s and float_move methods… I think the fmovd instruction I seek is this one with a general purpose register operand. After changing double_move to support fmovd between general purpose and floating point registers, rerunning the test on AArch64 does not give any additional output in the downcall log file. Very strange since I don’t see an assertion failure preventing the logging code from running…

I realize though that instead of trying to mess with WinDbg, I can simply write to the unified logging stream (to which output is already successfully being written). Making the LogStream creation unconditional enables me to verify that the code is indeed being executed. __ flush looks like AbstractAssembler::flush. It is only now that I realize that this is not flushing the output stream of the assembler – it is instead invalidating the CPU’s instruction cache! This is done by calling FlushInstructionCache on Windows.

So how do block comments get written to disk? AbstractAssembler::block_comment ends up passing the comments to an AsmRemarks. The inserted comments will be output by AsmRemarks::print. Turns out flags like PrintAssembly or UnlockDiagnosticVMOptions are required to output these comments. Once the downcall stub has been generated, this output should get written to the log file in DowncallLinker::make_downcall_stub.

After fixing the assertion failure by now checking the register types for fmovd, I get an OOM. Lots of output in the hotspot.log as well. paste it here. The hsdis output ends with this:

...
  0x000001c9479b721c:   	add	x8, x8, #0xd40
  0x000001c9479b7220:   	br	x8
[Stub Code]
  0x000001c9479b7224:   	udf	#0x0
--------------------------------------------------------------------------------
[/Disassembly]
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 18446743994480037248 bytes for Chunk::new
# An error report file with more information is saved as:
# C:\dev\repos\scratchpad\compilers\tests\aarch64\abi\printf\java\hs_err_pid11288.log

The Chunk::new string is from Chunk::operator new. Before debugging this, I try adding a delay to the NEP.make call to see if the logs I want will be written to disk before the process dies but I still get the OOM without additional logging output.

Next idea, terminate the program with an assertion failure to see if the output will be written to disk at termination. _wassert – Search (bing.com) -> c – Why is `_wassert` wrapped in `(..,0)`? – Stack Overflow. The hotspot asserts appear to be defines for the CRT _assert function. The latter calls abort, which on Windows, lets a custom abort signal handler function to run (enabling cleanup of resources or log information). Does the JVM use this?

I sprinkle DowncallLinker::generate with this logging code: ls.print_cr("Returning stub after %d", __LINE__); The output shows that the generate method completes executing successfully. However, I don’t get any output from logging calls one level below it in the callstack – in DowncallLinker::make_downcall_stub. Commenting out the creation of the new RuntimeStub (by using the aforemention logging call then returning nullptr on the previous line) shows that execution makes it to that point successfully. That has got to be the culprint since logging messages after that stub do not appear in the logs. And now looking at the RuntimeStub class, it is evident that it has an operator new implementation!

Let’s take a look at happens in WinDbg. The bp, bu, bm (Set Breakpoint) and x (Examine Symbols) are quite useful. x * shows the local variables and their values. I didn’t have the matching sources on the Surface Pro when trying to step into DowncallLinker::make_downcall_stub so I cleaned up all the custom logging, committed my changes, and rebuilt the JDK.

bp jvm!NEP_makeDowncallStub
g
x *

Surprisingly, the newly built JDK successfully passes the StdLibTest.java. Unfortunately, it regresses VaListTest.java and still fails TestVarArgs.java. The error from VaListTest is surprising since that was passing before I began but it looks like a compiler error:

--------------------------------------------------
TEST: java/foreign/valist/VaListTest.java
TEST JDK: C:\dev\java\abi\devbranch5\jdk

ACTION: build -- Failed. Compilation failed: Compilation failed
REASON: Named class compiled on demand
TIME:   32.591 seconds
messages:
command: build VaListTest
reason: Named class compiled on demand
Test directory:
  compile: VaListTest
elapsed time (seconds): 32.591

ACTION: compile -- Failed. Compilation failed: Compilation failed
REASON: .class file out of date or does not exist
TIME:   32.384 seconds
messages:
command: compile C:\dev\repos\java\forks\jdk\test\jdk\java\foreign\valist\VaListTest.java
reason: .class file out of date or does not exist
...
direct:
C:\dev\repos\java\forks\jdk\test\jdk\java\foreign\valist\VaListTest.java:153: error: cannot find symbol
            = (builder, scope) -> WindowsAArch64Linker.newVaList(builder, scope.scope());
                                                                               ^
  symbol:   method scope()
  location: variable scope of type MemorySession
Note: C:\dev\repos\java\forks\jdk\test\jdk\java\foreign\valist\VaListTest.java uses preview features of Java SE 20.
Note: Recompile with -Xlint:preview for details.
1 error
...

The rvalue in the failing assignment needs to match the other lines (simply replace with WindowsAArch64Linker.newVaList). Then get this:

test VaListTest.testCopy(VaListTest$$Lambda$125/0x000000080013cb10@1156402a, i32): success
test VaListTest.testCopy(): failure
org.testng.internal.reflect.MethodMatcherException:
[public void VaListTest.testCopy(java.util.function.BiFunction,java.lang.foreign.ValueLayout$OfInt)] has no parameters defined but was found to be using a data provider (either explicitly specified or inherited from class level annotation).
Data provider mismatch
Method: testCopy([Parameter{index=0, type=java.util.function.BiFunction, declaredAnnotations=[]}, Parameter{index=1, type=java.lang.foreign.ValueLayout$OfInt, declaredAnnotations=[]}])
Arguments: [(VaListTest$$Lambda$120/0x000000080013c000) VaListTest$$Lambda$120/0x000000080013c000@6a8ce624,(java.lang.foreign.ValueLayout$OfInt) i32]
        at org.testng.internal.reflect.DataProviderMethodMatcher.getConformingArguments(DataProviderMethodMatcher.java:43)
        at org.testng.internal.Parameters.injectParameters(Parameters.java:905)
        at org.testng.internal.MethodRunner.runInSequence(MethodRunner.java:34)
        at org.testng.internal.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:822)
        at org.testng.internal.TestInvoker.invokeTestMethods(TestInvoker.java:147)
        at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)
        at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:128)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
        at org.testng.TestRunner.privateRun(TestRunner.java:764)
        at org.testng.TestRunner.run(TestRunner.java:585)
        at org.testng.SuiteRunner.runTest(SuiteRunner.java:384)
        at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:378)
        at org.testng.SuiteRunner.privateRun(SuiteRunner.java:337)
        at org.testng.SuiteRunner.run(SuiteRunner.java:286)
        at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:53)
        at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:96)
        at org.testng.TestNG.runSuitesSequentially(TestNG.java:1218)
        at org.testng.TestNG.runSuitesLocally(TestNG.java:1140)
        at org.testng.TestNG.runSuites(TestNG.java:1069)
        at org.testng.TestNG.run(TestNG.java:1037)
        at com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:93)
        at com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:53)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
        at java.base/java.lang.reflect.Method.invoke(Method.java:578)
        at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:125)
        at java.base/java.lang.Thread.run(Thread.java:1589)

Turns out to be a porting bug in which copy() used winAArch64VaListFactory instead of winAArch64VaListScopedFactory. Thankfully the test passes after this fix. Unfortunately, TestVaArgs.java still fails:

STDOUT:
test TestVarArgs.testVarArgs(0, "f0_V__", VOID, [], []): success
test TestVarArgs.testVarArgs(17, "f0_V_S_DI", VOID, [STRUCT], [DOUBLE, INT]): success
test TestVarArgs.testVarArgs(34, "f0_V_S_IDF", VOID, [STRUCT], [INT, DOUBLE, FLOAT]): success
test TestVarArgs.testVarArgs(51, "f0_V_S_FDD", VOID, [STRUCT], [FLOAT, DOUBLE, DOUBLE]): success
test TestVarArgs.testVarArgs(68, "f0_V_S_DDP", VOID, [STRUCT], [DOUBLE, DOUBLE, POINTER]): success
test TestVarArgs.testVarArgs(85, "f0_V_S_PPI", VOID, [STRUCT], [POINTER, POINTER, INT]): success
test TestVarArgs.testVarArgs(102, "f0_V_IS_FF", VOID, [INT, STRUCT], [FLOAT, FLOAT]): failure
java.lang.ArrayIndexOutOfBoundsException: Index 0 out of bounds for length 0
        at java.base/jdk.internal.foreign.abi.aarch64.windows.WindowsAArch64CallArranger$StorageCalculator.regAlloc(WindowsAArch64CallArranger.java:230)
        at java.base/jdk.internal.foreign.abi.aarch64.windows.WindowsAArch64CallArranger$UnboxBindingCalculator.getBindings(WindowsAArch64CallArranger.java:369)
        at java.base/jdk.internal.foreign.abi.aarch64.windows.WindowsAArch64CallArranger.getBindings(WindowsAArch64CallArranger.java:150)
        at java.base/jdk.internal.foreign.abi.aarch64.windows.WindowsAArch64CallArranger.arrangeDowncall(WindowsAArch64CallArranger.java:157)
        at java.base/jdk.internal.foreign.abi.aarch64.windows.WindowsAArch64Linker.arrangeDowncall(WindowsAArch64Linker.java:85)
        at java.base/jdk.internal.foreign.abi.AbstractLinker.lambda$downcallHandle$0(AbstractLinker.java:53)
        at java.base/jdk.internal.foreign.abi.SoftReferenceCache$Node.get(SoftReferenceCache.java:52)
        at java.base/jdk.internal.foreign.abi.SoftReferenceCache.get(SoftReferenceCache.java:38)
        at java.base/jdk.internal.foreign.abi.AbstractLinker.downcallHandle(AbstractLinker.java:51)
        at java.base/java.lang.foreign.Linker.downcallHandle(Linker.java:221)
        at TestVarArgs.testVarArgs(TestVarArgs.java:97)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
        at ...
        at java.base/java.lang.Thread.run(Thread.java:1589)

test TestVarArgs.testVarArgs(119, "f0_V_IS_IFD", VOID, [INT, STRUCT], [INT, FLOAT, DOUBLE]): success
test TestVarArgs.testVarArgs(136, "f0_V_IS_FFP", VOID, [INT, STRUCT], [FLOAT, FLOAT, POINTER]): success
test TestVarArgs.testVarArgs(153, "f0_V_IS_DDI", VOID, [INT, STRUCT], [DOUBLE, DOUBLE, INT]): success
test TestVarArgs.testVarArgs(170, "f0_V_IS_PDF", VOID, [INT, STRUCT], [POINTER, DOUBLE, FLOAT]): success
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=\code/vmreg.hpp:147
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (c:\dev\repos\java\forks\jdk\src\hotspot\share\code/vmreg.hpp:147), pid=10580, tid=10896
#  assert(is_stack()) failed: Not a stack-based register
#
# JRE version: OpenJDK Runtime Environment (20.0) (slowdebug build 20-internal-adhoc.sawesong.jdk)
# Java VM: OpenJDK 64-Bit Server VM (slowdebug 20-internal-adhoc.sawesong.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, windows-aarch64)
# Core dump will be written. Default location: C:\dev\repos\java\forks\jdk\JTwork\scratch\0\hs_err_pid10580.mdmp
#
# An error report file with more information is saved as:
# C:\dev\repos\java\forks\jdk\JTwork\scratch\0\hs_err_pid10580.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#

The problem turns out to be the fact that I had removed the vector registers from the list of input registers but the HFA code expects these to exist. The Windows AArch64 ABI also expected these vector registers to be used in this scenario. Restoring them addresses this bug, getting us back to the original failure (before I made any changes):

--------------------------------------------------
TEST: java/foreign/TestVarArgs.java
TEST JDK: C:\dev\java\abi\devbranch6\jdk

ACTION: build -- Passed. All files up to date
REASON: Named class compiled on demand
TIME:   0.015 seconds
messages:
command: build TestVarArgs
reason: Named class compiled on demand
elapsed time (seconds): 0.015

ACTION: testng -- Failed. Unexpected exit from test [exit code: 1]
REASON: User specified action: run testng/othervm --enable-native-access=ALL-UNNAMED -Dgenerator.sample.factor=17 TestVarArgs
TIME:   18.911 seconds
messages:
command: testng --enable-native-access=ALL-UNNAMED -Dgenerator.sample.factor=17 TestVarArgs
reason: User specified action: run testng/othervm --enable-native-access=ALL-UNNAMED -Dgenerator.sample.factor=17 TestVarArgs
Mode: othervm [/othervm specified]
elapsed time (seconds): 18.911
configuration:
STDOUT:
test TestVarArgs.testVarArgs(0, "f0_V__", VOID, [], []): success
test TestVarArgs.testVarArgs(17, "f0_V_S_DI", VOID, [STRUCT], [DOUBLE, INT]): success
test TestVarArgs.testVarArgs(34, "f0_V_S_IDF", VOID, [STRUCT], [INT, DOUBLE, FLOAT]): success
test TestVarArgs.testVarArgs(51, "f0_V_S_FDD", VOID, [STRUCT], [FLOAT, DOUBLE, DOUBLE]): success
test TestVarArgs.testVarArgs(68, "f0_V_S_DDP", VOID, [STRUCT], [DOUBLE, DOUBLE, POINTER]): success
test TestVarArgs.testVarArgs(85, "f0_V_S_PPI", VOID, [STRUCT], [POINTER, POINTER, INT]): success
STDERR:
java.lang.RuntimeException: java.lang.IllegalStateException: java.lang.AssertionError: expected [12.0] but found [2.8E-45]
        at TestVarArgs.check(TestVarArgs.java:134)
        at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:733)
        at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:758)
        at TestVarArgs.testVarArgs(TestVarArgs.java:104)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
        at java.base/java.lang.reflect.Method.invoke(Method.java:578)
        at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
        at org.testng.internal.TestInvoker.invokeMethod(TestInvoker.java:599)
        at org.testng.internal.TestInvoker.invokeTestMethod(TestInvoker.java:174)
        at org.testng.internal.MethodRunner.runInSequence(MethodRunner.java:46)
        at org.testng.internal.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:822)
        at org.testng.internal.TestInvoker.invokeTestMethods(TestInvoker.java:147)
        at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)
        at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:128)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
        at org.testng.TestRunner.privateRun(TestRunner.java:764)
        at org.testng.TestRunner.run(TestRunner.java:585)
        at org.testng.SuiteRunner.runTest(SuiteRunner.java:384)
        at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:378)
        at org.testng.SuiteRunner.privateRun(SuiteRunner.java:337)
        at org.testng.SuiteRunner.run(SuiteRunner.java:286)
        at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:53)
        at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:96)
        at org.testng.TestNG.runSuitesSequentially(TestNG.java:1218)
        at org.testng.TestNG.runSuitesLocally(TestNG.java:1140)
        at org.testng.TestNG.runSuites(TestNG.java:1069)
        at org.testng.TestNG.run(TestNG.java:1037)
        at com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:93)
        at com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:53)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
        at java.base/java.lang.reflect.Method.invoke(Method.java:578)
        at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:125)
        at java.base/java.lang.Thread.run(Thread.java:1589)
Caused by: java.lang.IllegalStateException: java.lang.AssertionError: expected [12.0] but found [2.8E-45]
        at CallGeneratorHelper.lambda$initStruct$10(CallGeneratorHelper.java:443)
        at TestVarArgs.lambda$check$4(TestVarArgs.java:132)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
        at TestVarArgs.check(TestVarArgs.java:132)
        ... 32 more
Caused by: java.lang.AssertionError: expected [12.0] but found [2.8E-45]
        at org.testng.Assert.fail(Assert.java:99)
        at org.testng.Assert.failNotEquals(Assert.java:1037)
        at org.testng.Assert.assertEqualsImpl(Assert.java:140)
        at org.testng.Assert.assertEquals(Assert.java:122)
        at org.testng.Assert.assertEquals(Assert.java:617)
        at CallGeneratorHelper.lambda$makeArg$8(CallGeneratorHelper.java:413)
        at CallGeneratorHelper.lambda$initStruct$10(CallGeneratorHelper.java:441)
        ... 35 more

Examining the test source shows that upcalls can also be traced using -XX:+TraceOptimizedUpcallStubs. I wonder how many other tests are failing though since I didn’t expect this failure. Rerunning them all results in these failures:

  1. TestIntrinsics.java
  2. TestUpcallHighArity.java
  3. TestVarArgs.java

TestIntrinsics.java appears to be an easier test to minimize. Perhaps sorting out the failure there will mean less work in the more complex test.

  1. Show assertion failure here
  2. Discuss how to get to the assertion failure using WinDbg
  3. Show command line with Xlog to get the downcall log
  4. Show how the other data types are shuffled in the downcall log
  5. Show how to step into float_move
  6. first() -> Token-pasting operator (##) | Microsoft Docs
  7. Viewing and Editing Memory in WinDbg – Windows drivers | Microsoft Docs
  8. d, da, db, dc, dd, dD, df, dp, dq, du, dw (Display Memory) – Windows drivers | Microsoft Docs

The bug is that reg2offset_out is called on a single physical register on line 5894! This happens because the src.is_single_phys_reg returns false. I break out the local variables to get an explicit breakdown in the debugger:

// A float arg may have to do float reg int reg conversion
void MacroAssembler::float_move(VMRegPair src, VMRegPair dst, Register tmp) {
 VMReg src_first = src.first();
 VMReg dst_first = dst.first();
 if (src_first->is_stack()) {
    if (dst_first->is_stack()) {
      ldrw(tmp, Address(rfp, reg2offset_in(src.first())));
      strw(tmp, Address(sp, reg2offset_out(dst_first)));
    } else {
      ldrs(dst.first()->as_FloatRegister(), Address(rfp, reg2offset_in(src_first)));
    }
  } else if (src_first != dst_first) {
    bool src_is_single_phys_reg = src.is_single_phys_reg();
    bool dst_is_single_phys_reg = dst.is_single_phys_reg();

    bool src_is_float_reg = src_first->is_FloatRegister();
    bool src_is_reg = src_first->is_Register();

    bool dst_is_float_reg = dst_first->is_FloatRegister();
    bool dst_is_reg = dst_first->is_Register();

    if (src_is_single_phys_reg && dst_is_single_phys_reg)
      fmovs(dst_first->as_FloatRegister(), src_first->as_FloatRegister());
    else
      strs(src_first->as_FloatRegister(), Address(sp, reg2offset_out(dst_first)));
  }
}

Interestingly, the src register is a floating point register but the name is c_arg0. It is confusing to me that the regName field in both the source’s _first and _second fields point to the same location as the destination’s _first and _second VMRegImpl::regName pointers. Looking at the source, this makes sense because the regName pointer is a static field (missed this in WinDbg) and is set by the static set_regName method.

Notice that ArgumentShuffle::ArgumentShuffle calls NativeCallingConvention::calling_convention, which in turn calls out_regs[i].set1(reg). The set1 method explicitly sets _second to BAD (which is first() – 1). set2() on the other hand sets _second to first() + 1. The solution is then to simply check whether the dst is a register since it will not be a single physical register in this scenario. This fix addresses the assertion failure. We should now be able to get downcall logging.

java --enable-preview -Xlog:foreign+downcall=trace:file=downcall12.txt::filecount=0 MinimizedTestIntrinsics

MinimizedTestIntrinsics.java still fails with these errors:

java.lang.Exception: Expected 2 but found 4621819117588971520
java.lang.Exception: Expected 0 but found 2
java.lang.Exception: Expected 13 but found 0
java.lang.Exception: Expected a but found

4621819117588971520 is 0x4024000000000000, nothing revealing about that value. The native functions that were invoked must be invoke_high_arity2, invoke_high_arity4, invoke_high_arity5 , and invoke_high_arity6 since they are the only ones that match those expected return values. I remove the loop to run invoke_high_arity2 only. Here’s a snippet of the downcall log:

Argument shuffle {
Move a int from (c_rarg2,BAD!) to (c_rarg0,BAD!)
Move a long from (c_rarg3,c_rarg3) to (c_rarg2,c_rarg2)
Move a float from (v1,BAD!) to (c_rarg3,BAD!)
Move a long from (c_rarg1,c_rarg1) to (rscratch2,rscratch2)
Move a double from (v0,v0) to (c_rarg1,c_rarg1)
Stack argument slots: 0
}
[CodeBlob (0x00000259e688df90)]
Framesize: 4
Runtime Stub (0x00000259e688df90): nep_invoker_blob
--------------------------------------------------------------------------------
Decoding CodeBlob, name: nep_invoker_blob, at  [0x00000259e688e040, 0x00000259e688e118]  216 bytes
  0x00000259e688e040:   	stp	x29, x30, [sp, #-0x10]!
  0x00000259e688e044:   	mov	x29, sp
  0x00000259e688e048:   	sub	sp, x29, #0x10
  0x00000259e688e04c:   	adr	x9, #0x0
  0x00000259e688e050:   	str	x9, [x28, #0x318]
  0x00000259e688e054:   	mov	x9, sp
  0x00000259e688e058:   	str	x9, [x28, #0x310]
  0x00000259e688e05c:   	str	x29, [x28, #0x320]
 ;; 0x4
  0x00000259e688e060:   	orr	x9, xzr, #0x4
  0x00000259e688e064:   	add	x10, x28, #0x3c4
  0x00000259e688e068:   	stlr	w9, [x10]
 ;; { argument shuffle
 ;; bt=int
  0x00000259e688e06c:   	sxtw	x0, w2
 ;; bt=long
  0x00000259e688e070:   	mov	x2, x3
 ;; bt=float
  0x00000259e688e074:   	fmov	w3, s1
 ;; bt=long
  0x00000259e688e078:   	mov	x9, x1
 ;; bt=double
  0x00000259e688e07c:   	fmov	x1, d0
 ;; } argument shuffle
  0x00000259e688e080:   	blr	x9

Notice that the instructions correctly load the registers x0-x3. The question now is where the return value is used after this function. Here are the rest of the instructions:

 ;; 0x5
  0x00000259e688e084:   	mov	x9, #0x5
  0x00000259e688e088:   	str	w9, [x28, #0x3c4]
  0x00000259e688e08c:   	dmb	ish
  0x00000259e688e090:   	add	x9, x28, #0x3c8
  0x00000259e688e094:   	ldar	x9, [x9]
  0x00000259e688e098:   	cmp	x29, x9
  0x00000259e688e09c:   	b.hi	#0x3c
  0x00000259e688e0a0:   	ldr	w9, [x28, #0x3c0]
  0x00000259e688e0a4:   	cbnz	w9, #0x34
 ;; 0x8
  0x00000259e688e0a8:   	orr	x9, xzr, #0x8
  0x00000259e688e0ac:   	add	x10, x28, #0x3c4
  0x00000259e688e0b0:   	stlr	w9, [x10]
 ;; reguard stack check
  0x00000259e688e0b4:   	ldrb	w9, [x28, #0x450]
  0x00000259e688e0b8:   	cmp	w9, #0x2
  0x00000259e688e0bc:   	b.eq	#0x3c
  0x00000259e688e0c0:   	str	xzr, [x28, #0x310]
  0x00000259e688e0c4:   	str	xzr, [x28, #0x320]
  0x00000259e688e0c8:   	str	xzr, [x28, #0x318]
  0x00000259e688e0cc:   	mov	sp, x29
  0x00000259e688e0d0:   	ldp	x29, x30, [sp], #0x10
  0x00000259e688e0d4:   	ret
 ;; { L_safepoint_poll_slow_path
  0x00000259e688e0d8:   	str	x0, [sp]
  0x00000259e688e0dc:   	mov	x0, x28
 ;; 0x7FFF1FB2A870
  0x00000259e688e0e0:   	mov	x9, #0xa870
  0x00000259e688e0e4:   	movk	x9, #0x1fb2, lsl #16
  0x00000259e688e0e8:   	movk	x9, #0x7fff, lsl #32
  0x00000259e688e0ec:   	blr	x9
  0x00000259e688e0f0:   	ldr	x0, [sp]
  0x00000259e688e0f4:   	b	#-0x4c
 ;; } L_safepoint_poll_slow_path
 ;; { L_reguard
  0x00000259e688e0f8:   	str	x0, [sp]
 ;; 0x7FFF1FFFAAD0
  0x00000259e688e0fc:   	mov	x9, #0xaad0
  0x00000259e688e100:   	movk	x9, #0x1fff, lsl #16
  0x00000259e688e104:   	movk	x9, #0x7fff, lsl #32
  0x00000259e688e108:   	blr	x9
  0x00000259e688e10c:   	ldr	x0, [sp]
  0x00000259e688e110:   	b	#-0x50
 ;; } L_reguard
  0x00000259e688e114:   	udf	#0x0

I needed to search for B.cond in the ARM Architecture Reference Manual for A-profile architecture PDF. The HI mnemonic in b.hi means unsigned higher and is equivalent to the condition flags C==1 && Z == 0. This branch is to the safepoint poll slow path, which is the label immediately following the L_safepoint_poll_slow_path comment. I found it strange that 0x00000259e688e0a0 + #0x3c = 0x259E688E0DC, which is the 2nd instruction after the L_safepoint_poll_slow_path label. However, the B.cond documentation states that the program label to be conditionally branched to is given by an offset from the address of the branch instruction.

Looks like most of the above code is not relevant because it doesn’t touch x0. At this point, it seems like the problem could be in the native code we’re branching into. I set a breakpoint in invoke but the code doesn’t seem to make much sense:

bp intrinsics!invoke_high_arity2

Let us disassemble support/test/jdk/jtreg/native/lib/Intrinsics.dll and see what the compiler generated.

cd build\windows-aarch64-server-slowdebug\support\test\jdk\jtreg\native\support\libIntrinsics\
dumpbin /disasm /out:Intrinsics.asm libIntrinsics.obj
dumpbin /all /out:Intrinsics.txt libIntrinsics.obj

Here is the relevant code, which makes it apparent that libIntrinsics is not expecting floating point parameters in general purpose registers!


Dump of file libIntrinsics.obj

File Type: COFF OBJECT

empty:
  0000000000000000: D65F03C0  ret
  0000000000000004: 00000000
identity_bool:
  0000000000000008: D10043FF  sub         sp,sp,#0x10
  000000000000000C: 53001C08  uxtb        w8,w0
  0000000000000010: 390003E8  strb        w8,[sp]
  0000000000000014: 394003E0  ldrb        w0,[sp]
  0000000000000018: 910043FF  add         sp,sp,#0x10
  000000000000001C: D65F03C0  ret
identity_char:
  0000000000000020: D10043FF  sub         sp,sp,#0x10
  0000000000000024: 13001C08  sxtb        w8,w0
  0000000000000028: 390003E8  strb        w8,[sp]
  000000000000002C: 39C003E0  ldrsb       w0,[sp]
  0000000000000030: 910043FF  add         sp,sp,#0x10
  0000000000000034: D65F03C0  ret
...
identity_long:
  0000000000000068: D10043FF  sub         sp,sp,#0x10
  000000000000006C: F90003E0  str         x0,[sp]
  0000000000000070: F94003E0  ldr         x0,[sp]
  0000000000000074: 910043FF  add         sp,sp,#0x10
  0000000000000078: D65F03C0  ret
  000000000000007C: 00000000
identity_float:
  0000000000000080: D10043FF  sub         sp,sp,#0x10
  0000000000000084: BD0003E0  str         s0,[sp]
  0000000000000088: BD4003E0  ldr         s0,[sp]
  000000000000008C: 910043FF  add         sp,sp,#0x10
  0000000000000090: D65F03C0  ret
  0000000000000094: 00000000
identity_double:
  0000000000000098: D10043FF  sub         sp,sp,#0x10
  000000000000009C: FD0003E0  str         d0,[sp]
  00000000000000A0: FD4003E0  ldr         d0,[sp]
  00000000000000A4: 910043FF  add         sp,sp,#0x10
  00000000000000A8: D65F03C0  ret
  00000000000000AC: 00000000
...
invoke_high_arity2:
  0000000000000138: D10083FF  sub         sp,sp,#0x20
  000000000000013C: B9000BE0  str         w0,[sp,#8]
  0000000000000140: FD000FE0  str         d0,[sp,#0x18]
  0000000000000144: F9000BE1  str         x1,[sp,#0x10]
  0000000000000148: BD000FE1  str         s1,[sp,#0xC]
  000000000000014C: 13001C48  sxtb        w8,w2
  0000000000000150: 390003E8  strb        w8,[sp]
  0000000000000154: 13003C68  sxth        w8,w3
  0000000000000158: 790007E8  strh        w8,[sp,#2]
  000000000000015C: 13003C88  sxth        w8,w4
  0000000000000160: 79000BE8  strh        w8,[sp,#4]
  0000000000000164: F9400BE0  ldr         x0,[sp,#0x10]
  0000000000000168: 910083FF  add         sp,sp,#0x20
  000000000000016C: D65F03C0  ret

I update the WindowsAArch64CallArranger to specifically use general purpose registers for floating point data only for variadic FunctionDescriptors. This fixes both TestIntrinsics and TestUpcallHighArity but not TestVarArgs so I create a self contained test for it: MinimizedTestVarArgs.

TestVarArgs

This test depends on the native varargs.dll (built from libVarArgs.c). This DLL can be found in the build/windows-x86_64-server-slowdebug/support/test/jdk/jtreg/native/lib/ directory.

  1. How does the test work?
  2. It uses upcalls, how do they work?

Here’s how the native upcall linker is invoked to create an upcall stub:

  1. Test calls Linker.upcallStub
  2. AbstractLinker.upcallStub calls WindowsAArch64Linker.arrangeUpcall
  3. CallArranger.arrangeUpcall calls
  4. UpcallLinker.make, which calls the native
  5. makeUpcallStub
bp varargs!varargs
bp UpcallLinker::make_upcall_stb

Finding Upcall Logs

I’m trying to see the logs for upcalls but realize that I only have the downcall logs! Here’s the updated command line:

java --enable-preview -Xlog:foreign+upcall=trace,foreign+downcall=trace:file=up-and-downcalls.txt::filecount=0 MinimizedTestIntrinsics

These logging options generate argument shuffling output only. I expected to see comments like on_entry.

[8.157s][trace][foreign,upcall] Argument shuffle {
[8.157s][trace][foreign,upcall] Move a long from (c_rarg1,c_rarg1) to (c_rarg3,c_rarg3)
[8.157s][trace][foreign,upcall] Move a int from (c_rarg0,BAD!) to (c_rarg2,BAD!)
[8.157s][trace][foreign,upcall] Stack argument slots: 0
[8.158s][trace][foreign,upcall] }
[8.860s][trace][foreign,downcall] Argument shuffle {
[8.860s][trace][foreign,downcall] Move a long from (c_rarg1,c_rarg1) to (rscratch2,rscratch2)
[8.860s][trace][foreign,downcall] Move a int from (c_rarg3,BAD!) to (c_rarg1,BAD!)
[8.860s][trace][foreign,downcall] Move a long from (c_rarg2,c_rarg2) to (c_rarg0,c_rarg0)
[8.862s][trace][foreign,downcall] Stack argument slots: 0
[8.862s][trace][foreign,downcall] }
[8.862s][trace][foreign,downcall] [CodeBlob (0x0000027b876f0810)]
[8.862s][trace][foreign,downcall] Framesize: 2
[8.862s][trace][foreign,downcall] Runtime Stub (0x0000027b876f0810): nep_invoker_blob
[8.862s][trace][foreign,downcall] --------------------------------------------------------------------------------
[8.862s][trace][foreign,downcall] Decoding CodeBlob, name: nep_invoker_blob, at  [0x0000027b876f08c0, 0x0000027b876f0980]  192 bytes
[8.879s][trace][foreign,downcall]   0x0000027b876f08c0:   	stp	x29, x30, [sp, #-0x10]!
[8.879s][trace][foreign,downcall]   0x0000027b876f08c4:   	mov	x29, sp
...

Turns out the upcallLinker requires the TraceOptimizedUpcallStubs flag to log this information. TODO: improve the consistency of this logging. The Xlog option I’m using is not available in the non-debug product though!

java --enable-preview -XX:+TraceOptimizedUpcallStubs -Xlog:foreign+upcall=trace,foreign+downcall=trace:file=up-and-downcalls.txt::filecount=0 MinimizedTestIntrinsics

That is not sufficient though. Simply outputs this to the command prompt:

[CodeBlob (0x0000025291ffe090)]
Framesize: 0
UpcallStub (0x0000025291ffe090) used for upcall_stub_(Ljava/lang/Object;IJ)V
[CodeBlob (0x0000025291ffe090)]
Framesize: 0
UpcallStub (0x0000025291ffe090) used for upcall_stub_(Ljava/lang/Object;IJ)V
...

The UpcallStub constructor turns out to have the UpcallStub tracing code (notice the stub name “UpcallStub”). It expects the PrintStubCode flag. This outputs the disassembly as I expected but does so for just about everything – 10MB of text. The stub name can be used to narrow down the calls we’re interested in.

java --enable-preview -XX:+PrintStubCode -Xlog:foreign+upcall=trace,foreign+downcall=trace:file=up-and-downcalls.txt::filecount=0 MinimizedTestIntrinsics > upcallStub.asm

To see the corresponding native code, run dumpbin to generate libVarArgs.asm and libVarArgs.txt:

cd build\windows-aarch64-server-slowdebug\support\test\jdk\jtreg\native\support\libVarArgs\
dumpbin /disasm /out:libVarArgs.asm libVarArgs.obj
dumpbin /all /out:libVarArgs.txt libVarArgs.obj

Setting aside all this learning and simply reviewing the Overview of ARM64 ABI conventions, the statement that floating-point values are returned in s0, d0, or v0, as appropriate should be enough to track down the bug. The change I made to the CallArranger switched the floating point storage to a general purpose register whenever floating point storage was requested for a variadic function. However, this doesn’t fix the test, thereby showing the value of understanding exactly how things are flowing through registers!

Understanding libVarArgs

The varargs function does not return a value. Here is an interpretation of the disassembly:

;$LN2:
;;
;; i++
;;
  0000000000000044: B9400BE8  ldr         w8,[sp,#8]
  0000000000000048: 11000508  add         w8,w8,#1
  000000000000004C: B9000BE8  str         w8,[sp,#8]
$LN4:
;;
;; i < num
;;
  0000000000000050: B9401FE9  ldr         w9,[sp,#0x1C]
  0000000000000054: B9400BE8  ldr         w8,[sp,#8]
  0000000000000058: 6B09011F  cmp         w8,w9
  000000000000005C: 5400F66A  bge         $LN3
;;
;; x8 = info
;;
  0000000000000060: F9401FE8  ldr         x8,[sp,#0x38]
;;
;; x10 = &info->argids
;;
  0000000000000064: 9100210A  add         x10,x8,#8
;;
;; x9 = i * 4
;;
  0000000000000068: B9400BE8  ldr         w8,[sp,#8]
  000000000000006C: 93407D09  sxtw        x9,w8
  0000000000000070: D2800088  mov         x8,#4
  0000000000000074: 9B087D29  mul         x9,x9,x8
;;
;; Get the pointer from the call_info
;;
  0000000000000078: F9400148  ldr         x8,[x10]
;;
;; computer the offset of element [i]
;;
  000000000000007C: 8B090108  add         x8,x8,x9
;;
;; w8 = info->argids[i];
;;
  0000000000000080: B9400108  ldr         w8,[x8]
  0000000000000084: B90023E8  str         w8,[sp,#0x20]
  0000000000000088: B94023E8  ldr         w8,[sp,#0x20]
  000000000000008C: B9001BE8  str         w8,[sp,#0x18]
  0000000000000090: B9401BE8  ldr         w8,[sp,#0x18]
;;
;; There are 88 (0x58) enums.
;;
  0000000000000094: 71015D1F  cmp         w8,#0x57
;;
;; Go to default case if not one of the defined enums
;;
  0000000000000098: 5400F3E8  bhi         $LN95
;;
;; w10 = info->argids[i];
;;
  000000000000009C: B9401BEA  ldr         w10,[sp,#0x18]
;;
;; x9 = PC-relative address of $LN100
;;
  00000000000000A0: 1000F509  adr         x9,$LN100
;;
;; uxtw: unsigned word extend
;; load a signed offset from the table at $LN100
;; x8 = sign-extend([x9 + w10 * 4])
;;
  00000000000000A4: B8AA5928  ldrsw       x8,[x9,w10 uxtw #2]
;;
;; x9 = PC-relative address of $LN51 (half-way point in the switch/45th label from here)
;;
  00000000000000A8: 10007969  adr         x9,$LN51
;;
;; x8 = address of the case statement to jump to
;; why the left shift though?
;;
  00000000000000AC: 8B080928  add         x8,x9,x8,lsl #2
  00000000000000B0: D61F0100  br          x8

...
$LN95:
  0000000000001F14: 12800000  mov         w0,#-1
  0000000000001F18: 90000008  adrp        x8,__imp_exit
  0000000000001F1C: F9400108  ldr         x8,[x8,__imp_exit]
  0000000000001F20: D63F0100  blr         x8
$LN188:
  0000000000001F24: 17FFF848  b           $LN2

;; va_end(a_list);
;; This expands to ((void)(a_list = (va_list)0))
;;
$LN3:
  0000000000001F28: D2800008  mov         x8,#0
  0000000000001F2C: F90003E8  str         x8,[sp]

;;
;; cleanup before returning
;;
  0000000000001F30: 9132C3FF  add         sp,sp,#0xCB0
  0000000000001F34: 94000000  bl          __security_pop_cookie
  0000000000001F38: A8C47BFD  ldp         fp,lr,[sp],#0x40
  0000000000001F3C: D65F03C0  ret
$LN100:
  0000000000001F40: FFFFFC38
$LN101:
  0000000000001F44: FFFFFC49

The unconditional branch to the address in x8 is to the upcall stub.Notice from the setup for the branch that the target is invoked by the blr.

Stepping through the code, I decide to look up the void* parameter that was passed into the upcall stub (just before the last instruction of preserve_callee_saved_regsstr d24, [sp, #0xd0]). Perhaps a more reasonable point would be at the end of the argument shuffle but the values will be the same ones below:

0:004> k
 # Child-SP          RetAddr               Call Site
00 0000009b`fe3fe1a0 00007fff`97a31234     0x000001d7`4c436790
01 0000009b`fe3fe1a0 00007fff`97a31234     VarArgs!varargs+0x17c
02 0000009b`fe3fe2f0 000001d7`4c430680     VarArgs!varargs+0x17c
03 0000009b`fe3fefc0 00000000`00000000     0x000001d7`4c430680
0:004> r
 x0=0000000000000000   x1=0000009bfe3fe390   x2=4038000000000000   x3=0000000000000001
 x4=0000000712645c90   x5=00000000fffffffe   x6=000001d7432ecb10   x7=000000000000000e
 x8=000001d74c436740   x9=0000000000000008  x10=0000000000000002  x11=000001d75e0add98
x12=000001d75ede1550  x13=0000000000000000  x14=a2e64eada2e64ead  x15=000001d763b8a2b8
x16=0000679de851517d  x17=ffff04a2bd67b2d3  x18=0000000000000000  x19=0000009bfe3fefd0
x20=0000009bfe3feff0  x21=00007fff16390f90  x22=000001d75ed662b6  x23=000001d751f7d000
x24=0000009bfe3ff0d8  x25=000001d751f7d000  x26=000001d75ed66410  x27=0000000000000000
x28=000001d7432ecb10   fp=0000009bfe3fe2b0   lr=00007fff97a31234   sp=0000009bfe3fe1a0
 pc=000001d74c436790  psr=80000040 N--- EL0
000001d7`4c436790 fd006bf8 str         d24,[sp,#0xD0]
0:004> dq 9bfe3fe390
0000009b`fe3fe390  40380000`00000000 000001d7`64267890
0000009b`fe3fe3a0  0000009b`fe3fe3c0 00007fff`149502a8
0000009b`fe3fe3b0  000001d7`64267890 00007fff`1494b754

The 64-bit value is 0x4038000000000000. The program below confirms this value to be 24.0. Therefore, everything has been correctly set up for the upcall.

#include <stdio.h>

int main()
{
    __int64 i = 0x4038000000000000;
    double* d = (double*)&i;
    printf("%f", *d);
}
  1. Review earlier 0x4024 value.
  2. Review set of volatile registers defined by the ABI since that’s what ends up in the upcall stub.

Let us take another look at upcallStub.asm. The hex value at the beginning of the receiver section is the immediate value being loaded into the register on the next line. It is generated by MacroAssembler::movptr and is the pointer to the reciever jobject. The movptr method explains that since the AArch64 mode VA space is 48 bits in size, only 3 instructions are sufficient to create a patchable instruction sequence that can reach anywhere. This helps me notice that the 3 mov instructions are recreating that immediate value in the comments.

  1. Need to figure out how to set a breakpoint only when i == 2 in the varargs C function.

Now that I can break just before the branch into Java code, the question is where does the Java calling convention expect arguments to be? jvm – What’s the calling convention for the Java code in Linux platform? – Stack Overflow gives me the hint that I should be looking at assembler_aarch64.hpp for this info. At this point, I realize that I should have compiled the Java code as well. Back to the fuller command line:

javac -g --enable-preview --release 20 MinimizedTestVarArgs.java
java --enable-preview -Xlog:foreign+upcall=trace,foreign+downcall=trace:file=up-and-downcalls-TestVarArgs-16-05.txt::filecount=0 -XX:+PrintAssembly -XX:+PrintStubCode -XX:-Inline MinimizedTestVarArgs > TestVarArgs.asm

There is a level of indirection that works against this idea: the stub uses an offset into the receiver to retrieve the method to call. That is not directly output in the disassembly!

  1. A good place to break is jvm!UpcallLinker::on_entry

Why don’t we review how these cases are handled in the native code? Here is the definition of va_arg from C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.34.31823\include\vadefs.h:

#define __crt_va_arg(ap, t)                                                 \
   ((sizeof(t) > (2 * sizeof(__int64)))                                   \
       ? **(t**)((ap += sizeof(__int64)) - sizeof(__int64))               \
       : *(t*)((ap += _SLOTSIZEOF(t) + _APALIGN(t,ap)) - _SLOTSIZEOF(t)))

Below is the disassembly for the first case in libVarArgs.c. The 2nd definition of __crt_va_arg is used on ARM64. The _SLOTSIZEOF evaluates to 8 for both int and double. TODO: finish explaining this assembly.

$LN7:
  00000000000000B4: D2800009  mov         x9,#0
  00000000000000B8: F94003E8  ldr         x8,[sp]
  00000000000000BC: CB080128  sub         x8,x9,x8
  00000000000000C0: 92400508  and         x8,x8,#3
  00000000000000C4: 91002109  add         x9,x8,#8
  00000000000000C8: F94003E8  ldr         x8,[sp]
  00000000000000CC: 8B090108  add         x8,x8,x9
  00000000000000D0: F90003E8  str         x8,[sp]
  00000000000000D4: F94003E8  ldr         x8,[sp]
  00000000000000D8: D1002108  sub         x8,x8,#8
  00000000000000DC: B9400108  ldr         w8,[x8]
  00000000000000E0: B90027E8  str         w8,[sp,#0x24]
  00000000000000E4: 910093E1  add         x1,sp,#0x24
  00000000000000E8: B9400BE0  ldr         w0,[sp,#8]
  00000000000000EC: F9400BE8  ldr         x8,[sp,#0x10]
  00000000000000F0: D63F0100  blr         x8
  00000000000000F4: 1400078C  b           $LN188

So why does TestUpcallArity pass? It does not use variadic functions! I update MinimizedTestVarArgs to show the function signature codes when it fails. From the resulting log, a struct is being passed to the downcall.

f0_V_S_F java.lang.Exception: Expected 12.0 but found 7.95336E-11
f0_V_S_D java.lang.Exception: Expected 24.0 but found 9.022351855793E-312
f0_V_S_FF java.lang.Exception: Expected 12.0 but found 2.2120472E-11
f0_V_S_FF java.lang.Exception: Expected 12.0 but found 5.96E-43
f0_V_S_DD java.lang.Exception: Expected 24.0 but found 9.02227530708E-312
f0_V_S_DD java.lang.Exception: Expected 24.0 but found 4.9E-324
f0_V_S_FFF java.lang.Exception: Expected 12.0 but found 2.384152E-12
f0_V_S_FFF java.lang.Exception: Expected 12.0 but found 5.96E-43
f0_V_S_FFF java.lang.Exception: Expected 12.0 but found 1.4E-45
f0_V_S_DDD java.lang.Exception: Expected 24.0 but found 9.020261611475E-312
f0_V_S_DDD java.lang.Exception: Expected 24.0 but found 9.02168631996E-312
f0_V_S_DDD java.lang.Exception: Expected 24.0 but found 1.8075E-319
f0_V_IS_F java.lang.Exception: Expected 12.0 but found 2.8E-45
f0_V_IS_D java.lang.Exception: Expected 24.0 but found 9.9E-324
f0_V_IS_FF java.lang.Exception: Expected 12.0 but found 2.8E-45
f0_V_IS_FF java.lang.Exception: Expected 12.0 but found 0.0
f0_V_IS_DD java.lang.Exception: Expected 24.0 but found 9.9E-324
f0_V_IS_DD java.lang.Exception: Expected 24.0 but found 2.08E-322
f0_V_IS_FFF java.lang.Exception: Expected 12.0 but found 2.8E-45
f0_V_IS_FFF java.lang.Exception: Expected 12.0 but found 0.0
f0_V_IS_FFF java.lang.Exception: Expected 12.0 but found 5.9E-44

These signatures remind me of seeing 24.0 in d0 when debugging. I didn’t think about this as much as I should have. Breaking on the branch to the address from the table is the best way to examine the state of the registers and notice 24.0 in d0. Interestingly, only the general purpose registers are shown. See r (Registers) – Windows drivers | Microsoft Docs for details on how to view and modify additional registers.

bp VarArgs!varargs+0xb0
r
rF

The pattern in the above failing signatures implies that the UnboxBindingCalculator is using the STRUCT_HFA case to place them in floating point registers. Changing the code to use the STRUCT_REGISTER case for these causes some of the cases to pass (updated MinimizedTestVarArgs as well). The last case doesn’t work though..

Starting test 6 for f0_V_S_F ... Finished test 6 for f0_V_S_F
Starting test 7 for f0_V_S_D ... Finished test 7 for f0_V_S_D
Starting test 14 for f0_V_S_FF ... Finished test 14 for f0_V_S_FF
Starting test 19 for f0_V_S_DD ... Finished test 19 for f0_V_S_DD
Starting test 46 for f0_V_S_FFF ... Finished test 46 for f0_V_S_FFF
Starting test 67 for f0_V_S_DDD ...

My initial hypothesis is that there weren’t enough registers, but if that’s the case then why does the 3 floats case work? The above bp command in the debugger shows that $LN73 of VarArgs.dll is executed and that the integer registers contain the 4 floating point values (why 5 and not 3)? Turns out the reason the test failed to be complete is because there was an AccessViolation when loading the pair x8 and x9 from [x10].

Breakpoint 0 hit
VarArgs!varargs+0xb0:
00007fff`8f0f1168 d61f0100 br          x8 {VarArgs!varargs+0x1784 (00007fff`8f0f283c)}
0:005> r
 x0=0000018ccf9f3440   x1=0000000000000001   x2=4038000000000000   x3=4038000000000000
 x4=4038000000000000   x5=4038000000000000   x6=4038000000000000   x7=00000004e51301d8
 x8=00007fff8f0f283c   x9=00007fff8f0f208c  x10=0000000000000042  x11=0000018cc926be58
x12=0000018ccb9df990  x13=0000000000000000  x14=a2e64eada2e64ead  x15=0000018ccf798b7a
x16=0000b28569b6ec1d  x17=ffff9f321223209b  x18=0000000000000000  x19=0000000718bfed10
x20=0000000718bfed30  x21=00007fff16390f90  x22=0000018ccb95b2ba  x23=0000018cbf929000
x24=0000000718bfee58  x25=0000018cbf929000  x26=0000018ccb95b410  x27=0000000000000000
x28=0000018caf8c9b10   fp=0000000718bfecc0   lr=00007fff8f0f10d0   sp=0000000718bfe000
 pc=00007fff8f0f1168  psr=80000000 N--- EL0
VarArgs!varargs+0xb0:
00007fff`8f0f1168 d61f0100 br          x8 {VarArgs!varargs+0x1784 (00007fff`8f0f283c)}
0:005> rF

 d0=    2.47032822921e-323   d1=    5.92454341027e-270
 d2=    -3.98809525708e-16   d3=    -3.98809525708e-16
 d4=                     0   d5=                     0
 d6=                     0   d7=                     0
 d8=                     0   d9=                     0
d10=                     0  d11=                     0
d12=                     0  d13=                     0
d14=                     0  d15=                     0
d16=    6.46572227901e+170  d17=                     0
d18=     1.3906500245e-309  d19=     2.25252634258e-23
d20=                     0  d21=                     0
d22=     2.25252634258e-23  d23=                     0
d24=                     0  d25=                     0
d26=                     0  d27=                     0
d28=                     0  d29=                     0
d30=                     0  d31=                     0
VarArgs!varargs+0xb0:
00007fff`8f0f1168 d61f0100 br          x8 {VarArgs!varargs+0x1784 (00007fff`8f0f283c)}
0:005>

At this point, my curiosity about the correct solution for these registers leads me to create a self-contained varargs test SimpleVarArgs.c. The disassembly of call_S_DDD shows the struct being placed on the stack and a pointer to it being passed to varargs.

Other Notes

double and long each use 2 slots and void uses 0 as per the type2size array.

Note that the targetAddrStorage field is used by the downcall linker to branch to the native function. The retBufAddrStorage field is used to pass the address of the return buffer to the native function being invoked. See jdk/foreignGlobals_aarch64.cpp for how the Java ABIDescriptor is parsed in native code into an ABIDescriptor struct. The only usage of the _integer_additional_volatile_registers field seems to be the ABIDescriptor::is_volatile_reg method. Same for the _vector_additional_volatile_registers field. The only usage of is_volatile_reg is in the upcall linker, which saves and restores the callee saved registers. See the compute_reg_save_area_size, preserve_callee_saved_registers, and restore_callee_saved_registers methods. The strange thing is that the Overview of ARM ABI Conventions | Microsoft Docs document does not define what a volatile register is. Here is the definition from the x64 ABI page.

Volatile registers are scratch registers presumed by the caller to be destroyed across a call. Nonvolatile registers are required to retain their values across a function call and must be saved by the callee if used.

x64 ABI conventions | Microsoft Docs

Just when I think I’m done fixing up the CallArranger so that all the Windows AArch64 floating point ABI changes are in there, I realize when going through the other changes in the PR I would open that I don’t understand exactly what WindowsAArch64VaList is used for. I based it on the MacOsAArch64VaList class but perhaps WinVaList would be more appropriate.

While reviewing all this, I take a peek at the CallArranger tests. All but one of them use CallArranger.LINUX. This means I need to create a test for Windows. After replacing LINUX with WINDOWS, I run the test on the Surface Pro X and it passes, even though it should definitely fail! Oh boy, this turns out to be a copy/paste issue – I hadn’t updated the @run testng ClassName to the new class name so a different test was running!

Structure of CallArranger Tests

testStructHFA1 creates a struct with 2 floats for a downcall. One of the arrays it passes to checkArgumentBindings starts off with the dup() binding, which “duplicates the value on the top of the operand stack (without popping it!),
and pushes the duplicate onto the operand stack.

Breaking Down WinVaList

As part of this port, I needed to implement VaList. Understanding the Windows x64 implementation (WinVaList) is helpful. The skip() method repeatedly calls MemorySegment.asSlice() to create a memory segment offset by VA_SLOT_SIZE_BYTES. WinVaList.Builder also uses VA_SLOT_SIZE_BYTES for each argument whereas MacOsAArch64VaList.Builder uses the sizeOf method to compute the slot sizes for the arguments. The definition of Utils.alignUp (shown below) is what I thought the builder was using but it is actually SharedUtils.alignUp.

// Utils.alignUp
public static long alignUp(long n, long alignment) {
    return (n + alignment - 1) & -alignment;
}

// SharedUtils.alignUp
public static long alignUp(long addr, long alignment) {
    return ((addr - 1) | (alignment - 1)) + 1;
}

// Compare these to _SLOTSIZEOF(t) in vadefs.h
#define _SLOTSIZEOF(t)  ((sizeof(t) + _VA_ALIGN - 1) & ~(_VA_ALIGN - 1))

This enables the AArch64 implementation to align up the size required for STRUCT_REGISTER and STRUCT_HFA layouts. This also matches the definition of Visual Studio’s __crt_va_arg in vadefs.h. The Builder.build() method uses MemorySegment.copyFrom().

Viewing Compilation Logs

Viewing sources in VS reveals that compilation logs can be saved. Java JIT compiler explained – Part 1 – The Bored Dev.

Applying Changes to Panama Repo

It’s only when I start preparing to engage the OpenJDK mailing lists about a PR that I discover that there’s a separate repo for the Foreign Function & Memory API development so I need to apply my changes onto my new fork of the panama-foreign repo.

git clone https://github.com/swesonga/panama-foreign
cd panama-foreign
git remote add myjdk https://github.com/swesonga/jdk
git fetch myjdk
git log -1 myjdk/WinAArch64ABI
git switch -c WinAArch64ABI
git cherry-pick 3f70c10369b297f15e53997f600a80680bfa698a

Interesting learning about the rev-parse command from How to find the hash of branch in Git? – Stack Overflow.

Changes from Panama

There were some conflicts to resolve after cherry-picking but nothing too bad. Looks like I didn’t have the commits starting from July when I was changing the TestAArch64CallArranger.

  1. 8289285: Use records for binding classes ยท openjdk/panama-foreign@37b7935 (github.com) removed the Addressable and MemorySegment parameters to the unboxAddress method.
  2. 8291473: Unify MemorySegment and MemoryAddress ยท openjdk/panama-foreign@8b1af9a (github.com) replaced the Addressable class with MemorySegment.
  3. 8275644: Replace VMReg in shuffling code with something more fine graโ€ฆ ยท openjdk/panama-foreign@123463f (github.com) changed the AArch64Architecture.stackStorage method to accept a size in addition to an offset. The cast to short is necessary to avoid the error “incompatible types: possible lossy conversion from int to short
  4. Convert classes into records ยท openjdk/panama-foreign@5b63be8 (github.com) converted bindings from a class to a record so isInMemoryReturn and callingSequence now need to be a method invocations to avoid the error “isInMemoryReturn has private access in Bindings“.
  5. 8292047: Consider ways to add linkage parameters to downcall handles ยท openjdk/panama-foreign@60a47cb (github.com) removed the asVariadic function that my tests were using and added the LinkerOption for specifying the first variadic index.

Now the interesting behavior I observe is that 3 of the tests I worked on earlier now have assertion failures that terminate the JVM: StdLibTest, TestIntrinsics, and TestVarArgs. This assertion failure was added by 8275644: Replace VMReg in shuffling code with something more fine graโ€ฆ ยท openjdk/panama-foreign@123463f (github.com)

# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=\foreignGlobals_aarch64.cpp:181
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (d:\dev\repos\java\forks\panama-foreign\src\hotspot\cpu\aarch64\foreignGlobals_aarch64.cpp:181), pid=18972, tid=18908
#  Error: ShouldNotReachHere()
#
# JRE version: OpenJDK Runtime Environment (20.0) (slowdebug build 20-internal-adhoc.sawesong.panama-foreign)
# Java VM: OpenJDK 64-Bit Server VM (slowdebug 20-internal-adhoc.sawesong.panama-foreign, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, windows-aarch64)
# Core dump will be written. Default location: C:\dev\repos\java\forks\panama-foreign\JTwork\scratch\0\hs_err_pid18972.mdmp
#
# An error report file with more information is saved as:
# C:\dev\repos\java\forks\panama-foreign\JTwork\scratch\0\hs_err_pid18972.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#

The minimized tests I created are now out of date as well, e.g. History for test/jdk/java/foreign/TestIntrinsics.java – openjdk/panama-foreign (github.com) has 2 commits showing the changes I need to make in addition to copying the DLL from support\test\jdk\jtreg\native\lib. Suprisingly, WinDbg cannot open the executable as it did earlier. I’m launching it from C:\Program Files (x86)\Windows Kits\10\Debuggers\arm64\windbg.exe.

WinDbg could not create process

Perhaps it’s the wrong one for the current Windows version? Search for “debugger” in the store and install the WinDbg Preview app.

WinDbg Preview

Now we can set the breakpoint in foreignGlobals_aarch64.cpp:

bp jvm!move_v128
g
u jvm!move_v128

Here is the call stack when the breakpoint is hit:

0:004> k
 # Child-SP          RetAddr               Call Site
00 00000094`33efdee0 00007fff`370226a0     jvm!move_v128+0x20 [...src\hotspot\cpu\aarch64\foreignGlobals_aarch64.cpp @ 165] 
01 00000094`33efdfd0 00007fff`36fe846c     jvm!ArgumentShuffle::pd_generate+0x1a8 [...src\hotspot\cpu\aarch64\foreignGlobals_aarch64.cpp @ 200] 
02 00000094`33efe070 00007fff`36fe763c     jvm!ArgumentShuffle::generate+0x34 [...src\hotspot\share\prims\foreignGlobals.hpp @ 114] 
03 00000094`33efe0a0 00007fff`36fe7070     jvm!DowncallStubGenerator::generate+0x4e4 [...src\hotspot\cpu\aarch64\downcallLinker_aarch64.cpp @ 203] 
04 00000094`33efe920 00007fff`37566a04     jvm!DowncallLinker::make_downcall_stub+0x88 [...src\hotspot\cpu\aarch64\downcallLinker_aarch64.cpp @ 103] 
05 00000094`33efecc0 00000268`8fe255ec     jvm!NEP_makeDowncallStub+0x33c [...src\hotspot\share\prims\nativeEntryPoint.cpp @ 77] 
06 00000094`33efefd0 00000000`00000000     0x00000268`8fe255ec

The way the macro assembler is invoked to generate the vector-to-general purpose move was changed by 8275644: Replace VMReg in shuffling code with something more fine graโ€ฆ ยท openjdk/panama-foreign@123463f (github.com).

  1. Clean up & validate callarranger tests
  2. clean up callarranger api
  3. Create test showing broken VaList
  4. Combine VaList implementations
  5. Why isn’t using fmovd only failing for some test using a floating point argument?
  6. Are my macroAssembler instructions really necessary?
  7. Where is a test showing these instructions in use? MinimizedTestIntrinsics (run above)

Building on macOS

A newer boot JDK is required once again as explained by the error message when running bash configure. Download and install the macOS .pkg installer for JDK 19 from the adoptium site.

checking for java... /usr/bin/java
configure: Found potential Boot JDK using java(c) in PATH
configure: Potential Boot JDK found at /usr is incorrect JDK version (openjdk version "17.0.1" 2021-10-19 LTS OpenJDK Runtime Environment Microsoft-28056 (build 17.0.1+12-LTS) OpenJDK 64-Bit Server VM Microsoft-28056 (build 17.0.1+12-LTS, mixed mode)); ignoring
configure: (Your Boot JDK version must be one of: 19 20)

Testing 4-Float HFAs

I was reviewing the tests I added and realized that I wasn’t testing the variadic HFAs. Sure enough, I couldn’t get the tests for variadic HFA structs with 4 floats to pass. My code was assigning 2 64-bit general purpose registers to such a struct. Why isn’t this caught by one of the existing tests? TestVarArgs appears to simply pass the struct to the native code in the downcall and the native code passes it back in the upcall. Shouldn’t there be additional validation? testFloatStruct in VaListTest also looks like it should catch this. Is the problem that it only uses structs on the stack? Disassemble libVaList to find out:

cd build\windows-aarch64-server-slowdebug\support\test\jdk\jtreg\native\support\libVaList\
dumpbin /disasm /out:libVaList.asm libVaList.obj
dumpbin /all /out:libVaList.txt libVaList.obj

TODO: discuss sumDoubles.asm and sumFloats.asm.

I also tried taking a look at how this code runs using WinDbg. These are the arguments I provided to WinDbg on my system:

  1. Executable: C:\dev\java\abi\devbranch30\jdk\bin\java.exe
  2. Arguments: -jar C:\dev\java\jtreg\lib\jtreg.jar -agentvm -timeoutFactor:4 -concurrency:4 -verbose:fail,error,summary -nativepath:C:\dev\java\abi\devbranch30\support\test\jdk\jtreg\native\lib test/jdk/java/foreign/valist/VaListTest.java
  3. Start directory: C:\dev\repos\java\forks\panama-foreign

When the debugger was done loading, I ran these commands to set a breakpoint in the native code invoked by VaListTest. Unfortunately, the breakpoint was not hit. Why this happens is still a mystery.

bp VaList!sumFloatStruct
g

Adding the HFA Field Values

The function descriptor for the downcall to the native sum_struct_hfa_floats function is created by calling FunctionDescriptor.of with C_FLOAT as the first argument. This allows the result of the invokeWithArguments method of the downcall’s MethodHandle to be cast to a float. Using C_INT, for example, results in this error: ClassCastException: java.lang.Integer cannot be cast to class java.lang.Float.

Validating the HFA Field Values

Although the existing varargs tests passed, they looked like they checked round-tripping of a single value. Adding the components of the HFA seemed like a better idea because it verified that all the values were delivered correctly. This caught a bug in my implementation – when there aren’t enough registers for a HFA being passed to a variadic function, the struct was partially loaded into the available registers and then the rest of the struct was spilled onto the stack. This behavior differs from the macOS & Linux environments and wasn’t caught by any of the existing tests.

In the process of testing these changes, I deployed the locally built JDK to the Surface Pro X and got this cryptic error message:

C:\dev\java\abi\devbranch35\jdk\bin\java.exe --enable-preview SumVariadicStructHfa
WARNING: A restricted method in java.lang.foreign.Linker has been called
WARNING: java.lang.foreign.Linker::nativeLinker has been called by the unnamed module
WARNING: Use --enable-native-access=ALL-UNNAMED to avoid a warning for this module

Exception in thread "main" java.lang.UnsatisfiedLinkError: C:\dev\repos\scratchpad\compilers\tests\aarch64\abi\varargs\VarArgs.dll: Can't load ARM 64-bit .dll on a AMD 64-bit platform
        at java.base/jdk.internal.loader.NativeLibraries.load(Native Method)
        at java.base/jdk.internal.loader.NativeLibraries$NativeLibraryImpl.open(NativeLibraries.java:331)
        at java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:197)
        at java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:139)
        at java.base/jdk.internal.loader.NativeLibraries.findFromPaths(NativeLibraries.java:259)
        at java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:251)
        at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2437)
        at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:873)
        at java.base/java.lang.System.loadLibrary(System.java:2047)
        at SumVariadicStructHfa.<clinit>(SumVariadicStructHfa.java:61)

Turns out I deployed x64 binaries to the Surface Pro X and launched Java in a folder containing the prior ARM64 varargs test DLL. The solution was to delete that DLL and copy the DLL from the new build. The test passed successfully and it’s only then that I realized that x64 binaries run successfully on this ARM64 platform. Getting the correct ARM64 binaries in place without replacing the x64 varargs will give a similar error Exception in thread "main" java.lang.UnsatisfiedLinkError: C:\dev\repos\scratchpad\compilers\tests\aarch64\abi\varargs\VarArgs.dll: Can't load AMD 64-bit .dll on a ARM 64-bit platform.

Outstanding Questions

  1. Why invoke and instead of invokeExact in the tests?
  2. What happens if we return the method handle without the .asSpreader call?
  3. Why do we need to shuffle the PrintfArgs?
  4. Remove dead code
  5. Show how to debug (VS/VS Code) into the native code (on Windows x64 first, then ARM64).
  6. Generate logs showing the wrong downcall registers in use without my changes
  7. Generate logs showing the wrong upcall registers in use without my changes
  8. Make foreign+upcalls log the upcall stub details as is done for the downcall stubs.
  9. Why does using r10 as the retBufAddrStorage field work on Windows? Is there not test for returning a struct?
  10. Create test that returns a 16-byte result and verify that it is in x1:x0 (no tests failed with this change).
  11. Create test that returns result in address stored in x8 – see Return Values: For types greater than 16 bytes, the caller shall reserve a block of memory of sufficient size and alignment to hold the result. The address of the memory block shall be passed as an additional argument to the function in x8. The callee may modify the result memory block at any point during the execution of the subroutine. The callee isn’t required to preserve the value stored in x8. How does this compare to the comments in assembler_aarch64.hpp, downcallLinker_aarch64.cpp, stubGenerator_aarch64.cpp?
  12. Create test that uses r16-r17 and v24 and verify that they really are volatile.
  13. Fix d24 not being a volatile register
  14. Why doesn’t any test fail without the cursor update in MacOsAArch64VaList.Builder.read?


Sharing Files with Ubuntu Guest on Hyper-V Host

Of the many ways to transfer files to an Ubuntu guest on Hyper-V, running these PowerShell commands (as admin) suffices for a one-off file transfer. See 4 Ways to Transfer Files to a Linux Hyper-V Guest (altaro.com) for more details about this approach.

Enable-VMIntegrationService -VMName 'Ubuntu 22.04 LTS' -Name 'Guest Service Interface'

Copy-VMFile -Name 'Ubuntu 22.04 LTS' -SourcePath 'dumpfile.gz' -DestinationPath '/home/saint/Downloads' -FileSource Host
Copy-VMFile in Action

Backstory

Yesterday I had a core dump from a Linux process that I wanted to specifically inspect in an Ubuntu VM. My host machine is a Windows 11 (10.0.22621.674) machine. The simple question of how to share files with my Ubuntu VM took me all over the map. Searching for hyper-v share files linux guest led me to Shared Folders over Hyper-V Ubuntu Guest (linuxhint.com). This had me enabling SMB 1.0/CIFS File Sharing Support (already had SMB Direct enabled) and Public folder sharing.

SMB Windows Features
Public Folder Sharing Settings

I then created an empty directory and turned on sharing on it as instructed. However, accessing it from Ubuntu turned out to be the problem. These are the suggested commands:

sudo apt install cifs-utils
mkdir ~/SharedFolder
sudo mount.cifs //<NAME OF YOUR PC>/<SHARED FOLDER NAME>
~/SharedFolder -o user=<YOUR WINDOWS USERNAME>

mount.cifs failed though.

saint@linuxvm:~$ sudo mount.cifs //DEVICENAME/virtual-machines
~/shared -o user=USERNAME
Password for USERNAME@//DEVICENAME/virtual-machines: ***
mount error(13): Permission denied
Refer to the mount.cifs(8) manual page (e.g. man mount.cifs) and kernel log messages (dmesg)

There doesn’t seem to be anything particularly interesting at mount.cifs(8) – Linux man page (die.net). Running dmesg showed these messages:

[  425.318905] CIFS: Attempting to mount \\DEVICENAME\virtual-machines
[  425.318905] CIFS: Status code returned 0xc000006d STATUS_LOGON_FAILURE
[  425.318905] CIFS: VFS: \\DEVICENAME Send error in SessSetup = -13
[  425.318905] CIFS: VFS: cifs_mount failed w/return code = -13

cifs status_logon_failure – Search (bing.com) leads to a comment at STATUS_LOGON_FAILURE (0xc000006d) ยท Issue #478 ยท hierynomus/smbj (github.com) stating that STATUS_LOGON_FAILURE means that your credentials were rejected. This error code (and others) at [MS-CIFS]: SMB Error Classes and Codes | Microsoft Learn. The Windows event logs do not contain any entries related to this (surprisingly). So I pivot to the next result from my search for hyper-v share files linux guest.

4 Ways to Transfer Files to a Linux Hyper-V Guest (altaro.com) instructs you to enable the file copy guest service (either using PowerShell or the GUI). Apparently a power cycle of the VM is not necessary. See article for more info.

Enable-VMIntegrationService -VMName LinuxVM3 -Name 'Guest Service Interface'

Copy-VMFile -Name LinuxVM3 -SourcePath 'dumpfile.gz' -DestinationPath '/home/saint/Downloads' -FileSource Host

Unfortunately, Copy-VMFile fails. The VM is running Ubuntu 20.04.1 (x86_64) with kernel 5.15.0-52-generic.

It shouldn’t be this hard to just get a file into a guest VM. Looking up the docs again and Use local resources on Hyper-V virtual machine with VMConnect | Microsoft Learn suggests VMConnect but looks like enhanced session mode and Type Clipboard text are only available on VMs running a recent Windows OS. For Ubuntu, that article points to Changing Ubuntu Screen Resolution in a Hyper-V VM | Microsoft Learn. At this point, I decide to create a new VM using Hyper-V’s quick create and perhaps that will have the proper configuration for what I’m trying to do.

Creating an Ubuntu VM

Click on Hyper-V’s Quick Create… command to start creating a VM. Select the latest Ubuntu LTS (22.04). Unfortunately, the only options available are the VM name and the network switch to use. Clicking on Create Virtual machine creates a VM on the primary/OS disk. I was pleasantly surprised to find that the Ubuntu 22.04 VM appeared to support enhanced session mode when Hyper-V asked for the screen resolution when connecting to it:

Connecting to Ubuntu VM

The enhanced session gives this xrdp login window:

xrdp Login Window

The window disappears when I enter my credentials and nothing happens for some time. I used the “Basic Session” toolbar button to switch back to the normal mode I’m used to. These are some of the errors I encounter:

Oh no! Something has gone wrong.
Internal Error Details

The error report points out that I have obsolete packages, among them gnome-shell (which crashed). I run sudo apt upgrade and says yes to the 368 upgrades (826 MB of archives). That is not sufficient to address this rdp bug so I stay in Basic Session mode for the rest of the time.

This leads me back to the PowerShell commands I used above. Lo and behold, they work this time! This is despite the fact that there don’t appear to be any processes displayed by ps -u root | grep hyper as described at 4 Ways to Transfer Files to a Linux Hyper-V Guest (altaro.com).

Enable-VMIntegrationService -VMName 'Ubuntu 22.04 LTS' -Name 'Guest Service Interface'

Copy-VMFile -Name 'Ubuntu 22.04 LTS' -SourcePath 'dumpfile.gz' -DestinationPath '/home/saint/Downloads' -FileSource Host

This is when I discover that I do not have enough space on the VM to expand my .gz file.

Inspecting Disk Usage

Unfortunately, the disk for the VM is only 12 GB (confirmed by launching Ubuntu and running out of space). Therefore, once the installation completes, expand the disk from 12 GB to a more reasonable size (e.g. 127 GB). If the default drive Quick Create used for the VM’s virtual disk does not have sufficient space, you will need to move the virtual hard disk to another drive then expand the partition in Ubuntu to use the whole virtual disk.

Moving Ubuntu VM to a Bigger Disk

My main desktop has a 500 GB SSD that had only about 20GB of space free. How unpleasant to then discover that Quick Create simply dumped the new VM on it AND created such a small disk to start with, all without asking. Turns out I’m not the only one that finds this behavior less than ideal: hyperv quick create disk size – Search (bing.com) pointed me to Hyper-V Ubuntu 18.04 Quick Create disk size is too small ยท Issue #82 ยท microsoft/linux-vm-tools (github.com) and unfortunately, doesn’t look like there’s a resolution of this issue. My solution was to create a new virtual disk on my secondary 3.5 TB hard drive.

If the VM was still running, this error dialog will most likely be displayed.

After starting the VM again, I still didn’t have enough space to decompress my .gz file.

Inspecting Disk Usage

Fortunately, there is a useful site explaining how to Expand Ubuntu disk after Hyper-V Quick Create – Anton Karl Ingason (linguist.is):

sudo apt install cloud-guest-utils
sudo growpart /dev/sda 1
sudo resize2fs /dev/sda1

growpart failed the first time I ran it. The disk was still 12 GB!

I had to turn off the VM, wait for the disk “merging” status to go away, then go to edit the disk in Hyper-V:

Some scary warnings about data loss that I promptly ignored and marched forward since I didn’t yet have any critical data on that disk.

Once the expansion completes, the growpart command can now be successfully exeuted as shown below.

Running growpart in Ubuntu

Open Questions

  1. Why does mount.cifs fail (on both VMs)?
  2. Why does Copy-VMFile work on Ubuntu 22 VM but not Ubuntu 20?

Categories: Java, Testing

Running OpenJDK Google Tests

I recently had to investigate an OpenJDK google test. To run the test locally, I needed to ensure that configure is aware of my intent. As documented at jdk/building.md ยท openjdk/jdk (github.com), we need to pass the --with-gtest option to configure. We first need to get the appropriate googletest sources, e.g (in Git Bash):

cd /c/dev/repos
git clone -b release-1.8.1 https://github.com/google/googletest

Then in Cygwin:

cd /cygdrive/d/java/forks/jdk
bash configure --with-gtest=/cygdrive/c/dev/repos/googletest --with-debug-level=slowdebug

Once this is done, the OpenJDK repo can be built using this script. I use the time command to get statistics on how long the build took. I also only just discovered that the prompt can be configured to include the time.

time /cygdrive/d/dev/repos/scratchpad/scripts/java/cygwin/build-jdk.sh

The googletest launcher is in the images folder of the build configuration:

$ cd build/windows-x86_64-server-slowdebug
$ find . -name *gtest*
./hotspot/variant-server/libjvm/gtest
./hotspot/variant-server/libjvm/gtest/gtestLauncher.exe
...
./images/test/hotspot/gtest
./images/test/hotspot/gtest/server/gtestLauncher.exe
./images/test/hotspot/gtest/server/gtestLauncher.pdb

Use gtestLauncher.exe to run the JVM tests. Every tests passed on my build.

/d/java/ms/openjdk-jdk17u/build/windows-x86_64-server-slowdebug/images/test/hotspot/gtest/server/gtestLauncher.exe -jdk:/d/java/ms/openjdk-jdk17u/build/windows-x86_64-server-slowdebug/jdk

An interesting observation is that the JVM test code is in build/windows-x86_64-server-slowdebug/images/test/hotspot/gtest/server/jvm.dll, which is just over 5 MB larger than build/windows-x86_64-server-slowdebug/jdk/bin/server/jvm.dll. Here’s a snippet of the call stack showing how the tests get kicked off.

jvm.dll!JVMInitializerListener::OnTestStart(const testing::TestInfo & test_info) Line 129
...
jvm.dll!RUN_ALL_TESTS() Line 2342	C++
jvm.dll!runUnitTestsInner(int argc, char * * argv) Line 289	C++
jvm.dll!runUnitTests(int argc, char * * argv) Line 370	C++
gtestLauncher.exe!main(int argc, char * * argv) Line 40	C++
[Inline Frame] gtestLauncher.exe!invoke_main() Line 78	C++
gtestLauncher.exe!__scrt_common_main_seh() Line 288	C++
kernel32.dll!BaseThreadInitThunk...

Behind the Scenes

My first attempt at running the gtests was to launch them using the gtestLauncher from a build I was testing but using a locally built JDK:

/d/java/binaries/jdk/x64/jdk-17.0.5+8-test-image/hotspot/gtest/server/gtestLauncher -jdk:/d/java/ms/openjdk-jdk17u/build/windows-x86_64-server-slowdebug/jdk

The logging I added to my local gtest was not showing up in the output. Naturally, the question that arose was how do I know which binaries it is running against since I don’t see the logging I expected? Process Explorer and Process Monitor did not seem to have a way to show me all the DLLs in the process (before it terminated). I end up creating a dump file using Process Explorer. Here are the non-Windows binaries – a mix of local build and CI build DLLS.

DLLs Loaded in gTestLauncher.exe

This was what inspired me to figure out how to run the whole show with locally built binaries as described in the main section of this post.


Categories: LLVM, Windows

Cmd.exe File System Frustration

When working on Tracking Down Missing Headers in LLVM for Windows, I kept running into these access denied failures when running the LLVM build script:

D:\dev\repos\llvm\dups\llvm-project\llvm\utils\release\llvm_package_15.0.2> move llvm-project-* llvm-project   || exit /b 1
D:\dev\repos\llvm\dups\llvm-project\llvm\utils\release\llvm_package_15.0.2\llvm-project-llvmorg-15.0.2
Access is denied.
        0 dir(s) moved.

Before retrying the script, I tried to clean up using rmdir since the script requires the directory to not exist.

 rmdir /s /q llvm_package_15.0.2 && build_llvm_release.bat 15.0.2

Strangely enough, rmdir failed with this error:

llvm_package_15.0.2\llvm-project-llvmorg-15.0.2\libcxx\test\std\thread\thread.mutex\thread.mutex.requirements\thread.sharedtimedmutex.requirements\thread.sharedtimedmutex.class\try_lock_shared_for.pass.cpp - The system cannot find the path specified.
llvm_package_15.0.2\llvm-project-llvmorg-15.0.2\libcxx\test\std\thread\thread.mutex\thread.mutex.requirements\thread.sharedtimedmutex.requirements\thread.sharedtimedmutex.class\try_lock_shared_until.pass.cpp - The system cannot find the path specified.
llvm_package_15.0.2\llvm-project-llvmorg-15.0.2\libcxx\test\std\thread\thread.mutex\thread.mutex.requirements\thread.sharedtimedmutex.requirements\thread.sharedtimedmutex.class\try_lock_until_deadlock_bug.pass.cpp - The system cannot find the path specified.

These files still exist on disk though! They are displayed if you dir their containing directory but are not found if you dir their full paths! They cannot be deleted using del either. Interestingly, pressing tab after the directory path will autocomplete the file names.

C:\> dir D:\dev\repos\llvm\dups\llvm-project\llvm\utils\release\llvm_package_15.0.2\llvm-project-llvmorg-15.0.2\libcxx\test\std\thread\thread.mutex\thread.mutex.requirements\thread.sharedtimedmutex.requirements\thread.sharedtimedmutex.class\
 Volume in drive D is DATAVOL1
 Volume Serial Number is 8800-8693

 Directory of D:\dev\repos\llvm\dups\llvm-project\llvm\utils\release\llvm_package_15.0.2\llvm-project-llvmorg-15.0.2\libcxx\test\std\thread\thread.mutex\thread.mutex.requirements\thread.sharedtimedmutex.requirements\thread.sharedtimedmutex.class

10/16/2022  01:22 PM    <DIR>          .
10/04/2022  03:29 AM    <DIR>          ..
10/04/2022  03:29 AM             2,461 try_lock_shared_for.pass.cpp
10/04/2022  03:29 AM             2,423 try_lock_shared_until.pass.cpp
10/04/2022  03:29 AM             2,146 try_lock_until_deadlock_bug.pass.cpp
               3 File(s)          7,030 bytes

C:\> dir D:\dev\repos\llvm\dups\llvm-project\llvm\utils\release\llvm_package_15.0.2\llvm-project-llvmorg-15.0.2\libcxx\test\std\thread\thread.mutex\thread.mutex.requirements\thread.sharedtimedmutex.requirements\thread.sharedtimedmutex.class\try_lock_shared_for.pass.cpp
 Volume in drive D is DATADRIVE1
 Volume Serial Number is 548C-FFC9

 Directory of D:\dev\repos\llvm\dups\llvm-project\llvm\utils\release\llvm_package_15.0.2\llvm-project-llvmorg-15.0.2\libcxx\test\std\thread\thread.mutex\thread.mutex.requirements\thread.sharedtimedmutex.requirements\thread.sharedtimedmutex.class

File Not Found

C:\> del D:\dev\repos\llvm\dups\llvm-project\llvm\utils\release\llvm_package_15.0.2\llvm-project-llvmorg-15.0.2\libcxx\test\std\thread\thread.mutex\thread.mutex.requirements\thread.sharedtimedmutex.requirements\thread.sharedtimedmutex.class\try_lock_shared_for.pass.cpp
The system cannot find the path specified.

These files can be viewed in file explorer. Something that caught my eye when examining their properties is that their locations started with the \\?\ prefix! That seems unusual for files on my local drive.

File Locations Starting with \\?\

Resource monitor does not show any images with associated handles when searching for “try_lock”. Neither does searching for “\?\D:\dev\repos\llvm\”. I tried using Process Explorer’s “Find Handle or DLL…” command as well. There also don’t appear to be any child processes for the cmd.exe process I was using (a Developer 2019 Command Prompt).

Next idea, open Process Monitor and see what’s happening when dir and rmdir are executed. I used the Path contains thread.sharedtimedmutex.class filter. The deletes are showing up as SetDispositionInformationFile events and seem to be using the RemoveDirectoryW function.

The RemoveDirectory function marks a directory for deletion on close. Therefore, the directory is not removed until the last handle to the directory is closed.

RemoveDirectoryW function (fileapi.h)
Process Monitor View of RemoveDirectoryW Call
SetDispositionInformationFile Event Info

Notice the NOT EMPTY result of the SetDispositionInformationFile event. I believe this comes from RemoveDirectoryW. There’s the question of how the 3 files are printed to the command line. The FindNextFile API is used to search for files.

Ah, in the middle of this investigation, PowerShell.exe dies and so does Windows Terminal. All my tabs, everything, gone! Aaaargh… Windows Event Viewer has an Information level event showing that powershell.exe crashed due to a System.InvalidOperationException. This is then followed by another Information event with the WER source and P1 problem signature Microsoft.WindowsTerminal_1.15.2713.0_x64__8wekyb3d8bbwe. Then comes the Error level event with the Application Hang source and General explanation that “The program WindowsTerminal.exe version 1.15.2209.28003 stopped interacting with Windows and was closed. To see if more information about the problem is available, check the problem history in the Security and Maintenance control panel.” The ExeFileName is cut off below but simply append “\WindowsTerminal.exe” to the package name to reconstruct it. Looks like I need to avoid PowerShell. And why is there no crash dump created for it???

I use the Feedback Hub for the first time, trying to see whether there is a way to prevent Windows Terminal from dying with child processes. Windows Terminal crashes immmediately when launched from Win+X menu ยท Issue #13108 ยท microsoft/terminal (github.com) seems to suggest that Feedback Hub is the right way to do this.

One upside of this crash is that it lets me confirm that it is not the cmd.exe process that is hanging onto those files. I terminate explorer.exe and when I run new task in Task Manager, it asks me to create a Windows Hello pin. What is happening?? Now moving on to opening these files: Notepad++ acts as though nothing happened when you File->Open and select one of them. Notepad opens it though! Running cat in Git Bash also dumps its contents:

cat /d/dev/repos/llvm/dups/llvm-project/llvm/utils/release/llvm_package_15.0.2/llvm-project-llvmorg-15.0.2/libcxx/test/std/thread/thread.mutex/thread.mutex.requirements/thread.sharedtimedmutex.requirements/thread.sharedtimedmutex.class/try_lock_shared_for.pass.cpp
//===----------------------------------------------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
...

I’m suspecting Windows Defender but don’t have any definitive proof. According to Enable attack surface reduction (ASR) rules, this command in an admin powershell should do the trick if the problem was Windows Defender:

Add-MpPreference -AttackSurfaceReductionOnlyExclusions D:\dev\repos\llvm\dups\llvm-project\llvm\utils\release

Well, looks like rm from Git Bash works just fine as does deleting from file explorer. Unfortunately, this unsolved issue is probably going to continue to cause pain in command prompt batch files like the LLVM build script.

rm /d/dev/repos/llvm/dups/llvm-project/llvm/utils/release/llvm_package_15.0.2/llvm-project-llvmorg-15.0.2/libcxx/test/std/thread/thread.mutex/thread.mutex.requirements/thread.sharedtimedmutex.requirements/thread.sharedtimedmutex.class/try_lock_shared_for.pass.cpp

Categories: CUDA

Setting up for CUDA Dev Work

I have started exploring parallel programming using CUDA. The latest release as of this writing is 11.8 as detailed at CUDA Toolkit 11.8 New Features Revealed | NVIDIA Technical Blog. I’m using Windows and have Visual Studio 2022 installed. I have 2 CUDA-capable devices: a Surface Book 2 with a GeForce GTX 1060 and an HP Z4 G4 Workstation with a Quadro P1000. The compute capabilities supported by these cards is at CUDA GPUs – Compute Capability.

I had previously installed the 11.1 toolkit on my Surface Book so I started by uninstalling all apps that showed up when searching for “nvidia” under “Installed Apps” except NVIDIA Graphics Driver 461 and NVIDIA Update 38.0.2.0. I then got the new installer from Installation Guide Windows :: CUDA Toolkit Documentation (nvidia.com) and installed every component presented by the installer. Note that older builds can be found at the CUDA Toolkit Archive.

Installed NVIDIA CUDA Components

You can now create a new CUDA project in Visual Studio:

Visual Studio’s Create New Project Dialog

Surface Book 2 CUDA Issues

Creating and running a CUDA 11.8 Runtime project on my Surface Book 2 fails with the error cudaSetDevice failed! Do you have a CUDA-capable GPU installed?addWithCuda failed! A search for using nvidia GPU on surface book 2 leads to suggestions that involve the NVIDIA Control Panel. Unfortunately, it doesn’t start on my laptop. A peek at the event viewer reveals why:

Faulting application name: nvcplui.exe, version: 8.1.940.0, time stamp: 0x61b5030e
Faulting module name: nvcplui.exe, version: 8.1.940.0, time stamp: 0x61b5030e
Exception code: 0xc0000409
Fault offset: 0x00000000002947f5
Faulting process id: 0x0x6340
Faulting application start time: 0x0x1D8D9CC4435E10D
Faulting application path: C:\Program Files\WindowsApps\NVIDIACorp.NVIDIAControlPanel_8.1.962.0_x64__56jybvy8sckqj\nvcplui.exe
Faulting module path: C:\Program Files\WindowsApps\NVIDIACorp.NVIDIAControlPanel_8.1.962.0_x64__56jybvy8sckqj\nvcplui.exe
Report Id: 5500d6c8-eebe-4488-8863-397c3896c777
Faulting package full name: NVIDIACorp.NVIDIAControlPanel_8.1.962.0_x64__56jybvy8sckqj
Faulting package-relative application ID: NVIDIACorp.NVIDIAControlPanel

Opening the dump file in Visual Studio to see what’s going on is not helpful because there are no symbols available for the NVIDIA binaries. The NVIDIA Driver Symbol Server even says that it does not have PDBs (even though that’s for drivers) so this is not an optimistic path. The trimmed callstack of the main thread from the dump is shown below though. The paths to the NVIDIA binaries are C:\Program Files\WindowsApps\NVIDIACorp.NVIDIAControlPanel_8.1.962.0_x64__56jybvy8sckqj\nvcplui.exe and C:\Windows\System32\DriverStore\FileRepository\nvmsoui.inf_amd64_8fd9664c41d93f19\nvgames.dll

>	nvcplui.exe!00007ff756d547f5	Unknown
 	nvcplui.exe!00007ff756d529c7	Unknown
 	nvcplui.exe!00007ff756d09f57	Unknown
 	KERNELBASE.dll!UnhandledExceptionFilter	C
 	[Inline Frame] ntdll.dll!RtlpThreadExceptionFilter	C
...
 	ntdll.dll!RtlRaiseException	C
 	[External Code]	
 	nvgames.dll!00007ffd372ba7d2	Unknown
...
 	nvgames.dll!00007ffd36ffd59f	Unknown
 	combase.dll!???::CreateInstance	C++
...
 	[Inline Frame] combase.dll!CoCreateInstanceEx	C++
 	combase.dll!CoCreateInstance	C++
 	nvcplui.exe!00007ff756afdf63	Unknown
...
 	nvcplui.exe!00007ff756d08f63	Unknown
 	kernel32.dll!BaseThreadInitThunk	C
 	ntdll.dll!RtlUserThreadStart	C

Launching it again errors with a dialog claiming that an NVIDIA graphics card was not detected in my system. Check out the language too…

Sure enough, device manager no long shows the GTX 1060 in the list of display adapters.

Rebooting restores the GTX 1060 but doesn’t address the crash in the NVIDIA Control Panel so I decide to move to my workstation and everything is much smoother there. The new Visual Studio CUDA project runs to completion so I turn my attention back to the CUDA installer to work on resolving the Surface Book 2 issues. The first thing I notice is that the installer is not keyboard accessible, so here’s a detour…

NVIDIA Installer Accessibility Issues

Is the NVIDIA Installer narrator-friendly? Narrator informs me that there are new natural voices available so I install them (Microsoft Aria, Guy, and Jenny).

CUDA 11.6 Driver Components

Looks like narrator works with the installer. However, the installer cannot be used via keyboard alone due to these issues:

  1. You cannot TAB out of the NVIDIA software license agreement.
  2. Narrator doesn’t read the captions below the Express and Custom radio buttons on the Installation Options page.
  3. You cannot TAB into the components tree to select them via keyboard.
  4. Keyboard navigation works after clicking on a component but the focus goes back to the NEXT button after using ALT+TAB to switch to another program then back.
  5. Narrator reads the individual components, e.g. “NSight Systems, Selected” regardless of whether the checkbox is ticked or not. How does one know it’s a checkbox?
  6. The custom installation components columns are not resizable (Component, New Version, and Current Version). For example, what NVIDIA GeForce Experience compo
  7. Why isn’t it resizable?

A general usability issue: why do all the NVIDIA components need to be uninstalled individually instead of having an option to remove everything?

Outstanding Questions

  1. How do we figure out which component installed the NVIDIA Control Panel? One approach is to uninstall the existing components until the control panel binary from the dump file is deleted on disk. Removing NVIDIA NSight Systems 2022.4.2 removed the C:\Program Files\WindowsApps\NVIDIACorp.NVIDIAControlPanel_8.1.962.0_x64__56jybvy8sckqj\ directory. However, installing only this component in 11.6 did not bring back the NVIDIA control panel!
  2. The installer asks for a path to a temp directory to unpack setup file into. Could examining that folder help determine where the control panel is coming from?
  3. Was this installer generated by NSIS?

Resolution

I end up uninstalling all “nvidia” components on the Installed Apps page except NVIDIA Graphics Driver 461.40 then installing all components from CUDA 11.6. This finally has a working control panel!

NVIDIA Control Panel from CUDA 11.6 Installer

Surprisingly, this executable is in C:\Program Files\WindowsApps\NVIDIACorp.NVIDIAControlPanel_8.1.962.0_x64__56jybvy8sckqj, the same directory as 11.8! This must not have been the buggy component! Here is the version info for the 2 NVIDIA binaries in the earlier crash dump (nvgames.dll is now in C:\Windows\System32\DriverStore\FileRepository\nvmsoui.inf_amd64_ed4d74dfae95b5e6):

nvcplui.exe Properties

Visual Studio 2022 does not have the new CUDA project option though. However, changing the paths (in the .vcxproj) for my new project created using the 11.8 tools on my VS 2022 desktop makes the program work. Looks like I need to use 11.7 instead so I uninstall all the “nvidia” components except the NVIDIA Control Panel and the NVIDIA Graphics Driver 511.23 before installing 11.7. Thankfully, 11.7 works just fine!


Categories: Installers, Windows

Building NSIS

One of the show stoppers in Tracking Down Missing Headers in LLVM for Windows was the NSIS Internal compiler error #12345: error mmapping datablock to 33555089. This issue is more common than I would expect for an internal compiler error, judging from I get “Internal compiler error #12345” when compiling large installers – NSIS Forums (nsis-dev.github.io). Before engaging some other folks about this, I decide to first build a debuggable NSIS to get a sense of what is happening. This can be done by downloading the NSIS 3.08 source code and using tar with the -j flag to filter the archive through bzip2.

tar xjf nsis-3.08-src.tar.bz2

Checking Out Sources from the Repo

Alternatively, subversion can be used to check out the source code. Been a while since I touched svn. Thankfully, we can use git-svn instead.

git svn clone https://svn.code.sf.net/p/nsis/code/ --stdlayout --prefix svn/

This command fails after about an hour, and git svn clone https://svn.code.sf.net/p/nsis/code/ times out after getting r960. Not sure why these folks aren’t on GitHub.

Building the Sources

NSIS: [r7368] /NSIS/branches/WIN64/INSTALL (sourceforge.net) lists SCons are a requirement. Never heard of it before so I’m relieved to discover that it is on GitHub SCons/scons: SCons – a software construction tool (github.com) and is easy to install. Unfortunately, I did not actually want the --user option on my machine.

D:\dev\repos> python -m pip install --user scons
Collecting scons
  Downloading SCons-4.4.0-py3-none-any.whl (4.2 MB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 4.2/4.2 MB 11.7 MB/s eta 0:00:00
Requirement already satisfied: setuptools in c:\python310\lib\site-packages (from scons) (58.1.0)
Installing collected packages: scons
  WARNING: The scripts scons-configure-cache.exe, scons.exe and sconsign.exe are installed in '%APPDATA%\Python\Python310\Scripts' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed scons-4.4.0

[notice] A new release of pip available: 22.2 -> 22.2.2
[notice] To update, run: python.exe -m pip install --upgrade pip

D:\dev\repos> python -m pip uninstall scons
Found existing installation: SCons 4.4.0
Uninstalling SCons-4.4.0:
  Would remove:
    %APPDATA%\python\python310\scripts\scons-configure-cache.exe
    %APPDATA%\python\python310\scripts\scons.exe
    %APPDATA%\python\python310\scripts\sconsign.exe
    %APPDATA%\python\python310\site-packages\scons-4.4.0.dist-info\*
    %APPDATA%\python\python310\site-packages\scons\*
    %APPDATA%\python\scons-time.1
    %APPDATA%\python\scons.1
    %APPDATA%\python\sconsign.1
Proceed (Y/n)? y
  Successfully uninstalled SCons-4.4.0

D:\dev\repos> python -m pip install scons
Collecting scons
  Using cached SCons-4.4.0-py3-none-any.whl (4.2 MB)
Requirement already satisfied: setuptools in c:\python310\lib\site-packages (from scons) (58.1.0)
Installing collected packages: scons
Successfully installed scons-4.4.0

D:\dev\repos> where scons
C:\Python310\Scripts\scons.exe

The next prerequisite is zlib. Instead of downloading binaries from the unsecured site linked to, I decided to build the zlib sources myself. I only built the 64-bit version but turns out they are serious about setting ZLIB_W32:

D:\dev\repos\nsis\nsis-3.08-src> scons UNICODE=yes
scons: Reading SConscript files ...
Mkdir("build\urelease\config")
WARNING: VER_PACKED not set, defaulting to 0x03007666!
Delete("nsis-29-Sep-2022.cvs")
Delete(".instdist")
Delete(".test")
Using Microsoft tools configuration (14.3)
Checking for memset requirement... yes
Checking for memcpy requirement... no
Checking for C library gdi32... yes
Checking for C library user32... yes
Checking for C library pthread... no
Checking for C library iconv... no
Checking for C library shlwapi... yes
Checking for C library oleaut32... yes
Checking for C library version... yes
Checking for C library shell32... yes
Checking for C library version... yes
Please specify folder of zlib for Win32 via ZLIB_W32

Copying the DLL is not sufficient. To see why the error below occurs, consult config.log.

...
Checking for C library zdll... no
Checking for C library z... no
zlib (win32) is missing!

For example, config.log ends with C:\dev\software\zlib\win32\zlib.h(34): fatal error C1083: Cannot open include file: 'zconf.h': No such file or directory because I copied only zlib.h. I notice in config.log that it’s trying to also link using zdll.lib. Fix this by running:

cd /d D:\dev\repos\zlib
copy zlib.h C:\dev\software\zlib\win32\
copy zconf.h C:\dev\software\zlib\win32\
copy contrib\vstudio\vc14\x86\ZlibDllRelease\zlibwapi.lib C:\dev\software\zlib\win32\zdll.lib
set ZLIB_W32=C:\dev\software\zlib\win32\

Compilation now fails due to unresolved external symbols:

link /nologo /nocoffgrpinfo /map /subsystem:console,5.01 /STACK:2097152 /OUT:build\urelease\makensis\makensis.exe /LIBPATH:C:\dev\software\zlib\win32 gdi32.lib user32.lib shlwapi.lib oleaut32.lib version.lib shell32.lib version.lib zdll.lib build\urelease\makensis\build.obj build\urelease\makensis\clzma.obj build\urelease\makensis\crc32.obj build\urelease\makensis\DialogTemplate.obj build\urelease\makensis\dirreader.obj build\urelease\makensis\fileform.obj build\urelease\makensis\growbuf.obj build\urelease\makensis\icon.obj build\urelease\makensis\lang.obj build\urelease\makensis\lineparse.obj build\urelease\makensis\makenssi.obj build\urelease\makensis\manifest.obj build\urelease\makensis\mmap.obj build\urelease\makensis\Plugins.obj build\urelease\makensis\ResourceEditor.obj build\urelease\makensis\ResourceVersionInfo.obj build\urelease\makensis\BinInterop.obj build\urelease\makensis\script.obj build\urelease\makensis\scriptpp.obj build\urelease\makensis\ShConstants.obj build\urelease\makensis\strlist.obj build\urelease\makensis\tokens.obj build\urelease\makensis\tstring.obj build\urelease\makensis\utf.obj build\urelease\makensis\util.obj build\urelease\makensis\winchar.obj build\urelease\makensis\writer.obj build\urelease\makensis\bzip2\blocksort.obj build\urelease\makensis\bzip2\bzlib.obj build\urelease\makensis\bzip2\compress.obj build\urelease\makensis\bzip2\huffman.obj build\urelease\makensis\7zip\7zGuids.obj build\urelease\makensis\7zip\7zip\Common\OutBuffer.obj build\urelease\makensis\7zip\7zip\Common\StreamUtils.obj build\urelease\makensis\7zip\7zip\Compress\LZ\LZInWindow.obj build\urelease\makensis\7zip\7zip\Compress\LZMA\LZMAEncoder.obj build\urelease\makensis\7zip\7zip\Compress\RangeCoder\RangeCoderBit.obj build\urelease\makensis\7zip\Common\Alloc.obj build\urelease\makensis\7zip\Common\CRC.obj
build.obj : error LNK2019: unresolved external symbol _deflate referenced in function "public: virtual int __thiscall CZlib::Compress(bool)" (?Compress@CZlib@@UAEH_N@Z)
build.obj : error LNK2019: unresolved external symbol _deflateEnd referenced in function "public: virtual int __thiscall CZlib::End(void)" (?End@CZlib@@UAEHXZ)
build.obj : error LNK2019: unresolved external symbol _deflateInit2_ referenced in function "public: virtual int __thiscall CZlib::Init(int,unsigned int)" (?Init@CZlib@@UAEHHI@Z)
build\urelease\makensis\makensis.exe : fatal error LNK1120: 3 unresolved externals
scons: *** [build\urelease\makensis\makensis.exe] Error 1120
scons: building terminated because of errors.

Run dumpbin /headers zlibwapi.lib to examine the symbols in the lib file. Each of these does appear in a slightly different decorated form. For the declaration ZEXTERN int ZEXPORT deflateEnd OF((z_streamp strm)); in zlib.h we see the Name mangling below. This looks like __stdcall, coming from the expansion of ZEXPORT in zconf.h.

  Version      : 0
  Machine      : 14C (x86)
  TimeDateStamp: 6336126F Thu Sep 29 15:47:27 2022
  SizeOfData   : 0000001B
  DLL name     : zlibwapi.dll
  Symbol name  : _deflateEnd@4
  Type         : code
  Name type    : ordinal
  Ordinal      : 6

Just by chance, I CTRL+click on the deflateEnd method on the line int ret = deflateEnd(stream); in nsis-3.08-src/Source/czlib.h and it opens ZLIB.H in nsis-3.08-src/Source/zlib/. This file has been here the whole time, with the other header file I manually copied (and others that I might have needed to)! This header is directly included by Source\exehead\fileform.c, for example, so the build will fail if this folder is removed. (is this a bug though?)

In the NSIS sources, ZEXPORT is defined without a value. The link error is therefore caused by the use of _cdecl in the NSIS sources and __stdcall in the zlib source code I built. I end up changing the latter and rebuilding since the change in the former doesn’t seem to fix the build error and I don’t have time to investigate that. More specifically, I change line 355 of zconf.h to define ZEXPORT _cdecl. Now the build succeeds and this command create an installation:

scons UNICODE=yes
scons PREFIX="D:\dev\repos\nsis\local-install" install

I can run D:\dev\repos\nsis\local-install\makensisw.exe once but it is then blocked by Windows Defender. I guess I’ll have to review Troubleshoot problems with attack surface reduction rules. To create a debug build, use this command line:

scons UNICODE=yes DEBUG=yes
scons DEBUG=yes PREFIX="D:\dev\repos\nsis\local-install-debug" install

According to Enable attack surface reduction (ASR) rules, this command in an admin powershell should do the trick:

Add-MpPreference -AttackSurfaceReductionOnlyExclusions "D:\dev\repos\nsis\local-install\makensisw.exe"

Add-MpPreference -AttackSurfaceReductionOnlyExclusions "D:\dev\repos\nsis\local-install\makensis.exe"

Categories: 3D Modeling

Learning about Blender

I’ve had an interest in 3D modeling since my high school days. This was most likely informed by my curiosity about how computer games and animations are made. I recently downloaded Blender to start toying with and hopefully teach my kids and I some animation skills. I settled on Blender (instead of 3DS Max, which was the first such product I used) because it is free. There are also some decent Blender tutorials on YouTube. Here’s the channel I started watching:

Building the Source Code

I decided to dig into the sources and see how easy it is to build Blender on Windows. Thankfully, there are detailed instructions – Building Blender/Windows – Blender Developer Wiki. The subversion client is the only one I don’t have installed on my desktop. Weird that they zipped the MSI for a 3% compression ratio (saving 211 KB on a 7232 KB MSI.

17:19:55.47 D:\dev\repos\other\blender> make update
Warning: Python not found, there is likely an issue with the library folder
No explicit msvc version requested, autodetecting version.
**********************************************************************
** Visual Studio 2019 Developer Command Prompt v16.11.19
** Copyright (c) 2021 Microsoft Corporation
**********************************************************************
[vcvarsall.bat] Environment initialized for: 'x64'
Compiler Detection successful, detected VS2019

The required external libraries in "D:\dev\repos\other\blender\..\lib\win64_vc15" are missing

Would you like to download them? (y/n)y

Downloading win64_vc15 libraries, please wait.

A    D:\dev\repos\other\lib\win64_vc15\openpgl
A    D:\dev\repos\other\lib\win64_vc15\openpgl\lib
A    D:\dev\repos\other\lib\win64_vc15\openpgl\lib\cmake
A    D:\dev\repos\other\lib\win64_vc15\openpgl\lib\cmake\openpgl-0.3.1
A    D:\dev\repos\other\lib\win64_vc15\openpgl\include
...
A    D:\dev\repos\other\lib\win64_vc15\vulkan\share\vulkan\registry\vkconventions.py
A    D:\dev\repos\other\lib\win64_vc15\vulkan\share\vulkan\registry\validusage.json
A    D:\dev\repos\other\lib\win64_vc15\wintab\include\wintab.h
 U   D:\dev\repos\other\lib\win64_vc15
Checked out revision 63049.
python not found, required for this operation

19:10:47.57 D:\dev\repos\other\blender> 

Here’s the command line used to download the libraries:

"C:\Program Files\SlikSvn\bin\svn.exe"  checkout https://svn.blender.org/svnroot/bf-blender/trunk/lib/win64_vc15 "D:\dev\repos\other\blender\..\lib\win64_vc15"

Run make update again since it failed the first time because python was not found but it has now been checked out into the lib folder. Once that completes, run make to build Blender. Interestingly, it fails because it can’t find CMake yet it said to open a plain cmd prompt. I work around this by switching to the VS 2019 Developer Command Prompt instead of updating my PATH and that unblocks the build.

21:08:42.26 D:\dev\repos\other\blender> make

No explicit msvc version requested, autodetecting version.
**********************************************************************
** Visual Studio 2019 Developer Command Prompt v16.11.19
** Copyright (c) 2021 Microsoft Corporation
**********************************************************************
[vcvarsall.bat] Environment initialized for: 'x64'
Compiler Detection successful, detected VS2019
Building blender with VS2019 for x64 in D:\dev\repos\other\blender\..\build_windows_x64_vc16_Release
-- Selecting Windows SDK version 10.0.22572.0 to target Windows 10.0.22621.
-- The C compiler identification is MSVC 19.29.30146.0
-- The CXX compiler identification is MSVC 19.29.30146.0
...
  -- Installing: D:/dev/repos/other/build_windows_x64_vc16_Release/bin/Release/3.4/datafiles/usd/usdVolImaging/resources
  -- Installing: D:/dev/repos/other/build_windows_x64_vc16_Release/bin/Release/3.4/datafiles/usd/usdVolImaging/resources/plugInfo.json
21:35:47.14 D:\dev\repos\other\blender>

This is a really smooth experience (compared to ahem, zlib). I’m amazed it built and generated a local install folder in less than half an hour. I could launch build_windows_x64_vc16_Release\bin\Release\blender.exe, open the About Blender menu and see version 3.4.0 Alpha from hash 206dead86058. The release notes – Reference/Release Notes/3.4 – Blender Developer Wiki – are also quite useful, especially the Developer Intro!