Building OpenJDK with Custom Code Pages

I was recently poking around the Issue Navigator – Java Bug System (openjdk.org) for enhancements. I found this interesting issue: [JDK-8268719] Force execution (and source) code page used when compiling on Windows – Java Bug System (openjdk.org). By default, I can build the OpenJDK code without any changes on my system. What is my code page?

Checking Your Windows Code Page

See Code Pages – Win32 apps for an overview of why code pages exist (or start from Unicode and Character Sets – Win32 apps for the complete picture).

A Windows operating system always has one currently active Windows code page. All ANSI versions of API functions use the currently active code page.

Code Pages – Win32 apps | Microsoft Learn

To see your current ANSI code page, use the reg command from command line – How to see which ANSI code page is used in Windows? – Stack Overflow:

C:\> reg query "HKLM\SYSTEM\CurrentControlSet\Control\Nls\CodePage" -v ACP

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage
    ACP    REG_SZ    1252

C:\> reg query "HKLM\SYSTEM\CurrentControlSet\Control\Nls\CodePage" | findstr /I "CP.*REG_SZ"
    ACP    REG_SZ    1252
    OEMCP    REG_SZ    437
    MACCP    REG_SZ    10000

To change the active code page, go to Control Panel > Region. Click on the “Change system locale…” button in the Administrative tab.

The Region Dialog Box

The Region Settings dialog will pop up. Select a different locale e.g. Japanese (Japan).

Reboot when prompted. You can verify (even before rebooting) that the active and OEM code pages have changed. Locales like Kiswahili (Kenya) and English (India) did not change the code page values (and therefore didn’t prompt to reboot).

C:\> reg query "HKLM\SYSTEM\CurrentControlSet\Control\Nls\CodePage" | findstr /I "CP.*REG_SZ"
    ACP    REG_SZ    932
    OEMCP    REG_SZ    932
    MACCP    REG_SZ    10001
Change System Locale Reboot Dialog

After rebooting, I delete the build directory then configure and build OpenJDK again. This time the build fails with these errors:

ERROR: Build failed for target 'images' in configuration 'windows-x86_64-server-slowdebug' (exit code 2) 
Stopping javac server

=== Output from failing command(s) repeated here ===
* For target hotspot_variant-server_libjvm_gtest_objs_test_json.obj:
test_json.cpp
d:\java\forks\jdk\test\hotspot\gtest\utilities\test_json.cpp(357): error C2143: syntax error: missing ')' before ']'
d:\java\forks\jdk\test\hotspot\gtest\utilities\test_json.cpp(355): error C2660: 'JSON_GTest::test': function does not take 1 arguments
d:\java\forks\jdk\test\hotspot\gtest\utilities\test_json.cpp(49): note: see declaration of 'JSON_GTest::test'
d:\java\forks\jdk\test\hotspot\gtest\utilities\test_json.cpp(355): note: while trying to match the argument list '(const char [171])'
d:\java\forks\jdk\test\hotspot\gtest\utilities\test_json.cpp(357): error C2143: syntax error: missing ';' before ']'
d:\java\forks\jdk\test\hotspot\gtest\utilities\test_json.cpp(357): error C2059: syntax error: ']'
d:\java\forks\jdk\test\hotspot\gtest\utilities\test_json.cpp(357): error C2017: illegal escape sequence
d:\java\forks\jdk\test\hotspot\gtest\utilities\test_json.cpp(357): error C2059: syntax error: ')'
d:\java\forks\jdk\test\hotspot\gtest\utilities\test_json.cpp(363): error C2143: syntax error: missing ')' before ']'
d:\java\forks\jdk\test\hotspot\gtest\utilities\test_json.cpp(361): error C2660: 'JSON_GTest::test': function does not take 1 arguments
d:\java\forks\jdk\test\hotspot\gtest\utilities\test_json.cpp(49): note: see declaration of 'JSON_GTest::test'
d:\java\forks\jdk\test\hotspot\gtest\utilities\test_json.cpp(361): note: while trying to match the argument list '(const char [174])'
d:\java\forks\jdk\test\hotspot\gtest\utilities\test_json.cpp(363): error C2143: syntax error: missing ';' before ']'
d:\java\forks\jdk\test\hotspot\gtest\utilities\test_json.cpp(363): error C2059: syntax error: ']'
   ... (rest of output omitted)

* All command lines available in /cygdrive/d/java/forks/jdk/build/windows-x86_64-server-slowdebug/make-support/failure-logs.
=== End of repeated output ===

No indication of failed target found.
HELP: Try searching the build log for '] Error'.
HELP: Run 'make doctor' to diagnose build problems.

To see the command line, cat the .cmdline file shown below. The full command line is at hotspot_variant-server_libjvm_gtest_objs_test_json.obj.cmdline.

cat /d/java/forks/jdk/build/windows-x86_64-server-slowdebug/make-support/failure-logs/hotspot_variant-server_libjvm_gtest_objs_test_json.obj.cmdline

The Visual C++ compiler’s behavior when reading source files depends on whether or not source files have a byte-order mark.

By default, Visual Studio detects a byte-order mark to determine if the source file is in an encoded Unicode format, for example, UTF-16 or UTF-8. If no byte-order mark is found, it assumes that the source file is encoded in the current user code page, unless you’ve specified a code page by using /utf-8 or the /source-charset option.

/utf-8 (Set source and execution character sets to UTF-8)

This can be easily tested using hexdump in Cygwin. Launch notepad and open the test.txt file created by these commands. The File > Save as dialog has an Encoding dropdown that write a byte-order marker for any of the UTF options. Running hexdump will display the byte-order markers.

echo abc123 > test.txt
hexdump -C test.txt

Inspect the OpenJDK source file failing to build confirms that there is no BOM in the file. (can this be done on GitHub?)

$ hexdump -C /cygdrive/d/java/forks/jdk/test/hotspot/gtest/utilities/test_json.cpp | head
00000000  2f 2a 0a 20 2a 20 43 6f  70 79 72 69 67 68 74 20  |/*. * Copyright |
...

Updating CFLAGS

Add the -utf-8 option to TOOLCHAIN_CFLAGS_JVM in flags-cflags.m4.

diff --git a/make/autoconf/flags-cflags.m4 b/make/autoconf/flags-cflags.m4
index c0c78ce95b6..bbb0426c368 100644
--- a/make/autoconf/flags-cflags.m4
+++ b/make/autoconf/flags-cflags.m4
@@ -560,7 +560,9 @@ AC_DEFUN([FLAGS_SETUP_CFLAGS_HELPER],
     TOOLCHAIN_CFLAGS_JVM="-qtbtable=full -qtune=balanced -fno-exceptions \
         -qalias=noansi -qstrict -qtls=default -qnortti -qnoeh -qignerrno -qstackprotect"
   elif test "x$TOOLCHAIN_TYPE" = xmicrosoft; then
-    TOOLCHAIN_CFLAGS_JVM="-nologo -MD -Zc:preprocessor -Zc:strictStrings -Zc:inline -MP"
+    # The -utf8 option sets source and execution character sets to UTF-8 to enable correct
+    # compilation of all source files regardless of the active code page on Windows.
+    TOOLCHAIN_CFLAGS_JVM="-nologo -MD -Zc:preprocessor -Zc:strictStrings -Zc:inline -MP -utf-8"
     TOOLCHAIN_CFLAGS_JDK="-nologo -MD -Zc:preprocessor -Zc:strictStrings -Zc:inline -Zc:wchar_t-"
   fi

The build still fails but this time the error is from the java.desktop tree.

ERROR: Build failed for target 'images' in configuration 'windows-x86_64-server-slowdebug' (exit code 2) 

=== Output from failing command(s) repeated here ===
* For target support_native_java.desktop_libfreetype_afblue.obj:
afblue.c
d:\java\forks\jdk\src\java.desktop\share\native\libfreetype\src\autofit\afblue.c(1): error C2220: the following warning is treated as an error
d:\java\forks\jdk\src\java.desktop\share\native\libfreetype\src\autofit\afblue.c(1): warning C4819: The file contains a character that cannot be represented in the current code page (932). Save the file in Unicode format to prevent data loss
d:\java\forks\jdk\src\java.desktop\share\native\libfreetype\src\autofit\afscript.h(1): warning C4819: The file contains a character that cannot be represented in the current code page (932). Save the file in Unicode format to prevent data loss
d:\java\forks\jdk\src\java.desktop\share\native\libfreetype\src\autofit\afblue.c(257): warning C4819: The file contains a character that cannot be represented in the current code page (932). Save the file in Unicode format to prevent data loss
   ... (rest of output omitted)
* For target support_native_java.desktop_libfreetype_afcjk.obj:
afcjk.c
...

To see the command line, cat the .cmdline file shown below. The full command line is at support_native_java.desktop_libfreetype_afblue.obj.cmdline.

cat /d/java/forks/jdk/build/windows-x86_64-server-slowdebug/make-support/failure-logs/support_native_java.desktop_libfreetype_afblue.obj.cmdline

TOOLCHAIN_CFLAGS_JDK in flags-cflags.m4 needs the -utf-8 compiler flag as well.

diff --git a/make/autoconf/flags-cflags.m4 b/make/autoconf/flags-cflags.m4
index c0c78ce95b6..8655dfe41fb 100644
--- a/make/autoconf/flags-cflags.m4
+++ b/make/autoconf/flags-cflags.m4
@@ -560,8 +560,10 @@ AC_DEFUN([FLAGS_SETUP_CFLAGS_HELPER],
     TOOLCHAIN_CFLAGS_JVM="-qtbtable=full -qtune=balanced -fno-exceptions \
         -qalias=noansi -qstrict -qtls=default -qnortti -qnoeh -qignerrno -qstackprotect"
   elif test "x$TOOLCHAIN_TYPE" = xmicrosoft; then
-    TOOLCHAIN_CFLAGS_JVM="-nologo -MD -Zc:preprocessor -Zc:strictStrings -Zc:inline -MP"
-    TOOLCHAIN_CFLAGS_JDK="-nologo -MD -Zc:preprocessor -Zc:strictStrings -Zc:inline -Zc:wchar_t-"
+    # The -utf-8 option sets source and execution character sets to UTF-8 to enable correct
+    # compilation of all source files regardless of the active code page on Windows.
+    TOOLCHAIN_CFLAGS_JVM="-nologo -MD -Zc:preprocessor -Zc:strictStrings -Zc:inline -utf-8 -MP"
+    TOOLCHAIN_CFLAGS_JDK="-nologo -MD -Zc:preprocessor -Zc:strictStrings -Zc:inline -utf-8 -Zc:wchar_t-"
   fi

   # CFLAGS C language level for JDK sources (hotspot only uses C++)

These 2 changes enable the build to complete successfully. The upstream pull request is 8268719: Force execution (and source) code page used when compiling on Windows by swesonga · Pull Request #15569 · openjdk/jdk (github.com).