Diagnosing Hadoop Native Library Load Failures

Running a Basic Hadoop Command

The instructions for how to run hadoop haven’t changed much since I last used hadoop over 5 years ago (see Setting up Apache Hadoop). Download a recent stable release from one of the Apache Download Mirrors. I picked hadoop-3.3.5-aarch64.tar.gz from https://dlcdn.apache.org/hadoop/common/hadoop-3.3.5/.

mkdir -p ~/java/binaries/hadoop
cd ~/java/binaries/hadoop

curl -Lo hadoop-3.3.5-aarch64.tar.gz https://dlcdn.apache.org/hadoop/common/hadoop-3.3.5/hadoop-3.3.5-aarch64.tar.gz

tar xzf hadoop-3.3.5-aarch64.tar.gz

I used the instructions at Apache Hadoop 3.3.5 – Hadoop: Setting up a Single Node Cluster to test the build by running the grep example. See the Grep source code for the implementation details of the example.

export JAVA_HOME=~/java/binaries/jdk/x64/jdk-11.0.19+7/

mkdir testinput
cp etc/hadoop/*.xml testinput

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.5.jar grep testinput testoutput 'dfs[a-z.]+'

cat testoutput/*

When running this test code, I noticed this warning (first message displayed):

2023-05-31 12:31:33,686 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Checking for Loadable Native Libraries

The Apache Hadoop 3.3.5 – Native Libraries Guide explains that there is a NativeLibraryChecker that can be run using the command bin/hadoop checknative -a to show which native libraries can/cannot be loaded.

saint@ubuntuvm:~/java/binaries/hadoop/hadoop-3.3.5$ find . -name lib*.so
./lib/native/libhadoop.so
./lib/native/libhdfspp.so
./lib/native/libhdfs.so
./lib/native/libnativetask.so
saint@ubuntuvm:~/java/binaries/hadoop/hadoop-3.3.5$ uname -a
Linux ubuntuvm 5.19.0-41-generic #42~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 18 17:40:00 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
saint@ubuntuvm:~/java/binaries/hadoop/hadoop-3.3.5$ bin/hadoop checknative -a
2023-05-31 13:36:04,467 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Native library checking:
hadoop:  false 
zlib:    false 
zstd  :  false 
bzip2:   false 
openssl: false 
ISA-L:   false 
PMDK:    false 
2023-05-31 13:36:04,711 INFO util.ExitUtil: Exiting with status 1: ExitException

Diagnosing Native Library Load Errors

My assumption when seeing that none of these native libraries could be loaded was that I needed to install all those dependencies. I started with lib64z.

saint@ubuntuvm:~/java/binaries/hadoop/hadoop-3.3.5$ sudo apt install lib64z1
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  gcc-12-base:i386 krb5-locales libc6:i386 libc6-amd64:i386 libcom-err2:i386 libcrypt1:i386
  libgcc-s1:i386 libgssapi-krb5-2 libgssapi-krb5-2:i386 libidn2-0:i386 libk5crypto3 libk5crypto3:i386
  libkeyutils1:i386 libkrb5-3 libkrb5-3:i386 libkrb5support0 libkrb5support0:i386 libnsl2:i386
  libnss-nis:i386 libnss-nisplus:i386 libssl3 libssl3:i386 libtirpc3:i386 libunistring2:i386
Suggested packages:
  glibc-doc:i386 locales:i386 krb5-doc krb5-user krb5-doc:i386 krb5-user:i386
The following NEW packages will be installed:
  gcc-12-base:i386 krb5-locales lib64z1:i386 libc6:i386 libc6-amd64:i386 libcom-err2:i386
  libcrypt1:i386 libgcc-s1:i386 libgssapi-krb5-2:i386 libidn2-0:i386 libk5crypto3:i386
  libkeyutils1:i386 libkrb5-3:i386 libkrb5support0:i386 libnsl2:i386 libnss-nis:i386
  libnss-nisplus:i386 libssl3:i386 libtirpc3:i386 libunistring2:i386
The following packages will be upgraded:
  libgssapi-krb5-2 libk5crypto3 libkrb5-3 libkrb5support0 libssl3
5 upgraded, 20 newly installed, 0 to remove and 85 not upgraded.
Need to get 10.3 MB/12.2 MB of archives.
After this operation, 38.1 MB of additional disk space will be used.
Do you want to continue? [Y/n] 

Interestingly, rerunning checknative still showed false for all the native libraries! Next step was to inspect how the checknative argument is handled. It invokes the hadoop/NativeLibraryChecker.java class, which in turn calls the hadoop/NativeCodeLoader.java. One of the most important observations in the latter file is the additional debug logging available when the library doesn’t load!

Enabling Debug Logging

The logging code uses LoggerFactory, which is discussed in the Introduction to SLF4J | Baeldung. My question is now about how to change slf4j level at runtime? – Stack Overflow. A Google search for hadoop change log level leads me to another SO post on Setting the logging level in Hadoop to WARN – Stack Overflow but that isn’t as useful as the Hadoop commands guide at Apache Hadoop 2.7.0 –. Just need to pass the --loglevel flag to hadoop.

bin/hadoop --loglevel DEBUG checknative -a

The debug output is much now more informative! Notice the warning about the possible platform mismatch of the native library!

saint@ubuntuvm:~/java/binaries/hadoop/hadoop-3.3.5$ bin/hadoop --loglevel DEBUG checknative -a
2023-05-31 14:47:32,624 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
2023-05-31 14:47:32,625 DEBUG util.NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: /home/saint/java/binaries/hadoop/hadoop-3.3.5/lib/native/libhadoop.so.1.0.0: /home/saint/java/binaries/hadoop/hadoop-3.3.5/lib/native/libhadoop.so.1.0.0: cannot open shared object file: No such file or directory (Possible cause: can't load AARCH64-bit .so on a AMD 64-bit platform)
2023-05-31 14:47:32,625 DEBUG util.NativeCodeLoader: java.library.path=/home/saint/java/binaries/hadoop/hadoop-3.3.5/lib/native
2023-05-31 14:47:32,625 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2023-05-31 14:47:32,836 DEBUG util.Shell: setsid exited with exit code 0
Native library checking:
hadoop:  false 
zlib:    false 
zstd  :  false 
bzip2:   false 
openssl: false 
ISA-L:   false 
PMDK:    false 
2023-05-31 14:47:32,847 DEBUG util.ExitUtil: Exiting with status 1: ExitException
1: ExitException
	at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:381)
	at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:369)
	at org.apache.hadoop.util.NativeLibraryChecker.main(NativeLibraryChecker.java:154)
2023-05-31 14:47:32,856 INFO util.ExitUtil: Exiting with status 1: ExitException

To determine the architecture for which the shared library was compiled, I started with the objdump -f command as suggested by a StackOverflow post. However, it outputs architecture: UNKNOWN!, which isn’t very useful. The file command from the same post proves to be exactly what I need.

saint@ubuntuvm:~/java/binaries/hadoop/aarch64/hadoop-3.3.5$ objdump -f lib/native/libhadoop.so

lib/native/libhadoop.so:     file format elf64-little
architecture: UNKNOWN!, flags 0x00000150:
HAS_SYMS, DYNAMIC, D_PAGED
start address 0x0000000000005b80

saint@ubuntuvm:~/java/binaries/hadoop/aarch64/hadoop-3.3.5$ file lib/native/libhadoop.so
lib/native/libhadoop.so: symbolic link to libhadoop.so.1.0.0
saint@ubuntuvm:~/java/binaries/hadoop/aarch64/hadoop-3.3.5$ file lib/native/libhadoop.so.1.0.0
lib/native/libhadoop.so.1.0.0: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, BuildID[sha1]=19fbe9b0a7449eb05b687721548251af752b869f, with debug_info, not stripped

Turns out I was using an x86-64 Ubuntu VM instead of the aarch64 Ubuntu VM I had created so naturally, hadoop couldn’t load the aarch64 hadoop native library! For the VM I had been using, I needed to get the hadoop build by running:

curl -Lo hadoop-3.3.5.tar.gz https://dlcdn.apache.org/hadoop/common/hadoop-3.3.5/hadoop-3.3.5.tar.gz

Checking the loading status of the native libraries now indicates that the hadoop native library can be successfully loaded:

saint@ubuntuvm:~/java/binaries/hadoop/x64/hadoop-3.3.5$ bin/hadoop checknative -a
2023-05-31 14:58:40,869 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
2023-05-31 14:58:40,877 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
2023-05-31 14:58:40,887 WARN erasurecode.ErasureCodeNative: Loading ISA-L failed: Failed to load libisal.so.2 (libisal.so.2: cannot open shared object file: No such file or directory)
2023-05-31 14:58:40,887 WARN erasurecode.ErasureCodeNative: ISA-L support is not available in your platform... using builtin-java codec where applicable
2023-05-31 14:58:41,035 INFO nativeio.NativeIO: The native code was built without PMDK support.
Native library checking:
hadoop:  true /home/saint/java/binaries/hadoop/x64/hadoop-3.3.5/lib/native/libhadoop.so.1.0.0
zlib:    true /lib/x86_64-linux-gnu/libz.so.1
zstd  :  true /lib/x86_64-linux-gnu/libzstd.so.1
bzip2:   true /lib/x86_64-linux-gnu/libbz2.so.1
openssl: false Cannot load libcrypto.so (libcrypto.so: cannot open shared object file: No such file or directory)!
ISA-L:   false Loading ISA-L failed: Failed to load libisal.so.2 (libisal.so.2: cannot open shared object file: No such file or directory)
PMDK:    false The native code was built without PMDK support.
2023-05-31 14:58:41,056 INFO util.ExitUtil: Exiting with status 1: ExitException

Switching to the aarch64 Ubuntu VM also showed the aarch64 hadoop native library being successfully loaded on that platform. In hindsight, the 386 architecture references when I installed lib64z could have been a warning sign if I wasn’t just blasting my way through running these commands.


Categories: Security

Limiting SSH Identifier File Permissions

I recently needed to Use SSH keys to connect to a Linux Azure VMs from my primary development machine, a Windows desktop. OpenSSH has been available on Windows since 2018 as per this OpenSSH for Windows overview. I downloaded my private key for the Azure VM to a file called my_key.pem. Just to be sure I knew which executable would run when I launched ssh, I used this command line.

C:\> where ssh
C:\Windows\System32\OpenSSH\ssh.exe

I then passed the -i my_key.pem option to ssh when connecting to the VM.

ssh -J user1@ipaddress1 -i my_key.pem user2@ipaddress2

It was then that I discovered that ssh checks the file permissions on Windows and considered them too open by default. This is the error I got:

Bad permissions. Try removing permissions for user: BUILTIN\\Users (S-1-5-32-545) on file C:/.../my_key.pem.
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@         WARNING: UNPROTECTED PRIVATE KEY FILE!          @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions for 'my_key.pem' are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
Load key "my_key.pem": bad permissions
someuser@0.0.0.0: Permission denied (publickey).

The Security identifiers on Windows are well documented. BUILTIN\\Users (S-1-5-32-545)

A security identifier is used to uniquely identify a security principal or security group. Security principals can represent any entity that can be authenticated by the operating system, such as a user account, a computer account, or a thread or process that runs in the security context of a user or computer account.

Security identifiers
S-1-5-32-545UsersA built-in group. After the initial installation of the operating system, the only member is the Authenticated Users group.
Description of the “Users” Group
Security identifiers

Fortunately, someone else ran into this way before me: amazon web services – Set pem file permissions for AWS without chmod on Windows – Stack Overflow. To see the current file permissions, run icacls without any additional flags.

C:\> icacls my_key.pem
my_key.pem BUILTIN\Administrators:(I)(F)
           NT AUTHORITY\SYSTEM:(I)(F)
           BUILTIN\Users:(I)(RX)
           NT AUTHORITY\Authenticated Users:(I)(M)

Successfully processed 1 files; Failed processing 0 files

The solution from that StackOverflow post is to run the commands below.

icacls my_key.pem /reset
icacls my_key.pem /grant:r %username%:(R)
icacls my_key.pem /inheritance:r

The icacls docs explain that /reset “Replaces ACLs with default inherited ACLs for all matching files.” That doesn’t change anything on my system. The /grant option adds my personal account to the list of accounts with permission to the file.

Grants specified user access rights. Permissions replace previously granted explicit permissions.

Not adding the :r, means that permissions are added to any previously granted explicit permissions.

icacls | Microsoft Learn

The /inheritance:r option removes the 4 security identifiers shown previously from the private key file’s DACL. SSH is now happy to get the authentication identity from this private key file.


Categories: Profiling

Experimenting with perf on Linux

In the post on Experimenting with Async Profiler, I mentioned the basic (trial division) integer factorization app I wrote. I’ve been experimenting with perf to see what the system looks like when running this application. On Ubuntu, I started with this command:

perf record -F 97 -a -g -- sleep 10

Turns out perf isn’t installed by default.

WARNING: perf not found for kernel 5.19.0-41

  You may need to install the following packages for this specific kernel:
    linux-tools-5.19.0-41-generic
    linux-cloud-tools-5.19.0-41-generic

  You may also want to install one of the following packages to keep up to date:
    linux-tools-generic
    linux-cloud-tools-generic

Interestingly, running sudo apt install linux-tools-generic only picks up 5.17:

...
The following NEW packages will be installed:
  linux-tools-5.15.0-72 linux-tools-5.15.0-72-generic linux-tools-generic
...

which perf now shows /usr/bin/perf but even perf -v fails with the above warning so I have to run

sudo apt install linux-tools-5.19.0-41-generic

...
The following NEW packages will be installed:
  linux-hwe-5.19-tools-5.19.0-41 linux-tools-5.19.0-41-generic
...

Once that completes, perf can now run but perf version doesn’t display anything meaningful. Back to the original command:

perf record -F 97 -a -g -- sleep 10

This fails with an error about restricted access. Interesting reading but I just use sudo and carry on.

Error:
Access to performance monitoring and observability operations is limited.
Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
access to performance monitoring and observability operations for processes
without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
More information can be found at 'Perf events and tool security' document:
https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
perf_event_paranoid setting is 4:
  -1: Allow use of (almost) all events by all users
      Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow raw and ftrace function tracepoint access
>= 1: Disallow CPU event access
>= 2: Disallow kernel profiling
To make the adjusted perf_event_paranoid setting permanent preserve it
in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)

Once the command completes, a perf.data file is created in the current directory. To generate a report, run this command. See the sample perf-report.txt file on GitHub.

perf report -n --stdio > perf-report.txt

To generate a flame graph, use Brendan Gregg’s scripts:

cd ~/repos
git clone https://github.com/brendangregg/FlameGraph

cd -

perf script --header > stacks.txt

~/repos/FlameGraph/stackcollapse-perf.pl < stacks.txt | ~/repos/FlameGraph/flamegraph.pl --hash > myflamegraph.svg

Categories: Java, Profiling

Experimenting with Async Profiler

I have been studying the performance of a simple Java application (for integer factorization) using async-profiler. The application’s source code is on GitHub.

async-profiler is a low overhead sampling profiler for Java that does not suffer from Safepoint bias problem.

async-profiler repo

There is also a 3-part talk about async-profiler demo-ing how it works and how to use it.

I downloaded the Linux x64 async-profiler build with these commands…

mkdir -p ~/java/binaries/async-profiler
cd ~/java/binaries/async-profiler

curl -Lo async-profiler-2.9-linux-x64.tar.gz https://github.com/jvm-profiling-tools/async-profiler/releases/download/v2.9/async-profiler-2.9-linux-x64.tar.gz

tar xzf async-profiler-2.9-linux-x64.tar.gz

… then started the application with these:

# macos:
export JAVA_HOME=~/java/binaries/jdk/x64/jdk-17.0.7+7/Contents/Home

# Linux:
export JAVA_HOME=~/java/binaries/jdk/x64/jdk-17.0.7+7/

cd ~/repos/scratchpad/demos/java/FindPrimes/
$JAVA_HOME/bin/java Factorize 91278398257125987

Once the application is running, use the profiler.sh script to attach to the Java process and start profiling it. I was interested in wall clock profiling. This is specified using the -e wall argument (see Part 2: Improving Performance with Async-profiler by Andrei Pangin. – YouTube). The command line below will profile the Java application with a 5ms sampling interval for a duration (-d) of 10 seconds.

# macos:
cd ~/java/binaries/async-profiler-2.9-macos

# Linux:
cd ~/java/binaries/async-profiler-2.9-linux

./profiler.sh -e wall -t -i 5ms -d 10 -f result.html jps

The jps argument above lets the profiler.sh script determine which Java process is running by calling The jps Command (oracle.com). If there are multiple Java processes, then run jps first to determine the process id of the one to be profiled then explicitly pass that pid to profiler.sh e.g.

jps
./profiler.sh -e wall -t -i 5ms -d 10 -f result.html 53361

Async-profiler can also be attached at application startup.

$JAVA_HOME/bin/java -agentlib:~/java/binaries/async-profiler-2.9-macos/build/libasyncProfiler.dylib=start,event=wall,threadsfile=out.html Factorize

Other Event Types

The event type we have used so far is the wall clock event. Other event types include cpu and lock. The latter mode is mentioned at 32:15 in Part 2: Improving Performance with Async-profiler by Andrei Pangin. – YouTube. Here’s a sample command:

~/java/binaries/async-profiler/async-profiler-2.9-linux-x64/profiler.sh -e lock -i 5ms -d 10 -f result.html jps

Output Formats

The profiling data can be written in various formats. Here is an example command line used to explore what the traces output looks like. See scratchpad/factorization-profile.sh · swesonga/scratchpad · GitHub for additional examples of output formats.

cd ~/java/binaries/async-profiler/async-profiler-2.9-linux-x64
./profiler.sh -e wall -i 5ms -d 10 -o flat,traces -f Factorize-flat+traces.html 69490

To convert the async-profiler data from one format to another, use converter.jar from the downloaded tar file:

$JAVA_HOME/bin/java -cp ~/java/binaries/async-profiler/async-profiler-2.9-linux-x64/build/converter.jar FlameGraph rawdata.txt output.html

Other Notes

To find out file types on macos, run file -I rawdata. In my case, I had flamegraph data that was shared as application/gzip (causing unzip to fail with End-of-central-directory signature not found. I needed to use gzip -d rawdata.


Categories: Fluid Flow, Simulation

Building the VM2D Source Code

One of the challenges I have run into when reading through numerical methods/simulation papers is the dearth of source code. I was therefore pleasantly surprised to find the Vortex method for 2D flow simulation program. Of course, my luck is that the comments and docs are in Russian, which I do not understand. However, availability of the source code more than makes up for that. The publication of The VM2D Open Source Code for Incompressible Flow Simulation by Using Meshless Lagrangian Vortex Methods on CPU and GPU | IEEE Conference Publication | IEEE Xplore turns out to be the documenation. To build the source code, open a developer command prompt.

git clone https://github.com/vortexmethods/VM2D
cd VM2D
cmake .

This fails with an error about MPI missing:

-- -------------------------2D CODE-------------------------
-- Checking for module 'mpi-c'
--   Can't find mpi-c.pc in any of C:/software/strawberry/c/lib/pkgconfig
use the PKG_CONFIG_PATH environment variable, or
specify extra search paths via 'search_paths'
-- Could NOT find MPI_C (missing: MPI_C_LIB_NAMES MPI_C_HEADER_DIR MPI_C_WORKS)
-- Checking for module 'mpi-cxx'
--   Can't find mpi-cxx.pc in any of C:/software/strawberry/c/lib/pkgconfig
use the PKG_CONFIG_PATH environment variable, or
specify extra search paths via 'search_paths'
-- Could NOT find MPI_CXX (missing: MPI_CXX_LIB_NAMES MPI_CXX_HEADER_DIR MPI_CXX_WORKS)
CMake Error at C:/Program Files/CMake/share/cmake-3.25/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find MPI (missing: MPI_C_FOUND MPI_CXX_FOUND)
Call Stack (most recent call first):
  C:/Program Files/CMake/share/cmake-3.25/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
  C:/Program Files/CMake/share/cmake-3.25/Modules/FindMPI.cmake:1837 (find_package_handle_standard_args)
  src/VMlib/CMakeLists.txt:31 (find_package)

StackOverflow articles e.g. Installing MPI for Windows – Stack Overflow suggest installing Microsoft MPI. I Download Microsoft MPI v10.0 from Official Microsoft Download Center but why on earth does it have an IE warning when I’m already using Edge?

Microsoft MPI v10.0 Download Page

How do I know which one I want? I’ll start with the SDK MSI.

Choose the download you want
MPI SDK Setup
asdf

The publisher certificate expired in December 2021. Shouldn’t there be a warning about that? I guess the publisher is well known and not revoked? Oh well, plough ahead and install it. Reopen the developer command prompt and run cmake . again. This time, there are no errors (and I ignore all the warnings since I have things to do). A Visual Studio solution is generated. Open VM.sln in Visual Studio 2022. Building fails with these errors:

...
3>C:\repos\edu\VM2D\src\VMcuda>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc.exe"  --use-local-env ... -Wno-deprecated-gpu-targets -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -std=c++14 ... -o VMcuda.dir\Debug\cuLib2D.obj "C:\repos\edu\VM2D\src\VMcuda\Cuda\cuLib2D.cu"
3>nvcc fatal   : Unsupported gpu architecture 'compute_35'
...
3>Done building project "VMcuda.vcxproj" -- FAILED.
...
4>LINK : fatal error LNK1104: cannot open file 'VMcuda.lib'
4>Done building project "VM2D.vcxproj" -- FAILED.

The solution is to remove the segment -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 from VM2D/CMakeLists.txt · vortexmethods/VM2D (github.com). Launching the application (VM2D.exe) fails with this message: The code execution cannot proceed because msmpi.dll was not found. Reinstalling the program may fix this problem.

VM2D.exe Failing to Launch

The SDK doesn’t have that DLL, so I guess that’s what the other setup executable is for.

MPI 10.0.12498 Setup
MPI Setup

I don’t see any DLLs in that installation directory. However, mpi – msmpi.dll error message in Visual Studio C++ – Stack Overflow says reinstalling MPI is the solution. Before doing so, I run the application in Visual Studio once again and this time it launches successfully. This message is displayed: queue ERROR: file problems is not found. These files are in the VM2D/run folder. Other files will not be found though if using the run folder as the current directory.

I guess I need to translate VM2D/03_starting.rst. This is the first time I have needed to translate a web page. Looks like the problems file expects to be in the tutorials directory. Copying the file in the VM2D/run folder to the VM2D/tutorials directory allows the program to find the expect files and it appears to run now. Task manager shows 100% CPU usage on my AMD Ryzen 7 5800X 8-Core Processor. The program runs for an hour and as per VM2D/05_run.rst, creates a csv file and a snapshots directory containing vtk files. To view these files, Download ParaView and install it. Launch ParaView then open the snapshots directory (it should recognize all the vtk files as a group).

Opening snapshots

Click on the Apply button on the Properties pane (see image below).

Properties of the Snapshots

Finally, click on the Play button on the toolbar to see the animation of the snapshots. The next step will be figuring out how to use the GPU to generate these snapshots in (hopefully) much less than an hour.

Snapshots in Paraview

Categories: Digital Logic

Tristate Buffers & Impedance

Learning more about computer architecture has rekindled in me an interest in digital logic. I was perusing the Introduction to Logic Synthesis Using Verilog HDL book when I encountered tristate buffers – one of the less obvious circuit components to me. YouTube proved to be a valuable resource (as it often does) with a variety of explanations about what they are and how they are used. Below is my favorite video on the topic.

I wanted to learn more about the concept of impedance since the high impedance mode is one of the tristate buffer states. Wikipedia’s article on Electrical impedance seemed a bit much for the high level overview I sought. Back on YouTube, I found this video that, while not focussed on digital circuits, was quite interesting. It got me to order a P3 P4400 Kill A Watt Electricity Usage Monitor for myself.


Categories: Java

Checking Symbol Availability on Windows OpenJDK Build

We use SymChk to ensure that symbols are available for Windows applications. For the OpenJDK build, this command line can be used to ensure the symbols directory contains symbols for all the Java binaries:

"C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\symchk" /r D:\java\binaries\jdk\x64\jdk-17.0.7+7\ /s D:\java\binaries\jdk\x64\jdk-17.0.7+7-debug-symbols\bin;D:\java\binaries\jdk\x64\jdk-17.0.7+7-debug-symbols\bin\server

As per the SymChk Command-Line Options docs:

Here’s the tail end of the SymChk output:

...
SYMCHK: api-ms-win-crt-utility-l1-1-0.dll FAILED  - api-ms-win-crt-utility-l1-1-0.pdb mismatched or not found
SYMCHK: msvcp140.dll         FAILED  - msvcp140.amd64.pdb mismatched or not found
SYMCHK: ucrtbase.dll         FAILED  - ucrtbase.pdb mismatched or not found
SYMCHK: vcruntime140.dll     FAILED  - vcruntime140.amd64.pdb mismatched or not found
SYMCHK: vcruntime140_1.dll   FAILED  - vcruntime140_1.amd64.pdb mismatched or not found

SYMCHK: FAILED files = 46
SYMCHK: PASSED + IGNORED files = 440

The components that have failures are binaries that are external dependencies of the OpenJDK. Those failures can therefore be safely ignored. An interesting thing to note is that java.dll and java.exe are in the same folder in the OpenJDK installation. Since their symbol files are both called java.pdb, the symbols for java.exe are placed in a subdirectory called exe. This applies to other binaries with similar PDB filename conflicts. See the Symbol Path Syntax section for more details.

The symbols provided also come with .map files. The .map vs pdb search reveals some interesting tidbits about .map files, e.g. that they are an older technology than PDB files, which superseded them (Build Time Improvement Recommendation: Turn off /MAP, use PDBs – C++ Team Blog) and they can be created from PDB files (windows – How to create a .MAP file from a .PDB file – Stack Overflow). See debugging – Why should we need the map file when pdb file is available in windows platform? – Stack Overflow also.


Categories: Windows

Windows Night Light Not Working

I have been trying to enable the Night Light on my Windows 11 desktop but nothing happens when toggling the “Turn on now” and “Turn off now” buttons. It would be nice if they at least provided an error message explaining that they couldn’t do what you asked. The “Strength” slider doesn’t do anything either.

Night light Settings

The post at Night Light Not Working right – Microsoft Community links to How to Fix Windows 10 Night Light Not Working Properly – MajorGeeks, which recommends updating your video drivers. That post calls out nVidia drivers, which is what I thought have. Device manager thinks otherwise – I’m running the Microsoft Basic Display Adapter. That’s right! I reset my PC recently as mentioned in the Disabled Device & Domain Join Issues post.

Display Adapter in Device Manager
Microsoft Basic Display Adapter Properties

I download the latest CUDA Toolkit (12.1.0, Feb 2023) and install all available components. The driver version 531.14 should be installed based on this selection.

NVIDIA Installer

When the installation completes, the scale of my screen has increased from 100% to the recommended 300% and the night light is now on (even before closing the installer)! Ironically, the NVIDIA Installer window now looks horrific at the 300% scale! The Device Manager now shows the graphics card name.

nVidia Display Adapter in Device Manager
NVIDIA Quadro P1000 Properties

I also notice that the window corners are now rounded. I had tried running an OpenGL program and I had no idea why glfwCreateWindow returned NULL – it’s now obvious that there was no appropriate device driver.


Categories: Benchmarks

Introduction to YCSB

I recently started looking into the paper on the Yahoo! Cloud Serving Benchmark. It briefly discusses OLTP, (which is explained at Online transaction processing (OLTP) – Azure Architecture Center and Online transaction processing – Wikipedia) and compares various databases like Bigtable and Apache CouchDB.

Benchmark Execution

The YCSB repo explains that bin/ycsb.sh is used to load and run the benchmark. The actual command line executed on the shell is an invocation of the JDK with a YCSB class. For the load and run commands, site.ycsb.Client is set as the YCSB_CLASS. For the shell command, the site.ycsb.CommandLine class is used instead.

"$JAVA_HOME/bin/java" $JAVA_OPTS -classpath "$CLASSPATH" $YCSB_CLASS $YCSB_COMMAND -db $BINDING_CLASS $YCSB_ARGS

The YCSB_COMMAND passed to the Client class is set to -load and -t respectively, for the load and run arguments to the script. The -db argument specified which class to use for the database client. This comes from the second parameter to the script (grep is used to match the script’s 2nd argument with a line in bindings.properties that specifies the corresponding Java class).

Setting up YSCB with a MySQL Database

Database Installation

In addition to the original paper, Planet MySQL also has YCSB results for runs against a MySQL database. The ease of use of a local database prompts me to start out with MySQL as well. Ubuntu docs explain how to Install and configure a MySQL server.

saint@ubuntuvm2:~$ sudo apt install mysql-server
[sudo] password for saint: 
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  libaio1 libcgi-fast-perl libcgi-pm-perl libevent-core-2.1-7
  libevent-pthreads-2.1-7 libfcgi-bin libfcgi-perl libfcgi0ldbl
  libhtml-template-perl libmecab2 libprotobuf-lite23 mecab-ipadic
  mecab-ipadic-utf8 mecab-utils mysql-client-8.0 mysql-client-core-8.0
  mysql-common mysql-server-8.0 mysql-server-core-8.0
Suggested packages:
  libipc-sharedcache-perl mailx tinyca
The following NEW packages will be installed:
  libaio1 libcgi-fast-perl libcgi-pm-perl libevent-core-2.1-7
  libevent-pthreads-2.1-7 libfcgi-bin libfcgi-perl libfcgi0ldbl
  libhtml-template-perl libmecab2 libprotobuf-lite23 mecab-ipadic
  mecab-ipadic-utf8 mecab-utils mysql-client-8.0 mysql-client-core-8.0
  mysql-common mysql-server mysql-server-8.0 mysql-server-core-8.0
0 upgraded, 20 newly installed, 0 to remove and 2 not upgraded.
Need to get 29.2 MB of archives.
After this operation, 242 MB of additional disk space will be used.
Do you want to continue? [Y/n] 

Getting YCSB Sources

Now that MySQL is installed, we need the YCSB sources to run. I started out by cloning the YCSB repo.

mkdir -p ~/java/benchmarks/ycsb
cd ~/java/benchmarks/ycsb
git clone https://github.com/brianfrankcooper/YCSB
cd YCSB

As a Java repo rookie, I simply ran bin/ycsb.sh load basic -P workloads/workloada as mentioned in the readme without realizing that I needed to first build the repo, duh. That failed with this error:

$ export JAVA_HOME=~/java/binaries/jdk/x64/jdk-20+36
$ bin/ycsb.sh load basic -P workloads/workloada

Error: Could not find or load main class site.ycsb.db.JdbcDBCreateTable
Caused by: java.lang.ClassNotFoundException: site.ycsb.db.JdbcDBCreateTable

Use mvn to build the sources:

# Error: Could not find or load main class site.ycsb.db.JdbcDBCreateTable
# https://github.com/brianfrankcooper/YCSB/issues/257#issuecomment-104845560

sudo apt install maven
mvn clean package

I end up with test failures, what do you know?

Getting YCSB Binaries

I decided I might as well just follow the main readme steps and not deal with any build issues.

cd ~/java/benchmarks/ycsb

sudo apt install curl
curl -O --location https://github.com/brianfrankcooper/YCSB/releases/download/0.17.0/ycsb-0.17.0.tar.gz

tar xfvz ycsb-0.17.0.tar.gz
cd ycsb-0.17.0

Launching YCSB

Launch YCSB in the folder from the tar.gz file:

# Notice the version in the path below needs to be updated from what is used at
# https://github.com/brianfrankcooper/YCSB/tree/master/jdbc
#
# The MySQL connectors are at https://dev.mysql.com/downloads/connector/j/?os=26

java -cp jdbc-binding/lib/jdbc-binding-0.17.0.jar:../mysql-connector-j-8.0.32/mysql-connector-j-8.0.32.jar site.ycsb.db.JdbcDBCreateTable -P myjdbc.properties -n ycsbtable

Turns out the driver in the docs is outdated:

Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
Error in creating table. java.sql.SQLException: Access denied for user 'admin'@'localhost' (using password: YES)

Configuring the Database

To determine which user to run as, use the approach from MySQL SHOW USERS: List All Users in a MySQL Database Server (mysqltutorial.org). Launch mysql then enter these queries:

mysql> SELECT user FROM mysql.user;
+------------------+
| user             |
+------------------+
| debian-sys-maint |
| mysql.infoschema |
| mysql.session    |
| mysql.sys        |
| root             |
+------------------+
5 rows in set (0.00 sec)

mysql> SELECT user();
+----------------+
| user()         |
+----------------+
| root@localhost |
+----------------+
1 row in set (0.00 sec)

Let us create a new user for the benchmarks as outlined in How to Create MySQL User and Grant Privileges: A Beginner’s Guide (hostinger.com). Note that we need to create the database as well since the connection string in the properties file specifies the ycsb database. TODO: narrow the priviledges.

CREATE DATABASE ycsb;
CREATE USER 'ycsbuser'@'localhost' IDENTIFIED BY 'ProfileIt!';
GRANT ALL PRIVILEGES ON * . * TO 'ycsbuser'@'localhost';

Hard to believe but the JdbcDBCreateTable class fails!

losing database connection.
Error in creating table. java.sql.SQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'PRIMARY KEY, FIELD0 TEXT, FIELD1 TEXT, FIELD2 TEXT, FIELD3 TEXT, FIELD4 TEXT, FI' at line 1

Gets me curious about seeing the queries coming in. A quick look at logging – How to show the last queries executed on MySQL? – Stack Overflow convinces me that it’s not worth doing yet. We can manually create the table for the benchmark in MySQL.

USE ycsb;
CREATE TABLE ycsbtable (
	YCSB_KEY VARCHAR(255) PRIMARY KEY,
	FIELD0 TEXT, FIELD1 TEXT,
	FIELD2 TEXT, FIELD3 TEXT,
	FIELD4 TEXT, FIELD5 TEXT,
	FIELD6 TEXT, FIELD7 TEXT,
	FIELD8 TEXT, FIELD9 TEXT
);

Now we launch the benchmark:

curl -Lo https://raw.gihubusercontent.com/brianfrankcooper/YCSB/0.17.0/workloads/workloada

bin/ycsb.sh load jdbc -P workloads/workloada

It fails with a NullPointerException, of all things

...
Command line: -load -db site.ycsb.db.JdbcDBClient -P workloads/workloada
YCSB Client 0.17.0

Loading workload...
Starting test.
Exception in thread "Thread-1" java.lang.NullPointerException: Cannot invoke "String.contains(java.lang.CharSequence)" because "driver" is null
	at site.ycsb.db.JdbcDBClient.init(JdbcDBClient.java:187)
	at site.ycsb.DBWrapper.init(DBWrapper.java:86)
	at site.ycsb.ClientThread.run(ClientThread.java:91)
	at java.base/java.lang.Thread.run(Thread.java:833)
[OVERALL], RunTime(ms), 1
[OVERALL], Throughput(ops/sec), 0.0
...

Turns out I need a customer properties file instead:

bin/ycsb.sh load jdbc -P myjdbc.properties

However, that attempt fails too.

Command line: -load -db site.ycsb.db.JdbcDBClient -P ../../myjdbc.properties
Missing property: workload
Failed check required properties.

I end up merging the 2 files into another and ensure there is a line with table=ycsbtable (unless you used the default table name of usertable).

bin/ycsb.sh load jdbc -P ../../mysqlworkload.properties

The error is now:

Loading workload...
Starting test.
Error in initializing the JDBS driver: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
site.ycsb.DBException: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
	at site.ycsb.db.JdbcDBClient.init(JdbcDBClient.java:228)
	at site.ycsb.DBWrapper.init(DBWrapper.java:86)
	at site.ycsb.ClientThread.run(ClientThread.java:91)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)
	at java.base/java.lang.Class.forName0(Native Method)
	at java.base/java.lang.Class.forName(Class.java:375)
	at site.ycsb.db.JdbcDBClient.init(JdbcDBClient.java:199)
	... 3 more

Looks like the MySQL connector needs to be in the class path. Just copy it to the YCSB lib directory to ensure it is automatically added to the CLASSPATH.

cp ../binaries/mysql-connector-j-8.0.32.jar lib/

To run the benchmark:

bin/ycsb.sh run jdbc -P ../../mysqlworkload.properties

One question that arises is how to control the benchmark running time. There is a maxexecutiontime (in seconds) argument that can be passed to the benchmark.

bin/ycsb.sh run jdbc -P ../../mysqlworkload.properties -p maxexecutiontime=60

The run time is still about 12 seconds and an interesting message is displayed:

Loading workload...
Starting test.
Maximum execution time specified as: 60 secs
Adding shard node URL: jdbc:mysql://127.0.0.1:3306/ycsb
Using shards: 1, batchSize:-1, fetchSize: -1
DBWrapper: report latency for each error is false and specific error codes to track for latency are: []
Could not wait until max specified time, TerminatorThread interrupted.
[OVERALL], RunTime(ms), 6756

Looks like customizing the load is the way to prolong the benchmark:

# The number of records to load into the database initially.
recordcount=1000000

# The target number of operations to perform.
operationcount=10000

# Indicates how many inserts to do if less than recordcount.
# Useful for partitioning the load among multiple servers if the client is the bottleneck.
# Additionally workloads should support the "insertstart" property which tells them which record to start at.
insertcount=10000

Outstanding Items


Categories: Java

Introduction to the Java Flight Recorder (JFR)

As a total newbie to the Java flight recorder, I found these posts helpful in understanding the history and goals of JFR:

The .java_pid file extension came up and I was not familiar with it. The answer at java – How to get rid of /tmp/.java_pid<number> files in Linux? – Stack Overflow explains that these files are created by the JVM to support debugging as part of the attach api. Here is a more recent link to VirtualMachineImpl.java (and the related VirtualMachineImpl.c). Here is a simple walk-through (from the above posts) showing how to use JFR. I used Ubuntu 22.04 for this. First, download a JDK build.

mkdir -p ~/java/binaries/jdk/x64
cd ~/java/binaries/jdk/x64

sudo apt install curl
curl -Lo microsoft-jdk-11.0.18-linux-x64.tar.gz https://aka.ms/download-jdk/microsoft-jdk-11.0.18-linux-x64.tar.gz

tar -xzf microsoft-jdk-11.0.18-linux-x64.tar.gz

Next, get and compile the Java program to use to experiment with JFR (I used the Red Hat leaks demo).

curl -Lo RedHatLeaksDemo.java https://raw.githubusercontent.com/swesonga/scratchpad/baa0263f480b6d5c5446be90f572b2a7897279fa/demos/java/RedHatLeaksDemo.java

jdk-11.0.18+10/bin/javac RedHatLeaksDemo.java

Starting JFR Using Java Command Line Flags

Use the -XX:Start flag as suggested by the referenced posts above.

jdk-11.0.18+10/bin/java -XX:StartFlightRecording=duration=5s,filename=flightRedHatLeaks.jfr RedHatLeaksDemo

A message will be displayed notifying you that recording has started.

Started recording 1. The result will be written to:

/home/saint/java/binaries/jdk/x64/flightRedHatLeaks.jfr

Starting JFR Using jcmd

To start a recording after application startup, use jcmd.

# Start the application normally
jdk-11.0.18+10/bin/java RedHatLeaksDemo

Launch a new terminal then run jcmd to start a JFR recording.

# Determine the pid of the java process
ps -a | grep java

jdk-11.0.18+10/bin/jcmd 13573 JFR.start duration=100s filename=flight-jcmd.jfr

Viewing Results Using VisualVM

Download VisualVM or get and launch it using these commands:

mkdir -p ~/java/binaries/visualvm
cd ~/java/binaries/visualvm

curl -Lo visualvm_215.zip https://github.com/oracle/visualvm/releases/download/2.1.5/visualvm_215.zip

unzip visualvm_215.zip
visualvm_215/bin/visualvm --jdkhome ~/java/binaries/jdk/x64/jdk-11.0.18+10

Once VisualVM loads, browse to and open the .jfr file.

Related