Categories: Benchmarks, Java

Running the JMH Benchmark

To build and run the JMH benchmark, ensure Maven is installed (the mvn command should be available in the shell). On Ubuntu, this is as easy as running sudo apt install maven then setting up JMH using the steps below.

git clone https://github.com/swesonga/benchmarks
cd benchmarks
cd java/jmh
./setup_jmh_jdk_micros.sh
./run_jmh_jdk_micros.sh Parallel 1 2 10 5 5

The setup_jmh_jdk_micros.sh script builds the JMH JDK microbenchmarks and the run_jmh_jdk_micros.sh run the benchmark. Someone recently asked why there are 2 java processes shown in top when the run_jmh_jdk_micros.sh script runs given that it launches only 1 java process. I learned from this site how to use top to see java processes only: run top, type ‘o’, then type ‘COMMAND=java’ and press ENTER. This is the resulting output from the top command:

top - 22:15:21 up  8:15,  1 user,  load average: 1.73, 0.66, 0.30
Tasks: 330 total,   1 running, 329 sleeping,   0 stopped,   0 zombie
%Cpu(s): 20.9 us,  0.6 sy,  0.0 ni, 78.3 id,  0.0 wa,  0.0 hi,  0.2 si,  0.0 st
MiB Mem :  15415.5 total,   5837.9 free,   5436.0 used,   4141.6 buff/cache
MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.   9568.6 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                          
  22007 saint     20   0 5009436   1.1g  22464 S 345.3   7.2   1:02.74 java                                                                                             
  21927 saint     20   0 5607592   1.1g  23100 S   0.3   7.2   0:01.08 java

Sure enough, there are 2 java processes, each with 1.1g RES. This can also be confirmed by running the ps -aux | grep java command.

saint      21927  0.2  7.2 5607592 1141808 pts/2 Sl+  22:15   0:01 /home/saint/java/binaries/jdk/x64/jdk-21.0.1+12/bin/java -Xms1G -Xmx1G -XX:+AlwaysPreTouch -XX:+UseLargePages -jar jmh-jdk-microbenchmarks/micros-uber/target/micros-uberpackage-1.0-SNAPSHOT.jar -f 1 -si true -w 10 -i 5 -wi 5 -t 2 -o ./applogs/Parallel-1G-2024-02-02_22-15-03.txt -rff ./applogs/Parallel-1G-2024-02-02_22-15-03_machine.txt -rf text (?i)\.*(atomic|lock|volatile|ConcurrentHashMap|ProducerConsumer|Queues|ThreadLocalRandomNextInt)\.*
saint      22237  200  7.1 4876316 1125396 pts/2 Sl+  22:21   2:16 /home/saint/java/binaries/jdk/x64/jdk-21.0.1+12/bin/java -Xms1G -Xmx1G -XX:+AlwaysPreTouch -XX:+UseLargePages -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions -DcompilerBlackholesEnabled=true -XX:CompileCommandFile=/tmp/jmh12734456823702200189compilecommand -cp jmh-jdk-microbenchmarks/micros-uber/target/micros-uberpackage-1.0-SNAPSHOT.jar org.openjdk.jmh.runner.ForkedMain 127.0.0.1 36835

Notice that one of the processes has an IP address and what is most likely a port number. Without knowing anything else, this would suggest a client/server model in use. To better understand why there are 2 java processes, let us see look at how the run script launches Java. It passes flags like -f 1 to the benchmark jar file. What parses these flags?

The benchmark JAR file is created from a pom.xml file containing a mainClass attribute of org.openjdk.jmh.Main. That’s the class responsible for parsing these flags. This Main class uses the CommandLineOptions class to parse arguments like -f 1 then executes the Runner.run() method. Runner.runBenchmarks() checks whether the benchmarks should be run embedded or forked. runBenchmarksEmbedded() has a warning about using non-forked runs only for debugging purposes. This answers the question of why there are 2 JVMs: we are running in forked mode. runSeparate() has a getForkedMainCommand() method, which suggests that there is most likely a way to pass custom arguments to the ForkedMain JVM.

Now that we understand why there are 2 JVMs, can we control the heap size of each of them independently? The CommandLineOptions class has a list of all the supported arguments. Notice the jvmArgs, jvmArgsAppend, and jvmArgsAppend arguments! These suggest that we can indeed control the heap sizes of each of the JVMs. Use the jvmArgsAppend flag in the run_jmh_jdk_micros.sh script (e.g. just before the benchmark_filter_regex to specify a custom heap size for the forked java process (independent of the jvm_heap_size_opts).

Installing Maven

I did not have Java set up on my Ubuntu VM. Therefore, setting up MVN installed Java and many other dependencies. I decided to list these here for future reference.

saint@vm1ubuntu:~/java$ sudo apt install maven
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  ca-certificates-java default-jre-headless java-common libaopalliance-java libapache-pom-java libatinject-jsr330-api-java libcdi-api-java libcommons-cli-java
  libcommons-io-java libcommons-lang3-java libcommons-parent-java libgeronimo-annotation-1.3-spec-java libgeronimo-interceptor-3.0-spec-java libguava-java
  libguice-java libhawtjni-runtime-java libjansi-java libjansi-native-java libjsr305-java libmaven-parent-java libmaven-resolver-java libmaven-shared-utils-java
  libmaven3-core-java libplexus-cipher-java libplexus-classworlds-java libplexus-component-annotations-java libplexus-interpolation-java libplexus-sec-dispatcher-java
  libplexus-utils2-java libsisu-inject-java libsisu-plexus-java libslf4j-java libwagon-file-java libwagon-http-shaded-java libwagon-provider-api-java
  openjdk-11-jre-headless
Suggested packages:
  default-jre libaopalliance-java-doc libatinject-jsr330-api-java-doc libel-api-java libcommons-io-java-doc libcommons-lang3-java-doc libasm-java libcglib-java
  libjsr305-java-doc libmaven-shared-utils-java-doc liblogback-java libplexus-classworlds-java-doc libplexus-sec-dispatcher-java-doc libplexus-utils2-java-doc junit4
  testng libcommons-logging-java liblog4j1.2-java fonts-dejavu-extra fonts-ipafont-gothic fonts-ipafont-mincho fonts-wqy-microhei | fonts-wqy-zenhei
The following NEW packages will be installed:
  ca-certificates-java default-jre-headless java-common libaopalliance-java libapache-pom-java libatinject-jsr330-api-java libcdi-api-java libcommons-cli-java
  libcommons-io-java libcommons-lang3-java libcommons-parent-java libgeronimo-annotation-1.3-spec-java libgeronimo-interceptor-3.0-spec-java libguava-java
  libguice-java libhawtjni-runtime-java libjansi-java libjansi-native-java libjsr305-java libmaven-parent-java libmaven-resolver-java libmaven-shared-utils-java
  libmaven3-core-java libplexus-cipher-java libplexus-classworlds-java libplexus-component-annotations-java libplexus-interpolation-java libplexus-sec-dispatcher-java
  libplexus-utils2-java libsisu-inject-java libsisu-plexus-java libslf4j-java libwagon-file-java libwagon-http-shaded-java libwagon-provider-api-java maven
  openjdk-11-jre-headless
0 upgraded, 37 newly installed, 0 to remove and 72 not upgraded.
Need to get 52.7 MB of archives.
After this operation, 189 MB of additional disk space will be used.
Do you want to continue? [Y/n]

Categories: Benchmarks

Introduction to YCSB

I recently started looking into the paper on the Yahoo! Cloud Serving Benchmark. It briefly discusses OLTP, (which is explained at Online transaction processing (OLTP) – Azure Architecture Center and Online transaction processing – Wikipedia) and compares various databases like Bigtable and Apache CouchDB.

Benchmark Execution

The YCSB repo explains that bin/ycsb.sh is used to load and run the benchmark. The actual command line executed on the shell is an invocation of the JDK with a YCSB class. For the load and run commands, site.ycsb.Client is set as the YCSB_CLASS. For the shell command, the site.ycsb.CommandLine class is used instead.

"$JAVA_HOME/bin/java" $JAVA_OPTS -classpath "$CLASSPATH" $YCSB_CLASS $YCSB_COMMAND -db $BINDING_CLASS $YCSB_ARGS

The YCSB_COMMAND passed to the Client class is set to -load and -t respectively, for the load and run arguments to the script. The -db argument specified which class to use for the database client. This comes from the second parameter to the script (grep is used to match the script’s 2nd argument with a line in bindings.properties that specifies the corresponding Java class).

Setting up YSCB with a MySQL Database

Database Installation

In addition to the original paper, Planet MySQL also has YCSB results for runs against a MySQL database. The ease of use of a local database prompts me to start out with MySQL as well. Ubuntu docs explain how to Install and configure a MySQL server.

saint@ubuntuvm2:~$ sudo apt install mysql-server
[sudo] password for saint: 
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  libaio1 libcgi-fast-perl libcgi-pm-perl libevent-core-2.1-7
  libevent-pthreads-2.1-7 libfcgi-bin libfcgi-perl libfcgi0ldbl
  libhtml-template-perl libmecab2 libprotobuf-lite23 mecab-ipadic
  mecab-ipadic-utf8 mecab-utils mysql-client-8.0 mysql-client-core-8.0
  mysql-common mysql-server-8.0 mysql-server-core-8.0
Suggested packages:
  libipc-sharedcache-perl mailx tinyca
The following NEW packages will be installed:
  libaio1 libcgi-fast-perl libcgi-pm-perl libevent-core-2.1-7
  libevent-pthreads-2.1-7 libfcgi-bin libfcgi-perl libfcgi0ldbl
  libhtml-template-perl libmecab2 libprotobuf-lite23 mecab-ipadic
  mecab-ipadic-utf8 mecab-utils mysql-client-8.0 mysql-client-core-8.0
  mysql-common mysql-server mysql-server-8.0 mysql-server-core-8.0
0 upgraded, 20 newly installed, 0 to remove and 2 not upgraded.
Need to get 29.2 MB of archives.
After this operation, 242 MB of additional disk space will be used.
Do you want to continue? [Y/n] 

Getting YCSB Sources

Now that MySQL is installed, we need the YCSB sources to run. I started out by cloning the YCSB repo.

mkdir -p ~/java/benchmarks/ycsb
cd ~/java/benchmarks/ycsb
git clone https://github.com/brianfrankcooper/YCSB
cd YCSB

As a Java repo rookie, I simply ran bin/ycsb.sh load basic -P workloads/workloada as mentioned in the readme without realizing that I needed to first build the repo, duh. That failed with this error:

$ export JAVA_HOME=~/java/binaries/jdk/x64/jdk-20+36
$ bin/ycsb.sh load basic -P workloads/workloada

Error: Could not find or load main class site.ycsb.db.JdbcDBCreateTable
Caused by: java.lang.ClassNotFoundException: site.ycsb.db.JdbcDBCreateTable

Use mvn to build the sources:

# Error: Could not find or load main class site.ycsb.db.JdbcDBCreateTable
# https://github.com/brianfrankcooper/YCSB/issues/257#issuecomment-104845560

sudo apt install maven
mvn clean package

I end up with test failures, what do you know?

Getting YCSB Binaries

I decided I might as well just follow the main readme steps and not deal with any build issues.

cd ~/java/benchmarks/ycsb

sudo apt install curl
curl -O --location https://github.com/brianfrankcooper/YCSB/releases/download/0.17.0/ycsb-0.17.0.tar.gz

tar xfvz ycsb-0.17.0.tar.gz
cd ycsb-0.17.0

Launching YCSB

Launch YCSB in the folder from the tar.gz file:

# Notice the version in the path below needs to be updated from what is used at
# https://github.com/brianfrankcooper/YCSB/tree/master/jdbc
#
# The MySQL connectors are at https://dev.mysql.com/downloads/connector/j/?os=26

java -cp jdbc-binding/lib/jdbc-binding-0.17.0.jar:../mysql-connector-j-8.0.32/mysql-connector-j-8.0.32.jar site.ycsb.db.JdbcDBCreateTable -P myjdbc.properties -n ycsbtable

Turns out the driver in the docs is outdated:

Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
Error in creating table. java.sql.SQLException: Access denied for user 'admin'@'localhost' (using password: YES)

Configuring the Database

To determine which user to run as, use the approach from MySQL SHOW USERS: List All Users in a MySQL Database Server (mysqltutorial.org). Launch mysql then enter these queries:

mysql> SELECT user FROM mysql.user;
+------------------+
| user             |
+------------------+
| debian-sys-maint |
| mysql.infoschema |
| mysql.session    |
| mysql.sys        |
| root             |
+------------------+
5 rows in set (0.00 sec)

mysql> SELECT user();
+----------------+
| user()         |
+----------------+
| root@localhost |
+----------------+
1 row in set (0.00 sec)

Let us create a new user for the benchmarks as outlined in How to Create MySQL User and Grant Privileges: A Beginner’s Guide (hostinger.com). Note that we need to create the database as well since the connection string in the properties file specifies the ycsb database. TODO: narrow the priviledges.

CREATE DATABASE ycsb;
CREATE USER 'ycsbuser'@'localhost' IDENTIFIED BY 'ProfileIt!';
GRANT ALL PRIVILEGES ON * . * TO 'ycsbuser'@'localhost';

Hard to believe but the JdbcDBCreateTable class fails!

losing database connection.
Error in creating table. java.sql.SQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'PRIMARY KEY, FIELD0 TEXT, FIELD1 TEXT, FIELD2 TEXT, FIELD3 TEXT, FIELD4 TEXT, FI' at line 1

Gets me curious about seeing the queries coming in. A quick look at logging – How to show the last queries executed on MySQL? – Stack Overflow convinces me that it’s not worth doing yet. We can manually create the table for the benchmark in MySQL.

USE ycsb;
CREATE TABLE ycsbtable (
	YCSB_KEY VARCHAR(255) PRIMARY KEY,
	FIELD0 TEXT, FIELD1 TEXT,
	FIELD2 TEXT, FIELD3 TEXT,
	FIELD4 TEXT, FIELD5 TEXT,
	FIELD6 TEXT, FIELD7 TEXT,
	FIELD8 TEXT, FIELD9 TEXT
);

Now we launch the benchmark:

curl -Lo https://raw.gihubusercontent.com/brianfrankcooper/YCSB/0.17.0/workloads/workloada

bin/ycsb.sh load jdbc -P workloads/workloada

It fails with a NullPointerException, of all things

...
Command line: -load -db site.ycsb.db.JdbcDBClient -P workloads/workloada
YCSB Client 0.17.0

Loading workload...
Starting test.
Exception in thread "Thread-1" java.lang.NullPointerException: Cannot invoke "String.contains(java.lang.CharSequence)" because "driver" is null
	at site.ycsb.db.JdbcDBClient.init(JdbcDBClient.java:187)
	at site.ycsb.DBWrapper.init(DBWrapper.java:86)
	at site.ycsb.ClientThread.run(ClientThread.java:91)
	at java.base/java.lang.Thread.run(Thread.java:833)
[OVERALL], RunTime(ms), 1
[OVERALL], Throughput(ops/sec), 0.0
...

Turns out I need a customer properties file instead:

bin/ycsb.sh load jdbc -P myjdbc.properties

However, that attempt fails too.

Command line: -load -db site.ycsb.db.JdbcDBClient -P ../../myjdbc.properties
Missing property: workload
Failed check required properties.

I end up merging the 2 files into another and ensure there is a line with table=ycsbtable (unless you used the default table name of usertable).

bin/ycsb.sh load jdbc -P ../../mysqlworkload.properties

The error is now:

Loading workload...
Starting test.
Error in initializing the JDBS driver: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
site.ycsb.DBException: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
	at site.ycsb.db.JdbcDBClient.init(JdbcDBClient.java:228)
	at site.ycsb.DBWrapper.init(DBWrapper.java:86)
	at site.ycsb.ClientThread.run(ClientThread.java:91)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)
	at java.base/java.lang.Class.forName0(Native Method)
	at java.base/java.lang.Class.forName(Class.java:375)
	at site.ycsb.db.JdbcDBClient.init(JdbcDBClient.java:199)
	... 3 more

Looks like the MySQL connector needs to be in the class path. Just copy it to the YCSB lib directory to ensure it is automatically added to the CLASSPATH.

cp ../binaries/mysql-connector-j-8.0.32.jar lib/

To run the benchmark:

bin/ycsb.sh run jdbc -P ../../mysqlworkload.properties

One question that arises is how to control the benchmark running time. There is a maxexecutiontime (in seconds) argument that can be passed to the benchmark.

bin/ycsb.sh run jdbc -P ../../mysqlworkload.properties -p maxexecutiontime=60

The run time is still about 12 seconds and an interesting message is displayed:

Loading workload...
Starting test.
Maximum execution time specified as: 60 secs
Adding shard node URL: jdbc:mysql://127.0.0.1:3306/ycsb
Using shards: 1, batchSize:-1, fetchSize: -1
DBWrapper: report latency for each error is false and specific error codes to track for latency are: []
Could not wait until max specified time, TerminatorThread interrupted.
[OVERALL], RunTime(ms), 6756

Looks like customizing the load is the way to prolong the benchmark:

# The number of records to load into the database initially.
recordcount=1000000

# The target number of operations to perform.
operationcount=10000

# Indicates how many inserts to do if less than recordcount.
# Useful for partitioning the load among multiple servers if the client is the bottleneck.
# Additionally workloads should support the "insertstart" property which tells them which record to start at.
insertcount=10000

Outstanding Items