Categories: Crystal Growth, Elmer

Crystal Growth Simulation Failure

After completing the mesh generation in Crystal Growth Simulation – Part 1, I decided to investigate how to run the actual crystal growth simulation using Elmer. On Windows, Elmer needs to either be installed into “Program Files or be present in the path to avoid this error:

$ python run.py
crucible
melt
crystal
inductor
...
using material steel-1.4541 from self.materials_dict
using material vacuum from self.materials_dict
Wrote sif-file.
Starting simulation  ./simdata/2022-07-25_13-17_ss_test-cz-induction_vacuum  ...
Traceback (most recent call last):
  File "D:\dev\repos\fem\research\test-cz-induction\run.py", line 32, in <module>
    sim.execute()
  File "d:\dev\repos\fem\research\opencgs\opencgs\sim.py", line 182, in execute
    run_elmer_grid(self.sim_dir, "case.msh")
  File "C:\Python310\lib\site-packages\pyelmer\execute.py", line 28, in run_elmer_grid
    subprocess.run(args, cwd=sim_dir, stdout=f, stderr=f)
  File "C:\Python310\lib\subprocess.py", line 501, in run
    with Popen(*popenargs, **kwargs) as process:
  File "C:\Python310\lib\subprocess.py", line 966, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Python310\lib\subprocess.py", line 1435, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

I tried installing a locally built Elmer – see How to Build Elmer on Windows. Unfortunately, the layout created by setup defaulted to “C:\Program Files\Elmer 9.0-1b8e4f7ec” (ending with the hash of the commit I built, instead of “Release”). Worse still, it didn’t launch because of missing DLLs. I need to take a closer look at how NSIS is creating the setup executable, and also figure out how to set up the publisher so that a certificate is displayed with the UAC prompt. I ended up installing the publicly downloadable Elmer then retrying the python script.

using material steel-1.4541 from self.materials_dict
using material vacuum from self.materials_dict
Wrote sif-file.
Starting simulation  ./simdata/2022-07-25_17-23_ss_test-cz-induction_vacuum_1  ...
['ERROR:: systemc: Command exit status was 1'] [] {}
Finished simulation  ./simdata/2022-07-25_17-23_ss_test-cz-induction_vacuum_1  .
Post processing...
Traceback (most recent call last):
  File "D:\dev\repos\fem\research\test-cz-induction\run.py", line 32, in <module>
    sim.execute()
  File "d:\dev\repos\fem\research\opencgs\opencgs\sim.py", line 189, in execute
    self._postprocessing_probes()
  File "d:\dev\repos\fem\research\opencgs\opencgs\sim.py", line 139, in _postprocessing_probes
    names_data = self._read_names_file(self.sim_dir + "/results/probes.dat.names")
  File "d:\dev\repos\fem\research\opencgs\opencgs\sim.py", line 129, in _read_names_file
    with open(names_file) as f:
FileNotFoundError: [Errno 2] No such file or directory: './simdata/2022-07-25_17-23_ss_test-cz-induction_vacuum_1/02_simulation/results/probes.dat.names'

A dialog popped up asking whether I wanted to grant some elmer process access to the domain network or private, etc, and I’m not sure if I took too long to answer it but the script continued with the above failure. This looks like a bug in opencgs – why continue post processing when the simulation has failed – I filed this issue to let the author know: Do not start post-processing when simulation fails · Issue #2 · nemocrys/opencgs (github.com).

I had no idea why the simulation is failing so I used Process Explorer to see which command line was being used to invoke ElmerGrid and ElmerSolver. I was able to then manually invoke ElmerSolver and notice a segmentation fault!

ELMER SOLVER (v 9.0) STARTED AT: 2022/07/25 17:23:13
ParCommInit:  Initialize #PEs:            1
MAIN: 
MAIN: =============================================================
MAIN: ElmerSolver finite element software, Welcome!
...
MAIN: Reading Model: case.sif
 Caught LUA error:[string "loadfile("C:/Program Files (x86)/Elmer/shar..."]:1: attempt to call a nil value
 Caught LUA error:[string "loadstring(readsif("case.sif"))()"]:1: attempt to call global 'readsif' (a nil value)
LoadInputFile: Scanning input file: case.sif
LoadInputFile: Scanning only size info
...
OptimizeBandwidth: Half bandwidth after optimization: 1447
OptimizeBandwidth: ---------------------------------------------------------
'ViewFactors' is not recognized as an internal or external command,
operable program or batch file.
ERROR:: systemc: Command exit status was 1
RadiationFactors: Message
RadiationFactors: All done time (s)                      4.9000E-02
...
      60 0.2041E-08
      61 0.3710E-09
ComputeChange: NS (ITER=11) (NRM,RELC): ( 0.17032920E-04 0.41334346E-05 ) :: statmagsolver
StatMagSolve: Convergence after 11 iterations
StatMagSolve: Joule Heating (W):      4.1375E+01
StatMagSolve: All done, exiting
StatMagSolve: ------------------------------------------------
ComputeChange: SS (ITER=1) (NRM,RELC): ( 0.17032920E-04  2.0000000     ) :: statmagsolver
HeatSolve:  Found Control Point at distance:   0.0000000000000000
HeatSolve:  Control Point Index:           0
HeatSolve: Using Steady-state Heater Control
HeatSolve: 
HeatSolve: 
HeatSolve: -------------------------------------
HeatSolve:  TEMPERATURE ITERATION           1
HeatSolve: -------------------------------------
HeatSolve: 
HeatSolve: Starting Assembly...

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0xffffffff
#1  0xffffffff
#2  0xffffffff
...
#19  0xffffffff
#20  0xffffffff

Looking at the simulation folders such as “simdata\2022-07-25_22-52_ss_test-cz-induction_vacuum_1\02_simulation” revealed that there are ElmerGrid and ElmerSolver log files present with this info, so I didn’t even need Process Explorer! At this point, I have to save the SIGSEGV investigation for another day.


Crystal Growth Simulation – Part 1

Ever since I read chapter 2 of Fabrication Engineering at the Micro- and Nanoscale, I have been interested in simulating crystal growth, primarily because of the multi-discplinary nature of this problem: finite element modeling software, modeling using high performance computing, the physics of the problem (governing laws and equations), and visualization techniques. I had run across Elmer when looking up crystal growth examples a while back. About two weeks ago, I was watching this Elmer demo in which they used gmsh to show the geometry for the simulation.

I wanted to learn more about Gmsh and how it is used, so I took a detour into this Gmsh introductory video. It was an excellent tour of Gmsh and its features.

Gmsh for Crystal Growth

At this point, I wanted to see how Gmsh could be used for crystal growth simulation so I searched for “crystal growth gmsh“. I was very pleased to find this paper: Development and validation of a thermal simulation for the Czochralski crystal growth process using model experiments – ScienceDirect. It appeared to have a corresponding set of slides and a talk as well. It especially piqued my interest that there is a script to set up a mesh for a Czochralski furnace.

Running the Code

Thankfully, the code in this paper is open source. I already had python installed so the first thing I tried was executing run.py in Git bash

$ git clone https://github.com/nemocrys/test-cz-induction
$ cd test-cz-induction
$ python run.py
Traceback (most recent call last):
  File "D:\dev\repos\fem\research\test-cz-induction\run.py", line 5, in <module>
    import opencgs.control as ctrl
ModuleNotFoundError: No module named 'opencgs'

Without using my brain, I decided to install pyelmer since I knew it was a dependency (from the talk).

$ pip install pyelmer
Collecting pyelmer
  Downloading pyelmer-1.0.0-py3-none-any.whl (27 kB)
Collecting gmsh
  Downloading gmsh-4.10.5-py2.py3-none-win_amd64.whl (38.4 MB)
     |████████████████████████████████| 38.4 MB 285 kB/s
Collecting pyyaml
  Downloading PyYAML-6.0-cp310-cp310-win_amd64.whl (151 kB)
     |████████████████████████████████| 151 kB ...
Collecting matplotlib
  Downloading matplotlib-3.5.2-cp310-cp310-win_amd64.whl (7.2 MB)
     |████████████████████████████████| 7.2 MB 6.4 MB/s
Collecting pillow>=6.2.0
  Downloading Pillow-9.2.0-cp310-cp310-win_amd64.whl (3.3 MB)
     |████████████████████████████████| 3.3 MB ...
Collecting cycler>=0.10
  Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
Collecting numpy>=1.17
  Downloading numpy-1.23.1-cp310-cp310-win_amd64.whl (14.6 MB)
     |████████████████████████████████| 14.6 MB 6.4 MB/s
Collecting packaging>=20.0
  Downloading packaging-21.3-py3-none-any.whl (40 kB)
     |████████████████████████████████| 40 kB 2.5 MB/s
Collecting python-dateutil>=2.7
  Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
     |████████████████████████████████| 247 kB ...
Collecting kiwisolver>=1.0.1
  Downloading kiwisolver-1.4.4-cp310-cp310-win_amd64.whl (55 kB)
     |████████████████████████████████| 55 kB 1.6 MB/s
Collecting pyparsing>=2.2.1
  Downloading pyparsing-3.0.9-py3-none-any.whl (98 kB)
     |████████████████████████████████| 98 kB 6.8 MB/s
Collecting fonttools>=4.22.0
  Downloading fonttools-4.34.4-py3-none-any.whl (944 kB)
     |████████████████████████████████| 944 kB 6.4 MB/s
Collecting six>=1.5
  Downloading six-1.16.0-py2.py3-none-any.whl (11 kB)
Installing collected packages: six, pyparsing, python-dateutil, pillow, packaging, numpy, kiwisolver, fonttools, cycler, pyyaml, matplotlib, gmsh, pyelmer
  WARNING: Failed to write executable - trying to use .deleteme logic
ERROR: Could not install packages due to an OSError: [WinError 2] The system cannot find the file specified: 'C:\\Python310\\Scripts\\f2py.exe' -> 'C:\\Python310\\Scripts\\f2py.exe.deleteme'

WARNING: You are using pip version 21.2.4; however, version 22.2 is available.
You should consider upgrading via the 'C:\Python310\python.exe -m pip install --upgrade pip' command.

Looks like I need to run this as an administrator.

C:\dev>pip install pyelmer
Collecting pyelmer
  Using cached pyelmer-1.0.0-py3-none-any.whl (27 kB)
Collecting pyyaml
  Using cached PyYAML-6.0-cp310-cp310-win_amd64.whl (151 kB)
Collecting gmsh
  Using cached gmsh-4.10.5-py2.py3-none-win_amd64.whl (38.4 MB)
Collecting matplotlib
  Using cached matplotlib-3.5.2-cp310-cp310-win_amd64.whl (7.2 MB)
Requirement already satisfied: python-dateutil>=2.7 in c:\python310\lib\site-packages (from matplotlib->pyelmer) (2.8.2)
Requirement already satisfied: packaging>=20.0 in c:\python310\lib\site-packages (from matplotlib->pyelmer) (21.3)
Requirement already satisfied: fonttools>=4.22.0 in c:\python310\lib\site-packages (from matplotlib->pyelmer) (4.34.4)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\python310\lib\site-packages (from matplotlib->pyelmer) (1.4.4)
Requirement already satisfied: pillow>=6.2.0 in c:\python310\lib\site-packages (from matplotlib->pyelmer) (9.2.0)
Requirement already satisfied: pyparsing>=2.2.1 in c:\python310\lib\site-packages (from matplotlib->pyelmer) (3.0.9)
Requirement already satisfied: numpy>=1.17 in c:\python310\lib\site-packages (from matplotlib->pyelmer) (1.23.1)
Collecting cycler>=0.10
  Using cached cycler-0.11.0-py3-none-any.whl (6.4 kB)
Requirement already satisfied: six>=1.5 in c:\python310\lib\site-packages (from python-dateutil>=2.7->matplotlib->pyelmer) (1.16.0)
Installing collected packages: cycler, pyyaml, matplotlib, gmsh, pyelmer
Successfully installed cycler-0.11.0 gmsh-4.10.5 matplotlib-3.5.2 pyelmer-1.0.0 pyyaml-6.0
WARNING: You are using pip version 21.2.4; however, version 22.2 is available.
You should consider upgrading via the 'C:\Python310\python.exe -m pip install --upgrade pip' command.

Installing pyelmer is not sufficient to avoid the “No module named ‘opencgs’” error. The solution is to clone the opencgs repo and use pip to install (as administrator) from that repo.

git clone https://github.com/nemocrys/opencgs
cd opencgs
D:\...\research\opencgs>pip install -e .
Obtaining file:///D:/dev/repos/fem/research/opencgs
Collecting meshio
  Using cached meshio-5.3.4-py3-none-any.whl (167 kB)
Collecting pandas
  Using cached pandas-1.4.3-cp310-cp310-win_amd64.whl (10.5 MB)
Requirement already satisfied: pyyaml in c:\python310\lib\site-packages (from opencgs==0.3.1) (6.0)
Requirement already satisfied: pyelmer in c:\python310\lib\site-packages (from opencgs==0.3.1) (1.0.0)
Collecting rich
  Using cached rich-12.5.1-py3-none-any.whl (235 kB)
Requirement already satisfied: numpy in c:\python310\lib\site-packages (from meshio->opencgs==0.3.1) (1.23.1)
Requirement already satisfied: python-dateutil>=2.8.1 in c:\python310\lib\site-packages (from pandas->opencgs==0.3.1) (2.8.2)
Collecting pytz>=2020.1
  Using cached pytz-2022.1-py2.py3-none-any.whl (503 kB)
Requirement already satisfied: six>=1.5 in c:\python310\lib\site-packages (from python-dateutil>=2.8.1->pandas->opencgs==0.3.1) (1.16.0)
Requirement already satisfied: matplotlib in c:\python310\lib\site-packages (from pyelmer->opencgs==0.3.1) (3.5.2)
Requirement already satisfied: gmsh in c:\python310\lib\site-packages (from pyelmer->opencgs==0.3.1) (4.10.5)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\python310\lib\site-packages (from matplotlib->pyelmer->opencgs==0.3.1) (1.4.4)
Requirement already satisfied: pyparsing>=2.2.1 in c:\python310\lib\site-packages (from matplotlib->pyelmer->opencgs==0.3.1) (3.0.9)
Requirement already satisfied: cycler>=0.10 in c:\python310\lib\site-packages (from matplotlib->pyelmer->opencgs==0.3.1) (0.11.0)
Requirement already satisfied: pillow>=6.2.0 in c:\python310\lib\site-packages (from matplotlib->pyelmer->opencgs==0.3.1) (9.2.0)
Requirement already satisfied: fonttools>=4.22.0 in c:\python310\lib\site-packages (from matplotlib->pyelmer->opencgs==0.3.1) (4.34.4)
Requirement already satisfied: packaging>=20.0 in c:\python310\lib\site-packages (from matplotlib->pyelmer->opencgs==0.3.1) (21.3)
Collecting commonmark<0.10.0,>=0.9.0
  Using cached commonmark-0.9.1-py2.py3-none-any.whl (51 kB)
Requirement already satisfied: pygments<3.0.0,>=2.6.0 in c:\python310\lib\site-packages (from rich->meshio->opencgs==0.3.1) (2.12.0)
Installing collected packages: commonmark, rich, pytz, pandas, meshio, opencgs
  Running setup.py develop for opencgs
Successfully installed commonmark-0.9.1 meshio-5.3.4 opencgs-0.3.1 pandas-1.4.3 pytz-2022.1 rich-12.5.1
WARNING: You are using pip version 21.2.4; however, version 22.2 is available.
You should consider upgrading via the 'C:\Python310\python.exe -m pip install --upgrade pip' command.

Trying to run gives an error about a missing module named objectgmsh. My assumption was the installing pyelmer was supposed to bring in all such modules.

D:\dev\repos\fem\research\test-cz-induction>python run.py
Traceback (most recent call last):
  File "D:\dev\repos\fem\research\test-cz-induction\run.py", line 5, in <module>
    import opencgs.control as ctrl
  File "d:\dev\repos\fem\research\opencgs\opencgs\__init__.py", line 3, in <module>
    import opencgs.geo
  File "d:\dev\repos\fem\research\opencgs\opencgs\geo.py", line 1, in <module>
    from objectgmsh import *
ModuleNotFoundError: No module named 'objectgmsh'

I notice while reviewing the paper yet again that it mentions the setup being in a Dockerfile. Sure enough, objectgmsh is a separate package that needs to be installed using pip. I upgrade pip for good measure since those warnings are getting annoying.

\test-cz-induction>pip install objectgmsh
Collecting objectgmsh
  Downloading objectgmsh-0.9-py3-none-any.whl (25 kB)
Requirement already satisfied: gmsh in c:\python310\lib\site-packages (from objectgmsh) (4.10.5)
Requirement already satisfied: numpy in c:\python310\lib\site-packages (from objectgmsh) (1.23.1)
Installing collected packages: objectgmsh
Successfully installed objectgmsh-0.9

I realize I should probably be running setup.py instead of run.py. Here’s the output now!

\test-cz-induction>python setup.py
crucible
melt
crystal
inductor
seed
insulation
crucible_adapter
axis_bt
vessel
axis_top
crucible 0.001999999999999999
melt 0.003
crystal 0.0004
inductor 0.001
seed 0.00017999999999999998
insulation 0.005
crucible_adapter 0.0039
axis_bt 0.0025
vessel 0.0025
axis_top 0.0025
filling 0.0263
Warning: Mesh size = 0 for bnd_melt. Ignoring this shape...
Warning: Mesh size = 0 for bnd_seed. Ignoring this shape...
Warning: Mesh size = 0 for bnd_crystal_side. Ignoring this shape...
Warning: Mesh size = 0 for bnd_crystal_top. Ignoring this shape...
Warning: Mesh size = 0 for bnd_axtop_side. Ignoring this shape...
Warning: Mesh size = 0 for bnd_axtop_bt. Ignoring this shape...
Warning: Mesh size = 0 for bnd_crucible_bt. Ignoring this shape...
Warning: Mesh size = 0 for bnd_crucible_outside. Ignoring this shape...
Warning: Mesh size = 0 for bnd_ins. Ignoring this shape...
Warning: Mesh size = 0 for bnd_adp. Ignoring this shape...
Warning: Mesh size = 0 for bnd_axbt. Ignoring this shape...
Warning: Mesh size = 0 for bnd_vessel_inside. Ignoring this shape...
Warning: Mesh size = 0 for bnd_vessel_outside. Ignoring this shape...
Warning: Mesh size = 0 for bnd_inductor_outside. Ignoring this shape...
Warning: Mesh size = 0 for bnd_inductor_inside. Ignoring this shape...
Warning: Mesh size = 0 for bnd_symmetry_axis. Ignoring this shape...
Warning: Mesh size = 0 for if_crucible_melt. Ignoring this shape...
Warning: Mesh size = 0 for if_melt_crystal. Ignoring this shape...
Warning: Mesh size = 0 for if_crystal_seed. Ignoring this shape...
Warning: Mesh size = 0 for if_seed_axtop. Ignoring this shape...
Warning: Mesh size = 0 for if_axtop_vessel. Ignoring this shape...
Warning: Mesh size = 0 for if_crucible_ins. Ignoring this shape...
Warning: Mesh size = 0 for if_ins_adp. Ignoring this shape...
Warning: Mesh size = 0 for if_adp_axbt. Ignoring this shape...
Warning: Mesh size = 0 for if_axbt_vessel. Ignoring this shape...
Info    : Meshing 1D...
Info    : [  0%] Meshing curve 3 (Line)
Info    : [ 10%] Meshing curve 4 (Line)
Info    : [ 10%] Meshing curve 15 (Line)
Info    : [ 10%] Meshing curve 16 (Line)
Info    : [ 10%] Meshing curve 17 (Line)
Info    : [ 10%] Meshing curve 18 (BSpline)
Info    : [ 20%] Meshing curve 19 (Circle)
...
Info    : [100%] Meshing surface 12 order 2
Info    : [100%] Meshing surface 13 order 2
Info    : [100%] Meshing surface 14 order 2
Info    : Surface mesh: worst distortion = 0.526353 (0 elements in ]0, 0.2]); worst gamma = 0.252838
Info    : Done meshing order 2 (Wall 0.990998s, CPU 0.75s)
-------------------------------------------------------
Version       : 4.10.5
License       : GNU General Public License
Build OS      : Windows64-sdk
Build date    : 20220701
Build host    : gmsh.info
Build options : 64Bit ALGLIB[contrib] ANN[contrib] Bamg Blas[petsc] Blossom Cgns DIntegration DomHex Eigen[contrib] Fltk Gmm[contrib] Hxt Jpeg Kbipack Lapack[petsc] MathEx[contrib] Med Mesh Metis[contrib] Mmg Mpeg Netgen NoSocklenT ONELAB ONELABMetamodel OpenCASCADE OpenCASCADE-CAF OpenGL OpenMP OptHom PETSc Parser Plugins Png Post QuadMeshingTools QuadTri Solver TetGen/BR Voro++[contrib] WinslowUntangler Zlib
FLTK version  : 1.4.0
PETSc version : 3.15.0 (real arithmtic)
OCC version   : 7.6.1
MED version   : 4.1.0
Packaged by   : nt authority system
Web site      : https://gmsh.info
Issue tracker : https://gitlab.onelab.info/gmsh/gmsh/issues
-------------------------------------------------------

Gmsh launches (although not as a separate process) with the generated mesh:

Crystal Growth Gmsh Model

setup.py outputs the path to the mesh after closing Gmsh.

Info    : Writing './simdata/_test/case.msh'...
Info    : Done writing './simdata/_test/case.msh'
Gmsh model created with objectgmsh.
Shapes: ['crucible', 'melt', 'crystal', 'inductor', 'seed', 'insulation', 'crucible_adapter', 'axis_bt', 'vessel', 'axis_top', 'filling', 'bnd_melt', 'bnd_seed', 'bnd_crystal_side', 'bnd_crystal_top', 'bnd_axtop_side', 'bnd_axtop_bt', 'bnd_crucible_bt', 'bnd_crucible_outside', 'bnd_ins', 'bnd_adp', 'bnd_axbt', 'bnd_vessel_inside', 'bnd_vessel_outside', 'bnd_inductor_outside', 'bnd_inductor_inside', 'bnd_symmetry_axis', 'if_crucible_melt', 'if_melt_crystal', 'if_crystal_seed', 'if_seed_axtop', 'if_axtop_vessel', 'if_crucible_ins', 'if_ins_adp', 'if_adp_axbt', 'if_axbt_vessel']

The next step will be to figure out how to simulate crystal growth using pyelmer.


Categories: CUDA, Graphics

Testing nVidia Cuda Samples

I have been toying around with the idea of doing a fluid dynamics or crystal growth simulation using nVidia CUDA. I decided to try out nVidia’s cuda samples to see what their approach looks like, in particular when rendering using OpenGL. I am using Visual Studio 2022 so I simply cloned the cuda samples repo, opened the fluidsGL_vs2022.sln solution, right click on the fluidsGL project, then selected Build.

Build started...
1>------ Build started: Project: fluidsGL, Configuration: Debug x64 ------
1>D:\dev\...\cuda-samples\Samples\5_Domain_Specific\fluidsGL\fluidsGL_vs2022.vcxproj(37,5): error MSB4019: The imported project "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\MSBuild\Microsoft\VC\v170\BuildCustomizations\CUDA 11.6.props" was not found. Confirm that the expression in the Import declaration "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\MSBuild\Microsoft\VC\v170\\BuildCustomizations\CUDA 11.6.props" is correct, and that the file exists on disk.
1>Done building project "fluidsGL_vs2022.vcxproj" -- FAILED.
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

The prerequisites section does mention that the CUDA Toolkit 11.6 is required, so I close VS and install it. I end up with version 11.7 though:

When reopening the fluidsGL solution, I still get the same error about CUDA 11.6.props not being found. A quick look at the directory this file is expected to be in reveals that this is a simple version mismatch problem – see the hard coded version in the fluidsGL.vcxproj file. Instead of fixing every example .vcxproj file to match CUDA 11.7, we can patch the VS folder by running these commands from an admin command prompt:

cd "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\MSBuild\Microsoft\VC\v170\BuildCustomizations"

copy "CUDA 11.7.props" "CUDA 11.6.props"
copy "CUDA 11.7.targets" "CUDA 11.6.targets"

The code now builds in Visual Studio and I can now oooh, aaaah over the demo. Visual Studio does seem a bit sluggish at opening the entire samples solution though… I get this information about my device in the console window after the demo launches:

GPU Device 0: "Pascal" with compute capability 6.1

CUDA device [Quadro P1000] has 5 Multi-Processors
After clicking and dragging mouse around

Categories: Compilers, Fortran, LLVM

Failing to Build Flang with Visual C++

Background

Elmer is the first codebase that I have dug into that has a substantial (or really any) amount of Fortran code. I used GFortran to build it but went digging around for a clang based compiler. I found llvm-project/flang and since I had been building LLVM earlier this year, I figured it should be straightforward to build flang and perhaps explore it in a debugger.

My first attempt to build flang (on Windows, my primary OS) resulted in many build errors. Unfortunately, I was using a preview Visual Studio build, so I didn’t want to compare the errors with those from a different machine because it wasn’t the same compiler version in use. I decided to use an RTM Visual Studio compiler (VS 17.2.5) to avoid possible compiler bugs present only in VS preview builds since most people would not be using preview VS builds anyway.

Without giving it much thought, my suspicion was that any build failures probably arose from not using the correct C++ version. The source code I was trying to build (commit c0702ac0) states that it uses C++17. I set this in CMake by defining the CXX_STANDARD property. Here is the full cmake command line I used to set up the build.

cd llvm-project
mkdir build
cd build

cmake \
  -G Ninja \
  ../llvm \
  -DCMAKE_BUILD_TYPE=Release \
  -DFLANG_ENABLE_WERROR=On \
  -DLLVM_ENABLE_ASSERTIONS=ON \
  -DLLVM_TARGETS_TO_BUILD=host \
  -DCMAKE_INSTALL_PREFIX=../install
  -DLLVM_LIT_ARGS=-v \
  -DLLVM_ENABLE_PROJECTS="clang;mlir;flang" \
  -DLLVM_ENABLE_RUNTIMES="compiler-rt" \
  -DCXX_STANDARD=17

# Shown here without \ to be executable in cmd.exe
cmake -G Ninja ../llvm -DCMAKE_BUILD_TYPE=Release -DFLANG_ENABLE_WERROR=On -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_TARGETS_TO_BUILD=host -DCMAKE_INSTALL_PREFIX=../install -DLLVM_LIT_ARGS=-v -DLLVM_ENABLE_PROJECTS="clang;mlir;flang" -DLLVM_ENABLE_RUNTIMES="compiler-rt" -DCXX_STANDARD=17

That took about 2 minutes on my machine after which I ran ninja to start the build

ninja

Unfortunately, the build failed! The first error I encountered was in fold-real.cpp. Here is the command line used to invoke the compiler (shown with newlines to simplify interpretation, see Compiler options listed alphabetically | Microsoft Docs for the complete list of compiler options).

 C:\PROGRA~1\MIB055~1\2022\ENTERP~1\VC\Tools\MSVC\1432~1.313\bin\Hostx64\x64\cl.exe
 /nologo
 /TP
 -DFLANG_LITTLE_ENDIAN=1
 -DGTEST_HAS_RTTI=0 -DUNICODE
 -D_CRT_NONSTDC_NO_DEPRECATE
 ...
 -D__STDC_LIMIT_MACROS
 -ID:\dev\repos\llvm-project\build-cpp17\tools\flang\lib\Evaluate
 ...
 -ID:\dev\repos\llvm-project\llvm\include
 -external:I D:\dev\repos\llvm-project\llvm\..\mlir\include
 ...
 -external:I D:\dev\repos\llvm-project\llvm\..\clang\include
 -external:W0
 /DWIN32
 /D_WINDOWS
 /Zc:inline
 /Zc:__cplusplus
 /Oi
 /bigobj
 /permissive-
 /W4
 -wd4141
 ...
 -wd4324
 -w14062
 -we4238
 /Gw
 /WX
 /MD
 /O2
 /Ob2
 /EHs-c-
 /GR-
 -UNDEBUG
 -std:c++17
 /showIncludes
 /Fotools\flang\lib\Evaluate\CMakeFiles\obj.FortranEvaluate.dir\fold-real.cpp.obj
 /Fdtools\flang\lib\Evaluate\CMakeFiles\obj.FortranEvaluate.dir\
 /FS
 -c
 D:\dev\repos\llvm-project\flang\lib\Evaluate\fold-real.cpp

I had to look up the meaning of the C++ syntax on that line to understand why the build could be failing. Turns out to be a lambda, as explained at c++ – What is a lambda expression in C++11? – Stack Overflow.

I tried manually creating a repro for this compiler issue by creating a new Visual C++ project in Visual Studio and recreating the structure of the code failing to build. One of the questions I had was how to set conformance mode in a Visual Studio Cmake project. I still haven’t yet figured this out. However, one of the issues I ran into was that my cmake project was building the code without the /permissive- flag! I ended up switching to a regular Visual C++ project (.vcxproj) since I knew how to change the compiler options reliably for such projects. After struggling with recreating the code, I realized that I would make more progress removing code from flang’s fold-real.cpp instead. Here are some of the other searches and concepts I had to look up to understand the code while trying to create a minimal repro of the build failure.

  1. c++11 – how to remove error : X is not a class template – Stack Overflow
  2. c++ – What does template <unsigned int N> mean? – Stack Overflow
  3. what does this … (three dots) means in c++ – Stack Overflow
  4. create a reference in c++ -> Reference of Reference in C++ – Stack Overflow
  5. c++ using namespace -> Namespaces (C++) | Microsoft Docs
  6. using typedef -> Aliases and typedefs (C++) | Microsoft Docs
  7. move mechanics -> C++ Move Semantics Introduction | hacking C++ (hackingcpp.com)
  8. c++ – What is std::decay and when it should be used? – Stack Overflow

I was eventually able to create a simpler test case showing that the flang code could not build with my RTM compiler.

cl /std:c++17 /permissive- flang-msvc-clang-test.cpp

Microsoft (R) C/C++ Optimizing Compiler Version 19.32.31332 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

flang-msvc-clang-test.cpp
flang-msvc-clang-test.cpp(159): error C2065: 'T': undeclared identifier
flang-msvc-clang-test.cpp(48): note: see reference to function template instantiation 'auto FoldIntrinsicFunction::<lambda_1>::operator ()<_First>(const _T1 &) const' being compiled
        with
        [
            _First=Expr<Type<TypeCategory::Real,1>>,
            _T1=Expr<Type<TypeCategory::Real,1>>
        ]
flang-msvc-clang-test.cpp(171): note: see reference to function template instantiation 'Expr<Type<TypeCategory::Real,2>> FoldIntrinsicFunction<2>(FoldingContext &,FunctionRef<Type<TypeCategory::Real,2>> &&)' being compiled
flang-msvc-clang-test.cpp(159): error C2923: 'Scalar': 'T' is not a valid template type argument for parameter 'T'
flang-msvc-clang-test.cpp(159): note: see declaration of 'T'

So after all that, the RTM LTS Visual C++ compiler turned out to have a bug. Turns out the Visual C++ folks had already fixed this issue so the way to unblock myself was to switch to the preview Visual Studio build :(! The irony…

Suppressing Warnings

Armed with a preview build that correctly compiled the test case, the next obstacle in the build process was a set of warnings that were treated as errors: C4661 and C4101.

FAILED: tools/flang/lib/Evaluate/CMakeFiles/obj.FortranEvaluate.dir/fold.cpp.obj
C:\...\cl.exe ... -c D:\dev\repos\llvm-project\flang\lib\Evaluate\fold.cpp
D:\dev\repos\llvm-project\flang\include\flang\Evaluate\expression.h(101): error C2220: the following warning is treated as an error
D:\dev\repos\llvm-project\flang\include\flang\Evaluate\expression.h(101): warning C4661: 'std::optional<Fortran::evaluate::DynamicType> Fortran::evaluate::ExpressionBase<Fortran::evaluate::SomeDerived>::GetType(void) const': no suitable definition provided for explicit template instantiation request
...
FAILED: tools/flang/lib/Evaluate/CMakeFiles/obj.FortranEvaluate.dir/fold-complex.cpp.obj
C:\...\cl.exe ... -c D:\dev\repos\llvm-project\flang\lib\Evaluate\fold-complex.cpp
D:\dev\repos\llvm-project\flang\lib\Evaluate\fold-implementation.h(1583): error C2220: the following warning is treated as an error
D:\dev\repos\llvm-project\flang\lib\Evaluate\fold-implementation.h(1583): warning C4101: 'buffer': unreferenced local variable

I tried to suppressed them to keep marching forward:

cd \dev\repos\llvm-project
mkdir build-nowarn
cd build-nowarn

cmake -G Ninja ../llvm -DCMAKE_BUILD_TYPE=Release -DFLANG_ENABLE_WERROR=On -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_TARGETS_TO_BUILD=host -DCMAKE_INSTALL_PREFIX=../install -DLLVM_LIT_ARGS=-v -DLLVM_ENABLE_PROJECTS="clang;mlir;flang" -DLLVM_ENABLE_RUNTIMES="compiler-rt" -DCXX_STANDARD=17 -DCXX_FLAGS="-wd4661 -wd4101"

ninja

Defining CXX_FLAGS like that did not work so I end up looking around for how to disable warnings in cmake. This was when I discovered that CMAKE_CXX_STANDARD is not necessary on the command line because flang/CMakeLists.txt already requires C++17. Trying to append the warning disable option /wdXXXX to that file didn’t work either. However, the comment on line 329 made me explore HandleLLVMOptions.cmake. There, I discovered support for setting the number of parallel jobs (via /MP for Visual C++). This file also contained the code that sets up most of the compiler options used when building! Closer to the task at hand is the discover of the LLVM_ENABLE_WARNINGS option and the hard-coded list of MSVC warning flags! I therefore made this change (before running cmake and ninja) to get the warning flags to be respected:

diff --git a/llvm/cmake/modules/HandleLLVMOptions.cmake b/llvm/cmake/modules/HandleLLVMOptions.cmake
index 56d05f5b5fce..589281b232f1 100644
--- a/llvm/cmake/modules/HandleLLVMOptions.cmake
+++ b/llvm/cmake/modules/HandleLLVMOptions.cmake
@@ -648,6 +648,8 @@ if (MSVC)
           # v15.8.8. Re-evaluate the usefulness of this diagnostic when the bug
           # is fixed.
       -wd4709 # Suppress comma operator within array index expression
+      -wd4101  # Suppress ...
+      -wd4661  # Suppress ...

       # Ideally, we'd like this warning to be enabled, but even MSVC 2019 doesn't
       # support the 'aligned' attribute in the way that clang sources requires (for
cmake -G Ninja ../llvm -DCMAKE_BUILD_TYPE=Release -DFLANG_ENABLE_WERROR=On -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_TARGETS_TO_BUILD=host -DCMAKE_INSTALL_PREFIX=../install -DLLVM_LIT_ARGS=-v -DLLVM_ENABLE_PROJECTS="clang;mlir;flang" -DLLVM_ENABLE_RUNTIMES="compiler-rt"

Another Compiler Failure

With the aforementioned change, the build proceeded to a different build failure, this time in fold-integer.cpp.

FAILED: tools/flang/lib/Evaluate/CMakeFiles/obj.FortranEvaluate.dir/fold-integer.cpp.obj
C:\PROGRA~1\MIB055~1\2022\Preview\VC\Tools\MSVC\1433~1.316\bin\Hostx64\x64\cl.exe  /nologo /TP -DFLANG_LITTLE_ENDIAN=1 -DGTEST_HAS_RTTI=0 -DUNICODE -D_CRT_NONSTDC_NO_DEPRECATE -D_CRT_NONSTDC_NO_WARNINGS -D_CRT_SECURE_NO_DEPRECATE -D_CRT_SECURE_NO_WARNINGS -D_HAS_EXCEPTIONS=0 -D_SCL_SECURE_NO_DEPRECATE -D_SCL_SECURE_NO_WARNINGS -D_UNICODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -ID:\dev\repos\llvm-project\build-vsmain\tools\flang\lib\Evaluate -ID:\dev\repos\llvm-project\flang\lib\Evaluate -ID:\dev\repos\llvm-project\flang\include -ID:\dev\repos\llvm-project\build-vsmain\tools\flang\include -ID:\dev\repos\llvm-project\build-vsmain\include -ID:\dev\repos\llvm-project\llvm\include -external:ID:\dev\repos\llvm-project\llvm\..\mlir\include -external:ID:\dev\repos\llvm-project\build-vsmain\tools\mlir\include -external:ID:\dev\repos\llvm-project\build-vsmain\tools\clang\include -external:ID:\dev\repos\llvm-project\llvm\..\clang\include -external:W0 /DWIN32 /D_WINDOWS   /Zc:inline /Zc:__cplusplus /Oi /bigobj /permissive- /W4 -wd4141 -wd4146 -wd4244 -wd4267 -wd4291 -wd4351 -wd4456 -wd4457 -wd4458 -wd4459 -wd4503 -wd4624 -wd4722 -wd4100 -wd4127 -wd4512 -wd4505 -wd4610 -wd4510 -wd4702 -wd4245 -wd4706 -wd4310 -wd4701 -wd4703 -wd4389 -wd4611 -wd4805 -wd4204 -wd4577 -wd4091 -wd4592 -wd4319 -wd4709 -wd4101 -wd4661 -wd4324 -w14062 -we4238 /Gw /WX /MD /O2 /Ob2  /EHs-c- /GR- -UNDEBUG -std:c++17 /showIncludes /Fotools\flang\lib\Evaluate\CMakeFiles\obj.FortranEvaluate.dir\fold-integer.cpp.obj /Fdtools\flang\lib\Evaluate\CMakeFiles\obj.FortranEvaluate.dir\ /FS -c D:\dev\repos\llvm-project\flang\lib\Evaluate\fold-integer.cpp
D:\dev\repos\llvm-project\flang\lib\Evaluate\fold-integer.cpp(771): error C2672: 'invoke': no matching overloaded function found
C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.33.31627\include\type_traits(1552): note: could be 'unknown-type std::invoke(_Callable &&,_Ty1 &&,_Types2 &&...) noexcept(<expr>)'
D:\dev\repos\llvm-project\flang\lib\Evaluate\fold-integer.cpp(771): note: Failed to specialize function template 'unknown-type std::invoke(_Callable &&,_Ty1 &&,_Types2 &&...) noexcept(<expr>)'
C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.33.31627\include\type_traits(1552): note: see declaration of 'std::invoke'
...

By this point, I knew that simplifying the function containing the error was the fastest path to a repro. One of the little problems I ran into was how to figure out the type of fptr since it is declared using the auto keyword. I ended up assigning it to a new temporary variable of a different type, e.g. char et voila!

D:\dev\repos\llvm-project\flang\lib\Evaluate\fold-integer.cpp(505): error C2440: 'initializing': cannot convert from 'int (__cdecl Fortran::evaluate::value::Integer<8,true,8,unsigned char,unsigned short>::* )(void) const' to 'char'

I then removed the temporary assignment and explicitly specified this type as the type of fptr:

using T2 = int (__cdecl Fortran::evaluate::value::Integer<8,true,8,unsigned char,unsigned short>::* )(void) const;

T2 fptr{&Scalar<TI>::LEADZ};

The build then failed because the function pointer types are not the same, which was really confusing given that I had just checked the type of fptr.

D:\dev\repos\llvm-project\flang\lib\Evaluate\fold-integer.cpp(504): error C2440: 'initializing': cannot convert from
'int (__cdecl Fortran::evaluate::value::Integer<16,true,16,unsigned short,unsigned int>::* )(void) const' to
'int (__cdecl Fortran::evaluate::value::Integer<8,true,8,unsigned char,unsigned short>::* )(void) const'

I switched the type of fptr and got a different error:

D:\dev\repos\llvm-project\flang\include\flang\Evaluate\integer.h(66): error C2607: static assertion failed
D:\dev\repos\llvm-project\flang\lib\Evaluate\fold-integer.cpp(490): note: see reference to class template instantiation 'Fortran::evaluate::value::Integer<16,true,16,unsigned char,unsigned short>' being compiled

Here is a different change I tried:

using T2 = int (__cdecl Fortran::evaluate::value::Integer<8>::* )(void) const;

T2 fptr{&Scalar<TI>::LEADZ};

That still failed with the following error:

D:\dev\repos\llvm-project\flang\lib\Evaluate\fold-integer.cpp(504): error C2440: 'initializing': cannot convert from
'int (__cdecl Fortran::evaluate::value::Integer<8,true,8,unsigned char,unsigned short>::* )(void) const' to
'int (__cdecl Fortran::evaluate::value::Integer<32,true,32,unsigned int,unsigned __int64>::* )(void) const'

It was at this point that I realized that it was time to learn a bit more about decay. What is decay and array-to-pointer conversion? | C++ FAQ (64.github.io) had a good explanation of why the term decay is used. Perhaps a reexamination of std::decay – cppreference.com might lead to some insight. I wasn’t sure what Result referred to in the statement using TI = typename std::decay_t<decltype(n)>::Result; One idea I got was to append a number to the typename and examine the compiler error. Here’s the new line 752 of llvm-project/fold-integer.cpp and the resulting compiler error showing that this name cannot be arbitrary.

using TI = typename std::decay_t<decltype(n)>::Result3;


C:\dev\repos\llvm-project\flang\lib\Evaluate\fold-integer.cpp(502): error C2039: 'Result3': is not a member of 'Fortran::evaluate::Expr<Fortran::evaluate::Type<Fortran::common::TypeCategory::Integer,1>>'

Aha, so what it was referring to is the using statement in llvm-project/expression.h!

template <int KIND>
class Expr<Type<TypeCategory::Integer, KIND>>
    : public ExpressionBase<Type<TypeCategory::Integer, KIND>> {
public:
  using Result = Type<TypeCategory::Integer, KIND>;

...

The problematic lambda is therefore expecting a Scalar<Type<TypeCategory::Integer, KIND>>. Scalar is defined using decay and Type<TypeCategory::Integer, KIND>::Scalar is defined in llvm-project/type.h as the type value::Integer<8 * KIND>. This is when I see the reason for the previous build errors about mismatched Integer sizes no matter which size I picked – the fixed type I was using didn’t allow for the different template instantiations! Note that the problematic lambda is defined as a ScalarFunc.

By this point, I had a self-contained repro of the compiler bug, which ironically, compiled successfully on the RTM C++ compiler so I could use neither the preview nor the RTM to build the flang code.

cl /c /TP /std:c++17 /permissive- flang-msvc-clang-test-02.cpp

This compiler invocation gives the same error seen when compiling the flang code:

Microsoft (R) C/C++ Optimizing Compiler Version 19.33.31627.1 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

flang-msvc-clang-test-02.cpp
flang-msvc-clang-test-02.cpp(193): error C2672: 'invoke': no matching overloaded function found
C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.34.31721\include\type_traits(1552): note: could be 'unknown-type std::invoke(_Callable &&,_Ty1 &&,_Types2 &&...) noexcept(<expr>)'
flang-msvc-clang-test-02.cpp(193): note: Failed to specialize function template 'unknown-type std::invoke(_Callable &&,_Ty1 &&,_Types2 &&...) noexcept(<expr>)'

I ended up reporting this compiler bug via the Visual Studio feedback system – see C++17 lambda fails to compile on latest VS preview compiler.


Categories: Java, Testing

Investigating a jtreg Failure

Download jtreg from the AdoptOpenJDK dependency pipeline (adoptopenjdk.net). For this investigation, I’ll be using my MacBook M1 running Monterey 12.1.

mkdir investigate
cd investigate
git clone https://github.com/openjdk/jdk11u

# Download jtreg 6
curl -Lo jtreg-6+1.tar.gz https://ci.adoptopenjdk.net/view/Dependencies/job/dependency_pipeline/lastSuccessfulBuild/artifact/jtreg/jtreg-6+1.tar.gz

tar xzfv jtreg-6+1.tar.gz

cd jdk11u

We switch the current directory to the root of jdk11u repo so that test paths are relative to the repo root. I will assume that we’re in the jdk11u repo root directory and are using the directory structure generated by the commands above. To see a detailed list of all the jtreg options, run this command:

../jtreg/bin/jtreg -help all

Now let us try to run a jtreg test, specifically AmazonCA.java:

../jtreg/bin/jtreg test/jdk/security/infra/java/security/cert/CertPathValidator/certification/AmazonCA.java

There are some failure messages but it looks like a test ran.

failed to get value for vm.hasJFR
java.lang.UnsatisfiedLinkError: 'boolean sun.hotspot.WhiteBox.isJFRIncludedInVmBuild()'
	at sun.hotspot.WhiteBox.isJFRIncludedInVmBuild(Native Method)
	at requires.VMProps.vmHasJFR(VMProps.java:343)
	at requires.VMProps$SafeMap.put(VMProps.java:72)
	at requires.VMProps.call(VMProps.java:107)
	at requires.VMProps.call(VMProps.java:60)
	at com.sun.javatest.regtest.agent.GetJDKProperties.run(GetJDKProperties.java:80)
	at com.sun.javatest.regtest.agent.GetJDKProperties.main(GetJDKProperties.java:54)
failed to get value for vm.aot.enabled
java.lang.UnsatisfiedLinkError: 'int sun.hotspot.WhiteBox.aotLibrariesCount()'
	at sun.hotspot.WhiteBox.aotLibrariesCount(Native Method)
	at requires.VMProps.vmAotEnabled(VMProps.java:408)
	at requires.VMProps$SafeMap.put(VMProps.java:72)
	at requires.VMProps.call(VMProps.java:112)
	at requires.VMProps.call(VMProps.java:60)
	at com.sun.javatest.regtest.agent.GetJDKProperties.run(GetJDKProperties.java:80)
	at com.sun.javatest.regtest.agent.GetJDKProperties.main(GetJDKProperties.java:54)
.
.
.
Test results: passed: 1
Report written to /Users/saint/repos/java/jdk11u/JTreport/html/report.html
Results written to /Users/saint/repos/java/jdk11u/JTwork

Are these failure messages concerning given that the test passed? Reviewing the test report suggests not. The report keywords mention bug 8233223, which must be Bug ID: JDK-8233223 Add Amazon Root CA certificates (java.com). From the look of things, the java.lang.UnsatisfiedLinkErrors can be safely ignored (for this test anyway). That said, let us dig into these errors to ensure we understand what is happening.

The immediate cause of these errors is the failure to get the values for the SafeMap in VMProps.java. This raises the question of which JDK is being used by jtreg? My MacBook has both JDK11 and JDK17. The default java version is:

java -version
openjdk version "17.0.1" 2021-10-19 LTS
OpenJDK Runtime Environment Microsoft-28056 (build 17.0.1+12-LTS)
OpenJDK 64-Bit Server VM Microsoft-28056 (build 17.0.1+12-LTS, mixed mode)

Let’s ensure jtreg is using JDK11 by setting JTREG_JAVA.

JTREG_JAVA=/Library/Java/JavaVirtualMachines/microsoft-11.jdk/Contents/Home

$JTREG_JAVA/bin/java -version

openjdk version "11.0.14" 2022-01-18 LTS
OpenJDK Runtime Environment Microsoft-30257 (build 11.0.14+9-LTS)
OpenJDK 64-Bit Server VM Microsoft-30257 (build 11.0.14+9-LTS, mixed mode)

../jtreg/bin/jtreg test/jdk/security/infra/java/security/cert/CertPathValidator/certification/AmazonCA.java

We still see the same warnings though so let us explicitly use the -jdk option:

../jtreg/bin/jtreg -jdk:$JTREG_JAVA test/jdk/security/infra/java/security/cert/CertPathValidator/certification/AmazonCA.java

We now get an interesting error message indicating that the -jdk option was using the newer JDK17.

Exception while calling user-specified class: requires.VMProps
java.lang.UnsupportedClassVersionError: requires/VMProps has been compiled by a more recent version of the Java Runtime (class file version 61.0), this version of the Java Runtime only recognizes class file versions up to 55.0
	at java.base/java.lang.ClassLoader.defineClass1(Native Method)
	...
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
	at java.base/java.lang.Class.forName0(Native Method)
	at java.base/java.lang.Class.forName(Class.java:315)
	at com.sun.javatest.regtest.agent.GetJDKProperties.run(GetJDKProperties.java:78)
	at com.sun.javatest.regtest.agent.GetJDKProperties.main(GetJDKProperties.java:54)
failed to get JDK properties for /Library/Java/JavaVirtualMachines/microsoft-11.jdk/Contents/Home/bin/java ; exit code 1
Error: failed to get JDK properties for /Library/Java/JavaVirtualMachines/microsoft-11.jdk/Contents/Home/bin/java ; exit code 1

On my machine, I can remove these files as follows:

ls -l /Users/saint/repos/java/jdk11u/JTwork/extraPropDefns/classes/requires

rm -fr /Users/saint/repos/java/jdk11u/JTwork/extraPropDefns/classes/requires

Rerunning the test now results in a single (different) UnsatisfiedLinkError AND a test failure! However, we now have a properly set up environment since we control the JDK version tested by jtreg.

jdk11u % ../jtreg/bin/jtreg -jdk:$JTREG_JAVA test/jdk/security/infra/java/security/cert/CertPathValidator/certification/AmazonCA.java

failed to get value for vm.musl
java.lang.UnsatisfiedLinkError: 'java.lang.String sun.hotspot.WhiteBox.getLibcName()'
	at sun.hotspot.WhiteBox.getLibcName(Native Method)
	at requires.VMProps.isMusl(VMProps.java:514)
	at requires.VMProps$SafeMap.put(VMProps.java:72)
	at requires.VMProps.call(VMProps.java:122)
	at requires.VMProps.call(VMProps.java:60)
	at com.sun.javatest.regtest.agent.GetJDKProperties.run(GetJDKProperties.java:80)
	at com.sun.javatest.regtest.agent.GetJDKProperties.main(GetJDKProperties.java:54)
Test results: failed: 1
Report written to /Users/saint/repos/java/jdk11u/JTreport/html/report.html
Results written to /Users/saint/repos/java/jdk11u/JTwork
Error: Some tests failed or other problems occurred.

Now here’s an interesting question: why doesn’t this approach yield identical result to setting the -jdk flag to this same JTREG_JAVA path?

JTREG_JAVA=/Library/Java/JavaVirtualMachines/microsoft-11.jdk/Contents/Home

../jtreg/bin/jtreg test/jdk/security/infra/java/security/cert/CertPathValidator/certification/AmazonCA.java

After doing some printf (or rather echo) debugging and observing an empty string for JTREG_JAVA, the culprit turns out to be the difference between a shell variable and an environment variable. See command line – What is the difference in usage between shell variables and environment variables? – Unix & Linux Stack Exchange. For the jtreg script to pull in this value of JTREG_JAVA, it needs to be an environment variable. It should therefore show up when this command is executed:

printenv | grep -i java

The proper way to execute this test then is:

JTREG_JAVA=/Library/Java/JavaVirtualMachines/microsoft-11.jdk/Contents/Home

export JTREG_JAVA

../jtreg/bin/jtreg test/jdk/security/infra/java/security/cert/CertPathValidator/certification/AmazonCA.java

The outcome of the experiment so far though is that the AmazonCA test appears to fail when run with JDK11 and pass when run with JDK17 (of the respective versions). To convince ourselves that the infrastructure is fine, we can run this test with JDK11 (which is our focus) after exporting JTREG_JAVA.

../jtreg/bin/jtreg test/hotspot/jtreg/compiler/aot/cli/IncorrectAOTLibraryTest.java

This test passes, despite the single UnsatisfiedLinkError printed out.

failed to get value for vm.musl
java.lang.UnsatisfiedLinkError: 'java.lang.String sun.hotspot.WhiteBox.getLibcName()'
	at sun.hotspot.WhiteBox.getLibcName(Native Method)
	at requires.VMProps.isMusl(VMProps.java:514)
	at requires.VMProps$SafeMap.put(VMProps.java:72)
	at requires.VMProps.call(VMProps.java:122)
	at requires.VMProps.call(VMProps.java:60)
	at com.sun.javatest.regtest.agent.GetJDKProperties.run(GetJDKProperties.java:80)
	at com.sun.javatest.regtest.agent.GetJDKProperties.main(GetJDKProperties.java:54)
Test results: passed: 1
Report written to /Users/saint/repos/java/jdk11u/JTreport/html/report.html
Results written to /Users/saint/repos/java/jdk11u/JTwork

An Interesting Test

The above experimentation was inspired by AotInvokeDynamic2AotTest.java. The first time I tried to run this test, I used this command line.

../jtreg/bin/jtreg test/hotspot/jtreg/compiler/aot/calls/fromAot/AotInvokeDynamic2AotTest.java

We first set of 5 UnsatisfiedLinkError failures in the previous experiment were displayed but no tests were executed.

...
Test results: no tests selected
Report written to /Users/saint/repos/java/jdk11u/JTreport/html/report.html

This was happening while jtreg was using JDK17 and one of the values that could not be get()ed vm.aot.enabled. Could that be why there were no selected tests? Ignoring that rabbit hole for now sine jdk11u is our focus. We can now run the test with JTREG_JAVA exported. The test is now run but fails with this message in JTreport/text/summary.txt:

compiler/aot/calls/fromAot/AotInvokeDynamic2AotTest.java  Failed. Execution failed: `main' threw exception: java.lang.RuntimeException: Expected to get exit value of [0]

To see more details about the test failure, use the -verbose flag:

../jtreg/bin/jtreg -verbose:fail,error,summary test/hotspot/jtreg/compiler/aot/calls/fromAot/AotInvokeDynamic2AotTest.java

Here is the a portion of the output of the test run. Notice the linker error in there!

ACTION: build -- Passed. All files up to date
REASON: Named class compiled on demand
TIME:   0.0 seconds
messages:
command: build compiler.aot.AotCompiler
reason: Named class compiled on demand
elapsed time (seconds): 0.0

ACTION: driver -- Failed. Execution failed: `main' threw exception: java.lang.RuntimeException: Expected to get exit value of [0]
REASON: User specified action: run driver compiler.aot.AotCompiler -libname AotInvokeDynamic2AotTest.so -class compiler.calls.common.InvokeDynamic -extraopt -XX:+UnlockDiagnosticVMOptions -extraopt -XX:+WhiteBoxAPI -extraopt -Xbootclasspath/a:. 
TIME:   4.821 seconds
messages:
command: driver compiler.aot.AotCompiler -libname AotInvokeDynamic2AotTest.so -class compiler.calls.common.InvokeDynamic -extraopt -XX:+UnlockDiagnosticVMOptions -extraopt -XX:+WhiteBoxAPI -extraopt -Xbootclasspath/a:.
reason: User specified action: run driver compiler.aot.AotCompiler -libname AotInvokeDynamic2AotTest.so -class compiler.calls.common.InvokeDynamic -extraopt -XX:+UnlockDiagnosticVMOptions -extraopt -XX:+WhiteBoxAPI -extraopt -Xbootclasspath/a:. 
Mode: othervm
Additional options from @modules: --add-modules java.base --add-exports java.base/jdk.internal.org.objectweb.asm=ALL-UNNAMED --add-exports java.base/jdk.internal.misc=ALL-UNNAMED
elapsed time (seconds): 4.821
configuration:
Boot Layer
  add modules: java.base                                
  add exports: java.base/jdk.internal.misc              ALL-UNNAMED
               java.base/jdk.internal.org.objectweb.asm ALL-UNNAMED

STDOUT:
Command line: [/usr/bin/ld -v]
@(#)PROGRAM:ld  PROJECT:ld64-764
BUILD 11:22:50 Apr 28 2022
configured to support archs: armv6 armv7 armv7s arm64 arm64e arm64_32 i386 x86_64 x86_64h armv6m armv7k armv7m armv7em
LTO support using: LLVM version 13.1.6, (clang-1316.0.21.2.5) (static support for 28, runtime is 28)
TAPI support using: Apple TAPI version 13.1.6 (tapi-1316.0.7.3)

found working linker: /usr/bin/ld
Command line: [/Library/Java/JavaVirtualMachines/microsoft-11.jdk/Contents/Home/bin/jaotc -J-XX:+UnlockDiagnosticVMOptions -J-XX:+WhiteBoxAPI -J-Xbootclasspath/a:. -J-classpath -J/Users/saint/repos/java/jdk11u/JTwork/classes/compiler/aot/calls/fromAot/AotInvokeDynamic2AotTest.d:/Users/saint/repos/java/jdk11u/JTwork/classes/test/lib:/Users/saint/repos/java/jdk11u/JTwork/classes/testlibrary:/Users/saint/repos/java/jdk11u/JTwork/classes:/Users/saint/repos/java/jdk11u/test/hotspot/jtreg/compiler/aot/calls/fromAot --compile-with-assertions --info --output AotInvokeDynamic2AotTest.so --class-name compiler.calls.common.InvokeDynamic -J-ea -J-esa -J-Xmixed]
Compiling AotInvokeDynamic2AotTest.so...
1 classes found (22 ms)
9 methods total, 8 methods to compile (12 ms)
Compiling with 10 threads
.
8 methods compiled, 0 methods failed (2785 ms)
Parsing compiled code (7 ms)
Processing metadata (46 ms)
Preparing stubs binary (3 ms)
Preparing compiled binary (2 ms)
Creating binary: AotInvokeDynamic2AotTest.so.o (18 ms)
Creating shared library: AotInvokeDynamic2AotTest.so (30 ms)
Exception in thread "main" java.lang.InternalError: ld: dynamic main executables must link with libSystem.dylib for architecture x86_64
	at jdk.aot@11.0.14/jdk.tools.jaotc.Linker.link(Linker.java:142)
	at jdk.aot@11.0.14/jdk.tools.jaotc.Main.run(Main.java:262)
	at jdk.aot@11.0.14/jdk.tools.jaotc.Main.run(Main.java:133)
	at jdk.aot@11.0.14/jdk.tools.jaotc.Main.main(Main.java:89)

Why on earth is there an error about x86_64 on my M1? Here is the failing command line, listed separately for easy execution:

/Library/Java/JavaVirtualMachines/microsoft-11.jdk/Contents/Home/bin/jaotc -J-XX:+UnlockDiagnosticVMOptions -J-XX:+WhiteBoxAPI -J-Xbootclasspath/a:. -J-classpath -J/Users/saint/repos/java/jdk11u/JTwork/classes/compiler/aot/calls/fromAot/AotInvokeDynamic2AotTest.d:/Users/saint/repos/java/jdk11u/JTwork/classes/test/lib:/Users/saint/repos/java/jdk11u/JTwork/classes/testlibrary:/Users/saint/repos/java/jdk11u/JTwork/classes:/Users/saint/repos/java/jdk11u/test/hotspot/jtreg/compiler/aot/calls/fromAot --compile-with-assertions --info --output AotInvokeDynamic2AotTest.so --class-name compiler.calls.common.InvokeDynamic -J-ea -J-esa -J-Xmixed

Once this command completes (and fails), a file named AotInvokeDynamic2AotTest.so.o exists on disk. The format of the ld command can be deduced from Linker.java:101. The ld command can then be directly invoked to see the actual failure:

% ld -dylib -o AotInvokeDynamic2AotTest.so AotInvokeDynamic2AotTest.so.o
ld: dynamic main executables must link with libSystem.dylib for architecture x86_64

As per Clang -nostdlib option not working | Apple Developer Forums I tried adding the -lSystem option but that was not sufficient.

% ld -lSystem -dylib -o AotInvokeDynamic2AotTest.so AotInvokeDynamic2AotTest.so.o
ld: library not found for -lSystem

Exploring Mach-O, Part 1 | g.p. anders (gpanders.com) pointed out that the solution is to include the path to the lib folder as well!

ld -L /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/lib/ -lSystem -dylib -o AotInvokeDynamic2AotTest.so AotInvokeDynamic2AotTest.so.o

The proper way to address this test failure then is to fix Linker.java to pass these additional flags to ld.


Categories: Graphics

Building Gmsh in Windows

To build Gmsh on Windows, launch MSYS via the “MSYS2 MinGW x64” shortcut then run these commands:

git clone https://gitlab.onelab.info/gmsh/gmsh.git
cd gmsh
mkdir build
cd build
cmake .. -G "MSYS Makefiles" -DENABLE_FLTK:BOOL=TRUE -DCMAKE_INSTALL_PREFIX=../install

make install

Background

The build instructions for the Gmsh repo require the make command to build the code. The build instructions do not seem particularly specific for Windows. At the very least, they imply a Cygwin environment. I ended up using the environment I configured for building Elmer. Interestingly, the Ninja system is selected (not Visual Studio).

git clone https://gitlab.onelab.info/gmsh/gmsh.git
cd gmsh
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=../install
-- Building for: Ninja
-- The CXX compiler identification is GNU 12.1.0
-- The C compiler identification is GNU 12.1.0
...

Compiler for Shipped Gmsh

Out of curiosity, I used a text editor to open the gmsh.exe binary I downloaded. I was curious to see if I could determine which compiler was used to generate it. The data in gmsh.exe was so much richer than I could have expected – there is information about not just the compiler, but also the compilation flags! Granted, I’m not yet 100% sure that this metadata is describing the gmsh.exe binary.

General Information:
-------------------
                   HDF5 Version: 1.10.5
                  Configured on: Mon May 17 05:00:08 PDT 2021
                  Configured by: geuzaine@DESKTOP-BH6899J
                    Host system: x86_64-unknown-cygwin
              Uname information: CYGWIN_NT-10.0 DESKTOP-BH6899J 3.2.0(0.340/5/3) 2021-03-29 08:42 x86_64 Cygwin
                       Byte sex: little-endian
             Installation point: /usr/local

Compiling Options:
------------------
                     Build Mode: production
              Debugging Symbols: no
                        Asserts: no
                      Profiling: no
             Optimization Level: high

Linking Options:
----------------
                      Libraries: static
  Statically Linked Executables: 
                        LDFLAGS: 
                     H5_LDFLAGS: 
                     AM_LDFLAGS: 
                Extra libraries: -lm 
                       Archiver: ar
                       AR_FLAGS: cr
                         Ranlib: ranlib

Languages:
----------
                              C: yes
                     C Compiler: /usr/bin/x86_64-w64-mingw32-gcc ( x86_64-w64-mingw32-gcc (GCC) 10.2.0)
                       CPPFLAGS: 
                    H5_CPPFLAGS:   -DNDEBUG -UH5_DEBUG_API
                    AM_CPPFLAGS: -D_FILE_OFFSET_BITS=64 
                        C Flags: 
                     H5 C Flags:   -pedantic -Wall -Wextra -Wbad-function-cast -Wc++-compat -Wcast-align -Wcast-qual -Wconversion -Wdeclaration-after-statement -Wdisabled-optimization -Wfloat-equal -Wformat=2 -Winit-self -Winvalid-pch -Wmissing-declarations -Wmissing-include-dirs -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wpacked -Wpointer-arith -Wredundant-decls -Wshadow -Wstrict-prototypes -Wswitch-default -Wswitch-enum -Wundef -Wunused-macros -Wunsafe-loop-optimizations -Wwrite-strings -finline-functions -s -Wno-inline -Wno-aggregate-return -Wno-missing-format-attribute -Wno-missing-noreturn -O
                     AM C Flags: 
               Shared C Library: no
               Static C Library: yes


                        Fortran: no

                            C++: no

                           Java: no


Features:
---------
                   Parallel HDF5: no
Parallel Filtered Dataset Writes: no
              Large Parallel I/O: no
              High-level library: yes
                    Threadsafety: no
             Default API mapping: v110
  With deprecated public symbols: yes
          I/O filters (external): 
                             MPE: no
                      Direct VFD: no
                         dmalloc: no
  Packages w/ extra debug output: none
                     API tracing: no
            Using memory checker: no
 Memory allocation sanity checks: no
          Function stack tracing: no
       Strict file format checks: no
    Optimization instrumentation: no

Preceding this info was the substring SUMMARY OF THE HDF5 CONFIGURATION. It’s only now that I’m learning about HDF. Reviewing the cmake output also shows that cmake searched for HDF5 (output from CMakeLists.txt):

...
-- Could NOT find HDF5 (missing: HDF5_LIBRARIES HDF5_INCLUDE_DIRS) (found version "")
-- HDF5 not found
...

Finding Program Entry Point via GDB

Which GUI does Gmsh use? This was the simple question I wanted to answer. Looking around the tree I wasn’t sure where the program entry point was. Ended up searching for the regex main([^)] and there are still so many results (looks like benchmarks). Hmm, I could just launch it under the debugger and see which main function is being invoked. Finding instructions for this took longer than I expected (because I ignored the GDB manual). I eventually stumbled into debugging – GDB is not stopping at the first machine code instruction with break _start or info files Entry point address – Stack Overflow, which had the very useful info file command for showing the program’s entry point.

gdb gmsh.exe
GNU gdb (GDB) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
...
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from gmsh.exe...
(gdb) b main
Function "main" not defined.
...
(gdb) b winmain
Function "winmain" not defined.
...
(gdb) step 1
The program is not being run.
(gdb) info file
Symbols from "D:\dev\repos\other\gmsh\build\gmsh.exe".
Local exec file:
        `D:\dev\repos\other\gmsh\build\gmsh.exe', file type pei-x86-64.
         Entry point: 0x100014f0
        0x0000000010001000 - 0x0000000010d73100 is .text
        0x0000000010d74000 - 0x0000000010dcb700 is .data
...
(gdb) b 0x100014f0
Function "0x100014f0" not defined.
Make breakpoint pending on future shared library load? (y or [n]) [answered N; input not from terminal]
(gdb) b *0x100014f0
Breakpoint 1 at 0x100014f0: file C:/M/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c, line 196.

Running the program runs it to termination but by this point in time, I had already found what I was pretty sure was the program’s entry point (by browsing the sources in VSCode). Sure enough, relaunching gdb is now seamless:

(gdb) b wmain
Breakpoint 1 at 0x10001550: file D:/dev/repos/other/gmsh/src/common/Main.cpp, line 34.
(gdb) b GmshMainBatch
Breakpoint 2 at 0x10002fc0: file D:/dev/repos/other/gmsh/src/common/GmshGlobal.cpp, line 486.
(gdb) c
Continuing.

Thread 1 hit Breakpoint 2, GmshMainBatch (argc=1, argv=0x209707c5a90) at D:/dev/repos/other/gmsh/src/common/GmshGlobal.cpp:486
486     {

The GDB User Manual has a section on Starting your Program, which mentions that the starti command “does the equivalent of setting a temporary breakpoint at the first instruction of a program’s execution and then invoking the ‘run’ command.” This is probably closest to what I’ve been looking for.

Building with FLTK

In the midst of the chaos of trying to find the program’s entry point, I read a little bit about FLTK and discovered that it’s the GUI system Gmsh is using. Here’s the new cmake command to build the GUI-enabled Gmsh.

cmake .. -DENABLE_FLTK:BOOL=TRUE -DCMAKE_INSTALL_PREFIX=../install
ninja

Not only is the newly built gmsh.exe binary is a whooping 674MB, it also can’t start:

Unable to start gmsh.exe because libgomp-1.dll is missing

Same error is displayed for libgomp-10.dll when you click OK. Well, let’s try the install target – presumably that should create a usable installation:

$ make install
make: *** No rule to make target 'install'.  Stop.

Turns out there is no makefile in my build folder – I used ninja! As per the ninja manual, this command should list all available targets:

ninja -t targets all

After verifying that there is indeed an install target, run it using ninja:

ninja install

This is not sufficient to address the resulting missing DLL errors – perhaps additional components need to be statically linked into the produced binary. A work-around for now:

cp /mingw64/bin/libgomp-1.dll ../install/bin/
cp /mingw64/bin/libgmp-10.dll ../install/bin/
cp /mingw64/bin/libgcc_s_seh-1.dll ../install/bin/
cp /mingw64/bin/libwinpthread-1.dll ../install/bin/

Running gmesh now outputs the program’s usage info and exits. Still need to try it on some examples to ensure it works. There is also the unresolved question of why the GUI didn’t pop up.

Building using Makefiles

Since I used ninja for everything so far, I decided to use the makefiles generator and see what happens.

cmake .. -G "MSYS Makefiles" -DENABLE_FLTK:BOOL=TRUE -DCMAKE_INSTALL_PREFIX=../install

make install

There is no “Building for Ninja” line at the beginning of the cmake output. This time, the newly built gmsh.exe can run without the other DLLs I manually had to copy. Why doesn’t the GUI show up? Read the cmake output carefully:

-- Could NOT find FLTK (missing: FLTK_LIBRARIES FLTK_INCLUDE_DIR)
$ pacman -Ss fltk
...
mingw64/mingw-w64-x86_64-fltk 1.3.7-2
    C++ user interface toolkit (mingw-w64)
...

$ pacman -S --noconfirm --needed mingw64/mingw-w64-x86_64-fltk

Now running cmake finds FLTK:

-- Found FLTK: D:/dev/Software/msys64/mingw64/lib/libfltk_images.a;D:/dev/Software/msys64/mingw64/lib/libfltk_gl.a;opengl32;D:/dev/Software/msys64/mingw64/lib/libfltk.a
-- Found Fltk

And that’s it! Launching Gmsh from the install folder now launches the GUI application.

Outstanding Questions

Why does using the Makefiles generator produce a build that doesn’t require additional DLLs?


Thermal Oxidation

One of the key lessons from reading about oxidation was that Si has a key advantage over other semiconductors – it oxidizes much more readily thereby simplifying the development of insulation. I was looking up videos on the Deal-Grove oxidation model law when I stumbled into these videos by Chris Mack from a microfabrication course at the University of Texas at Austin using the same Microfabrication book I’m reading! It’s amazing how much free content there is out there.


Categories: Complexity Theory

Information Theory Bootcamp

I just started going through the Communication Complexity of Majority paper. I have not had any explicit study of information theory so I found it immensely helpful that there was a reference to this excellent series on Information Theory in Computational Complexity from the Simons Institute linked to here for reference.

Update: 2022-07-23 – A talk was uploaded specifically about the Communication Complexity of Majority Paper:


Categories: CMake, Elmer

Building the Elmer Install Folder

My post on How to Build Elmer on Windows has a succinct list of instructions but when I first built the Elmer source code (on Windows), I was unsure how to run the binaries. Running ElmerGUI, for example, failed because qt5.dll could not be found. This post details the process I used to figure out how to create a usable Elmer installation.

I started by going through the generated Makefile. It contained targets like install, install/local, and install/strip. I started with install/local, fearing that perhaps these targets would attempt to install the binaries into program files.

Installing only the local directory...
-- Install configuration: "RelWithDebInfo"
-- Installing: C:/dev/repos/fem/install/bin/libopenblas.dll.a
CMake Error at cmake_install.cmake:45 (file):
  file INSTALL cannot find "C:/dev/repos/fem/elmerfem/../bundle_msys2/bin":
  No error.


make: *** [Makefile:142: install/local] Error 1

The error at cmake_install.cmake:45 is from ElmerCPack.cmake. I did not even know about the existence of the CPack tool. Git blame points to commit b9914082 as the source of the failing command. The strange thing about that commit is that there is no mention of what is supposed to create the bundle folders (despite the callout that the change “Relies on selected bundled files selected prior to installation”).

A workaround I tried was to use this define when running cmake: CPACK_BUNDLE_EXTRA_WINDOWS_DLLS:BOOL=FALSE. That avoided the build error but did not result in a usable installation. ElmerGUI’s windows_bundle.cmake needs this to be set to TRUE to package the Qt5 binaries into the install folder. Strangely enough, ElmerGUILogger’s windows_bundle.cmake does not have this logic for Qt5. This is likely because ElmerGUI would already have installed the right files into the bin folder but this looks like a bug.

Another work-around I tried was to manually create the folders expected by the build. The install target then succeeded but the necessary binaries were not copied to the install folder.

mkdir -p bundle_msys2/bin
mkdir -p bundle_qt5/bin
mkdir -p platforms

This is when I dig around and find that only ElmerGUILogger and ElmerClips include windows_bundle.cmake. Hmm… the latter looks promising but doesn’t look up to date either since it requires Qt4. More exploring of the Makefile – the package target looks interesting but fails because NSIS is not installed.

...
[ 97%] Built target ElmerGUI_autogen
[100%] Built target ElmerGUI
Run CPack packaging tool...
CPack Error: Cannot find NSIS compiler makensis: likely it is not installed, or not in your PATH
CPack Error: Could not read NSIS registry value. This is usually caused by NSIS not being installed. Please install NSIS from http://nsis.sourceforge.net
CPack Error: Cannot initialize the generator NSIS
make: *** [Makefile:71: package] Error 1

I wonder if we couldn’t just use NSIS for the MSYS environment:

$ pacman -Ss nsis
...
mingw64/mingw-w64-x86_64-nsis 3.06.1-1
    Windows installer development tool (mingw-w64)
mingw64/mingw-w64-x86_64-nsis-nsisunz 1.0-2
    NSIS plugin which allows you to extract files from ZIP archives (mingw-w64)
...

$ pacman -S mingw64/mingw-w64-x86_64-nsis

Now we can create an Elmer installer by running make package. Unfortunately, that turns out to be insufficient. My next idea is to compare the binaries from the installed. This turned out to be easier when using ls -R1 to output only the file names and in 1 column only. Some obvious differences are that the build in Program Files has a bin folder containing the Qt and vtk binaries (as well as a stripped gfortran).

Qt5Core.dll
Qt5Gui.dll
Qt5OpenGL.dll
Qt5PrintSupport.dll
Qt5Script.dll
Qt5Svg.dll
Qt5Widgets.dll
Qt5Xml.dll
...
libvtkChartsCore-8.2.dll
libvtkChartsCorePython38D-8.2.dll
libvtkCommonColor-8.2.dll
libvtkCommonColorPython38D-8.2.dll
libvtkCommonComputationalGeometry-8.2.dll
libvtkCommonComputationalGeometryPython38D-8.2.dll
libvtkCommonCore-8.2.dll
libvtkCommonCorePython38D-8.2.dll
libvtkCommonDataModel-8.2.dll

The Qt binaries certainly look like the output of windows_bundle.cmake (found this time by a search for “vtk”) but it’s still not clear how this file would be included in the build. I’m using VSCode to search for “windows_bundle” and only 2 of the 3 references in the codebase were showing up (on my desktop). Looking for “ElmerGUILogger” then revealed yet another reference. Such a waste of time! Not cool VSCode, not cool. It’s included in ElmerGUI/CMakeLists.txt. So I probably only need to define WIN32. But why does the WIN32 code run if I add some statements to that IF block?

Some searching (TODO: put bing searches here) leads to indications that there might be an error in CMake where WIN32 is not defined. Seeing signs that MSYS can be in Cywgin? Trying to get to the bottom of why WIN32 is not respected by CMake, I review ElmerGUI/CMakeLists.txt again. It adds the netgen subdirectory. Interestingly, ElmerGUI/netgen/README points out that install\lib\ElmerGUI\ngcore\libng.a is the unix library and that the win32 extension should be .lib. Biggest sign I’ve seen so far that something is really off. At this point, I remember seeing a .a file in the install folder – and that seemed strange for a lib folder since I expected a DLL. This hypothesis fails though because the working Elmer installation also has the file “C:\Program Files\Elmer 9.0-Release\lib\ElmerGUI\ngcore\libng.a“.

The sure way to verify that the WIN32 include is working is to introduce an error into windows_bundle.cmake, e.g. by change the first IF into the unknown IF2. Even better, notice that the “Qt5 Windows packaging” message is correctly displayed. Perhaps the FIND_FILE command should have a REQUIRED option now that it is supported. Adding that doesn’t fail so reexamine the INSTALL command. Since windows_bundle.cmake uses a relative path for the DESTINATION, it is interpreted relative to the value of the CMAKE_INSTALL_PREFIX variable.

At this point, I take a step back and search for cmake install component. My suspision is that there is something about the elmergui COMPONENT specified in the INSTALL command. I discover from Understanding the CMake `COMPONENT` keyword in the `install` command – Code – CMake Discourse that cmake --install is a thing. It seems to do the same things. Unfortunately, it doesn’t copy the files I need either.

How about zooming into the windows_bundle.cmake file and outputting the list of files installed after the install command. The new message says the files were installed. However, they aren’t in the install folder! I need an explanation for this behavior. So let’s open Process Explorer and see which paths are actually used. These requests show up using the Qt5 path filter:

These files exist on disk (despite the Result column in process explorer having the value INVALID DEVICE REQUEST)! For a better understanding of what is happening, I change the bin path used by the install command to dbgcmake since there are multiple bin folders (under install and also in the MinGW installation). So such path shows up in Process Explorer when using a path filter for dbgcmake. This means I shouldn’t be expecting this files to be written by cmake at this point. In fact, running grep -Rin dbgcmake shows that build/ElmerGUI/cmake_install.cmake now contains this snippet (thereby verifying that windows_bundle.cmake is used to generate cmake_install.cmake).

if("x${CMAKE_INSTALL_COMPONENT}x" STREQUAL "xelmerguix" OR NOT CMAKE_INSTALL_COMPONENT)
  file(INSTALL DESTINATION "${CMAKE_INSTALL_PREFIX}/dbgcmake" TYPE FILE FILES
    "D:/dev/Software/msys64/mingw64/bin/Qt5Core.dll"
    "D:/dev/Software/msys64/mingw64/bin/Qt5Gui.dll"
    "D:/dev/Software/msys64/mingw64/bin/Qt5OpenGL.dll"
    "D:/dev/Software/msys64/mingw64/bin/Qt5Script.dll"
    "D:/dev/Software/msys64/mingw64/bin/Qt5Xml.dll"
    "D:/dev/Software/msys64/mingw64/bin/Qt5Svg.dll"
    "D:/dev/Software/msys64/mingw64/bin/Qt5Widgets.dll"
    "D:/dev/Software/msys64/mingw64/bin/Qt5PrintSupport.dll"
    )
endif()

So back to the question of why windows_bundle.cmake is not invoked by default – let us dig into the CMake sources for clues. Renaming it outputs an error for the include statement meaning the file is being included! Sigh… I end up poking around some CMake sources to see where WIN32 is set anyway.:

At this point, I’m down to simply finding out how to print CMake call stacks to confirm that this file is indeed included. The StackOverflow question about how to trace cmakelists has the big reveal: use the cmake –trace option!

date; time cmake -G "MSYS Makefiles" -DWITH_ELMERGUI:BOOL=TRUE  -DWITH_MPI:BOOL=FALSE -DCMAKE_INSTALL_PREFIX=../install  -DCMAKE_Fortran_COMPILER=d:/dev/Software/msys64/mingw64/bin/gfortran.exe  -DQWT_INCLUDE_DIR=d:/dev/Software/msys64/mingw64/include/qwt-qt5/  -DWIN32:BOOL=TRUE --trace-expand ../elmerfem > build.txt 2>&1; date;

I immediately notice that the install command of interest is skipped because CPACK_BUNDLE_EXTRA_WINDOWS_DLLS is not set to TRUE! Setting it to FALSE earlier then seeing a build failure gave the impression that it was set to TRUE everywhere (especially given that ElmerCPack.cmake sets it to TRUE) but that happens after windows_bundle.cmake has been evaluated! Here’s the final required command line:

date; time \
cmake -G "MSYS Makefiles" \
 -DWITH_ELMERGUI:BOOL=TRUE \
 -DWITH_MPI:BOOL=FALSE \
 -DCMAKE_INSTALL_PREFIX=../install \
 -DCMAKE_Fortran_COMPILER=d:/dev/Software/msys64/mingw64/bin/gfortran.exe \
 -DQWT_INCLUDE_DIR=d:/dev/Software/msys64/mingw64/include/qwt-qt5/ \
 -DWIN32:BOOL=TRUE \
 -DCPACK_BUNDLE_EXTRA_WINDOWS_DLLS:BOOL=TRUE \
 ../elmerfem

There are lessons there about making assumptions but the biggest takeaway for me is the need for (and existence) of tracing capabilities in CMake. Enabling tracing made it so easy to figure out exactly what was broken – the variable did not have the value I expected and I simply needed to define it! Running make install now results in new errors haha!

The code execution cannot proceed because libgcc_s_seh-1.dll was not found. Reinstalling the program may fix this problem. This error dialog is then followed by others for qwt-qt5.dll, libstdc++-6.dll, and libdouble-conversion.dll respectively. To see which other binaries are required, manually copy these 4 binaries from d:\dev\Software\msys64\mingw64\bin to install/bin. These are the other missing binaries that show up in error dialogs:

libwinpthread-1.dll
libicuin69.dll
libicuuc69.dll
libpcre2-16-0.dll
libharfbuzz-0.dll
libmd4c.dll
libmd4c.dll
libpng16-16.dll
zlib1.dll
libzstd.dll
libfreetype-6.dll
libgraphite2.dll
libintl-8.dll
libglib-2.0-0.dll
libicudt69.dll
libiconv-2.dll
libbz2-1.dll
libbrotlidec.dll
libbrotlicommon.dll
libpcre-1.dll

There are so many missing binaries that I wonder if building the other components might be necessary. I first manually copy them from the bin folder to see if ElmerGUI can load but that now fails with the error This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

The working installation from the downloaded public Elmer installer has a qwindows.dll in the bin/platforms directory. This file exists on my system as D:\dev\Software\msys64\mingw64\share\qt5\plugins\platforms\qwindows.dll. At this point, I’m really wondering how on earth they are gathering all these files. Hmm, maybe they build using their docker image? Didn’t think of seeing how their docker image is set up. One final try to set up the qwindows.dll first and see what happens.

cp /mingw64/share/qt5/plugins/platforms/qwindows.dll ../install/bin/platforms/

Lo and behold! ElmerGUI loads successfully! Whether it actually works… well, I will revisit that after I learn how to use Elmer/ElmerGUI :). Actually, turns out ElmerSolver.exe doesn’t start. Here is the fix for the 3 binaries that cause error dialogs:

# binaries required by ElmerSolver
cp /mingw64/bin/libgfortran-5.dll ../install/bin/
cp /mingw64/bin/libopenblas.dll ../install/bin/
cp /mingw64/bin/libquadmath-0.dll ../install/bin/

The final set of commands to deploy these binaries is therefore:

# binaries required by ElmerGUI
cp /mingw64/bin/libgcc_s_seh-1.dll ../install/bin/
cp /mingw64/bin/libdouble-conversion.dll ../install/bin/
cp /mingw64/bin/qwt-qt5.dll ../install/bin/
cp /mingw64/bin/libstdc++-6.dll ../install/bin/
cp /mingw64/bin/libwinpthread-1.dll ../install/bin/
cp /mingw64/bin/libicuin69.dll ../install/bin/
cp /mingw64/bin/libicuuc69.dll ../install/bin/
cp /mingw64/bin/libpcre2-16-0.dll ../install/bin/
cp /mingw64/bin/libharfbuzz-0.dll ../install/bin/
cp /mingw64/bin/libmd4c.dll ../install/bin/
cp /mingw64/bin/libpng16-16.dll ../install/bin/
cp /mingw64/bin/zlib1.dll ../install/bin/
cp /mingw64/bin/libzstd.dll ../install/bin/
cp /mingw64/bin/libfreetype-6.dll ../install/bin/
cp /mingw64/bin/libgraphite2.dll ../install/bin/
cp /mingw64/bin/libintl-8.dll ../install/bin/
cp /mingw64/bin/libglib-2.0-0.dll ../install/bin/
cp /mingw64/bin/libicudt69.dll ../install/bin/
cp /mingw64/bin/libiconv-2.dll ../install/bin/
cp /mingw64/bin/libbz2-1.dll ../install/bin/
cp /mingw64/bin/libbrotlidec.dll ../install/bin/
cp /mingw64/bin/libbrotlicommon.dll ../install/bin/
cp /mingw64/bin/libpcre-1.dll ../install/bin/
cp /mingw64/share/qt5/plugins/platforms/qwindows.dll ../install/bin/platforms/

# binaries required by ElmerSolver
cp /mingw64/bin/libgfortran-5.dll ../install/bin/
cp /mingw64/bin/libopenblas.dll ../install/bin/
cp /mingw64/bin/libquadmath-0.dll ../install/bin/

At this point, ElmerSolver starts up successfully but outputs an error about missing ELMERSOLVER_STARTINFO. This is an error from ElmerSolver.F90 (the Fortran 90 source code). From that code, it looks like the error is because I didn’t specify an input file name and the default file does not exist. This is the same behavior as ElmerSolver.exe from the official Elmer installation.

ELMER SOLVER (v 9.0) STARTED AT: 2022/06/26 21:55:22
ParCommInit:  Initialize #PEs:            1
MAIN:
MAIN: =============================================================
MAIN: ElmerSolver finite element software, Welcome!
MAIN: This program is free software licensed under (L)GPL
MAIN: Copyright 1st April 1995 - , CSC - IT Center for Science Ltd.
MAIN: Webpage http://www.csc.fi/elmer, Email elmeradm@csc.fi
MAIN: Version: 9.0 (Rev: af959fd0, Compiled: 2022-06-26)
MAIN:  Running one task without MPI parallelization.
MAIN:  Running with just one thread per task.
MAIN: =============================================================
ERROR:: ElmerSolver: Unable to find ELMERSOLVER_STARTINFO, can not execute.
STOP 1

Reviewing Docker Files

I was wondering after all this if there was a Windows docker file that had all these steps already baked in. However, the docker directory has only Ubuntu dockerfiles. Perhaps I could create a Windows docker file if I could figure out these dependencies.

Binaries Required by ElmerSolver

The above deployment instructions starts with the binaries required by ElmerGUI but ElmerSolver is the crucial component (since Elmer can be used with the GUI). Here are the binaries grouped such that the ElmerGUI binaries can be excluded if desired.

# binaries required by ElmerSolver
cp /mingw64/bin/libgfortran-5.dll ../install/bin/
cp /mingw64/bin/libgcc_s_seh-1.dll ../install/bin/
cp /mingw64/bin/libopenblas.dll ../install/bin/
cp /mingw64/bin/libquadmath-0.dll ../install/bin/
cp /mingw64/bin/libwinpthread-1.dll ../install/bin/

# binaries required by Mesh2D
cp /mingw64/bin/libstdc++-6.dll ../install/bin/

# binaries required by ElmerGUI
cp /mingw64/bin/qwt-qt5.dll ../install/bin/
cp /mingw64/bin/libdouble-conversion.dll ../install/bin/
cp /mingw64/bin/libicuin69.dll ../install/bin/
cp /mingw64/bin/libicuuc69.dll ../install/bin/
cp /mingw64/bin/libpcre2-16-0.dll ../install/bin/
cp /mingw64/bin/libharfbuzz-0.dll ../install/bin/
cp /mingw64/bin/libmd4c.dll ../install/bin/
cp /mingw64/bin/libpng16-16.dll ../install/bin/
cp /mingw64/bin/zlib1.dll ../install/bin/
cp /mingw64/bin/libzstd.dll ../install/bin/
cp /mingw64/bin/libicudt69.dll ../install/bin/
cp /mingw64/bin/libfreetype-6.dll ../install/bin/
cp /mingw64/bin/libglib-2.0-0.dll ../install/bin/
cp /mingw64/bin/libgraphite2.dll ../install/bin/
cp /mingw64/bin/libintl-8.dll ../install/bin/
cp /mingw64/bin/libbz2-1.dll ../install/bin/
cp /mingw64/bin/libbrotlidec.dll ../install/bin/
cp /mingw64/bin/libpcre-1.dll ../install/bin/
cp /mingw64/bin/libiconv-2.dll ../install/bin/
cp /mingw64/bin/libbrotlicommon.dll ../install/bin/

cp /mingw64/share/qt5/plugins/platforms/qwindows.dll ../install/bin/platforms/

Outstanding Issues

Deploying the above binaries manually gives a locally runnable build. Unfortunately, it does not fix the build created by CPack when you run make install – that build will not have any of these binaries/dependencies.


Categories: Material Science

Crystalline vs Amorphous Materials

I was reading about diffusion coefficients for common dopants when the diffusivity of Boron in crystalline Si was compared to that in amorphous Si. Decided to dig into the differences between the crystalline and amorphous materials. These are the resources I used.

Fertig’s video piqued my curiosity about how we determine these structures, hence his other crystallography video on x-ray diffraction. It introduces Bragg’s law.