Categories: Assembly, Visual C++

Building & Disassembling ARM64 Code using Visual C++

This path C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\Build has various scripts to set up a command window as documented at Use the Microsoft C++ toolset from the command line | Microsoft Docs. If vcvarsx86_arm64.bat and vcvarsamd64_arm64.bat are missing in that folder on your Windows x64 machine, install the MSVC v143 – VS 2022 C++ ARM64 build tools (Latest) component in the Visual Studio 2022 installer.

Selection ARM64 Build Tools in VS Installer

Once it is installed, open a new cmd.exe window and run this command to set up the build environment:

"C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\Build\vcvarsamd64_arm64.bat"

To verify that the ARM64 compiler will be used when cl or dumpbin is executed:

D:\> where cl
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.32.31326\bin\Hostx64\arm64\cl.exe
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.32.31326\bin\Hostx64\x64\cl.exe

D:\> where dumpbin
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.32.31326\bin\Hostx64\arm64\dumpbin.exe
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.32.31326\bin\Hostx64\x64\dumpbin.exe

To see the command Visual Studio uses to build the project, create a C++ console application and use the Configuration Manager to change the Active solution platform to ARM64. Next, go to Tools > Options then expand the Projects and Solutions node. Select Build And Run then change the MSBuild project build output verbosity to Detailed. Building the project should now show the full command line used to invoke the compiler, for example here are the command lines used in the Debug and Release configurations respectively.

C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.32.31326\bin\HostX86\arm64\CL.exe /c /Zi /JMC /nologo /W3 /WX- /diagnostics:column /sdl /Od /Oy- /D _DEBUG /D _CONSOLE /D _ARM64_WINAPI_PARTITION_DESKTOP_SDK_AVAILABLE=1 /D _UNICODE /D UNICODE /Gm- /EHsc /RTC1 /MDd /GS /fp:precise /Zc:wchar_t /Zc:forScope /Zc:inline /std:c++17 /permissive- /Fo"ARM64\Debug\\" /Fd"ARM64\Debug\vc143.pdb" /external:W3 /Gd /TP /analyze- /FC /errorReport:prompt ConsoleApplication1.cpp

C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.32.31326\bin\HostX86\arm64\CL.exe /c /Zi /nologo /W3 /WX- /diagnostics:column /sdl /O2 /Oi /Oy- /GL /D NDEBUG /D _CONSOLE /D _ARM64_WINAPI_PARTITION_DESKTOP_SDK_AVAILABLE=1 /D _UNICODE /D UNICODE /Gm- /EHsc /MD /GS /Gy /fp:precise /Zc:wchar_t /Zc:forScope /Zc:inline /std:c++17 /permissive- /Fo"ARM64\Release\\" /Fd"ARM64\Release\vc143.pdb" /external:W3 /Gd /TP /analyze- /FC /errorReport:prompt ConsoleApplication1.cpp

Notice the /O2 flag (maximize speed) in the release build instead of the /Od flag (no optimizations) above. The debug build also uses the just my code /JMC, runtime error checks /RTC1, and debug multithread-specific version of the run-time library /MDd flags. For our testing purposes, we can ignore most of these flags.

Calling Printf

Here is a simple program, aarch64-abi-test-printf.cpp, which calls printf with a format specifier and 4 additional arguments.

#include <stdio.h>

int main()
{
    int result = printf("%.4f,%.4f,%.4f,%s", 1.2345, 1.2345, 1.2345, "str");
}

Compiling a Debug Build

To compile and disassemble this program, run:

cl /c aarch64-abi-test-printf.cpp
dumpbin /disasm /out:printf-abi.asm aarch64-abi-test-printf.obj
dumpbin /all /out:printf-abi.txt aarch64-abi-test-printf.obj

The disassembly is shown below with some links to the documentation for the various instructions. See the Arm Architecture Reference Manual for A-profile architecture PDF for more details about these instructions. The overview of AArch64 state at ARM Compiler armasm User Guide Version 6.6.1 is also a useful resource.

Dump of file aarch64-abi-test-printf.obj

File Type: COFF OBJECT

main:
  0000000000000000: A9BE7BFD  stp         fp,lr,[sp,#-0x20]!
  0000000000000004: 910003FD  mov         fp,sp
  0000000000000008: 90000008  adrp        x8,$SG5571
  000000000000000C: 91000104  add         x4,x8,$SG5571
  0000000000000010: 58000183  ldr         x3,$LN3
  0000000000000014: 58000162  ldr         x2,$LN3
  0000000000000018: 58000141  ldr         x1,$LN3
  000000000000001C: 90000008  adrp        x8,$SG5572
  0000000000000020: 91000100  add         x0,x8,$SG5572
  0000000000000024: 94000000  bl          printf
  0000000000000028: 2A0003E0  mov         w0,w0
  000000000000002C: B90013E0  str         w0,[sp,#0x10]
  0000000000000030: 52800000  mov         w0,#0
  0000000000000034: A8C27BFD  ldp         fp,lr,[sp],#0x20
  0000000000000038: D65F03C0  ret
  000000000000003C: D503201F  nop
$LN3:
  0000000000000040: 126E978D
  0000000000000044: 3FF3C083

__local_stdio_printf_options:
  0000000000000000: 90000008  adrp        x8,?_OptionsStorage@?1??__local_stdio_printf_options@@9@4_KA
  0000000000000004: 91000100  add         x0,x8,?_OptionsStorage@?1??__local_stdio_printf_options@@9@4_KA
  0000000000000008: D65F03C0  ret

_vfprintf_l:
  0000000000000000: A9BD7BFD  stp         fp,lr,[sp,#-0x30]!
  0000000000000004: 910003FD  mov         fp,sp
  0000000000000008: F90017E0  str         x0,[sp,#0x28]
  000000000000000C: F90013E1  str         x1,[sp,#0x20]
  0000000000000010: F9000FE2  str         x2,[sp,#0x18]
  0000000000000014: F9000BE3  str         x3,[sp,#0x10]
  0000000000000018: 94000000  bl          __local_stdio_printf_options
  000000000000001C: F9400BE4  ldr         x4,[sp,#0x10]
  0000000000000020: F9400FE3  ldr         x3,[sp,#0x18]
  0000000000000024: F94013E2  ldr         x2,[sp,#0x20]
  0000000000000028: F94017E1  ldr         x1,[sp,#0x28]
  000000000000002C: F9400000  ldr         x0,[x0]
  0000000000000030: 94000000  bl          __stdio_common_vfprintf
  0000000000000034: 2A0003E0  mov         w0,w0
  0000000000000038: 2A0003E0  mov         w0,w0
  000000000000003C: A8C37BFD  ldp         fp,lr,[sp],#0x30
  0000000000000040: D65F03C0  ret

printf:
  0000000000000000: D10103FF  sub         sp,sp,#0x40
  0000000000000004: A9008BE1  stp         x1,x2,[sp,#8]
  0000000000000008: A90193E3  stp         x3,x4,[sp,#0x18]
  000000000000000C: A9029BE5  stp         x5,x6,[sp,#0x28]
  0000000000000010: F9001FE7  str         x7,[sp,#0x38]
  0000000000000014: A9BD7BFD  stp         fp,lr,[sp,#-0x30]!
  0000000000000018: 910003FD  mov         fp,sp
  000000000000001C: F90013E0  str         x0,[sp,#0x20]
  0000000000000020: 9100E3E8  add         x8,sp,#0x38
  0000000000000024: F9000FE8  str         x8,[sp,#0x18]
  0000000000000028: 52800020  mov         w0,#1
  000000000000002C: 94000000  bl          __acrt_iob_func
  0000000000000030: F9400FE3  ldr         x3,[sp,#0x18]
  0000000000000034: D2800002  mov         x2,#0
  0000000000000038: F94013E1  ldr         x1,[sp,#0x20]
  000000000000003C: 94000000  bl          _vfprintf_l
  0000000000000040: 2A0003E0  mov         w0,w0
  0000000000000044: B90013E0  str         w0,[sp,#0x10]
  0000000000000048: D2800008  mov         x8,#0
  000000000000004C: F9000FE8  str         x8,[sp,#0x18]
  0000000000000050: B94013E0  ldr         w0,[sp,#0x10]
  0000000000000054: A8C37BFD  ldp         fp,lr,[sp],#0x30
  0000000000000058: 910103FF  add         sp,sp,#0x40
  000000000000005C: D65F03C0  ret

  Summary

           8 .bss
          68 .chks64
          9C .debug$S
          62 .drectve
          18 .pdata
          1A .rdata
          F8 .text$mn
          10 .xdata

In the disassembly generated by dumpbin (printf-abi.asm), notice that all 5 arguments to printf are passed in registers! x0 contains a pointer to the format string, x1-x3 contain the address of the $LN3 label. The 64-bits at that label are the IEEE double floating point representation of 1.2345. x4 contains a pointer to the null-terminated string “str“.

Which are the printf String Arguments?

To determine what symbols in instructions like adrp x8,$SG5571 mean, we use the output of dumpbin /all. The RELOCATIONS section shows $SG5571 to have symbol index 8. The COFF SYMBOL TABLE shows this symbol index 8 to be in SECT3. The raw data for section 3 contains the format string and the single string parameter passed to printf. I’m still not sure how the assembler knows the difference in offsets between these 2 strings?

.
.
.
SECTION HEADER #3
  .rdata name
       0 physical address
       0 virtual address
      1A size of raw data
     31A file pointer to raw data (0000031A to 00000333)
       0 file pointer to relocation table
       0 file pointer to line numbers
       0 number of relocations
       0 number of line numbers
40400040 flags
         Initialized Data
         8 byte align
         Read Only

RAW DATA #3
  00000000: 73 74 72 00 00 00 00 00 25 2E 34 66 2C 25 2E 34  str.....%.4f,%.4
  00000010: 66 2C 25 2E 34 66 2C 25 73 00                    f,%.4f,%s.
.
.
.
RELOCATIONS #4
                                                Symbol    Symbol
 Offset    Type              Applied To         Index     Name
 --------  ----------------  -----------------  --------  ------
 00000008  PAGEBASE_REL21             90000008         8  $SG5571
 0000000C  PAGEOFFSET_12A             91000104         8  $SG5571
 0000001C  PAGEBASE_REL21             90000008         9  $SG5572
 00000020  PAGEOFFSET_12A             91000100         9  $SG5572
 00000024  BRANCH26                   94000000        16  printf
.
.
.
COFF SYMBOL TABLE
000 01057A64 ABS    notype       Static       | @comp.id
001 80010190 ABS    notype       Static       | @feat.00
002 00000000 SECT1  notype       Static       | .drectve
    Section length   62, #relocs    0, #linenums    0, checksum        0
004 00000000 SECT2  notype       Static       | .debug$S
    Section length   9C, #relocs    0, #linenums    0, checksum        0
006 00000000 SECT3  notype       Static       | .rdata
    Section length   1A, #relocs    0, #linenums    0, checksum B99D9667
008 00000000 SECT3  notype       Static       | $SG5571
009 00000008 SECT3  notype       Static       | $SG5572
00A 00000000 SECT4  notype       Static       | .text$mn

Compiling an Optimized Build

Specifying the /O2 flag for speed generates optimized code.

cl /c /O2 /Fo"printf-abi-o2.obj" aarch64-abi-test-printf.cpp
dumpbin /disasm /out:printf-abi-o2.asm printf-abi-o2.obj
dumpbin /all /out:printf-abi-o2.txt printf-abi-o2.obj

In the optimized code below, the IEEE double is loaded into d16 then copied to the x1-x3 registers by the FMOV instruction.

Dump of file printf-abi-o2.obj

File Type: COFF OBJECT

__local_stdio_printf_options:
  0000000000000000: 90000008  adrp        x8,?_OptionsStorage@?1??__local_stdio_printf_options@@9@4_KA
  0000000000000004: 91000100  add         x0,x8,?_OptionsStorage@?1??__local_stdio_printf_options@@9@4_KA
  0000000000000008: D65F03C0  ret

_vfprintf_l:
  0000000000000000: A9BD53F3  stp         x19,x20,[sp,#-0x30]!
  0000000000000004: A9015BF5  stp         x21,x22,[sp,#0x10]
  0000000000000008: F90013FE  str         lr,[sp,#0x20]
  000000000000000C: AA0003F6  mov         x22,x0
  0000000000000010: AA0103F5  mov         x21,x1
  0000000000000014: AA0203F4  mov         x20,x2
  0000000000000018: AA0303F3  mov         x19,x3
  000000000000001C: 94000000  bl          __local_stdio_printf_options
  0000000000000020: F9400000  ldr         x0,[x0]
  0000000000000024: AA1303E4  mov         x4,x19
  0000000000000028: AA1403E3  mov         x3,x20
  000000000000002C: AA1503E2  mov         x2,x21
  0000000000000030: AA1603E1  mov         x1,x22
  0000000000000034: 94000000  bl          __stdio_common_vfprintf
  0000000000000038: F94013FE  ldr         lr,[sp,#0x20]
  000000000000003C: A9415BF5  ldp         x21,x22,[sp,#0x10]
  0000000000000040: A8C353F3  ldp         x19,x20,[sp],#0x30
  0000000000000044: D65F03C0  ret

main:
  0000000000000000: F81F0FFE  str         lr,[sp,#-0x10]!
  0000000000000004: 5C0001B0  ldr         d16,$LN4
  0000000000000008: 90000008  adrp        x8,??_C@_03OJMAPEGJ@str@
  000000000000000C: 91000104  add         x4,x8,??_C@_03OJMAPEGJ@str@
  0000000000000010: 90000008  adrp        x8,??_C@_0BC@OEIAMIIK@?$CF?44f?0?$CF?44f?0?$CF?44f?0?$CFs@
  0000000000000014: 91000100  add         x0,x8,??_C@_0BC@OEIAMIIK@?$CF?44f?0?$CF?44f?0?$CF?44f?0?$CFs@
  0000000000000018: 9E660203  fmov        x3,d16
  000000000000001C: 9E660202  fmov        x2,d16
  0000000000000020: 9E660201  fmov        x1,d16
  0000000000000024: 94000000  bl          printf
  0000000000000028: 52800000  mov         w0,#0
  000000000000002C: F84107FE  ldr         lr,[sp],#0x10
  0000000000000030: D65F03C0  ret
  0000000000000034: D503201F  nop
$LN4:
  0000000000000038: 126E978D
  000000000000003C: 3FF3C083

printf:
  0000000000000000: A9BA53F3  stp         x19,x20,[sp,#-0x60]!
  0000000000000004: A9017BF5  stp         x21,lr,[sp,#0x10]
  0000000000000008: A9028BE1  stp         x1,x2,[sp,#0x28]
  000000000000000C: A90393E3  stp         x3,x4,[sp,#0x38]
  0000000000000010: A9049BE5  stp         x5,x6,[sp,#0x48]
  0000000000000014: F9002FE7  str         x7,[sp,#0x58]
  0000000000000018: AA0003F4  mov         x20,x0
  000000000000001C: 52800020  mov         w0,#1
  0000000000000020: 9100A3F5  add         x21,sp,#0x28
  0000000000000024: 94000000  bl          __acrt_iob_func
  0000000000000028: AA0003F3  mov         x19,x0
  000000000000002C: 94000000  bl          __local_stdio_printf_options
  0000000000000030: F9400000  ldr         x0,[x0]
  0000000000000034: D2800003  mov         x3,#0
  0000000000000038: AA1403E2  mov         x2,x20
  000000000000003C: AA1303E1  mov         x1,x19
  0000000000000040: AA1503E4  mov         x4,x21
  0000000000000044: 94000000  bl          __stdio_common_vfprintf
  0000000000000048: A9417BF5  ldp         x21,lr,[sp,#0x10]
  000000000000004C: A8C653F3  ldp         x19,x20,[sp],#0x60
  0000000000000050: D65F03C0  ret

  Summary

           8 .bss
          70 .chks64
          94 .debug$S
          62 .drectve
          18 .pdata
          16 .rdata
          E8 .text$mn
           8 .xdata

The example we have reviewed in this post passed only 5 parameters to printf. To see how more than 8 parameters are handled, see the example print call in aarch64-abi-test-printf-manyargs.cpp and printf-abi-many.asm (or for the optimized assembly code, printf-abi-many-o2.asm).

Additional resources on AArch64: