Building & Disassembling ARM64 Code using Visual C++
This path C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\Build
has various scripts to set up a command window as documented at Use the Microsoft C++ toolset from the command line | Microsoft Docs. If vcvarsx86_arm64.bat
and vcvarsamd64_arm64.bat
are missing in that folder on your Windows x64 machine, install the MSVC v143 – VS 2022 C++ ARM64 build tools (Latest) component in the Visual Studio 2022 installer.
Once it is installed, open a new cmd.exe window and run this command to set up the build environment:
"C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\Build\vcvarsamd64_arm64.bat"
To verify that the ARM64 compiler will be used when cl or dumpbin is executed:
D:\> where cl
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.32.31326\bin\Hostx64\arm64\cl.exe
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.32.31326\bin\Hostx64\x64\cl.exe
D:\> where dumpbin
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.32.31326\bin\Hostx64\arm64\dumpbin.exe
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.32.31326\bin\Hostx64\x64\dumpbin.exe
To see the command Visual Studio uses to build the project, create a C++ console application and use the Configuration Manager to change the Active solution platform to ARM64. Next, go to Tools > Options then expand the Projects and Solutions node. Select Build And Run then change the MSBuild project build output verbosity to Detailed. Building the project should now show the full command line used to invoke the compiler, for example here are the command lines used in the Debug and Release configurations respectively.
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.32.31326\bin\HostX86\arm64\CL.exe /c /Zi /JMC /nologo /W3 /WX- /diagnostics:column /sdl /Od /Oy- /D _DEBUG /D _CONSOLE /D _ARM64_WINAPI_PARTITION_DESKTOP_SDK_AVAILABLE=1 /D _UNICODE /D UNICODE /Gm- /EHsc /RTC1 /MDd /GS /fp:precise /Zc:wchar_t /Zc:forScope /Zc:inline /std:c++17 /permissive- /Fo"ARM64\Debug\\" /Fd"ARM64\Debug\vc143.pdb" /external:W3 /Gd /TP /analyze- /FC /errorReport:prompt ConsoleApplication1.cpp
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.32.31326\bin\HostX86\arm64\CL.exe /c /Zi /nologo /W3 /WX- /diagnostics:column /sdl /O2 /Oi /Oy- /GL /D NDEBUG /D _CONSOLE /D _ARM64_WINAPI_PARTITION_DESKTOP_SDK_AVAILABLE=1 /D _UNICODE /D UNICODE /Gm- /EHsc /MD /GS /Gy /fp:precise /Zc:wchar_t /Zc:forScope /Zc:inline /std:c++17 /permissive- /Fo"ARM64\Release\\" /Fd"ARM64\Release\vc143.pdb" /external:W3 /Gd /TP /analyze- /FC /errorReport:prompt ConsoleApplication1.cpp
Notice the /O2 flag (maximize speed) in the release build instead of the /Od flag (no optimizations) above. The debug build also uses the just my code /JMC, runtime error checks /RTC1, and debug multithread-specific version of the run-time library /MDd flags. For our testing purposes, we can ignore most of these flags.
Calling Printf
Here is a simple program, aarch64-abi-test-printf.cpp, which calls printf with a format specifier and 4 additional arguments.
#include <stdio.h>
int main()
{
int result = printf("%.4f,%.4f,%.4f,%s", 1.2345, 1.2345, 1.2345, "str");
}
Compiling a Debug Build
To compile and disassemble this program, run:
cl /c aarch64-abi-test-printf.cpp
dumpbin /disasm /out:printf-abi.asm aarch64-abi-test-printf.obj
dumpbin /all /out:printf-abi.txt aarch64-abi-test-printf.obj
The disassembly is shown below with some links to the documentation for the various instructions. See the Arm Architecture Reference Manual for A-profile architecture PDF for more details about these instructions. The overview of AArch64 state at ARM Compiler armasm User Guide Version 6.6.1 is also a useful resource.
Dump of file aarch64-abi-test-printf.obj
File Type: COFF OBJECT
main:
0000000000000000: A9BE7BFD stp fp,lr,[sp,#-0x20]!
0000000000000004: 910003FD mov fp,sp
0000000000000008: 90000008 adrp x8,$SG5571
000000000000000C: 91000104 add x4,x8,$SG5571
0000000000000010: 58000183 ldr x3,$LN3
0000000000000014: 58000162 ldr x2,$LN3
0000000000000018: 58000141 ldr x1,$LN3
000000000000001C: 90000008 adrp x8,$SG5572
0000000000000020: 91000100 add x0,x8,$SG5572
0000000000000024: 94000000 bl printf
0000000000000028: 2A0003E0 mov w0,w0
000000000000002C: B90013E0 str w0,[sp,#0x10]
0000000000000030: 52800000 mov w0,#0
0000000000000034: A8C27BFD ldp fp,lr,[sp],#0x20
0000000000000038: D65F03C0 ret
000000000000003C: D503201F nop
$LN3:
0000000000000040: 126E978D
0000000000000044: 3FF3C083
__local_stdio_printf_options:
0000000000000000: 90000008 adrp x8,?_OptionsStorage@?1??__local_stdio_printf_options@@9@4_KA
0000000000000004: 91000100 add x0,x8,?_OptionsStorage@?1??__local_stdio_printf_options@@9@4_KA
0000000000000008: D65F03C0 ret
_vfprintf_l:
0000000000000000: A9BD7BFD stp fp,lr,[sp,#-0x30]!
0000000000000004: 910003FD mov fp,sp
0000000000000008: F90017E0 str x0,[sp,#0x28]
000000000000000C: F90013E1 str x1,[sp,#0x20]
0000000000000010: F9000FE2 str x2,[sp,#0x18]
0000000000000014: F9000BE3 str x3,[sp,#0x10]
0000000000000018: 94000000 bl __local_stdio_printf_options
000000000000001C: F9400BE4 ldr x4,[sp,#0x10]
0000000000000020: F9400FE3 ldr x3,[sp,#0x18]
0000000000000024: F94013E2 ldr x2,[sp,#0x20]
0000000000000028: F94017E1 ldr x1,[sp,#0x28]
000000000000002C: F9400000 ldr x0,[x0]
0000000000000030: 94000000 bl __stdio_common_vfprintf
0000000000000034: 2A0003E0 mov w0,w0
0000000000000038: 2A0003E0 mov w0,w0
000000000000003C: A8C37BFD ldp fp,lr,[sp],#0x30
0000000000000040: D65F03C0 ret
printf:
0000000000000000: D10103FF sub sp,sp,#0x40
0000000000000004: A9008BE1 stp x1,x2,[sp,#8]
0000000000000008: A90193E3 stp x3,x4,[sp,#0x18]
000000000000000C: A9029BE5 stp x5,x6,[sp,#0x28]
0000000000000010: F9001FE7 str x7,[sp,#0x38]
0000000000000014: A9BD7BFD stp fp,lr,[sp,#-0x30]!
0000000000000018: 910003FD mov fp,sp
000000000000001C: F90013E0 str x0,[sp,#0x20]
0000000000000020: 9100E3E8 add x8,sp,#0x38
0000000000000024: F9000FE8 str x8,[sp,#0x18]
0000000000000028: 52800020 mov w0,#1
000000000000002C: 94000000 bl __acrt_iob_func
0000000000000030: F9400FE3 ldr x3,[sp,#0x18]
0000000000000034: D2800002 mov x2,#0
0000000000000038: F94013E1 ldr x1,[sp,#0x20]
000000000000003C: 94000000 bl _vfprintf_l
0000000000000040: 2A0003E0 mov w0,w0
0000000000000044: B90013E0 str w0,[sp,#0x10]
0000000000000048: D2800008 mov x8,#0
000000000000004C: F9000FE8 str x8,[sp,#0x18]
0000000000000050: B94013E0 ldr w0,[sp,#0x10]
0000000000000054: A8C37BFD ldp fp,lr,[sp],#0x30
0000000000000058: 910103FF add sp,sp,#0x40
000000000000005C: D65F03C0 ret
Summary
8 .bss
68 .chks64
9C .debug$S
62 .drectve
18 .pdata
1A .rdata
F8 .text$mn
10 .xdata
In the disassembly generated by dumpbin (printf-abi.asm), notice that all 5 arguments to printf are passed in registers! x0 contains a pointer to the format string, x1-x3 contain the address of the $LN3 label. The 64-bits at that label are the IEEE double floating point representation of 1.2345. x4 contains a pointer to the null-terminated string “str“.
Which are the printf String Arguments?
To determine what symbols in instructions like adrp x8,$SG5571
mean, we use the output of dumpbin /all. The RELOCATIONS section shows $SG5571
to have symbol index 8. The COFF SYMBOL TABLE shows this symbol index 8 to be in SECT3. The raw data for section 3 contains the format string and the single string parameter passed to printf. I’m still not sure how the assembler knows the difference in offsets between these 2 strings?
.
.
.
SECTION HEADER #3
.rdata name
0 physical address
0 virtual address
1A size of raw data
31A file pointer to raw data (0000031A to 00000333)
0 file pointer to relocation table
0 file pointer to line numbers
0 number of relocations
0 number of line numbers
40400040 flags
Initialized Data
8 byte align
Read Only
RAW DATA #3
00000000: 73 74 72 00 00 00 00 00 25 2E 34 66 2C 25 2E 34 str.....%.4f,%.4
00000010: 66 2C 25 2E 34 66 2C 25 73 00 f,%.4f,%s.
.
.
.
RELOCATIONS #4
Symbol Symbol
Offset Type Applied To Index Name
-------- ---------------- ----------------- -------- ------
00000008 PAGEBASE_REL21 90000008 8 $SG5571
0000000C PAGEOFFSET_12A 91000104 8 $SG5571
0000001C PAGEBASE_REL21 90000008 9 $SG5572
00000020 PAGEOFFSET_12A 91000100 9 $SG5572
00000024 BRANCH26 94000000 16 printf
.
.
.
COFF SYMBOL TABLE
000 01057A64 ABS notype Static | @comp.id
001 80010190 ABS notype Static | @feat.00
002 00000000 SECT1 notype Static | .drectve
Section length 62, #relocs 0, #linenums 0, checksum 0
004 00000000 SECT2 notype Static | .debug$S
Section length 9C, #relocs 0, #linenums 0, checksum 0
006 00000000 SECT3 notype Static | .rdata
Section length 1A, #relocs 0, #linenums 0, checksum B99D9667
008 00000000 SECT3 notype Static | $SG5571
009 00000008 SECT3 notype Static | $SG5572
00A 00000000 SECT4 notype Static | .text$mn
Compiling an Optimized Build
Specifying the /O2 flag for speed generates optimized code.
cl /c /O2 /Fo"printf-abi-o2.obj" aarch64-abi-test-printf.cpp
dumpbin /disasm /out:printf-abi-o2.asm printf-abi-o2.obj
dumpbin /all /out:printf-abi-o2.txt printf-abi-o2.obj
In the optimized code below, the IEEE double is loaded into d16 then copied to the x1-x3 registers by the FMOV instruction.
Dump of file printf-abi-o2.obj
File Type: COFF OBJECT
__local_stdio_printf_options:
0000000000000000: 90000008 adrp x8,?_OptionsStorage@?1??__local_stdio_printf_options@@9@4_KA
0000000000000004: 91000100 add x0,x8,?_OptionsStorage@?1??__local_stdio_printf_options@@9@4_KA
0000000000000008: D65F03C0 ret
_vfprintf_l:
0000000000000000: A9BD53F3 stp x19,x20,[sp,#-0x30]!
0000000000000004: A9015BF5 stp x21,x22,[sp,#0x10]
0000000000000008: F90013FE str lr,[sp,#0x20]
000000000000000C: AA0003F6 mov x22,x0
0000000000000010: AA0103F5 mov x21,x1
0000000000000014: AA0203F4 mov x20,x2
0000000000000018: AA0303F3 mov x19,x3
000000000000001C: 94000000 bl __local_stdio_printf_options
0000000000000020: F9400000 ldr x0,[x0]
0000000000000024: AA1303E4 mov x4,x19
0000000000000028: AA1403E3 mov x3,x20
000000000000002C: AA1503E2 mov x2,x21
0000000000000030: AA1603E1 mov x1,x22
0000000000000034: 94000000 bl __stdio_common_vfprintf
0000000000000038: F94013FE ldr lr,[sp,#0x20]
000000000000003C: A9415BF5 ldp x21,x22,[sp,#0x10]
0000000000000040: A8C353F3 ldp x19,x20,[sp],#0x30
0000000000000044: D65F03C0 ret
main:
0000000000000000: F81F0FFE str lr,[sp,#-0x10]!
0000000000000004: 5C0001B0 ldr d16,$LN4
0000000000000008: 90000008 adrp x8,??_C@_03OJMAPEGJ@str@
000000000000000C: 91000104 add x4,x8,??_C@_03OJMAPEGJ@str@
0000000000000010: 90000008 adrp x8,??_C@_0BC@OEIAMIIK@?$CF?44f?0?$CF?44f?0?$CF?44f?0?$CFs@
0000000000000014: 91000100 add x0,x8,??_C@_0BC@OEIAMIIK@?$CF?44f?0?$CF?44f?0?$CF?44f?0?$CFs@
0000000000000018: 9E660203 fmov x3,d16
000000000000001C: 9E660202 fmov x2,d16
0000000000000020: 9E660201 fmov x1,d16
0000000000000024: 94000000 bl printf
0000000000000028: 52800000 mov w0,#0
000000000000002C: F84107FE ldr lr,[sp],#0x10
0000000000000030: D65F03C0 ret
0000000000000034: D503201F nop
$LN4:
0000000000000038: 126E978D
000000000000003C: 3FF3C083
printf:
0000000000000000: A9BA53F3 stp x19,x20,[sp,#-0x60]!
0000000000000004: A9017BF5 stp x21,lr,[sp,#0x10]
0000000000000008: A9028BE1 stp x1,x2,[sp,#0x28]
000000000000000C: A90393E3 stp x3,x4,[sp,#0x38]
0000000000000010: A9049BE5 stp x5,x6,[sp,#0x48]
0000000000000014: F9002FE7 str x7,[sp,#0x58]
0000000000000018: AA0003F4 mov x20,x0
000000000000001C: 52800020 mov w0,#1
0000000000000020: 9100A3F5 add x21,sp,#0x28
0000000000000024: 94000000 bl __acrt_iob_func
0000000000000028: AA0003F3 mov x19,x0
000000000000002C: 94000000 bl __local_stdio_printf_options
0000000000000030: F9400000 ldr x0,[x0]
0000000000000034: D2800003 mov x3,#0
0000000000000038: AA1403E2 mov x2,x20
000000000000003C: AA1303E1 mov x1,x19
0000000000000040: AA1503E4 mov x4,x21
0000000000000044: 94000000 bl __stdio_common_vfprintf
0000000000000048: A9417BF5 ldp x21,lr,[sp,#0x10]
000000000000004C: A8C653F3 ldp x19,x20,[sp],#0x60
0000000000000050: D65F03C0 ret
Summary
8 .bss
70 .chks64
94 .debug$S
62 .drectve
18 .pdata
16 .rdata
E8 .text$mn
8 .xdata
The example we have reviewed in this post passed only 5 parameters to printf. To see how more than 8 parameters are handled, see the example print call in aarch64-abi-test-printf-manyargs.cpp and printf-abi-many.asm (or for the optimized assembly code, printf-abi-many-o2.asm).