Implementing SpinPause on Windows AArch64
I reported a gtest failure in [JDK-8374735] Implement SpinPause on Windows AArch64 – Java Bug System a few weeks ago and took a stab (ha) at implementing SpinPause() on openjdk/jdk at 27dbdec297fc8030812f7290a7601b6a99defb46. I could see no reason why the Linux AArch64 implementation couldn’t be used on the Windows AArch64 platform. Here is one caller of SpinPause on Windows-AArch64:
jvm.dll!SpinPause() Line 296 C++
jvm.dll!ObjectMonitor::short_fixed_spin(JavaThread * current, int spin_count, bool adapt) Line 2379 C++
jvm.dll!ObjectMonitor::try_spin(JavaThread * current) Line 2401 C++
jvm.dll!ObjectMonitor::spin_enter(JavaThread * current) Line 493 C++
jvm.dll!ObjectMonitor::enter(JavaThread * current) Line 506 C++
jvm.dll!ObjectSynchronizer::inflate_and_enter(oopDesc * object, BasicLock * lock, ObjectSynchronizer::InflateCause cause, JavaThread * locking_thread, JavaThread * current) Line 2149 C++
jvm.dll!ObjectSynchronizer::enter(Handle obj, BasicLock * lock, JavaThread * current) Line 1846 C++
jvm.dll!ObjectLocker::ObjectLocker(Handle obj, JavaThread * __the_thread__) Line 477 C++
jvm.dll!InstanceKlass::link_class_impl(JavaThread * __the_thread__) Line 1018 C++
jvm.dll!InstanceKlass::link_class(JavaThread * __the_thread__) Line 933 C++
jvm.dll!InstanceKlass::initialize_impl(JavaThread * __the_thread__) Line 1233 C++
jvm.dll!InstanceKlass::initialize_preemptable(JavaThread * __the_thread__) Line 838 C++
jvm.dll!InterpreterRuntime::_new(JavaThread * current, ConstantPool * pool, int index) Line 222 C++
0000021f7b427cc0() Unknown
Another is the MacroAssembler:
jvm.dll!MacroAssembler::spin_wait() Line 6649 C++
jvm.dll!StubGenerator::generate_spin_wait() Line 10270 C++
jvm.dll!StubGenerator::generate_final_stubs() Line 11788 C++
jvm.dll!StubGenerator::StubGenerator(CodeBuffer * code, BlobId blob_id) Line 11980 C++
jvm.dll!StubGenerator_generate(CodeBuffer * code, BlobId blob_id) Line 11991 C++
jvm.dll!initialize_stubs(BlobId blob_id, int code_size, int max_aligned_stubs, const char * timer_msg, const char * buffer_name, const char * assert_msg) Line 190 C++
jvm.dll!StubRoutines::initialize_final_stubs() Line 233 C++
jvm.dll!final_stubs_init() Line 243 C++
jvm.dll!init_globals2() Line 205 C++
jvm.dll!Threads::create_vm(JavaVMInitArgs * args, bool * canTryAgain) Line 622 C++
jvm.dll!JNI_CreateJavaVM_inner(JavaVM_ * * vm, void * * penv, void * args) Line 3621 C++
jvm.dll!JNI_CreateJavaVM(JavaVM_ * * vm, void * * penv, void * args) Line 3712 C++
jli.dll!InitializeJVM(const JNIInvokeInterface_ * * * pvm, const JNINativeInterface_ * * * penv, InvocationFunctions * ifn) Line 1506 C
jli.dll!JavaMain(void * _args) Line 494 C
jli.dll!ThreadJavaMain(void * args) Line 632 C
ucrtbase.dll!00007ffba234b028() Unknown
This is another spot where the MacroAssembler::spin_wait() method is called:
jvm.dll!MacroAssembler::spin_wait() Line 6649 C++
jvm.dll!onspinwaitNode::emit(C2_MacroAssembler * masm, PhaseRegAlloc * ra_) Line 22285 C++
jvm.dll!PhaseOutput::scratch_emit_size(const Node * n) Line 3150 C++
jvm.dll!MachNode::emit_size(PhaseRegAlloc * ra_) Line 157 C++
jvm.dll!MachNode::size(PhaseRegAlloc * ra_) Line 149 C++
jvm.dll!PhaseOutput::shorten_branches(unsigned int * blk_starts) Line 528 C++
jvm.dll!PhaseOutput::Output() Line 330 C++
jvm.dll!Compile::Code_Gen() Line 3137 C++
jvm.dll!Compile::Compile(ciEnv * ci_env, ciMethod * target, int osr_bci, Options options, DirectiveSet * directive) Line 896 C++
jvm.dll!C2Compiler::compile_method(ciEnv * env, ciMethod * target, int entry_bci, bool install_code, DirectiveSet * directive) Line 147 C++
jvm.dll!CompileBroker::invoke_compiler_on_method(CompileTask * task) Line 2348 C++
jvm.dll!CompileBroker::compiler_thread_loop() Line 1990 C++
jvm.dll!CompilerThread::thread_entry(JavaThread * thread, JavaThread * __the_thread__) Line 69 C++
jvm.dll!JavaThread::thread_main_inner() Line 777 C++
jvm.dll!JavaThread::run() Line 761 C++
jvm.dll!Thread::call_run() Line 242 C++
jvm.dll!thread_native_entry(void * t) Line 565 C++
ucrtbase.dll!00007ffba234b028() Unknown
Below are some example commands showing how to which assembly instruction to use in the SpinWait() call and how many of them should be used. On my Surface Pro X, the SB instruction is not supported. A good post to read about various barrier instructions is The AArch64 processor (aka arm64), part 14: Barriers – The Old New Thing.
$ $JDKTOTEST/bin/java -Xcomp -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:OnSpinWaitInst=sb ProducerConsumerLoops
Error occurred during initialization of VM
OnSpinWaitInst is SB but current CPU does not support SB instruction
$ $JDKTOTEST/bin/java -Xcomp -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:OnSpinWaitInst=nop -XX:OnSpinWaitInstCount=5 ProducerConsumerLoops
The snippet below shows the 4 instructions in the spin_wait stub if the isb instruction is selected with a count of 3 (after copying the Linux SpinPause implementation). Whether or not this is a good idea is not the point, this is about showing what the flags do.
000001800B9D0700 isb sy
000001800B9D0704 isb sy
000001800B9D0708 isb sy
000001800B9D070C ret
extern "C" {
int SpinPause() {
00007FFAB97A5CF8 stp fp,lr,[sp,#-0x20]!
00007FFAB97A5CFC mov fp,sp
using spin_wait_func_ptr_t = void (*)();
spin_wait_func_ptr_t func = CAST_TO_FN_PTR(spin_wait_func_ptr_t, StubRoutines::aarch64::spin_wait());
00007FFAB97A5D00 bl StubRoutines::aarch64::spin_wait (07FFAB97A63B8h)+#0xFFFF8005DA859DF6
00007FFAB97A5D04 mov x8,x0
00007FFAB97A5D08 str x8,[sp,#0x10]
assert(func != nullptr, "StubRoutines::aarch64::spin_wait must not be null.");
00007FFAB97A5D0C mov w8,#0
00007FFAB97A5D10 cmp w8,#0
00007FFAB97A5D14 bne SpinPause+34h (07FFAB97A5D2Ch)
00007FFAB97A5D18 bl DebuggingContext::is_enabled (07FFAB8619D18h)+#0xFFFF8005DF5832E8
00007FFAB97A5D1C uxtb w8,w0
00007FFAB97A5D20 mov w8,w8
00007FFAB97A5D24 cmp w8,#0
00007FFAB97A5D28 bne SpinPause+74h (07FFAB97A5D6Ch)
00007FFAB97A5D2C ldr x8,[sp,#0x10]
00007FFAB97A5D30 cmp x8,#0
00007FFAB97A5D34 bne SpinPause+74h (07FFAB97A5D6Ch)
00007FFAB97A5D38 adrp x8,g_assert_poison (07FFABAA80F88h)+#0xFFFF800635588740
00007FFAB97A5D3C ldr x9,[x8,g_assert_poison (07FFABAA80F88h)+#0xFFFF80063E9FB581]
00007FFAB97A5D40 mov w8,#0x58
00007FFAB97A5D44 strb w8,[x9]
00007FFAB97A5D48 adrp x8,siglabels+690h (07FFABA5C5000h)
00007FFAB97A5D4C add x3,x8,#0x450
00007FFAB97A5D50 adrp x8,siglabels+690h (07FFABA5C5000h)
00007FFAB97A5D54 add x2,x8,#0x488
00007FFAB97A5D58 mov w1,#0x128
00007FFAB97A5D5C adrp x8,siglabels+690h (07FFABA5C5000h)
00007FFAB97A5D60 add x0,x8,#0x4B0
00007FFAB97A5D64 bl report_vm_error (07FFAB8D15210h)+#0xFFFF8005DF046B1B
00007FFAB97A5D68 nop
00007FFAB97A5D6C mov w8,#0
00007FFAB97A5D70 cmp w8,#0
00007FFAB97A5D74 bne SpinPause+14h (07FFAB97A5D0Ch)
(*func)();
00007FFAB97A5D78 ldr x8,[sp,#0x10]
00007FFAB97A5D7C blr x8
// If StubRoutines::aarch64::spin_wait consists of only a RET,
// SpinPause can be considered implemented. There will be a sequence
// of instructions for:
// - call of SpinPause
// - load of StubRoutines::aarch64::spin_wait stub pointer
// - indirect call of the stub
// - return from the stub
// - return from SpinPause
// So '1' always is returned.
return 1;
00007FFAB97A5D80 mov w0,#1
00007FFAB97A5D84 ldp fp,lr,[sp],#0x20
00007FFAB97A5D88 ret
00007FFAB97A5D8C ?? ??????
}
SpinPause is also used by the G1 collector as shown in the callstack below:
jvm.dll!SpinPause() Line 307 C++
jvm.dll!TaskTerminator::DelayContext::do_step() Line 61 C++
jvm.dll!TaskTerminator::offer_termination(TerminatorTerminator * terminator) Line 166 C++
jvm.dll!TaskTerminator::offer_termination() Line 105 C++
jvm.dll!G1ParEvacuateFollowersClosure::offer_termination() Line 601 C++
jvm.dll!G1ParEvacuateFollowersClosure::do_void() Line 626 C++
jvm.dll!G1EvacuateRegionsBaseTask::evacuate_live_objects(G1ParScanThreadState * pss, unsigned int worker_id, G1GCPhaseTimes::GCParPhases objcopy_phase, G1GCPhaseTimes::GCParPhases termination_phase) Line 673 C++
jvm.dll!G1EvacuateRegionsTask::evacuate_live_objects(G1ParScanThreadState * pss, unsigned int worker_id) Line 762 C++
jvm.dll!G1EvacuateRegionsBaseTask::work(unsigned int worker_id) Line 730 C++
jvm.dll!WorkerTaskDispatcher::worker_run_task() Line 73 C++
jvm.dll!WorkerThread::run() Line 200 C++
jvm.dll!Thread::call_run() Line 242 C++
jvm.dll!thread_native_entry(void * t) Line 565 C++
ucrtbase.dll!00007ffba234b028() Unknown
Outstanding Challenges
How do I get full stacks in Visual Studio? We need to integrate jstack’s functionality into the VS debugger.
Leave a Reply