Categories: OpenJDK

Implementing SpinPause on Windows AArch64

I reported a gtest failure in [JDK-8374735] Implement SpinPause on Windows AArch64 – Java Bug System a few weeks ago and took a stab (ha) at implementing SpinPause() on openjdk/jdk at 27dbdec297fc8030812f7290a7601b6a99defb46. I could see no reason why the Linux AArch64 implementation couldn’t be used on the Windows AArch64 platform. Here is one caller of SpinPause on Windows-AArch64:

jvm.dll!SpinPause() Line 296	C++
jvm.dll!ObjectMonitor::short_fixed_spin(JavaThread * current, int spin_count, bool adapt) Line 2379	C++
jvm.dll!ObjectMonitor::try_spin(JavaThread * current) Line 2401	C++
jvm.dll!ObjectMonitor::spin_enter(JavaThread * current) Line 493	C++
jvm.dll!ObjectMonitor::enter(JavaThread * current) Line 506	C++
jvm.dll!ObjectSynchronizer::inflate_and_enter(oopDesc * object, BasicLock * lock, ObjectSynchronizer::InflateCause cause, JavaThread * locking_thread, JavaThread * current) Line 2149	C++
jvm.dll!ObjectSynchronizer::enter(Handle obj, BasicLock * lock, JavaThread * current) Line 1846	C++
jvm.dll!ObjectLocker::ObjectLocker(Handle obj, JavaThread * __the_thread__) Line 477	C++
jvm.dll!InstanceKlass::link_class_impl(JavaThread * __the_thread__) Line 1018	C++
jvm.dll!InstanceKlass::link_class(JavaThread * __the_thread__) Line 933	C++
jvm.dll!InstanceKlass::initialize_impl(JavaThread * __the_thread__) Line 1233	C++
jvm.dll!InstanceKlass::initialize_preemptable(JavaThread * __the_thread__) Line 838	C++
jvm.dll!InterpreterRuntime::_new(JavaThread * current, ConstantPool * pool, int index) Line 222	C++
0000021f7b427cc0()	Unknown

Another is the MacroAssembler:

jvm.dll!MacroAssembler::spin_wait() Line 6649	C++
jvm.dll!StubGenerator::generate_spin_wait() Line 10270	C++
jvm.dll!StubGenerator::generate_final_stubs() Line 11788	C++
jvm.dll!StubGenerator::StubGenerator(CodeBuffer * code, BlobId blob_id) Line 11980	C++
jvm.dll!StubGenerator_generate(CodeBuffer * code, BlobId blob_id) Line 11991	C++
jvm.dll!initialize_stubs(BlobId blob_id, int code_size, int max_aligned_stubs, const char * timer_msg, const char * buffer_name, const char * assert_msg) Line 190	C++
jvm.dll!StubRoutines::initialize_final_stubs() Line 233	C++
jvm.dll!final_stubs_init() Line 243	C++
jvm.dll!init_globals2() Line 205	C++
jvm.dll!Threads::create_vm(JavaVMInitArgs * args, bool * canTryAgain) Line 622	C++
jvm.dll!JNI_CreateJavaVM_inner(JavaVM_ * * vm, void * * penv, void * args) Line 3621	C++
jvm.dll!JNI_CreateJavaVM(JavaVM_ * * vm, void * * penv, void * args) Line 3712	C++
jli.dll!InitializeJVM(const JNIInvokeInterface_ * * * pvm, const JNINativeInterface_ * * * penv, InvocationFunctions * ifn) Line 1506	C
jli.dll!JavaMain(void * _args) Line 494	C
jli.dll!ThreadJavaMain(void * args) Line 632	C
ucrtbase.dll!00007ffba234b028()	Unknown

This is another spot where the MacroAssembler::spin_wait() method is called:

jvm.dll!MacroAssembler::spin_wait() Line 6649	C++
jvm.dll!onspinwaitNode::emit(C2_MacroAssembler * masm, PhaseRegAlloc * ra_) Line 22285	C++
jvm.dll!PhaseOutput::scratch_emit_size(const Node * n) Line 3150	C++
jvm.dll!MachNode::emit_size(PhaseRegAlloc * ra_) Line 157	C++
jvm.dll!MachNode::size(PhaseRegAlloc * ra_) Line 149	C++
jvm.dll!PhaseOutput::shorten_branches(unsigned int * blk_starts) Line 528	C++
jvm.dll!PhaseOutput::Output() Line 330	C++
jvm.dll!Compile::Code_Gen() Line 3137	C++
jvm.dll!Compile::Compile(ciEnv * ci_env, ciMethod * target, int osr_bci, Options options, DirectiveSet * directive) Line 896	C++
jvm.dll!C2Compiler::compile_method(ciEnv * env, ciMethod * target, int entry_bci, bool install_code, DirectiveSet * directive) Line 147	C++
jvm.dll!CompileBroker::invoke_compiler_on_method(CompileTask * task) Line 2348	C++
jvm.dll!CompileBroker::compiler_thread_loop() Line 1990	C++
jvm.dll!CompilerThread::thread_entry(JavaThread * thread, JavaThread * __the_thread__) Line 69	C++
jvm.dll!JavaThread::thread_main_inner() Line 777	C++
jvm.dll!JavaThread::run() Line 761	C++
jvm.dll!Thread::call_run() Line 242	C++
jvm.dll!thread_native_entry(void * t) Line 565	C++
ucrtbase.dll!00007ffba234b028()	Unknown

Below are some example commands showing how to which assembly instruction to use in the SpinWait() call and how many of them should be used. On my Surface Pro X, the SB instruction is not supported. A good post to read about various barrier instructions is The AArch64 processor (aka arm64), part 14: Barriers – The Old New Thing.

$ $JDKTOTEST/bin/java -Xcomp -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:OnSpinWaitInst=sb ProducerConsumerLoops
Error occurred during initialization of VM
OnSpinWaitInst is SB but current CPU does not support SB instruction

$ $JDKTOTEST/bin/java -Xcomp -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:OnSpinWaitInst=nop -XX:OnSpinWaitInstCount=5 ProducerConsumerLoops

The snippet below shows the 4 instructions in the spin_wait stub if the isb instruction is selected with a count of 3 (after copying the Linux SpinPause implementation). Whether or not this is a good idea is not the point, this is about showing what the flags do.

000001800B9D0700  isb         sy  
000001800B9D0704  isb         sy  
000001800B9D0708  isb         sy  
000001800B9D070C  ret 

extern "C" {
  int SpinPause() {
00007FFAB97A5CF8  stp         fp,lr,[sp,#-0x20]!  
00007FFAB97A5CFC  mov         fp,sp  
    using spin_wait_func_ptr_t = void (*)();
    spin_wait_func_ptr_t func = CAST_TO_FN_PTR(spin_wait_func_ptr_t, StubRoutines::aarch64::spin_wait());
00007FFAB97A5D00  bl          StubRoutines::aarch64::spin_wait (07FFAB97A63B8h)+#0xFFFF8005DA859DF6  
00007FFAB97A5D04  mov         x8,x0  
00007FFAB97A5D08  str         x8,[sp,#0x10]  
    assert(func != nullptr, "StubRoutines::aarch64::spin_wait must not be null.");
00007FFAB97A5D0C  mov         w8,#0  
00007FFAB97A5D10  cmp         w8,#0  
00007FFAB97A5D14  bne         SpinPause+34h (07FFAB97A5D2Ch)  
00007FFAB97A5D18  bl          DebuggingContext::is_enabled (07FFAB8619D18h)+#0xFFFF8005DF5832E8  
00007FFAB97A5D1C  uxtb        w8,w0  
00007FFAB97A5D20  mov         w8,w8  
00007FFAB97A5D24  cmp         w8,#0  
00007FFAB97A5D28  bne         SpinPause+74h (07FFAB97A5D6Ch)  
00007FFAB97A5D2C  ldr         x8,[sp,#0x10]  
00007FFAB97A5D30  cmp         x8,#0  
00007FFAB97A5D34  bne         SpinPause+74h (07FFAB97A5D6Ch)  
00007FFAB97A5D38  adrp        x8,g_assert_poison (07FFABAA80F88h)+#0xFFFF800635588740  
00007FFAB97A5D3C  ldr         x9,[x8,g_assert_poison (07FFABAA80F88h)+#0xFFFF80063E9FB581]  
00007FFAB97A5D40  mov         w8,#0x58  
00007FFAB97A5D44  strb        w8,[x9]  
00007FFAB97A5D48  adrp        x8,siglabels+690h (07FFABA5C5000h)  
00007FFAB97A5D4C  add         x3,x8,#0x450  
00007FFAB97A5D50  adrp        x8,siglabels+690h (07FFABA5C5000h)  
00007FFAB97A5D54  add         x2,x8,#0x488  
00007FFAB97A5D58  mov         w1,#0x128  
00007FFAB97A5D5C  adrp        x8,siglabels+690h (07FFABA5C5000h)  
00007FFAB97A5D60  add         x0,x8,#0x4B0  
00007FFAB97A5D64  bl          report_vm_error (07FFAB8D15210h)+#0xFFFF8005DF046B1B  
00007FFAB97A5D68  nop  
00007FFAB97A5D6C  mov         w8,#0  
00007FFAB97A5D70  cmp         w8,#0  
00007FFAB97A5D74  bne         SpinPause+14h (07FFAB97A5D0Ch)  
    (*func)();
00007FFAB97A5D78  ldr         x8,[sp,#0x10]  
00007FFAB97A5D7C  blr         x8  
    // If StubRoutines::aarch64::spin_wait consists of only a RET,
    // SpinPause can be considered implemented. There will be a sequence
    // of instructions for:
    // - call of SpinPause
    // - load of StubRoutines::aarch64::spin_wait stub pointer
    // - indirect call of the stub
    // - return from the stub
    // - return from SpinPause
    // So '1' always is returned.
    return 1;
00007FFAB97A5D80  mov         w0,#1  
00007FFAB97A5D84  ldp         fp,lr,[sp],#0x20  
00007FFAB97A5D88  ret  
00007FFAB97A5D8C  ?? ?????? 
} 

SpinPause is also used by the G1 collector as shown in the callstack below:

jvm.dll!SpinPause() Line 307	C++
jvm.dll!TaskTerminator::DelayContext::do_step() Line 61	C++
jvm.dll!TaskTerminator::offer_termination(TerminatorTerminator * terminator) Line 166	C++
jvm.dll!TaskTerminator::offer_termination() Line 105	C++
jvm.dll!G1ParEvacuateFollowersClosure::offer_termination() Line 601	C++
jvm.dll!G1ParEvacuateFollowersClosure::do_void() Line 626	C++
jvm.dll!G1EvacuateRegionsBaseTask::evacuate_live_objects(G1ParScanThreadState * pss, unsigned int worker_id, G1GCPhaseTimes::GCParPhases objcopy_phase, G1GCPhaseTimes::GCParPhases termination_phase) Line 673	C++
jvm.dll!G1EvacuateRegionsTask::evacuate_live_objects(G1ParScanThreadState * pss, unsigned int worker_id) Line 762	C++
jvm.dll!G1EvacuateRegionsBaseTask::work(unsigned int worker_id) Line 730	C++
jvm.dll!WorkerTaskDispatcher::worker_run_task() Line 73	C++
jvm.dll!WorkerThread::run() Line 200	C++
jvm.dll!Thread::call_run() Line 242	C++
jvm.dll!thread_native_entry(void * t) Line 565	C++
ucrtbase.dll!00007ffba234b028()	Unknown

Outstanding Challenges

How do I get full stacks in Visual Studio? We need to integrate jstack’s functionality into the VS debugger.

Article info



Leave a Reply

Your email address will not be published. Required fields are marked *