SPA 5.0:
BPT.mode
{#ImmU20}
{&req_6}
{?sched>=?WAIT5}
;
.mode: { .DRAIN, .PAUSE, .TRAP, .INT }
This instruction implements functionality associated with the SM trap handler and software debugger.
The SM pipeline is drained to guarantee that all Warp Errors interrupts caused by previous instructions from the executing warp are taken before the next instruction. Note that BPT.DRAIN does not drain the memory pipeline beyond SM and provides no guarantee for L1 pipeline Error interrupts.
Only allowed while in trap hander (otherwise it is treated as an ILLEGAL_INSTR_ENCODING error). Stop the current warp from continuing execution and unconditionally send a notification (interrupt) to the CPU that a warp has stopped. The warp will not continue execution until it has been woken by the CPU via an external signal.
The CPU should only "wake-up" warps from the paused state after all warps on a given SM have either paused or exited and while the SM is trapped. Otherwise the system may be placed in an inconsistent state.
Note: Execution of BPT.PAUSE does not guarantee that prior instructions have completed. If a guarantee of prior instruction completion is desired, SW should insert BPT.DRAIN before BPT.PAUSE.
Intention of BPT.PAUSE: BPT.PAUSE is a very powerful operation. While all warps enter the trap handler when a BPT.TRAP or error is encountered, they are not required to stop waiting for the CPU at the same time. Warps are allowed to be 'peeled' off from the group of running warps individually by using BPT.PAUSE.
Some example usage cases include:
Initiates a Trap
No instructions past the BPT.TRAP will have been executed for this warp when it enters the trap handler. It is a precise trap which will leave the PC pointing to the instruction immediately following the BPT.TRAP.
.TRAP takes a 3-bit immediate as a mechanism to allow software to differentiate between traps. The 3-bit immediate is provided in a per-warp warp ESR so the trap handler can identify the BPT.TRAP type. Furthermore, the global ESR broadcasts which BPT.TRAP types were encountered, if any. BPT.TRAP with a 0 immediate value is illegal.
SOFTWARE NOTE: The motivation for providing an immediate for BPT.TRAP is to allow trapping for multiple reasons. One trap immediate can be used as a debugger breakpoint while another can be used for all other SW initiated traps. Example use cases of non-debugger-breakpoint traps include system calls and CTA continuations. To differentiate between the non-breakpoint trap cases, SW will have to establish an ABI between user mode binaries and the kernel mode trap handler. For example, SW could reserve per-warp words in global memory for this communication. Per-thread local memory could be used as well, but BPT.TRAP is a warp-wide instruction so care must be taken to ensure that memory is initialized and set properly. Debugger breakpoints receive special treatment via their own dedicated BPT.TRAP since there is generally no room to insert additional store instructions to implement the ABI around the breakpoint. SPA 5.0 extends the two bit immediate trap reason to 3 bits. Also the trap reason 7 has is considered as non maskable exception and is processed even when interrupts are masked with IDE.EN/.DIS.
BPT.INT raises an interrupt to CPU. It can be used both within and outside of the trap handler. It also sets up a bit in global ESR which can only be cleared by the CPU. This bit is always read as 0 when read by an S2R instruction. BPT.INT interrupts do not trap the SM.
BPT.PAUSE_QUIET pauses the warp exactly like BPT.PAUSE does, except that it does not set the NV_PTPC_PRI_SM_HWW_GLOBAL_ESR_BPT_PAUSE, and therefore does not raise an interrupt. Additionally, if all live warps become paused by executing BPT.PAUSE_QUIET during a given trap event then the SM will return to user mode and allow launches to resume without needing any poke of the NV_PTPC_PRI_SM_DBGR_STATUS0_RUN_TRIGGER register bit (which would otherwise be required).
BPT.PAUSE; BPT.TRAP 1; BPT.DRAIN; BPT.INT;