SPA 5.0: Relative branches:
{@{!}Pg}
BRA{.U}{.LMT}
{CC.test,}#ImmS24
{&req_6}
{?sched>=?WAIT5}
;
{@{!}Pg}
BRA{.U}{.LMT}
{CC.test,}c[#ImmU05][#ImmU16]
{&req_6}
{?sched>=?WAIT5}
;
// This form is not patchable and is deprecated.
Absolute Jumps:
{@{!}Pg}
BRX{.LMT}
{CC.test,}Ra + #ImmS24
{&req_6}
{?sched>=?WAIT5}
;
{@{!}Pg}
JMP{.U}{.LMT}
{CC.test,}#ImmU32
{&req_6}
{?sched>=?WAIT5}
;
{@{!}Pg}
JMP{.U}{.LMT}
{CC.test,}c[#ImmU05][#ImmU16]
{&req_6}
{?sched>=?WAIT5}
;
.U Unanimous condition, branch/jump is only taken if all active threads in the warp agree on taking the branch. .LMT ApiCallLimit check (limit already reached by previous PRET, see the RET/PRET opcode page for more details.) .test: { .F, .LT, .EQ, .LE, .GT, .NE, .GE, .NUM, Signed numeric tests .NAN, .LTU, .EQU, .LEU, .GTU, .NEU, .GEU, .T*, Signed or Unordered tests .OFF, .LO, .SFF, .LS, .HI, .SFT, .HS, .OFT, Unsigned integer tests .CSM_TA, .CSM_TR, .CSM_MX, .FCSM_TA, .FCSM_TR, .FCSM_MX, .RLE, .RGT } Clip State Machine tests If no condition code test is specified, CC.TRUE is assumed.
{@{!}Pg}
JMX{.LMT}
{CC.test,}Ra + #ImmS32
{&req_6}
{?sched>=?WAIT5}
;
Conditional control flow.
BRA/BRX compute a target PC address using a PC-relative signed offset operand and a signed immediate offset on a per thread basis and then jumps to target PC if branch condition evaluates to true. Note that the value in Ra is considered as a signed value. Target (offset) address is specified as an offset in bytes, not instructions or words, relative to the PC of the next instruction within the address range specified by the 40b virtual base address (PROGRAM_BASE).
JMP/JMPX compute a target PC address using an absolute (unsigned) address operand and a signed immediate offset on per thread basis. Note that the value in Ra is considered as an unsigned value. Target address is specified in bytes, not instructions or words, and is an absolute offset within the address range specified by the 40b virtual base address (PROGRAM_BASE).
The branch/jump condition is based on a predicate AND condition code bits. To branch/jump only on the predicate, use CC.TRUE. To branch/jump only on CC, use PT for the predicate.
Branch/Jump target can be:
Branch/Jump target operands can be: (1) Immediate // BRA and JMP - Immediate lsb0 and lsb1 must be 0 - S24 (24 bit signed) immediate for BRA - U32 (32 bit unsigned) immediate for JMP (2) Immediate Address Constant // BRA and JMP The interpretation of constant is as follows: JMP: As U32 (with a range 0f 0..4GB) absolute target PC BRA: As S32 (with a range of +/- 0..2GB) PC-relative byte offset. (3) Register + Immediate // BRX and JMX - Ra is in bytes (Ra+Imm may then sum to word-aligned) and interpreted as JMX: As u32 (with a range 0f 0..4GB) absolute target PC BRX: As s32 (with a range of +/- 0..2GB) PC-relative byte offset. - immediate is signed and in bytes - Even when Ra=Rz the immediate is still signed, which is different from most opcodes that use the register + offset semantics. - S24 (24 bit signed) immediate for BRX - S32 (32 bit signed) immediate for JMX The target PC computation (Ra + imm)/(PC + Ra +imm )/(PC +const) is done with "infinite precision" then final result is checked to not overflow the range 0..4GB.
A branch/jump op cannot push a SYNC token as is needed to synchronize potentially divergent control flow (an if-then-else construct). To push a SYNC token, the programmer should use an SSY instruction prior to the branch/jump, and a NOP.S (SYNC) at the end of the potentially divergent code. See the SSY opcode page for more information.
The .LMT option allows CAL nesting limits to be checked when a branch is used in conjunction with PRET to perform conditional subroutine calls. If the .LMT option is specified, the per-warp LMT state bit is checked before executing the branch, and if set, the branch instruction is converted into a NOP. The LMT state bit is left intact by the branch. For more details refer to the PRET description.
BRA REL:0x28; // defaults to CC.TRUE BRA CC.EQ,c[2][0x48]; BRX CC.GE,R5+0x128; JMP CC.LT,0x128000; JMP CC.EQ,c[2][0x48]; JMP ABS:0xCA80; JMX CC.GE,R5+0x128000; // pseudo-code example if (CC.LT) R6 = R0; else R6 = R1; R7 = R6*R6; // in assembler could be: SSY LABEL0; BRA CC.GE,LAB_ELSE; // or GEU if fp comparison LAB_IF: MOV R6,R0; SYNC; LAB_ELSE: MOV R6,R1; SYNC; LABEL0: MUL R7,R6,R6; // sync happens before MUL