SPA 5.0:
{@{!}Pg}
VOTE.op
{Rd,} Pu,{!}Pp
{&req_6}
{?sched}
;
// vote
{@{!}Pg}
VOTE.VTG.R
#ImmU28
{&req_6}
{?sched}
;
// CSM reset
{@{!}Pg}
VOTE.VTG.A
#ImmU28
{&req_6}
{?sched}
;
// CSM accumulate
.op: { .ALL, .ANY, .EQ }
{@{!}Pg}
VOTE.VTG.RA
#ImmU28
{&req_6}
{?sched}
;
// CSM reset and accumulate
The VOTE instruction performs a reduce-and-broadcast of a predicate over all active threads in a warp (not across multiple warps, e.g. a CTA).
The result of the vote operation is shared across all active threads in the warp.
Three voting operations (.op) are supported:
.ALL: The source predicate, {!}Pp, must be (TRUE for all active threads) to yield TRUE, else FALSE. - complementing the source predicate yields NONE. .ANY: The source predicate, {!}Pp, must be (TRUE for at least one active thread) to yield TRUE, else FALSE. - complementing the source predicate yields NOT_ALL. .EQ: The source predicate, {!}Pp, must be equal across all active threads, either all TRUE or all FALSE to yield TRUE, else FALSE. - complementing the source predicate yields an identical result.
If Rd is specified, the destination register is written with a 32-bit result (subsequently referred to as the ballot), which contains the values of the source predicates produced by each thread of the warp. Bit i of the ballot is equal to ({!}Pp & {!}Pg) of thread i in the warp. A thread that is inactive (e.g., due to divergence) contributes a 0 for its entry in the ballot. An unspecified Rd is assembled as RZ.
VOTE.VTG is used to generate clip status for VTG shaders. In this mode, up to 7 predicates are serially collapsed into the warp's clip state machine (CSM), with state considered as a qualification specified in the immediate.
The predicate result produced by VOTE.{ANY,ALL,EQ} produces its result excluding inactive and non-predicated threads, i.e., if some threads are inactive or predicated off, VOTE.ALL can be true even if the ballot is non-zero and VOTE.EQ can be true if the ballot result is not all ones or zeros. For more information, please see the pseudo-code below.
Each predicate has a 4b index associated with it, making up the #ImmU28 field. This specifies whether each predicate is to be allowed to VOTE.
P6[27:24] - 4b index P5[23:20] - 4b index P4[19:16] - 4b index P3[15:12] - 4b index P2[11:08] - 4b index P1[07:04] - 4b index P0[03:00] - 4b index
The 4b index mapping is as follows:
0000: Predicate is disabled for VOTE 0001: If (State.cull.xy), predicate is enabled for VOTE 0010: If (State.cull.znear), predicate is enabled for VOTE 0011: If (State.cull.zfar), predicate is enabled for VOTE 0100: If (State.cull.w), predicate is enabled for VOTE 0101: illegal (Predicate is disabled for VOTE) 0110: illegal (Predicate is disabled for VOTE) 0111: If (State.ucp0), predicate is enabled for VOTE 1000: If (State.ucp1), predicate is enabled for VOTE 1001: If (State.ucp2), predicate is enabled for VOTE 1010: If (State.ucp3), predicate is enabled for VOTE 1011: If (State.ucp4), predicate is enabled for VOTE 1100: If (State.ucp5), predicate is enabled for VOTE 1101: If (State.ucp6), predicate is enabled for VOTE 1110: If (State.ucp7), predicate is enabled for VOTE 1111: Predicate is enabled for VOTE
Clip: slice primitive by clip plane, keep portion of primitive inside clip plane Cull: keep or discard entire primitive based on intersection of plane ---------------------------------------------------- Geometric Status Clip Cull ---------------------------------------------------- TRIVIAL_REJECT discard discard TRIVIAL_ACCEPT keep keep MIXED clip primitive keep ---------------------------------------------------- The SM is capable of generating clip/cull status based on vertex/plane calculations. As such, there really is no distinction between clip and cull in the SM. The PE however, does distinguish. Clip/Cull mode table: ------------------------------------------------------------------ GeometryClip Class Method State.cull.xy State.cull.z State.cull.w ------------------------------------------------------------------ WZERO_CLIP (non nv50) yes yes yes // clip to w=0, cull according to State PASSTHRU no no no // no clip , cull according to State FRUSTUM_XY_CLIP yes no yes // clip to xy, cull according to State FRUSTUM_XYZ_CLIP yes yes yes // clip to xyz, cull according to State WZERO_CLIP_NO_Z_CULL yes no yes // clip to w=0, cull according to State FRUSTUM_Z_CLIP yes yes yes // clip to z, cull according to State ------------------------------------------------------------------
If VTG.R or VTG.RA is specified, the CSM is first reset to UNINIT status.
If VTG.A or VTG.RA is specified:
For each of the VOTE enabled predicates (from P0-P6) specified, the following is done: (1) Each active thread of the warp contributes its version of the predicate (2) A 2b status is generated: INSIDE: All active threads have source predicate FALSE. OUTSIDE: All active threads have source predicate TRUE. MIXED: If neither of the above. (3) The 2b status updates the CSM.
VOTE.ALL P2,!P4; VOTE.EQ P2,P4; @!P2 VOTE.ANY P2,P4; VOTE.VTG.A 0x0003ffff; // first 6 predicates VOTE VOTE.ANY R1, P1, !P3; // Store the ballot in R1, P1 true if ballot is non-zero VOTE.ANY R1, PT, !P3; // Store the ballot in R1, discard the VOTE.ANY result