SPA 5.0:
{@{!}Pg}
FSWZADD{.FTZ}{.rnd}{.NDV}
Rd{.CC}, Ra, Rb, znpControl
{&req_6}
{?sched}
;
.FTZ denorm inputs/output is flushed to sign preserving 0.0.
.rnd {.RN*, .RM, .RP, .RZ}
.RN - Round to the nearest even. This is the default.
.RM - Round towards -Infinity (floor)
.RP - Round towards +Infinity (ceiling)
.RZ - Round towards 0 (truncate)
.NDV Force the quad to be treated as non-divergent
if .NDV is FALSE,
quad is determined to "divergent" if some threads in quad are active and some are not.
If the quad is hw divergent, output of FSWZ is forced to 0.0 or
+Inf (dependent on State.ShaderControl.DefaultPartial).
However, if .NDV is TRUE,
then the hw quad divergence bit will be ignored,
and the quad deemed hw non-divergent allowing the expected fp add.
.CC Write condition code flags
znpControl : specifies modifiers for Ra and Rb source registers, as 4 sets of character pairs.
Each set is associated with a specific thread/pixel in a quad.
The ordering of these sets is UL,UR,LL,LR in pixel quad i.e.
| Thread0 (P0:UL) Thread1 (P1:UR) |
| Thread2 (P2:LL) Thread3 (P3:LR) |
The valid modifier control character pairs are:
char pair | Ra modifier | Rb modifier |
---|---|---|
PP | none | none |
NP | Negate | none |
PN | none | Negate |
ZP | Force to Zero | none |
Add fp32 sources into destination register. Used as part of FSWZ emulation.
// DDX implementation SHFL.BFLY PT, Ry, Rx, 1, 0x1C03; // exchange with tid^1, Mask = 5'b11100, Max = 3 (within quad) FSWZADD R0,R1,R1,PNNPPNNP; // DDY implementation for DirectX SHFL.BFLY PT, Ry, Rx, 2, 0x1C03; // exchange with tid^2, Mask = 5'b11100, Max = 3 (within quad) FSWZADD R0,R1,R1,PNPNNPNP; // DDY implementation for OpenGL SHFL.BFLY PT, Ry, Rx, 2, 0x1C03; // exchange with tid^2, Mask = 5'b11100, Max = 3 (within quad) FSWZADD R0,R1,R1,PNPNNPNP; S2R R1, SR18; //Accounts for screen origin inversion FMUL R0, R0, R1;