SPA 5.3:
{@{!}Pg}
HSETP2.cmp{.H_AND}{.FTZ}
Pu, Pv, {-}{|}Ra{|}{.iswz}, {-}{|}Rb{|}{.iswz}
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
{@{!}Pg}
HSETP2.cmp{.H_AND}{.FTZ}
Pu, Pv, {-}{|}Ra{|}{.iswz}, {-}{|}c[#BankU05][#AddrU16]{|}
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
{{-}{|}#Immfp10H1{|}}, {{-}{|}#Immfp10H0{|}}
{@{!}Pg}
HSETP2.cmp{.H_AND}{.FTZ}
Pu, Pv, {-}{|}Ra{|}{.iswz},
{&req_6}
{&rdN}
{&wrN}
{?sched}
;// Imm order: H1, H0
{@{!}Pg}
HSETP2.cmp{.H_AND}{.FTZ}.bop
Pu, Pv, {-}{|}Ra{|}{.iswz}, {-}{|}Rb{|}{.iswz}, {!}Pp
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
{@{!}Pg}
HSETP2.cmp{.H_AND}{.FTZ}.bop
Pu, Pv, {-}{|}Ra{|}{.iswz}, {-}{|}c[#BankU05][#AddrU16]{|}, {!}Pp
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
{{-}{|}#Immfp10H1{|}}, {{-}{|}#Immfp10H0{|}},
{@{!}Pg}
HSETP2.cmp{.H_AND}{.FTZ}.bop
Pu, Pv, {-}{|}Ra{|}{.iswz},
.cmp: { .F, .LT, .EQ, .LE, .GT, .NE, .GE, .NUM, FP16 numeric comparisons
{!}Pp
{&req_6}
{&rdN}
{&wrN}
{?sched}
;// Imm order: H1, H0
.NAN, .LTU, .EQU, .LEU, .GTU, .NEU, .GEU, .T } FP16 numeric or Unordered comparisons
.H_AND: Horizontal AND of the compare results.
.FTZ: Flush post-converted input denorms to sign-preserving zero.
.bop: { .AND, .OR, .XOR }
Boolean op with predicate {!}Pp
.iswz: { .H1_H0*, .F32, .H0_H0, .H1_H1 }
Input format.
.H1_H0: Input is a set of two 16-bit floating point numbers.
.F32: Input is a single 32-bit floating point number that
will be converted to a 16-bit floating point number
and replicated to both halves of the SIMD operation.
The conversion will round towards 0 (truncation).
Any denorms generated in FP32 -> FP16 conversion process will flush to 0.
.H0_H0: Input is a single 16-bit floating point number in the
lower 16-bits of a 32-bit register, and is replicated
to both halves of the SIMD operation.
.H1_H1: Input is a single 16-bit floating point number in the
upper 16-bits of a 32-bit register, and is replicated
to both halves of the SIMD operation.
immfp10H0 Most signficant 10 bits of fp16 immediate.
immfp10H1 Most signficant 10 bits of fp16 immediate.
For HSETP2 with an immediate "Sb" operand .iswz field is not encoded and
behavior defaults to .H1_H0 for the immediate operand. Also Absolute values and
negates are not encoded and default to false. SASS can support absolute/negates
when enclosed in curly braces. e.g {-1.0} or {|-19.5|} and encode appropriate
immediates.
For HSETP2 with a constant "Sb" operand, .iswz is not encoded and
behavior defaults to .F32 for the constant reference operand. Also absolute
value for the constant operand defaults to false.
If Pu == Pv, then Pu and Pv must be PT.
First, the components of each input are extracted. Then, for each of the SIMD halves, HSETP.cmp.bop compares the first and second operands with FP16 comparison operation .cmp. The Boolean results are then either combined with predicate operand {!}Pp using Boolean operation .bop and generates two predicate registers Pu and Pv to Boolean values based on the comparison, or are the initial Boolean results are simply returned.
The Boolean operation .bop may be .AND, .OR, or .XOR, corresponding to C Boolean operations &, |, and ^.
// A[] represents the two halves of the SIMD operation for operand Ra
// B[] represents the two halves of the SIMD operation for operand Sb
// Normal mode
Pu = (A[0] .cmp B[0]) .bop {!}Pp;
Pv = (A[1] .cmp B[1]) .bop {!}Pp;
// .H_AND mode
Pu = ((A[0] .cmp B[0]) && (A[1] .cmp B[1])) .bop {!}Pp;
Pv = (!((A[0] .cmp B[0]) && (A[1] .cmp B[1]))) .bop {!}Pp;
Use .bop {!}Pp for nested predication, with an inner comparison of A vs. B, conditioned on outer predicate Pp.
The simple instruction format without .bop {!}Pp assembles as .AND, providing the following effective operation:
Pu = (A[0] .cmp B[0]); // Set predicate to 1 if comparison test is true, else 0
Pv = (A[1] .cmp B[1]); // Set predicate to 1 if comparison test is true, else 0
HSETP2.GT P0, P1, -R0.H1_H0, R1.F32; # P0 = R0.H0 > R1, P1 = R0.H1 > R1
HSETP2.LTE P3, P1, -R0.F32, R1.H0_H0; # P3 = R0 <= R1.H0, P1 = R0 <= R1.H0
HSETP2.GT.H_AND.AND P0, PT, -R0.H1_H0, R1.F32, P2; # P0 = R0.H0 > R1 && R0.H1 > R1 && P2