SPA 5.3:
{@{!}Pg}
HSET2{.bval}.cmp{.FTZ}
Rd, {-}{|}Ra{|}{.iswz}, {-}{|}Rb{|}{.iswz}
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
{@{!}Pg}
HSET2{.bval}.cmp{.FTZ}
Rd, {-}{|}Ra{|}{.iswz}, {-} c[#BankU05][#AddrU16]
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
{{-}{|}#Immfp10H1{|}}, {{-}{|}#Immfp10H0{|}}
{@{!}Pg}
HSET2{.bval}.cmp{.FTZ}
Rd, {-}{|}Ra{|}{.iswz},
{&req_6}
{&rdN}
{&wrN}
{?sched}
;// Imm order: H1, H0
{@{!}Pg}
HSET2{.bval}.cmp{.FTZ}.bop
Rd, {-}{|}Ra{|}{.iswz}, {-}{|}Rb{|}{.iswz}, {!}Pp
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
{@{!}Pg}
HSET2{.bval}.cmp{.FTZ}.bop
Rd, {-}{|}Ra{|}{.iswz}, {-}c[#BankU05][#AddrU16], {!}Pp
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
{{-}{|}#Immfp10H1{|}}, {{-}{|}#Immfp10H0{|}},
{@{!}Pg}
HSET2{.bval}.cmp{.FTZ}.bop
Rd, {-}{|}Ra{|}{.iswz},
.bval: { .BM*, .BF }
{!}Pp
{&req_6}
{&rdN}
{&wrN}
{?sched}
;// Imm order: H1, H0
Boolean mask or Boolean float value to set in Rd, default .BM.
.cmp: { .F, .LT, .EQ, .LE, .GT, .NE, .GE, .NUM, FP16 numeric comparisons
.NAN, .LTU, .EQU, .LEU, .GTU, .NEU, .GEU, .T } FP16 numeric or Unordered comparisons
.FTZ: Flush post-converted input denorms to sign-preserving zero.
.bop: { .AND, .OR, .XOR}
Boolean op with predicate {!}Pp
.iswz: { .H1_H0*, .F32, .H0_H0, .H1_H1 }
Input format.
.H1_H0: Input is a set of two 16-bit floating point numbers.
.F32: Input is a single 32-bit floating point number that
will be converted to a 16-bit floating point number
and replicated to both halves of the SIMD operation.
The conversion will round towards 0 (truncation).
Any denorms generated in FP32 -> FP16 conversion process will flush to 0.
.H0_H0: Input is a single 16-bit floating point number in the
lower 16-bits of a 32-bit register, and is replicated
to both halves of the SIMD operation.
.H1_H1: Input is a single 16-bit floating point number in the
upper 16-bits of a 32-bit register, and is replicated
to both halves of the SIMD operation.
immfp10H0 Most signficant 10 bits of fp16 immediate.
immfp10H1 Most signficant 10 bits of fp16 immediate.
For HSET2 with an immediate "Sb" operand .iswz field is not encoded and
behavior defaults to .H1_H0 for the immediate operand. Also Absolute values and
negates are not encoded and default to false. SASS can support absolute/negates
when enclosed in curly braces. e.g {-1.0} or {|-19.5|} and encode appropriate
immediates.
For HSET2 with a constant "Sb" operand, .iswz is not encoded and
behavior defaults to .F32 for the constant reference operand. Also absolute
value for the constant operand defaults to false.
First, the components of each input are extracted. Then, for each of the SIMD halves, HSET.cmp.bop compares the first and second operands with FP16 comparison operation .cmp. The Boolean results are then converted to the half-precision (fp16) floating point values 0.0 (false) or 1.0 (true).
The Boolean operation .bop may be .AND, .OR, or .XOR, corresponding to C Boolean operations &, |, and ^.
// A[] represents the two halves of the SIMD operation for operand Ra
// B[] represents the two halves of the SIMD operation for operand Sb
// Normal mode
Rd.LO = ((A[0] .cmp B[0]) .bop {!}Pp) ? 1.0 : 0.0;
Rd.HI = ((A[1] .cmp B[1]) .bop {!}Pp) ? 1.0 : 0.0;
Use .bop {!}Pp for nested predication, with an inner comparison of A vs. B, conditioned on outer predicate Pp.
The simple instruction format without .bop {!}Pp assembles as .AND, providing the following effective operation:
Rd.LO = (A[0] .cmp B[0]) ? 1.0 : 0.0;
Rd.HI = (A[1] .cmp B[1]) ? 1.0 : 0.0;
HSET2.GT R2, -R0.H1_H0, R1.F32; # R2.LO = R0.H0 > R1 ? 1.0 : 0.0, R2.HI = R0.H1 > R1 ? 1.0 : 0.0
HSET2.LTE R2, -R0.F32, R1.H0_H0; # R2.LO = R0 <= R1.H0 ? 1.0 : 0.0, R2.HI = R0 <= R1.H0 ? 1.0 : 0.0