SPA 5.3:
{@{!}Pg}
HMUL2{.ofmt}{.fmz}{.SAT}
Rd, {-}{|}Ra{|}{.iswz}, {-}{|}Rb{|}{.iswz}
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
{@{!}Pg}
HMUL2{.ofmt}{.fmz}{.SAT}
Rd, {-}{|}Ra{|}{.iswz}, {-}{|}c[#BankU05][#AddrU16]{|}
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
{{-}{|}#Immfp10H1{|}}, {{-}{|}#Immfp10H0{|}}
{@{!}Pg}
HMUL2{.ofmt}{.fmz}{.SAT}
Rd, {-}{|}Ra{|}{.iswz},
{&req_6}
{&rdN}
{&wrN}
{?sched}
;// Imm order: H1, H0
{{-}{|}#Immfp16H1{|}}, {{-}{|}#Immfp16H0{|}}
{@{!}Pg}
HMUL2_32I{.fmz}{.SAT}
Rd, Ra{.iswz},
.fmz: { < NULL >*, .FTZ, .FMZ, INVALID }
{&req_6}
{&rdN}
{&wrN}
{?sched}
;// Imm order: H1, H0
.fmz controls denorm flush and multiply mode.
< NULL >: Denorms supported. No special handling of 0.
This is default.
.FTZ: Flush input/output denorms to sign-preserving zero.
.FMZ: Flush input/output denorms to sign-preserving zero AND
if either source is 0.0, the product is forced to +0.0
(even if other source is infinity or NaN), regardless
of the input signs. The 0.0 test is done after input
denorm flush.
.SAT: Saturate output to the inclusive range [+0.0 .. 1.0]
(NaN is converted to +0.0)
.ofmt: { .F16_V2*, .F32, .MRG_H0, .MRG_H1 }
Output format.
.F16_V2: Outputs two 16-bit floating point numbers, packed in the 32-bit output.
.F32: Compute a single 16-bit floating point result (simd lane 0) and
outputs a single 32-bit floating point number.
This mode flushes fp16 denorm results to zero prior to the conversion.
Inputs are unaffected.
.MRG_H0: Generates a single 16-bit floating point number (simd lane 0), and
writes it to bottom 16-bits of Rd. The upper bits of
Rd are not modified.
.MRG_H1: Generates a single 16-bit floating point number (simd lane 1), and
writes it to upper 16-bits of Rd. The lower bits of
Rd are not modified.
Note: .MRG_H0/1 modifier support is restricted. See below.
Supports .MRG_H0, .MRG_H1
.iswz: { .H1_H0*, .F32, .H0_H0, .H1_H1 }
Input format.
.H1_H0: Input is a set of two 16-bit floating point numbers.
.F32: Input is a single 32-bit floating point number that
will be converted to a 16-bit floating point number
and replicated to both halves of the SIMD operation.
The conversion will round towards 0 (truncation).
Any denorms generated in FP32 -> FP16 conversion process will flush to 0.
Denorms can still be generated from the operation itself.
.H0_H0: Input is a single 16-bit floating point number in the
lower 16-bits of a 32-bit register, and is replicated
to both halves of the SIMD operation.
.H1_H1: Input is a single 16-bit floating point number in the
upper 16-bits of a 32-bit register, and is replicated
to both halves of the SIMD operation.
immfp16H0 fp16 immediate in {sign, exp[4:0], mant[9:0]} format.
immfp16H1 fp16 immediate in {sign, exp[4:0], mant[9:0]} format.
immfp10H0 Most signficant 10 bits of fp16 immediate.
immfp10H1 Most signficant 10 bits of fp16 immediate.
For HMUL2 with an immediate "Sb" operand .iswz field is not encoded and
behavior defaults to .H1_H0 for the immediate operand. Also Absolute values and
negates are not encoded and default to false. SASS can support absolute/negates
when enclosed in curly braces. e.g {-1.0} or {|-19.5|} and encode appropriate
immediates.
For HMUL2 with a constant "Sb" operand, .iswz is not encoded and
behavior defaults to .F32 for the constant reference operand. Also absolute
value for the constant operand defaults to false.
For HMUL2_32I
- .ofmt is not encoded and behaviour defaults to .F16_V2
- .iswz is specified only for Ra operand. For other source operands .iswz
is not encoded and behaviour defaults to .H1_H0
- Absolute values and negates for the immediate operand are not encoded and
behavior defaults to false.
Note: SASS can support absolute/negates when enclosed in curly braces
(e.g {-1.0} or {|-19.5|}) and encode appropriate immediates.
The sign of operands Ra and Rb/c[][] are encoded together in single sign bit.
For immediate operands forms:
- Sign of both immediates must be same.
- Absolute operator must be present or absent on both immediates together.
- Negate on Ra can be supported by inverting immediates.
First, the components of each input are extracted. Then, for each of the two portions of the SIMD operations, this instruction computes the product of the first and second operand to infinite precision, and then round to the format of 16-bit floating point number, using the round-to-nearest-even algorithm.
Finally, the two results are either merged into the output, or merged with the third operand, or the upper result is discarded and the lower is converted to a 32-bit floating number.
Component extraction and result writes can operate on whole fp32 values, which get converted to fp16 for the operation. These conversions will round towards zero, and will also cause fp16 denormals (subnormals) to be flushed to zero. The flushing to zero occurs independently of the behavior set by the .FTZ or .FMZ modifiers.
Fp16 operations support only one of the 4 required IEEE-754 2008 rounding modes:
The other rounding modes are not supported.
Implicit conversions to and from fp32 are performed with a rounding mode of .RZ (roundTowardZero).
See the IEEE-754 2008 specification, Section 4.3.3.
The chosen NaN behavior for fp16 operations is different than that of fp64 operations. The chosen fp16 behavior is in the spirit of IEEE-754 2008: The standard allows canonicalization of the NaN result to be implementation-defined. See the IEEE-754 2008 specification, Section 6.2.
HMUL2 R7, -|R3|.H0_H1, 0xad1c, 0xffff;
HMUL2.SAT R3, |R8|.H1_H1, -|c[6][60672]|;
HMUL2.F16_V2.FMZ.SAT R1, -|R4|.F32, -|RZ|.H0_H0;
HMUL2_32I R2, RZ.F32, 0xffff, {-|0x8ef7|};