SPA 5.0:
{@{!}Pg}
MUFU.op{.SAT}
Rd,{-}{|}Ra{|}
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
.op: { .COS, .EX2, .LG2, .RCP, .RSQ, .SIN, .RCP64H, .RSQ64H, .SQRT }
.COS - cosine // must pre-process with RRO
.EX2 - exponent base 2 // must pre-process with RRO
.LG2 - logarithm base 2
.RCP - reciprocal
.RSQ - reciprocal square root
.SIN - sine // must pre-process with RRO
.RCP64H - reciprocal // fp64: high 32b (1.11.20) input, same format output
.RSQ64H - reciprocal square root // fp64: high 32b (1.11.20) input, same format output
.SQRT - square root
.SAT: saturate output to (+0.0,1.0) (NaN is converted to +0.0f).
Ignored for .RCP64H and RSQ64H.
.op must be specified. No default.
Denorm input/output is flushed to sign preserving 0.0. Denorm is different between fp32 and fp64.
Hardware multi-function op. Performs the operation specified by .op. SIN, COS, and EX2 operations must first be preprocessed with RRO.
Computing the reciprocal or reciprocal square root of double-precision fp64 numbers uses the .RCP64H or .RSQ64H modes (respectively) as a starting point in a longer sequence.
Computing exactly rounded reciprocal and reciprocal square roots can be done with a longer sequence of instructions as well.
For all operations below, input NaNs and operations that would result in NaN generate NVIDIA canonical NaN (0x7fff_ffff).
Cosine function. The input is a special 1.1.7.23 format 32b word generated by RRO.
COS(-denorm) gives 1.0 COS(-0.0) gives 1.0 COS(+0.0) gives 1.0 COS(+denorm) gives 1.0 COS(-Inf) gives NaN COS(+Inf) gives NaN COS(NaN) gives NaN
COS Accuracy: absolute |error| <= 2^-20.9 in quadrant 00.
Note that COS cannot operate directly on a float, the input has to be preprocessed with RRO first.
Exponential base2 function. The input is a special 1.1.7.23 format 32b word generated by RRO.
EX2(-denorm) gives +1.0 EX2(-0.0) gives +1.0 EX2(+0.0) gives +1.0 EX2(+denorm) gives +1.0 EX2(-Inf) or underflow gives +0.0 EX2(+Inf) or overflow gives +Inf EX2(NaN) gives NaN
EX2 Accuracy: absolute |error| <= 2^-22.5 for fractional part.
Note that EX2 cannot operate directly on a float, the input has to be preprocessed with RRO first.
Logarithm base2 function.
LG2(-denorm) gives -Inf LG2(-0.0) gives -Inf LG2(+0.0) gives -Inf LG2(+denorm) gives -Inf LG2(-Inf) gives NaN LG2(+Inf) gives +Inf LG2( NaN) gives NaN
LG2 Accuracy: absolute |error| <= 2^-22.6 for mantissa.
Reciprocal function. Output must be exactly 1.0 if the input is exactly 1.0.
RCP(-denorm) gives -Inf RCP(-0.0) gives -Inf RCP(+0.0) gives +Inf RCP(+denorm) gives +Inf RCP(-Inf) gives -0.0 RCP(+Inf) gives +0.0 RCP(NaN) gives NaN
RCP Accuracy: absolute |error| <= 2^-23.0 over the range 1.0-2.0.
Reciprocal square root function. Output must be exactly 1.0 if the input is exactly 1.0.
RSQ(-denorm) gives -Inf RSQ(-0.0) gives -Inf RSQ(+0.0) gives +Inf RSQ(+denorm) gives +Inf RSQ(-Inf) gives NaN RSQ(+Inf) gives +0.0 RSQ(NaN) gives NaN
RSQ Accuracy: absolute |error| <= 2^-22.4 over the range 1.0-4.0.
Sine function. The input is a special 1.1.7.23 format 32b word generated by RRO.
SIN(-denorm) gives -0.0 SIN(-0.0) gives -0.0 SIN(+0.0) gives +0.0 SIN(+denorm) gives +0.0 SIN(-Inf) gives NaN SIN(+Inf) gives NaN SIN(NaN) gives NaN
SIN Accuracy: absolute |error| <= 2^-20.9 in quadrant 00.
Note that SIN cannot operate directly on a float, the input has to be preprocessed with RRO first.
FP64 Reciprocal. Input is 32b register containing the high 32b of a fp64 value (1.11.20). Output is also the high 32b of a fp64 value.
Output must be exactly 1.0 if the input is exactly 1.0.
RCP64H(-denorm) gives -Inf RCP64H(-0.0) gives -Inf RCP64H(+0.0) gives +Inf RCP64H(+denorm) gives +Inf RCP64H(-Inf) gives -0.0 RCP64H(+Inf) gives +0.0 RCP64H(NaN) gives NaN
RCP64H Accuracy: close to 2^-20
FP64 Reciprocal square root. Input is 32b register containing the high 32b of a fp64 value (1.11.20). Output is also the high 32b of a fp64 value.
Output must be exactly 1.0 if the input is exactly 1.0.
RSQ64H(-denorm) gives -Inf RSQ64H(-0.0) gives -Inf RSQ64H(+0.0) gives +Inf RSQ64H(+denorm) gives +Inf RSQ64H(-Inf) gives NaN RSQ64H(+Inf) gives +0.0 RSQ64H(NaN) gives NaN
RSQ64H Accuracy: close to 2^-20
Square root function. Output must be exactly 1.0 if the input is exactly 1.0.
SQRT(-denorm) gives -0.0 SQRT(-0.0) gives -0.0 SQRT(+0.0) gives +0.0 SQRT(+denorm) gives +0.0 SQRT(-Inf) gives NaN SQRT(+Inf) gives +Inf SQRT(NaN) gives NaN
SQRT Accuracy: close to 2^-20
MUFU.COS.SAT R0,R1; MUFU.RCP64H R3,R1;