MUFU

SPA 5.0:
        {@{!}Pg}   MUFU.op{.SAT}   Rd,{-}{|}Ra{|}   {&req_6}   {&rdN}   {&wrN}   {?sched}   ;   

 .op:        { .COS, .EX2, .LG2, .RCP, .RSQ, .SIN, .RCP64H, .RSQ64H, .SQRT } 
             .COS - cosine                        // must pre-process with RRO
             .EX2 - exponent base 2               // must pre-process with RRO
             .LG2 - logarithm base 2
             .RCP - reciprocal
             .RSQ - reciprocal square root
             .SIN - sine                          // must pre-process with RRO
             .RCP64H - reciprocal                 // fp64: high 32b (1.11.20) input, same format output
             .RSQ64H - reciprocal square root     // fp64: high 32b (1.11.20) input, same format output
             .SQRT - square root

 .SAT:       saturate output to (+0.0,1.0) (NaN is converted to +0.0f).
             Ignored for .RCP64H and RSQ64H.

.op must be specified. No default.

Denorm input/output is flushed to sign preserving 0.0. Denorm is different between fp32 and fp64.

For all operations below, input NaNs and operations that would result in NaN generate NVIDIA canonical NaN (0x7fff_ffff).

COS:

Cosine function. The input is a special 1.1.7.23 format 32b word generated by RRO.

  COS(-denorm) gives 1.0
  COS(-0.0)    gives 1.0
  COS(+0.0)    gives 1.0
  COS(+denorm) gives 1.0
  COS(-Inf)    gives NaN
  COS(+Inf)    gives NaN
  COS(NaN)     gives NaN

COS Accuracy: absolute |error| <= 2^-20.9 in quadrant 00.
Note that COS cannot operate directly on a float, the input has to be preprocessed with RRO first.

EX2:

Exponential base2 function. The input is a special 1.1.7.23 format 32b word generated by RRO.

  EX2(-denorm)           gives +1.0
  EX2(-0.0)              gives +1.0
  EX2(+0.0)              gives +1.0
  EX2(+denorm)           gives +1.0
  EX2(-Inf) or underflow gives +0.0
  EX2(+Inf) or overflow  gives +Inf
  EX2(NaN)               gives NaN

EX2 Accuracy: absolute |error| <= 2^-22.5 for fractional part.
Note that EX2 cannot operate directly on a float, the input has to be preprocessed with RRO first.

LG2:

Logarithm base2 function.

  LG2(-denorm) gives -Inf
  LG2(-0.0)    gives -Inf
  LG2(+0.0)    gives -Inf
  LG2(+denorm) gives -Inf
  LG2(-Inf)    gives NaN
  LG2(+Inf)    gives +Inf
  LG2( NaN)    gives NaN

LG2 Accuracy: absolute |error| <= 2^-22.6 for mantissa.

RCP:

Reciprocal function. Output must be exactly 1.0 if the input is exactly 1.0.

  RCP(-denorm) gives -Inf
  RCP(-0.0)    gives -Inf
  RCP(+0.0)    gives +Inf
  RCP(+denorm) gives +Inf
  RCP(-Inf)    gives -0.0
  RCP(+Inf)    gives +0.0
  RCP(NaN)     gives NaN

RCP Accuracy: absolute |error| <= 2^-23.0 over the range 1.0-2.0.

RSQ:

Reciprocal square root function. Output must be exactly 1.0 if the input is exactly 1.0.

  RSQ(-denorm) gives -Inf
  RSQ(-0.0)    gives -Inf
  RSQ(+0.0)    gives +Inf
  RSQ(+denorm) gives +Inf
  RSQ(-Inf)    gives NaN
  RSQ(+Inf)    gives +0.0
  RSQ(NaN)     gives NaN

RSQ Accuracy: absolute |error| <= 2^-22.4 over the range 1.0-4.0.

SIN:

Sine function. The input is a special 1.1.7.23 format 32b word generated by RRO.

  SIN(-denorm) gives -0.0
  SIN(-0.0)    gives -0.0
  SIN(+0.0)    gives +0.0
  SIN(+denorm) gives +0.0
  SIN(-Inf)    gives NaN
  SIN(+Inf)    gives NaN
  SIN(NaN)     gives NaN

SIN Accuracy: absolute |error| <= 2^-20.9 in quadrant 00.
Note that SIN cannot operate directly on a float, the input has to be preprocessed with RRO first.

RCP64H:

FP64 Reciprocal. Input is 32b register containing the high 32b of a fp64 value (1.11.20). Output is also the high 32b of a fp64 value.

Output must be exactly 1.0 if the input is exactly 1.0.

  RCP64H(-denorm) gives -Inf
  RCP64H(-0.0)    gives -Inf
  RCP64H(+0.0)    gives +Inf
  RCP64H(+denorm) gives +Inf
  RCP64H(-Inf)    gives -0.0
  RCP64H(+Inf)    gives +0.0
  RCP64H(NaN)     gives NaN

RCP64H Accuracy: close to 2^-20

RSQ64H:

FP64 Reciprocal square root. Input is 32b register containing the high 32b of a fp64 value (1.11.20). Output is also the high 32b of a fp64 value.

Output must be exactly 1.0 if the input is exactly 1.0.

  RSQ64H(-denorm) gives -Inf
  RSQ64H(-0.0)    gives -Inf
  RSQ64H(+0.0)    gives +Inf
  RSQ64H(+denorm) gives +Inf
  RSQ64H(-Inf)    gives NaN
  RSQ64H(+Inf)    gives +0.0
  RSQ64H(NaN)     gives NaN

RSQ64H Accuracy: close to 2^-20

SQRT:

Square root function. Output must be exactly 1.0 if the input is exactly 1.0.

  SQRT(-denorm) gives -0.0
  SQRT(-0.0)    gives -0.0
  SQRT(+0.0)    gives +0.0
  SQRT(+denorm) gives +0.0
  SQRT(-Inf)    gives NaN
  SQRT(+Inf)    gives +Inf
  SQRT(NaN)     gives NaN

SQRT Accuracy: close to 2^-20

MUFU : Multi Function Operation

Format:

Description:

Additional Information: