MUFU : Multi Function Operation

Format:

SPA 5.0:
        {@{!}Pg}   MUFU.op{.SAT}   Rd,{-}{|}Ra{|}   {&req_6}   {&rdN}   {&wrN}   {?sched}   ;   

 .op:        { .COS, .EX2, .LG2, .RCP, .RSQ, .SIN, .RCP64H, .RSQ64H, .SQRT } 
             .COS - cosine                        // must pre-process with RRO
             .EX2 - exponent base 2               // must pre-process with RRO
             .LG2 - logarithm base 2
             .RCP - reciprocal
             .RSQ - reciprocal square root
             .SIN - sine                          // must pre-process with RRO
             .RCP64H - reciprocal                 // fp64: high 32b (1.11.20) input, same format output
             .RSQ64H - reciprocal square root     // fp64: high 32b (1.11.20) input, same format output
             .SQRT - square root

 .SAT:       saturate output to (+0.0,1.0) (NaN is converted to +0.0f).
             Ignored for .RCP64H and RSQ64H.

.op must be specified. No default.

Denorm input/output is flushed to sign preserving 0.0. Denorm is different between fp32 and fp64.

Description:

Hardware multi-function op. Performs the operation specified by .op. SIN, COS, and EX2 operations must first be preprocessed with RRO.

Computing the reciprocal or reciprocal square root of double-precision fp64 numbers uses the .RCP64H or .RSQ64H modes (respectively) as a starting point in a longer sequence.

Computing exactly rounded reciprocal and reciprocal square roots can be done with a longer sequence of instructions as well.

Additional Information:

For all operations below, input NaNs and operations that would result in NaN generate NVIDIA canonical NaN (0x7fff_ffff).

COS:

Cosine function. The input is a special 1.1.7.23 format 32b word generated by RRO.

  COS(-denorm) gives 1.0
  COS(-0.0)    gives 1.0
  COS(+0.0)    gives 1.0
  COS(+denorm) gives 1.0
  COS(-Inf)    gives NaN
  COS(+Inf)    gives NaN
  COS(NaN)     gives NaN

COS Accuracy: absolute |error| <= 2^-20.9 in quadrant 00.
Note that COS cannot operate directly on a float, the input has to be preprocessed with RRO first.

EX2:

Exponential base2 function. The input is a special 1.1.7.23 format 32b word generated by RRO.

  EX2(-denorm)           gives +1.0
  EX2(-0.0)              gives +1.0
  EX2(+0.0)              gives +1.0
  EX2(+denorm)           gives +1.0
  EX2(-Inf) or underflow gives +0.0
  EX2(+Inf) or overflow  gives +Inf
  EX2(NaN)               gives NaN

EX2 Accuracy: absolute |error| <= 2^-22.5 for fractional part.
Note that EX2 cannot operate directly on a float, the input has to be preprocessed with RRO first.

LG2:

Logarithm base2 function.

  LG2(-denorm) gives -Inf
  LG2(-0.0)    gives -Inf
  LG2(+0.0)    gives -Inf
  LG2(+denorm) gives -Inf
  LG2(-Inf)    gives NaN
  LG2(+Inf)    gives +Inf
  LG2( NaN)    gives NaN

LG2 Accuracy: absolute |error| <= 2^-22.6 for mantissa.

RCP:

Reciprocal function. Output must be exactly 1.0 if the input is exactly 1.0.

  RCP(-denorm) gives -Inf
  RCP(-0.0)    gives -Inf
  RCP(+0.0)    gives +Inf
  RCP(+denorm) gives +Inf
  RCP(-Inf)    gives -0.0
  RCP(+Inf)    gives +0.0
  RCP(NaN)     gives NaN

RCP Accuracy: absolute |error| <= 2^-23.0 over the range 1.0-2.0.

RSQ:

Reciprocal square root function. Output must be exactly 1.0 if the input is exactly 1.0.

  RSQ(-denorm) gives -Inf
  RSQ(-0.0)    gives -Inf
  RSQ(+0.0)    gives +Inf
  RSQ(+denorm) gives +Inf
  RSQ(-Inf)    gives NaN
  RSQ(+Inf)    gives +0.0
  RSQ(NaN)     gives NaN

RSQ Accuracy: absolute |error| <= 2^-22.4 over the range 1.0-4.0.

SIN:

Sine function. The input is a special 1.1.7.23 format 32b word generated by RRO.

  SIN(-denorm) gives -0.0
  SIN(-0.0)    gives -0.0
  SIN(+0.0)    gives +0.0
  SIN(+denorm) gives +0.0
  SIN(-Inf)    gives NaN
  SIN(+Inf)    gives NaN
  SIN(NaN)     gives NaN

SIN Accuracy: absolute |error| <= 2^-20.9 in quadrant 00.
Note that SIN cannot operate directly on a float, the input has to be preprocessed with RRO first.

RCP64H:

FP64 Reciprocal. Input is 32b register containing the high 32b of a fp64 value (1.11.20). Output is also the high 32b of a fp64 value.

Output must be exactly 1.0 if the input is exactly 1.0.

  RCP64H(-denorm) gives -Inf
  RCP64H(-0.0)    gives -Inf
  RCP64H(+0.0)    gives +Inf
  RCP64H(+denorm) gives +Inf
  RCP64H(-Inf)    gives -0.0
  RCP64H(+Inf)    gives +0.0
  RCP64H(NaN)     gives NaN

RCP64H Accuracy: close to 2^-20

RSQ64H:

FP64 Reciprocal square root. Input is 32b register containing the high 32b of a fp64 value (1.11.20). Output is also the high 32b of a fp64 value.

Output must be exactly 1.0 if the input is exactly 1.0.

  RSQ64H(-denorm) gives -Inf
  RSQ64H(-0.0)    gives -Inf
  RSQ64H(+0.0)    gives +Inf
  RSQ64H(+denorm) gives +Inf
  RSQ64H(-Inf)    gives NaN
  RSQ64H(+Inf)    gives +0.0
  RSQ64H(NaN)     gives NaN

RSQ64H Accuracy: close to 2^-20

SQRT:

Square root function. Output must be exactly 1.0 if the input is exactly 1.0.

  SQRT(-denorm) gives -0.0
  SQRT(-0.0)    gives -0.0
  SQRT(+0.0)    gives +0.0
  SQRT(+denorm) gives +0.0
  SQRT(-Inf)    gives NaN
  SQRT(+Inf)    gives +Inf
  SQRT(NaN)     gives NaN

SQRT Accuracy: close to 2^-20

Examples:

MUFU.COS.SAT      R0,R1;
MUFU.RCP64H       R3,R1;

Back to Index of Instructions