F2F

SPA 5.0:
        {@{!}Pg}   F2F{.FTZ}{.dstfmt.srcfmt}{.rnd}{.SAT}   Rd{.CC},{-}{|}Sb{.extract}{|}   {&req_6}   {&rdN}   {&wrN}   {?sched}   ;   

 .FTZ         fp32 denorm inputs and outputs are flushed to sign-preserving 0.0
              (as long as neither .dstfmt/.srcfmt are FP64).

 .srcfmt:     { .F16, .F32*, .F64, INVALID } 
 .dstfmt:     { .F16, .F32*, .F64, INVALID } 
              ----------------------------------------
                 .SRCFMT     .DSTFMT       Status
              ----------------------------------------
                  .F16        .F16           OK
                  .F16        .F32           OK
                  .F16        .F64        Illegal
                  .F32        .F16           OK
                  .F32        .F32           OK
                  .F32        .F64           OK
                  .F64        .F16        Illegal
                  .F64        .F32           OK
                  .F64        .F64           OK
              ----------------------------------------

 .rnd:        if (.srcfmt < .dstfmt)
                { INVALID } 
              else if (.srcfmt == .dstfmt)
                { .PASS*, .ROUND, .FLOOR, .CEIL, .TRUNC } 
              else
                { .RN*, .RM, .RP, .RZ } 

 .SAT:        saturate to (+0.0,1.0). NaN is converted to +0.0.
              Only supported if .srcfmt and .dstfmt are not FP64.

 .CC:         Write condition codes

 .extract:    {.H0,.H1}
              .F16 extraction from bottom(H0) or top(H1) 16b.
              Only legal if srcfmt==.F16
              

The following source Sb combinations are allowed for FP16 input:
    Sb(register)
    Sb(constant with immediate address)
    Sb(((#Imm20 & 0x0000_ffff)<<16) | (#Imm20 & 0x0000_ffff))

The following source Sb combinations are allowed for FP32 input:
    Sb(register)
    Sb(constant with immediate address)
    Sb(#IMM20<<12)

The following source Sb combinations are allowed for FP64 input:
    Sb(even aligned register)
    Sb(64-bit constant with immediate address -- 
       Upper 32-bits are taken from constant, lower 3 address bits must be 0x4, 
       Lower 32 bits of Sb are always 0)
    Sb(#IMM20<<44)

The floating point contents of source Sb are converted/moved into destination Rd.

  .F16 is 1.5.10 format              // denorms are supported/don't-care in all APIs
  .F32 is 1.8.23 format              // denorms NOT supported in DirectX
  .F64 is 1.11.52 format             // denorms are supported/don't-care in all modes

Write destination of .F16 would be padded with top 16b of 0.

The conversions are as follows:

Optional absolute value and/or negate. If both are specified, the abs is done first.

Conversion modes (.rnd):

If the source format has less mantissa bits than the destination format: (encoding fields IR[1] and RND[2] ignored) lsb pad destination mantissa with 0's
If the source format is the same as the destination format and encoding field IR[1] = PASS: Pass-through (encoding field RND[2] ignored)

If the source format is the same as the destination format and encoding field IR[1] = RND:

         0: .ROUND - Integer Round to the nearest even       // round
         1: .FLOOR - Integer Round towards -Infinity         // floor
         2: .CEIL  - Integer Round towards +Infinity         // ceiling
         3: .TRUNC - Integer Round towards 0                 // truncate

If the source format has more mantissa bits than the destination format: (encoding field IR[1] ignored)

         0: .RN - Mantissa lsb round to nearest even
         1: .RM - Mantissa lsb round towards -Infinity 
         2: .RP - Mantissa lsb rount towards +Infinity
         3: .RZ - Mantissa lsb round towards 0

Optional saturate (.SAT) to (+0.0,1.0). NaN is converted to +0.0.

F2F : Floating Point To Floating Point Conversion

Format:

Description:

Examples: