VMAD : Integer Byte/Short Multiply Add

Format

SPA 5.0:
        {@{!}Pg}   VMAD{.safmt32.sbfmt32}{.PO}{.scale}{.SAT}       Rd{.CC}, {-}Ra{.partselA}, {-}Rb{.partselB}, {-}Rc   {&req_6}   {&rdN}   {&wrN}   {?sched}   ;   
        {@{!}Pg}   VMAD{.safmt32.sbfmt8_16}{.PO}{.scale}{.SAT}     Rd{.CC}, {-}Ra{.partselA}, {-}Rb{.partselB}, {-}Rc   {&req_6}   {&rdN}   {&wrN}   {?sched}   ;   
        {@{!}Pg}   VMAD{.safmt8_16.sbfmt32}{.PO}{.scale}{.SAT}     Rd{.CC}, {-}Ra{.partselA}, {-}Rb{.partselB}, {-}Rc   {&req_6}   {&rdN}   {&wrN}   {?sched}   ;   
        {@{!}Pg}   VMAD{.safmt8_16.sbfmt8_16}{.PO}{.scale}{.SAT}   Rd{.CC}, {-}Ra{.partselA}, {-}Rb{.partselB}, {-}Rc   {&req_6}                     {?sched}   ;   

        {@{!}Pg}   VMAD{.safmt32.ifmt}{.PO}{.scale}{.SAT}          Rd{.CC}, {-}Ra{.partselA}, {-}#imm16,        {-}Rc   {&req_6}   {&rdN}   {&wrN}   {?sched}   ;   
        {@{!}Pg}   VMAD{.safmt8_16.ifmt}{.PO}{.scale}{.SAT}        Rd{.CC}, {-}Ra{.partselA}, {-}#imm16,        {-}Rc   {&req_6}                     {?sched}   ;   

 .safmt32:     { .U32, .S32* } 
 .safmt8_16:   { .U16, .S16, .U8, .S8 } 
 .sbfmt32:     { .U32, .S32* } 
 .sbfmt8_16:   { .U16, .S16, .U8, .S8 } 
 .ifmt:        { .U16, .S16*  }


              Source formats; if either safmt or sbfmt is 32-bits (which is the default), then VMAD is a decoupled op.
              If both safmt and either sbfmt or ifmt are both 8 or 16 bits, then VMAD is a coupled op.

 .PO:         Plus one (used in computing averages)

 .scale:     { .PASS*, .SHR_7, .SHR_15 }
             normalization shift count after the multiply/accumulate stage:
             .PASS  :  shr = tmp
             .SHR_7 :  shr = tmp >> 7  // sign extended if final result is signed,
                                          zero-extended otherwise
             .SHR_15:  shr = tmp >> 15 // sign extended if final result is signed,
                                          zero-extended otherwise

 .partselA:  if (.U8|.S8)   { .B0*, .B1, .B2, .B3 } 
             if (.U16|.S16) { .H0*, .H1 }

 .partselB:  if (.U8|.S8)   { .B0*, .B1, .B2, .B3 } 
             if (.U16|.S16) { .H0*, .H1 }

Description

Multiply and add sources into destination.

The set of datapaths VMAD can be dispatched to depend on widths of Ra/Rb operands.

Basic math operation:

Depending on the sign of the Ra and Rb/#imm16 operand, and the negates for Ra/Rb/Rc/#imm16, the following combinations of operands are supported for VMAD:

(Here the input operands Ra/Rb/#imm16 are extended to full 32 bits, even if they were specified as U16/S16 or U8/S8.)

    tmp =  (Ra  * Rb ) + Rc:
    tmp =  (Ra  *#imm) + Rc:

    tmp =  (U32 * U32) + U32   // intermediate unsigned; final unsigned
    tmp = -(U32 * U32) + S32   // intermediate   signed; final   signed
    tmp =  (U32 * U32) - U32   // intermediate unsigned; final   signed
    tmp =  (U32 * S32) + S32   // intermediate   signed; final   signed
    tmp = -(U32 * S32) + S32   // intermediate   signed; final   signed
    tmp =  (U32 * S32) - S32   // intermediate   signed; final   signed
    tmp =  (S32 * U32) + S32   // intermediate   signed; final   signed
    tmp = -(S32 * U32) + S32   // intermediate   signed; final   signed
    tmp =  (S32 * U32) - S32   // intermediate   signed; final   signed
    tmp =  (S32 * S32) + S32   // intermediate   signed; final   signed
    tmp = -(S32 * S32) + S32   // intermediate   signed; final   signed
    tmp =  (S32 * S32) - S32   // intermediate   signed; final   signed

Note: Sass allows optional negates on both Ra and Rb/#imm16 source operands, however this is implemented by negating the result of the product (only if one but not both sources are negated. Since Ra and Rb/#imm16 are not actually negated, VMAD will not detect an overflow in the case where negating the source operand would have caused overflow.

The negate on Ra,Rb/#imm16,Rc and the .PO option is encoded in the 2 bit psign field as follows:

psignOperation Mapping Comments
0
 (A*B)+C 
 (A*B) +  C + 0 
1
 (A*B)-C 
 (A*B) + ~C + 1 
2
-(A*B)+C 
~(A*B) +  C + 1 
3
   .PO   
 (A*B) +  C + 1 
Plus One

Note: A negate of both Ra and Rb/#imm16 will have no effect.
Note: It is illegal to negate both Rc and the result of the product.
Note: It is illegal to negate any source operand in .PO mode.

The tmp variable will contain enough bits to represent all possible values after the multiply/accumulation. No overflow of the tmp variable will occur.

Sign:

The intermediate sign of the product A*B will be unsigned if:

Else the intermediate sign will be signed.

The sign of Rc is the same as the intermediate sign of the multiply.
If Rc is signed it needs to be properly sign extended to 64 bits before addition/subtraction to/from the product.

The final sign will be unsigned if:

Else the final sign will be signed.

Optional Shift Right:

There is a normalization shift after the multiply/accumulate stage:

  2'b00 = .PASS  :  shr = tmp
  2'b01 = .SHR_7 :  shr = tmp >> 7  // sign extended if final result is signed,
                                       zero-extended otherwise
  2'b10 = .SHR_15:  shr = tmp >> 15 // sign extended if final result is signed,
                                       zero-extended otherwise
  2'b11 = illegal

The shr variable will contain enough bits to represent all possible values after the normalization shift. No overflow of the shr variable will occur.

Saturation:

There is an optional saturate of the shifted output:

The input to the saturation is the result from the normalization shift:

  result[31:0] = .SAT ? CLAMP(shr, RANGE_MAX, RANGE_MIN) : shr[31:0]

Second stage operation:

There is no second stage operation (unlike the other V* instructions that do byte/short extraction.

Examples:

VMAD.S16.U16.SAT         R0, R1, R2, R3;
VMAD.U16.U8.SHR_15.SAT   R0, R1, R2, R3;

Back to Index of Instructions