PRMT : Permute Register Pair

Format:

SPA 5.0:
        {@{!}Pg}   PRMT{.mode}   Rd,Ra,Sb,Sc   {&req_6}   {?sched}   ;   

.mode:      { .IDX*, .F4E, .B4E, .RC8, .ECL, .ECR, .RC16, INVALID } 
            Permutation mode (see Description section for details)



PRMT allows the following sources Sb,Sc:
    Sb(register),                         Sc(register)
    Sb(constant with immediate address),  Sc(register)
    Sb(#Imm20),                           Sc(register)
    Sb(register),                         Sc(constant with immediate address)

Description:

Video and Compute frequently need to do alignment and extraction of byte and short data. To accelerate this, PRMT can pick four arbitrary bytes from two 32b registers, and reassemble them into a 32b destination register.

The generic form (.IDX mode) of the permute control consists of four 4-bit selection values. The bytes in the two source registers are numbered from 0 to 7: {Sc, Ra} = {{B7, B6, B5, B4}, {B3, B2, B1, B0}}. For each byte in the target register, a 4-bit selection value is defined.

The 3 lsbs of the selection value specify which of the 8 source bytes should be moved into the target position. The msb defines if the byte value should be copied, or if the sign (msb of the byte) should be replicated over all 8 bits of the target position (sign extend of the byte value); msb=0 means copy the literal value; msb=1 means replicate the sign. Note that the sign extension is only performed as part of generic form (.IDX mode).

Thus, the four 4-bit values fully specify an arbitrary byte permute, as a 16b permute code.

--------------------------------------------------------------------------------------
Control                dest byte 3  dest byte 2  dest byte 1  dest byte 0
                       src select   src select   src select   src select
--------------------------------------------------------------------------------------
Index                    Sb[15:12]     Sb[11:8]      Sb[7:4]      Sb[3:0]
--------------------------------------------------------------------------------------

The more specialized form of the permute control uses the two lsb's of Sb (which is typically an address pointer) to control the byte extraction.

--------------------------------------------------------------------------------------
Control             selector   dest byte 3  dest byte 2  dest byte 1  dest byte 0
                     Sb[1:0]      src          src          src          src 
--------------------------------------------------------------------------------------
Forward4 extract        0          3            2            1            0
                        1          4            3            2            1
                        2          5            4            3            2
                        3          6            5            4            3

Backward4 extract       0          5            6            7            0
                        1          6            7            0            1
                        2          7            0            1            2
                        3          0            1            2            3

Replicate.8             0          0            0            0            0         
                        1          1            1            1            1
                        2          2            2            2            2
                        3          3            3            3            3

EdgeClampL              0          3            2            1            0
                        1          3            2            1            1
                        2          3            2            2            2
                        3          3            3            3            3

EdgeClampR              0          0            0            0            0
                        1          1            1            1            0
                        2          2            2            1            0
                        3          3            2            1            0

Replicate.16            0          1            0            1            0
                        1          3            2            3            2
                        2          1            0            1            0
                        3          3            2            3            2
--------------------------------------------------------------------------------------

Examples:

PRMT.ECR  R0,R1,R2,R3;            
PRMT.IDX  R0,R1,0x6420,R3;            

Back to Index of Instructions