SPA 5.0:
{@{!}Pg}
PRMT{.mode}
Rd,Ra,Sb,Sc
{&req_6}
{?sched}
;
.mode: { .IDX*, .F4E, .B4E, .RC8, .ECL, .ECR, .RC16, INVALID }
Permutation mode (see Description section for details)
PRMT allows the following sources Sb,Sc:
Sb(register), Sc(register)
Sb(constant with immediate address), Sc(register)
Sb(#Imm20), Sc(register)
Sb(register), Sc(constant with immediate address)
Video and Compute frequently need to do alignment and extraction of byte and short data. To accelerate this, PRMT can pick four arbitrary bytes from two 32b registers, and reassemble them into a 32b destination register.
The generic form (.IDX mode) of the permute control consists of four 4-bit selection values. The bytes in the two source registers are numbered from 0 to 7: {Sc, Ra} = {{B7, B6, B5, B4}, {B3, B2, B1, B0}}. For each byte in the target register, a 4-bit selection value is defined.
The 3 lsbs of the selection value specify which of the 8 source bytes should be moved into the target position. The msb defines if the byte value should be copied, or if the sign (msb of the byte) should be replicated over all 8 bits of the target position (sign extend of the byte value); msb=0 means copy the literal value; msb=1 means replicate the sign. Note that the sign extension is only performed as part of generic form (.IDX mode).
Thus, the four 4-bit values fully specify an arbitrary byte permute, as a 16b permute code.
-------------------------------------------------------------------------------------- Control dest byte 3 dest byte 2 dest byte 1 dest byte 0 src select src select src select src select -------------------------------------------------------------------------------------- Index Sb[15:12] Sb[11:8] Sb[7:4] Sb[3:0] --------------------------------------------------------------------------------------
The more specialized form of the permute control uses the two lsb's of Sb (which is typically an address pointer) to control the byte extraction.
-------------------------------------------------------------------------------------- Control selector dest byte 3 dest byte 2 dest byte 1 dest byte 0 Sb[1:0] src src src src -------------------------------------------------------------------------------------- Forward4 extract 0 3 2 1 0 1 4 3 2 1 2 5 4 3 2 3 6 5 4 3 Backward4 extract 0 5 6 7 0 1 6 7 0 1 2 7 0 1 2 3 0 1 2 3 Replicate.8 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 EdgeClampL 0 3 2 1 0 1 3 2 1 1 2 3 2 2 2 3 3 3 3 3 EdgeClampR 0 0 0 0 0 1 1 1 1 0 2 2 2 1 0 3 3 2 1 0 Replicate.16 0 1 0 1 0 1 3 2 3 2 2 1 0 1 0 3 3 2 3 2 --------------------------------------------------------------------------------------
PRMT.ECR R0,R1,R2,R3; PRMT.IDX R0,R1,0x6420,R3;