TEXS : Texture Fetch with scalar/non-vec4 source/destinations

Format

SPA 5.0:
        {@{!}Pg}   TEXS{.F16}{.lod}{.DC}{.NODEP}{.phase}   RZ, Rd0, Ra{, Rb}, #tsPtrIdxU13, #paramA,  wmsk2C     {&req_6}   {&rdN}   {&wrN}   {?sched}   ;   
        {@{!}Pg}   TEXS{.F16}{.lod}{.DC}{.NODEP}{.phase}   Rd1,Rd0, Ra{, Rb}, #tsPtrIdxU13, #paramA{, wmsk34C}   {&req_6}   {&rdN}   {&wrN}   {?sched}   ;   

.lod:    { .LZ, .LL  } 
        LOD adjust mode.
            < NONE >
            .LZ  - LOD level 0 (finest)       // no register required
            .LL  - LOD absolute discrete      // 1 fp32 register required
	         LOD Level is actually relative to textureHeader.resViewMinMapLevel.

.NODEP: Indicates that there are no subsequent quad derivatives to be calculated.
        Threads that have been "killed" will be disabled to stop unnecessary texture fetches.

.DC:    Depth comparison filter mode using reference value.
        RefVal                           // 1 fp32 register required
        Depth Comparison filter is not supported by 3D textures.
        For TEXS and TEXS.LZ, the .DC option will force a depth comparison filter mode regardless of the sampler state.
        For TEXS.LL, if the sampler state does not enable depth comparison the .DC option 
	will not force a depth comparison filter mode.

.phase: { .T, .P }
        Allows control on the current warps texture hash, used for scheduling.
        Phasing is explained here Texture Phasing.
            < NONE >
            .T - postfix increment of the 3 bit texture component of the hash.
	    .P - postfix increment of the 5 bit phase component, and zero out the 3 bit texture component of the hash. 

#tsPtrIdxU13:
    This immediate index (word address) is used to fetch the packed header+sampler pointer entry from constant cache.
    The bank from which it is fetched is determined by bundle state. The constant bank entry is 32 bit structure of
    the form "samplerPtr[31:20] | headerPtr[19:0]".
    In SetSamplerBinding.ViaHeaderBinding (i.e. OGL) mode, the headerPtr would be used as the samplerPtr as well.
    Any header pointer greater than one specified in SetTexHeaderPoolC.MaximumIndex will be regarded as an "invalid"
    texture (i.e. equivalent to BIND_GROUP_TEXTURE_HEADER_VALID_FALSE in Fermi).
    Any sampler pointer greater than one specified in SetSamplerHeaderPoolC.MaximumIndex will be regarded as an
    "invalid" texture (i.e. equivalent to BIND_GROUP_TEXTURE_HEADER_VALID_FALSE in fermi).

wmsk2C : {R, G, B, A, RG, RA, GA, BA}  // destination write mask for up to 2 component writeback.
wmsk34C: {RGB, RGA, GBA, RBA, RGBA*}   // destination write mask for 3 or 4 component writeback.

.F16:  If specified, texture return data is in packed F16 format. 
     Otherwise, the return data is in 32 bit format (fp32 or S/UINT32).
     Partial register writes do no occur: any unused portion of the return 
     register is written with the value 0.
     Note: .F16 modifier is not supported for integer textures in SPA 5.2.
     UNPREDICTABLE)

#paramA: source coordinate description.
Valid paramA specifiers for TEXS
parameterCoordinate Registers implied
1Ds
2Ds,t
3Ds,t,r
CUBEs,t,r
RESERVED// for ARRAY_1D
ARRAY_2Da,s,t
RESERVED// for ARRAY_3D
RESERVED// for ARRAY_CUBE
           s,t,r are fp32, 
           a is U16 integer
Not all combinations of .lod, .DC, and #paramA are allowed.  See the encoding table in the Description, below.

Rounding mode is controlled by a PRI: [SM]PRI_SM_TEXIO_CONTROL_FP16_ROUNDING_MODE.  It must be set to the same value
as PRI_TEX_F_DBG_FP16_ROUNDING_MODE.

Description

Texture fetch using a texture coordinates/parameters stored in registers Ra,Rb. The return data is written back to registers Rd0, Rd1 based on wmsk2C/wmsk34C specification. Legal instruction modifiers for TEXS and corresponding parameter packing in Ra and Rb is specified below.

    Legal modifier table for #paramA, .DC, .lod
    #paramA .DC .lod encoding Ra-packing Ra-size Rb Rb-size
    1D - .LZ 0 s scalar must be RZ none
    2D - <NONE> 1 s scalar t scalar
    2D - .LZ 2 s scalar t scalar
    2D - .LL 3 s,t vec2 lod scalar
    2D .DC <NONE> 4 s,t vec2 dc scalar
    2D .DC .LL 5 s,t vec2 lod,dc vec2
    2D .DC .LZ 6 s,t vec2 dc scalar
    ARRAY_2D - <NONE> 7 array,s vec2 t scalar
    ARRAY_2D - .LZ 8 array,s vec2 t scalar
    ARRAY_2D .DC .LZ 9 array,s vec2 t,dc vec2
    3D - <NONE> 10 s,t vec2 r scalar
    3D - .LZ 11 s,t vec2 r scalar
    CUBE - <NONE> 12 s,t vec2 r scalar
    CUBE - .LL 13 s,t vec2 r,lod vec2

For destination registers Rd1,Rd0, the following restrictions apply based on wmsk specification.

    Legal modifier table for Rd1,Rd0,wmsk for 32 bit return data (.F16 is not present)
    Rd1 wmsk wmsk encoding Rd0-size Rd0-packing Rd1-size Rd1-packing
    RZ R 0 scalar Rd0+0 = R component none must be RZ
    RZ G 1 scalar Rd0+0 = G component none must be RZ
    RZ B 2 scalar Rd0+0 = B component none must be RZ
    RZ A 3 scalar Rd0+0 = A component none must be RZ
    RZ RG 4 vec2 Rd0+0 = R component,
    Rd0+1 = G component
    none must be RZ
    RZ RA 5 vec2 Rd0+0 = R component,
    Rd0+1 = A component
    none must be RZ
    RZ GA 6 vec2 Rd0+0 = G component,
    Rd0+1 = A component
    none must be RZ
    RZ BA 7 vec2 Rd0+0 = B component,
    Rd0+1 = A component
    none must be RZ
    non-RZ RGB 0 vec2 Rd0+0 = R component,
    Rd0+1 = G component
    scalar Rd1+0=B component
    non-RZ RGA 1 vec2 Rd0+0 = R component,
    Rd0+1 = G component
    scalar Rd1+0=A component
    non-RZ RBA 2 vec2 Rd0+0 = R component,
    Rd0+1 = B component
    scalar Rd1+0=A component
    non-RZ GBA 3 vec2 Rd0+0 = G component,
    Rd0+1 = B component
    scalar Rd1+0=A component
    non-RZ RGBA 4 vec2 Rd0+0 = R component,
    Rd0+1 = G component
    vec2 Rd1+0=B component,
    Rd1+1=A component
    Legal modifier table for Rd1,Rd0,wmsk for packed F16 return data (.F16 is present)
    Rd1 wmsk wmsk encoding Rd0-size Rd0-packing Rd1-size Rd1-packing
    RZ R 0 scalar Rd0[15:0] = R component,
    Rd0[31:16] = 0
    none must be RZ
    RZ G 1 scalar Rd0[15:0] = G component,
    Rd0[31:16] = 0
    none must be RZ
    RZ B 2 scalar Rd0[15:0] = B component,
    Rd0[31:16] = 0
    none must be RZ
    RZ A 3 scalar Rd0[15:0] = A component,
    Rd0[31:16] = 0
    none must be RZ
    RZ RG 4 scalar Rd0[15:0] = R component,
    Rd0[31:16] = G component
    none must be RZ
    RZ RA 5 scalar Rd0[15:0] = R component,
    Rd0[31:16] = A component
    none must be RZ
    RZ GA 6 scalar Rd0[15:0] = G component,
    Rd0[31:16] = A component
    none must be RZ
    RZ BA 7 scalar Rd0[15:0] = B component,
    Rd0[31:16] = A component
    none must be RZ
    non-RZ RGB 0 scalar Rd0[15:0] = R component,
    Rd0[31:16] = G component
    scalar Rd1[15:0] =B component,
    Rd1[31:16] = 0
    non-RZ RGA 1 scalar Rd0[15:0] = R component,
    Rd0[31:16] = G component
    scalar Rd1[15:0] =A component,
    Rd1[31:16] = 0
    non-RZ RBA 2 scalar Rd0[15:0] = R component,
    Rd0[31:16] = B component
    scalar Rd1[15:0] =A component,
    Rd1[31:16] = 0
    non-RZ GBA 3 scalar Rd0[15:0] = G component,
    Rd0[31:16] = B component
    scalar Rd1[15:0] =A component,
    Rd1[31:16] = 0
    non-RZ RGBA 4 scalar Rd0[15:0] = R component,
    Rd0[31:16] = G component
    scalar Rd1[15:0] =B component,
    Rd1[31:16]=A component

The texture parameter source registers Ra/Rb and the destination (result) registers Rd0/Rd1 have alignment restrictions based on the number of scalar registers being read/written. Specifically,

  1. Rd0/Rd1 should be aligned to number of valid components being returned (as specified by wmask)
  2. Ra/Rb should always be aligned to
    1. 1 (scalar register) if the scalar count for that register (Ra or Rb) is 1
    2. 2 (vec2 register) if the scalar count for that register (Ra or Rb) is 2
  3. Rb must be specified as RZ if no parameters need to be packed in Rb.
  4. Ra/Rb must not be specified as RZ if any parameters need to be packed in Ra/Rb.

Some input texture values will be sanitized before being used, see Additional Information for more details.

Additional Information:

TEXS corresponds to these DX ops:

   sample    =  TEXS                   // 
   sample_d  =  TXD/sw emulated        // emulate this via SAM/SWZ/TEX/RAM
   sample_l  =  TEXS.LL                // lod      supplied
   sample_c  =  TEXS.DC                // depth comparison filter with reference value
   sample_lz =  TEXS.LZ                // lod level 0 (finest)

Sanitation of Texture Input Coordinates:

Texture input coordinates go through a sanitation step before being used in texture calculations.

Examples:

TEXS        RZ, R0, R19, R29, 0x1, 2D, RG;   # reads R19 & R29; writes R0 & R1
TEXS.LL     R0, R2,  R4,  R9, 0x3, 2D, RGBA; # reads R4, R5, &  R9; writes R0, R1, R2, & R3
TEXS.DC     R4, R0,  R8, R19, 0x2, 2D, RGBA; # reads R8, R9, & R19; writes R4, R5, R0, & R1

Back to Index of Instructions