TLDS : Texture Load with scalar/non-vec4 source/destinations

Format

SPA 5.0:
{@{!}Pg} TLDS.lod{.AOFFI}{.MS}{.NODEP}{.phase} RZ, Rd0, Ra{, Rb}, #tsPtrIdxU13, #paramA, #wmskU2C {&req_6} {&rdN} {&wrN} {?sched} ; {@{!}Pg} TLDS.lod{.AOFFI}{.MS}{.NODEP}{.phase} Rd1, Rd0, Ra{, Rb}, #tsPtrIdxU13, #paramA{, #wmskU34C} {&req_6} {&rdN} {&wrN} {?sched} ; offer support for packed FP16 data return. {@{!}Pg} TLDS{.F16}.lod{.AOFFI}{.MS}{.NODEP}{.phase} RZ, Rd0, Ra{, Rb}, #tsPtrIdxU13, #paramA, #wmskU2C {&req_6} {&rdN} {&wrN} {?sched} ; {@{!}Pg} TLDS{.F16}.lod{.AOFFI}{.MS}{.NODEP}{.phase} Rd1, Rd0, Ra{, Rb}, #tsPtrIdxU13, #paramA{, #wmskU34C} {&req_6} {&rdN} {&wrN} {?sched} ; .lod: LOD adjust mode. .LZ - LOD level 0 (finest) // no register required .LL - LOD absolute // 1 U32 register required\ LOD Level 0 actually selects the level set by textureHeader.resViewMinMapLevel. .AOFFI: Programmable Texture Offset. _aoffimmi(u,v,w) [DX10] // 1 register required ((w & 0xf)<<8) | ((v & 0xf)<<4) | (u & 0xf) .MS: Programmable Multisample location. .MS can only be used with the .LZ LOD option, and 2D/ARRAY_2D textures. Multisample location // 1 U32 register required .NODEP: Indicates that there is no subsequent quad derivatives to be calculated. Threads that have been "killed" will be disabled to stop unnecessary texture fetches.

.phase: Allows control on the current warps texture hash, used for scheduling.
< NONE >
.T - postfix increment of the 3 bit texture component of the hash.
.P - postfix increment of the 5 bit phase component, and zero out the 3 bit texture component of the hash.

Immediate Inputs:

#tsPtrIdxU13:
This immediate index (word address) is used to fetch the packed header+sampler pointer entry from constant cache. The bank from
which it is fetched is determined by bundle state. The constant bank entry is 32 bit structure of the form
"samplerPtr[31:20] | headerPtr[19:0]". Only headerPtr is used by this instruction.
Any header pointer greater than one specified in SetTexHeaderPoolC.MaximumIndex will be regarded as an "invalid"
texture (i.e. equivalent to BIND_GROUP_TEXTURE_HEADER_VALID_FALSE in fermi).

.F16: If specified, texture return data is in packed FP16 format.
Otherwise, the return data is in 32 bit format (fp32 or S/UINT32).
Partial register writes do no occur: any unused portion of the return
register is written with the value 0.
Note: .F16 modifier is not supported for integer textures in SPA 5.2.
(return value is UNPREDICTABLE)

#paramA: source coordinate description.
Valid paramA specifiers for TLDS
parameterCoordinate Registers implied
1Ds
2Ds,t
3Ds,t,r
RESERVED// for CUBE
RESERVED// for ARRAY_1D
ARRAY_2Da,s,t
RESERVED// for ARRAY_3D
RESERVED// for ARRAY_CUBE
           s,t,r are fp32, 
a is U16 integer
  If the source coordinate description does not match the texture type of the texture header,
  zeroes will be returned.  The array specifiers can be freely used with non-array textures
  (and the opposite holds as well), provided the number of coordinates (1D,2D,3D) matches.

wmsk2C : {R, G, B, A, RG, RA, GA, BA}  // destination write mask for up to 2 component writeback.
wmsk34C: {RGB, RGA, GBA, RBA, RGBA*}   // destination write mask for 3 or 4 component writeback.

Not all combinations of .lod, .AOFFI, .MS, and #paramA are allowed.  See the encoding table in the Description, below.

Rounding mode is controlled by a PRI: [SM]PRI_SM_TEXIO_CONTROL_FP16_ROUNDING_MODE.  It must be set to the same value
as PRI_TEX_F_DBG_FP16_ROUNDING_MODE.

Description

Texture load (point sample only) using a texture coordinates/parameters packed in Ra/Rb registers. The assignment of parameters to Ra/Rb is as follows: The return data is written back to registers Rd0, Rd1 based on wmsk2C/34C specification. Legal instruction modifiers for TEXS and corresponding parameter packing in Ra and Rb is specified below.

    Legal modifier table
    encoding #paramA .lod .AOFFI .MS Ra Rb
    0 1D .LZ - - s must be RZ
    1 1D .LL - - s lod
    2 2D .LZ - - s t
    4 2D .LZ .AOFFI - s,t aoffi
    5 2D .LL - - s,t lod
    6 2D .LZ - MS s,t ms
    7 3D .LZ - - s,t r
    8 ARRAY_2D .LZ - - array s,t
    12 2D .LL .AOFFI - s,t lod,aoffi

HW note: These parameter combintions are encoded as tld2d_4 field, where bit [2] of field indicates Ra size (1 = vec2, 0:scalar). Similarly bit[3] indicates Rb size (1:vec2, 0:scalar)

For destination registers Rd1,Rd0, the following restrictions apply based on wmsk specification.

    Legal modifier table for Rd1,Rd0,wmsk for 32 bit return data (.FP16 is not present)
    Rd1 wmsk wmsk encoding Rd0-size Rd0-packing Rd1-size Rd1-packing
    RZ R 0 scalar Rd0+0 = R component none must be RZ
    RZ G 1 scalar Rd0+0 = G component none must be RZ
    RZ B 2 scalar Rd0+0 = B component none must be RZ
    RZ A 3 scalar Rd0+0 = A component none must be RZ
    RZ RG 4 vec2 Rd0+0 = R component,
    Rd0+1 = G component
    none must be RZ
    RZ RA 5 vec2 Rd0+0 = R component,
    Rd0+1 = A component
    none must be RZ
    RZ GA 6 vec2 Rd0+0 = G component,
    Rd0+1 = A component
    none must be RZ
    RZ BA 7 vec2 Rd0+0 = B component,
    Rd0+1 = A component
    none must be RZ
    non-RZ RGB 0 vec2 Rd0+0 = R component,
    Rd0+1 = G component
    scalar Rd1+0=B component
    non-RZ RGA 1 vec2 Rd0+0 = R component,
    Rd0+1 = G component
    scalar Rd1+0=A component
    non-RZ RBA 2 vec2 Rd0+0 = R component,
    Rd0+1 = B component
    scalar Rd1+0=A component
    non-RZ GBA 3 vec2 Rd0+0 = G component,
    Rd0+1 = B component
    scalar Rd1+0=A component
    non-RZ RGBA 4 vec2 Rd0+0 = R component,
    Rd0+1 = G component
    vec2 Rd1+0=B component,
    Rd1+1=A component
    Legal modifier table for Rd1,Rd0,wmsk for packed FP16 return data (.FP16 is present)
    Rd1 wmsk wmsk encoding Rd0-size Rd0-packing Rd1-size Rd1-packing
    RZ R 0 scalar Rd0[15:0] = R component,
    Rd0[31:16] = 0
    none must be RZ
    RZ G 1 scalar Rd0[15:0] = G component,
    Rd0[31:16] = 0
    none must be RZ
    RZ B 2 scalar Rd0[15:0] = B component,
    Rd0[31:16] = 0
    none must be RZ
    RZ A 3 scalar Rd0[15:0] = A component,
    Rd0[31:16] = 0
    none must be RZ
    RZ RG 4 scalar Rd0[15:0] = R component,
    Rd0[31:16] = G component
    none must be RZ
    RZ RA 5 scalar Rd0[15:0] = R component,
    Rd0[31:16] = A component
    none must be RZ
    RZ GA 6 scalar Rd0[15:0] = G component,
    Rd0[31:16] = A component
    none must be RZ
    RZ BA 7 scalar Rd0[15:0] = B component,
    Rd0[31:16] = A component
    none must be RZ
    non-RZ RGB 0 scalar Rd0[15:0] = R component,
    Rd0[31:16] = G component
    scalar Rd1[15:0] =B component,
    Rd1[31:16] = 0
    non-RZ RGA 1 scalar Rd0[15:0] = R component,
    Rd0[31:16] = G component
    scalar Rd1[15:0] =A component,
    Rd1[31:16] = 0
    non-RZ RBA 2 scalar Rd0[15:0] = R component,
    Rd0[31:16] = B component
    scalar Rd1[15:0] =A component,
    Rd1[31:16] = 0
    non-RZ GBA 3 scalar Rd0[15:0] = G component,
    Rd0[31:16] = B component
    scalar Rd1[15:0] =A component,
    Rd1[31:16] = 0
    non-RZ RGBA 4 scalar Rd0[15:0] = R component,
    Rd0[31:16] = G component
    scalar Rd1[15:0] =B component,
    Rd1[31:16]=A component

The texture parameter source registers Ra/Rb and the destination (result) registers Rd0/Rd1 have alignment restrictions based on the number of scalar registers being read/written. Specifically,

  1. Rd0/Rd1 should be aligned to number of valid components being returned (as specified by wmask)
  2. Ra/Rb should always be aligned to
    1. 1 (scalar register) if the scalar count for that register (Ra or Rb) is 1
    2. 2 (vec2 register) if the scalar count for that register (Ra or Rb) is 2
  3. Rb must be specified as RZ if no parameters need to be packed in Rb.
  4. Ra/Rb must not be specified as RZ if any parameters need to be packed in Ra/Rb.

Some input texture values will be sanitized before being used.

Additional Information:

Corresponds to these DX ops:

   ld        =  TLDS                    // load
ld2ms = TLDS.MS // load multisample

Texture Header State Overrides

Unlike other texture instructions, TLDS overrides a great deal of texture header/sampler state with different values. The following tables show how the texture state will be treated for these instructions

    +--------------------------+--------------------------+
| Header Field | TLDS Value |
+--------------------------+--------------------------+
| UseHeaderOptControl | FALSE |
| MaxAnisotropy | ANISO_1_TO_1 |
+--------------------------+--------------------------+

+--------------------------+--------------------------+
| Sampler Field | TLDS Value |
+--------------------------+--------------------------+
| MagFiler | MAG_POINT |
| MinFilter | MIN_POINT |
| MipFilter | MIP_POINT |
| MaxAnisotropy | ANISO_1_TO_1 |
| BorderColorR | 0 |
| BorderColorG | 0 |
| BorderColorB | 0 |
| BorderColorA | 0 |
| sRGBBorderColorR | 0 |
| sRGBBorderColorR | 0 |
| sRGBBorderColorR | 0 |
| sRGBBorderColorR | 0 |
| DepthCompare | FALSE |
+--------------------------+--------------------------+

Examples:

TLDS.LZ      R0, R4, R9, R11, 0x7, 2D, RGBA; // reads R9 & R11; writes R0, R1, R4, & R5
TLDS.LZ.MS RZ, R9, R6, R11, 0x0, 2D, R; // reads R6, R7, & R11; writes R9

Back to Index of Instructions