SPA 5.0:
{@{!}Pg}
TLDS.lod{.AOFFI}{.MS}{.NODEP}{.phase}
RZ, Rd0, Ra{, Rb}, #tsPtrIdxU13, #paramA, #wmskU2C
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
offer support for packed FP16 data return.
{@{!}Pg}
TLDS.lod{.AOFFI}{.MS}{.NODEP}{.phase}
Rd1, Rd0, Ra{, Rb}, #tsPtrIdxU13, #paramA{, #wmskU34C}
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
{@{!}Pg}
TLDS{.F16}.lod{.AOFFI}{.MS}{.NODEP}{.phase}
RZ, Rd0, Ra{, Rb}, #tsPtrIdxU13, #paramA, #wmskU2C
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
.lod: LOD adjust mode. .LZ - LOD level 0 (finest) // no register required .LL - LOD absolute // 1 U32 register required\ LOD Level 0 actually selects the level set by textureHeader.resViewMinMapLevel. .AOFFI: Programmable Texture Offset. _aoffimmi(u,v,w) [DX10] // 1 register required ((w & 0xf)<<8) | ((v & 0xf)<<4) | (u & 0xf) .MS: Programmable Multisample location. .MS can only be used with the .LZ LOD option, and 2D/ARRAY_2D textures. Multisample location // 1 U32 register required .NODEP: Indicates that there is no subsequent quad derivatives to be calculated. Threads that have been "killed" will be disabled to stop unnecessary texture fetches.
{@{!}Pg}
TLDS{.F16}.lod{.AOFFI}{.MS}{.NODEP}{.phase}
Rd1, Rd0, Ra{, Rb}, #tsPtrIdxU13, #paramA{, #wmskU34C}
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
.phase: Allows control on the current warps texture hash, used for scheduling.
< NONE >
.T - postfix increment of the 3 bit texture component of the hash.
.P - postfix increment of the 5 bit phase component, and zero out the 3 bit texture component of the hash.
Immediate Inputs:
#tsPtrIdxU13:
This immediate index (word address) is used to fetch the packed header+sampler pointer entry from constant cache. The bank from
which it is fetched is determined by bundle state. The constant bank entry is 32 bit structure of the form
"samplerPtr[31:20] | headerPtr[19:0]". Only headerPtr is used by this instruction.
Any header pointer greater than one specified in SetTexHeaderPoolC.MaximumIndex will be regarded as an "invalid"
texture (i.e. equivalent to BIND_GROUP_TEXTURE_HEADER_VALID_FALSE in fermi).
.F16: If specified, texture return data is in packed FP16 format.
Otherwise, the return data is in 32 bit format (fp32 or S/UINT32).
Partial register writes do no occur: any unused portion of the return
register is written with the value 0.
Note: .F16 modifier is not supported for integer textures in SPA 5.2.
(return value is UNPREDICTABLE)
#paramA: source coordinate description.
parameter | Coordinate Registers implied |
---|---|
1D | s |
2D | s,t |
3D | s,t,r |
RESERVED | // for CUBE |
RESERVED | // for ARRAY_1D |
ARRAY_2D | a,s,t |
RESERVED | // for ARRAY_3D |
RESERVED | // for ARRAY_CUBE |
s,t,r are fp32,
a is U16 integer
If the source coordinate description does not match the texture type of the texture header, zeroes will be returned. The array specifiers can be freely used with non-array textures (and the opposite holds as well), provided the number of coordinates (1D,2D,3D) matches. wmsk2C : {R, G, B, A, RG, RA, GA, BA} // destination write mask for up to 2 component writeback. wmsk34C: {RGB, RGA, GBA, RBA, RGBA*} // destination write mask for 3 or 4 component writeback. Not all combinations of .lod, .AOFFI, .MS, and #paramA are allowed. See the encoding table in the Description, below. Rounding mode is controlled by a PRI: [SM]PRI_SM_TEXIO_CONTROL_FP16_ROUNDING_MODE. It must be set to the same value as PRI_TEX_F_DBG_FP16_ROUNDING_MODE.
Texture load (point sample only) using a texture coordinates/parameters packed in Ra/Rb registers. The assignment of parameters to Ra/Rb is as follows: The return data is written back to registers Rd0, Rd1 based on wmsk2C/34C specification. Legal instruction modifiers for TEXS and corresponding parameter packing in Ra and Rb is specified below.
encoding | #paramA | .lod | .AOFFI | .MS | Ra | Rb |
---|---|---|---|---|---|---|
0 | 1D | .LZ | - | - | s | must be RZ |
1 | 1D | .LL | - | - | s | lod |
2 | 2D | .LZ | - | - | s | t |
4 | 2D | .LZ | .AOFFI | - | s,t | aoffi |
5 | 2D | .LL | - | - | s,t | lod |
6 | 2D | .LZ | - | MS | s,t | ms |
7 | 3D | .LZ | - | - | s,t | r |
8 | ARRAY_2D | .LZ | - | - | array | s,t |
12 | 2D | .LL | .AOFFI | - | s,t | lod,aoffi |
HW note: These parameter combintions are encoded as tld2d_4 field, where bit [2] of field indicates Ra size (1 = vec2, 0:scalar). Similarly bit[3] indicates Rb size (1:vec2, 0:scalar)
For destination registers Rd1,Rd0, the following restrictions apply based on wmsk specification.
Rd1 | wmsk | wmsk encoding | Rd0-size | Rd0-packing | Rd1-size | Rd1-packing |
---|---|---|---|---|---|---|
RZ | R | 0 | scalar | Rd0+0 = R component | none | must be RZ |
RZ | G | 1 | scalar | Rd0+0 = G component | none | must be RZ |
RZ | B | 2 | scalar | Rd0+0 = B component | none | must be RZ |
RZ | A | 3 | scalar | Rd0+0 = A component | none | must be RZ |
RZ | RG | 4 | vec2 | Rd0+0 = R component, Rd0+1 = G component |
none | must be RZ |
RZ | RA | 5 | vec2 | Rd0+0 = R component, Rd0+1 = A component |
none | must be RZ |
RZ | GA | 6 | vec2 | Rd0+0 = G component, Rd0+1 = A component |
none | must be RZ |
RZ | BA | 7 | vec2 | Rd0+0 = B component, Rd0+1 = A component |
none | must be RZ |
non-RZ | RGB | 0 | vec2 | Rd0+0 = R component, Rd0+1 = G component |
scalar | Rd1+0=B component |
non-RZ | RGA | 1 | vec2 | Rd0+0 = R component, Rd0+1 = G component |
scalar | Rd1+0=A component |
non-RZ | RBA | 2 | vec2 | Rd0+0 = R component, Rd0+1 = B component |
scalar | Rd1+0=A component |
non-RZ | GBA | 3 | vec2 | Rd0+0 = G component, Rd0+1 = B component |
scalar | Rd1+0=A component |
non-RZ | RGBA | 4 | vec2 | Rd0+0 = R component, Rd0+1 = G component |
vec2 | Rd1+0=B component, Rd1+1=A component |
Rd1 | wmsk | wmsk encoding | Rd0-size | Rd0-packing | Rd1-size | Rd1-packing |
---|---|---|---|---|---|---|
RZ | R | 0 | scalar | Rd0[15:0] = R component, Rd0[31:16] = 0 |
none | must be RZ |
RZ | G | 1 | scalar | Rd0[15:0] = G component, Rd0[31:16] = 0 |
none | must be RZ |
RZ | B | 2 | scalar | Rd0[15:0] = B component, Rd0[31:16] = 0 |
none | must be RZ |
RZ | A | 3 | scalar | Rd0[15:0] = A component, Rd0[31:16] = 0 |
none | must be RZ |
RZ | RG | 4 | scalar | Rd0[15:0] = R component, Rd0[31:16] = G component |
none | must be RZ |
RZ | RA | 5 | scalar | Rd0[15:0] = R component, Rd0[31:16] = A component |
none | must be RZ |
RZ | GA | 6 | scalar | Rd0[15:0] = G component, Rd0[31:16] = A component |
none | must be RZ |
RZ | BA | 7 | scalar | Rd0[15:0] = B component, Rd0[31:16] = A component |
none | must be RZ |
non-RZ | RGB | 0 | scalar | Rd0[15:0] = R component, Rd0[31:16] = G component |
scalar | Rd1[15:0] =B component, Rd1[31:16] = 0 |
non-RZ | RGA | 1 | scalar | Rd0[15:0] = R component, Rd0[31:16] = G component |
scalar | Rd1[15:0] =A component, Rd1[31:16] = 0 |
non-RZ | RBA | 2 | scalar | Rd0[15:0] = R component, Rd0[31:16] = B component |
scalar | Rd1[15:0] =A component, Rd1[31:16] = 0 |
non-RZ | GBA | 3 | scalar | Rd0[15:0] = G component, Rd0[31:16] = B component |
scalar | Rd1[15:0] =A component, Rd1[31:16] = 0 |
non-RZ | RGBA | 4 | scalar | Rd0[15:0] = R component, Rd0[31:16] = G component |
scalar | Rd1[15:0] =B component, Rd1[31:16]=A component |
The texture parameter source registers Ra/Rb and the destination (result) registers Rd0/Rd1 have alignment restrictions based on the number of scalar registers being read/written. Specifically,
Some input texture values will be sanitized before being used.
Corresponds to these DX ops:
ld = TLDS // load
ld2ms = TLDS.MS // load multisample
Unlike other texture instructions, TLDS overrides a great deal of texture header/sampler state with different values. The following tables show how the texture state will be treated for these instructions
+--------------------------+--------------------------+
| Header Field | TLDS Value |
+--------------------------+--------------------------+
| UseHeaderOptControl | FALSE |
| MaxAnisotropy | ANISO_1_TO_1 |
+--------------------------+--------------------------+
+--------------------------+--------------------------+
| Sampler Field | TLDS Value |
+--------------------------+--------------------------+
| MagFiler | MAG_POINT |
| MinFilter | MIN_POINT |
| MipFilter | MIP_POINT |
| MaxAnisotropy | ANISO_1_TO_1 |
| BorderColorR | 0 |
| BorderColorG | 0 |
| BorderColorB | 0 |
| BorderColorA | 0 |
| sRGBBorderColorR | 0 |
| sRGBBorderColorR | 0 |
| sRGBBorderColorR | 0 |
| sRGBBorderColorR | 0 |
| DepthCompare | FALSE |
+--------------------------+--------------------------+
TLDS.LZ R0, R4, R9, R11, 0x7, 2D, RGBA; // reads R9 & R11; writes R0, R1, R4, & R5
TLDS.LZ.MS RZ, R9, R6, R11, 0x0, 2D, R; // reads R6, R7, & R11; writes R9