TXD : Texture Fetch With Derivatives

Format

SPA 5.0:
{@{!}Pg} TXD{.B}{.LC}{.AOFFI}{.NODEP}{.phase} {Ps,} Rd, Ra, Rb, #tsPtrIdxU13, #paramA{, #wmskU04} {&req_6} {&rdN} {&wrN} {?sched} ; {@{!}Pg} TXD{.B}{.LC}{.AOFFI}{.NODEP}{.phase} {Ps,} Rd, Ra, Rb, #tidU08, #smpU05, #paramA{, #wmskU04} {&req_6} {&rdN} {&wrN} {?sched} ; .B: Bindless mode, where the texture header pointer and sampler pointer is packed into a 32 bit register as: samplerPtr[31:20] | headerPtr[19:0] Data is sent via register Ra. .LC: LOD Clamp value for Sparse Textures. A 12 bit (u4.8 format) value. Packed with the ARRAY index in the same register. .AOFFI: Programmable Texel Offset. _aoffimmi(u,v,w) [DX10] // 1 register required ((v & 0xf)<<4) | (u & 0xf) Each 4b field is a 2's complement integer from -8 to +7. .NODEP: Indicates that there is no subsequent quad derivatives to be calculated. Threads that have been "killed" will be disabled to stop unnecessary texture fetches.

.phase: Allows control on the current warps texture hash, used for scheduling.
< NONE >
.T - postfix increment of the 3 bit texture component of the hash.
.P - postfix increment of the 5 bit phase component, and zero out the 3 bit texture component of the hash.

Ps:
Predicate returning sparse tile status. Indiate that the surface access is happening to a page marked as sparse (not valid).

Immediate Inputs:

#tsPtrIdxU13:
This immediate index (word address) used to fetch the packed header+sampler pointer entry from constant cache. The bank from
which it is fetched is determined by bundle state. The constant bank entry is 32 bit structure of the form
"samplerPtr[31:20] | headerPtr[19:0]".
Note: Ignored if .B option is used.
In SetSamplerBinding.ViaHeaderBinding (i.e. OGL) mode, the headerPtr would be used as the samplerPtr as well.
Any header pointer greater than one specified in SetTexHeaderPoolC.MaximumIndex will be regarded as an "invalid"
texture (i.e. equivalent to BIND_GROUP_TEXTURE_HEADER_VALID_FALSE in fermi).
Any sampler pointer greater than one specified in SetSamplerHeaderPoolC.MaximumIndex will be regarded as an
"invalid" texture (i.e. equivalent to BIND_GROUP_TEXTURE_HEADER_VALID_FALSE in fermi).

#tidU08, #smpU05:
This is the "almost" Fermi-compatible specification of tsPtrIdxU13 which allows running of legacy apps/traces
where sass will transform these into tsPtrIdxU13 as follows:
tsPtrIdxU13 = {#smpU05, #tidU08}

#paramA: source coordinate description.
Valid paramA specifiers for TXD
parameterCoordinate Registers implied
1Ds
2Ds,t
RESERVED// for 3D
RESERVED// for CUBE
ARRAY_1Da,s
ARRAY_2Da,s,t
RESERVED// for ARRAY_3D
RESERVED// for ARRAY_CUBE
           s,t,r are fp32, 
a is U16 integer
     If the source coordinate description does not match the texture type of the texture header,
     zeroes will be returned.  The array specifiers can be freely used with non-array textures
     (and the opposite holds as well), provided the number of coordinates (1D,2D) matches.

  #wmskU04       destination write mask (decimated contiguous writes)
     Allows for write masking the returning data writes via a bit enable
     for each of R,G,B,A. A four-vector is always returned from TXD.
     #wmskU04 defaults to 0xf.

  Neither Ra nor Rb can be RZ.

Description

Texture fetch using a texture coordinate vector and derivatives.

Note: TXD hardware does not support CUBE and 3D. These must still be emulated by SHFL/TEX.

The parameter assignment in register Ra/Rb is as follows:

Texture parameter packing in Ra and Rb
Reg parameter format
Ra+0 {&nbspSamplerPtr[31:20] |&nbspHeaderPtr[19:0] } u32
Ra+1 s fp32
Ra+2 t fp32
Ra+3 (.LC) ? {&nbspLodClamp[31:20] |&nbsptoff[19:12] |&nbsparray[11:0] }
     &nbsp: {&nbsptoff[27:16] |&nbsparray[15:0] }
u32
Rb+0 dsdx fp32
Rb+1 dsdy fp32
Rb+2 dtdx fp32
Rb+3 dtdy fp32

In the table above, "+0/1/2/3" represents the order of packing parameters in Ra/Rb. If a parameter is not specified, then the rest are compacted upwards within the same Ra or Rb register.

The texture parameter source registers Ra/Rb and the destination (result) register Rd have alignment restrictions based on the number of scalar registers being read/written. Specifically,

  1. Rd should be aligned to number of valid components being returned (as specified by wmask)
  2. Ra/Rb should always be aligned to
    1. 1 (scalar register) if the scalar count for that register (Ra or Rb) is 1
    2. 2 (vec2 register) if the scalar count for that register (Ra or Rb) is 2
    3. 4 (vec4 register) if the scalar count for that register (Ra or Rb) is 3 or 4
  3. Rb should be specified as RZ if no parameters need to be packed in Rb. (However no error is generated if non-RZ register is specified)
  4. Ra/Rb must not be specified as RZ if any parameters need to be packed in Ra/Rb.

Some input texture values will be sanitized before being used.

Additional Information:

Corresponds to these DX ops:

   sample_d

Notes about equivalency with TEX instruction:

Examples:

TXD        R0,R2,R4,5,2D,0xf;

For OpenGL, it is necessary to premultiply the dy values by SR18 to cancel the effects of the origin aware DDY expansion (documented in FSWZ).

S2R R8, SR18;
FMUL R5, R5, R8;
FMUL R7, R7, R8;
TXD R0,R2,R4,5,2D,0xf;

Back to Index of Instructions