TEX : Texture Fetch

Format

SPA 5.0:
        {@{!}Pg}   TEX{.B}{.lod}{.AOFFI}{.DC}{.NDV}{.NODEP}{.phase}        Rd, Ra{, Rb}, #tsPtrIdxU13, #paramA{, #wmskU04}             {&req_6}   {&rdN}   {&wrN}   {?sched}   ;   
        {@{!}Pg}   TEX{.B}{.lod}{.AOFFI}{.DC}{.NDV}{.NODEP}{.phase}        Rd, Ra{, Rb}, #tidU08, #smpU05, #paramA{, #wmskU04}         {&req_6}   {&rdN}   {&wrN}   {?sched}   ;   

2 additional forms support for tiled resources (sparse status predicate and LOD clamping).
        {@{!}Pg}   TEX{.B}{.lod}{.LC}{.AOFFI}{.DC}{.NDV}{.NODEP}{.phase}   {Ps,} Rd, Ra{, Rb}, #tsPtrIdxU13, #paramA{, #wmskU04}       {&req_6}   {&rdN}   {&wrN}   {?sched}   ;   
        {@{!}Pg}   TEX{.B}{.lod}{.LC}{.AOFFI}{.DC}{.NDV}{.NODEP}{.phase}   {Ps,} Rd, Ra{, Rb}, #tidU08, #smpU05, #paramA{, #wmskU04}   {&req_6}   {&rdN}   {&wrN}   {?sched}   ;   

.B:      Bindless mode, where the texture header pointer and sampler pointer is packed into a 32 bit register as:
         samplerPtr[31:20] | headerPtr[19:0]
         Data is sent via register Rb.

.lod:    { .LZ, .LB, .LL, .LBA, .LLA } 
         Level of detail (LOD) adjust mode.
            < NONE >
            .LZ  - LOD level 0 (finest)       // no register required
            .LB  - LOD bias discrete          // 1 fp32 register required
            .LL  - LOD absolute discrete      // 1 fp32 register required
            .LBA - LOD bias averaged          // 1 fp32 register required (Tesla legacy mode)
            .LLA - LOD absolute averaged      // 1 fp32 register required (Tesla legacy mode)
         The "averaged" options allow the TEX pipe to average the LOD's across the quad as a performance optimization.
	 LOD Level 0 actually selects the level set by textureHeader.resViewMinMapLevel.

.LC:  LOD Clamp value for Sparse Textures.
               A 12 bit (fixed point u4.8 format) value. Packed with the ARRAY index in the same register.

.AOFFI: Programmable Texture Offset.
            _aoffimmi(u,v,w)  [DX10]   // 1 register required
                ((w & 0xf)<<8) | ((v & 0xf)<<4) | (u & 0xf)
            Each 4b field is a 2's complement integer from -8 to +7.
            AOFFI is not supported with CubeMap textures.

.DC:     Depth comparison filter mode using reference value.
            RefVal                           // 1 fp32 register required
            Depth Comparison filter is not supported by 3D textures.
            For TEX and TEX.LZ, the .DC option will force a depth comparison filter mode regardless of the sampler state.
            For TEX.LB, TEX.LL, TEX.LBA, TEX.LLA, if the sampler state does not enable depth comparison the .DC option 
	    will not force a depth comparison filter mode.

.NDV:    Forces the TEX to be considered non-divergent even though quad may be divergent.  
            This will not promote inactive threads, only force it to be treated as non-divergent despite the fact
            that some threads might be inactive.  To activate disabled threads in a quad SAM must be used.
	    Only the active mask and shader type are used to determine if a quad of threads is divergent.

.NODEP:  Indicates that there are no subsequent quad derivatives to be calculated.
	 Threads that have been "killed" will be disabled to stop unnecessary texture fetches.

.phase:  { .T, .P }
         Allows control on the current warps texture hash, used for scheduling.
             < NONE >
             .T - postfix increment of the 3 bit texture component of the hash.
	     .P - postfix increment of the 5 bit phase component, and zero out the 3 bit texture component of the hash. 


#tsPtrIdxU13:
         This immediate index (word address) is used to fetch the packed header+sampler pointer entry from constant cache.  The bank from which
         it is fetched  is determined by bundle state. The constant bank entry is 32 bit structure of the form
         "samplerPtr[31:20] | headerPtr[19:0]"
         Note: Ignored if .B option is used.
         In SetSamplerBinding.ViaHeaderBinding (i.e. OGL) mode, the headerPtr  would be used as the samplerPtr as well.
         Any header pointer greater than one specified in SetTexHeaderPoolC.MaximumIndex  will be regarded as an "invalid"
         texture (i.e. equivalent to BIND_GROUP_TEXTURE_HEADER_VALID_FALSE in Fermi). 
         Any sampler pointer greater than one specified in SetSamplerHeaderPoolC.MaximumIndex  will be regarded as an "invalid"
         texture (i.e. equivalent to BIND_GROUP_TEXTURE_HEADER_VALID_FALSE in fermi).

#tidU08, #smpU05:
         This is the Fermi-compatible specification of tsPtrIdxU13 which allows running of legacy apps/traces where SASS will
	 transform these into tsPtrIdxU13 as follows:
	 #tsPtrIdxU13 = {#smpU05, #tidU08}

#paramA: source coordinate description.
Valid paramA specifiers for TEX
parameterCoordinate Registers implied
1Ds
2Ds,t
3Ds,t,r
CUBEs,t,r
ARRAY_1Da,s
ARRAY_2Da,s,t
RESERVED// for ARRAY_3D
ARRAY_CUBEa,s,t,r
           s,t,r are fp32, 
           a is U16 integer

#wmskU04: destination write mask (decimated contiguous writes)
         Allows for write masking the returning data writes via a bit enable
         for each of R,G,B,A. A four-vector is always returned from TEX.
         #wmskU04 defaults to 0xf.

Ps:
         Predicate returning sparse tile status. Indicate that the surface access is happening to a page marked as sparse (valid, not mapped).

Description

Texture fetch using a texture coordinates/parameters stored in registers Ra,Rb. The assignment of parameters is as follows:

    Texture parameter packing in Ra and Rb
    Reg parameter
    format
    Ra+0 (.LC) : {LodClamp[27:16] | array[15:0]} {fixed point u4.8|u16}
    !(.LC) : array[15:0] u32
    Ra+1 s fp32
    Ra+2 t fp32
    Ra+3 r fp32
    Rb+0 SamplerPtr|HeaderPtr u32
    Rb+1 LOD fp32
    Rb+2 toff[11:0] u32
    Rb+3 DC fp32

    In the table above, "+0/1/2/3" represents the order of packing parameters in Ra/Rb. If a parameter is not specified, then the rest are compacted upwards within the same Ra or Rb register.

    The texture parameter source registers Ra/Rb and the destination (result) register Rd have alignment restrictions based on the number of scalar registers being read/written. Specifically,

    1. Rd should be aligned to number of valid components being returned (as specified by wmask)
    2. Ra/Rb should always be aligned to
      1. 1 (scalar register) if the scalar count for that register (Ra or Rb) is 1
      2. 2 (vec2 register) if the scalar count for that register (Ra or Rb) is 2
      3. 4 (vec4 register) if the scalar count for that register (Ra or Rb) is 3 or 4
    3. Rb should be specified as RZ if no parameters need to be packed in Rb. (However no error is generated if non-RZ register is specified)
    4. Ra/Rb must not be specified as RZ if any parameters need to be packed in Ra/Rb.

    Some input texture values will be sanitized before being used, see Additional Information for more details.

Additional Information:

TEX corresponds to these DX ops:

   sample    =  TEX                    // 
   sample_d  =  TXD/sw emulated        // emulate this via SAM/SWZ/TEX/RAM
   sample_b  =  TEX.LB                 // lod bias supplied
   sample_l  =  TEX.LL                 // lod      supplied
   sample_c  =  TEX.DC                 // depth comparison filter with reference value
   sample_lz =  TEX.LZ                 // lod level 0 (finest)

Sanitation of Texture Input Coordinates:

Texture input coordinates go through a sanitation step before being used in texture calculations.

Notes On Status Bits Sent To Texture Pipe:

Definitions:
------------
   ORM = original raster mask (pixel is active if at least one sample in pixel is covered)
   PRM = promoted raster mask (quad is active if at least one sample in quad is covered)
   CTM = current thread mask
   IPM = instruction predicate mask
   KTM = Killed Thread mask
 
   At thread launch:
     CTM is set to  PRM
     KTM is set to ~ORM

   Note: Entire quads are disabled once all 4 pixels are disabled.
 
Operation for Tex ops:
----------------------
 
Texture pipe sees 5 bits of status per quad:
    activemask[4]: one bit per pixel, allows cache and filter perf optimization
    divergent[1]:  quad divergence, impacts LOD calculation
 
  NonDependent:
    TEX.NODEP - activemask[4] = ~KTM & IPM & CTM
                divergent[1]  = div(CTM,PS,.NDV)        // quad divergence
 
  Dependent:
    TEX       - activemask[4] = IPM & CTM               // Don't consider KTM or PRM
                divergent[1]  = div(CTM,PS,.NDV)        // quad divergence

  // Return 1 if quad is divergent, 0 if quad is not divergent
  div(CTM,PS,.NDV)
  {
    if (.NDV)
      return(0);
    if (!PS)
      return(1);
    if (CTM == 0xf)
      return(0);
    return(1);
  }

Only pixel threads can use the TEX pipe to calculate LOD. Other thread types are automatically divergent, which forces LOD calculation (if required) to a default value (either 0 or +Inf).

If .NDV is set, it overrides the previous comment (for all thread types).

Examples:

TEX        R0,R2,5,2D,0xf;             // no need for Rb
TEX.LB     R0,R2,R4,5,2D,0xf;
TEX.B.DC   R4,R0,R8,5,1D,0xf;

Back to Index of Instructions