TMML

SPA 5.0:
        {@{!}Pg}   TMML{.B}.LOD{.NDV}{.NODEP}{.phase}   Rd, Ra, {Rb}, #tsPtrIdxU13, #paramA{, #wmskU04}       {&req_6}   {&rdN}   {&wrN}   {?sched}   ;   
        {@{!}Pg}   TMML{.B}.LOD{.NDV}{.NODEP}{.phase}   Rd, Ra, {Rb}, #tidU08, #smpU05, #paramA{, #wmskU04}   {&req_6}   {&rdN}   {&wrN}   {?sched}   ;   

  .B:      Bindless mode, where the texture header pointer and sampler pointer is packed into a 32 bit register as:
           samplerPtr[31:20] | headerPtr[19:0]
           Data is sent via register Rb.

  .LOD     level-of-detail information

  .NDV:    Forces the TEX to be considered non-divergent even though quad may be divergent.  
                   This will not promote inactive threads, only force it to be treated as non-divergent despite the fact
           that some threads might be inactive.  To activate disabled threads in a quad SAM must be used.
           Only the active mask and shader type are used to determine if a quad of threads is divergent.

  .NODEP:  Indicates that there is no subsequent quad derivatives to be calculated.
           Threads that have been "killed" will be disabled to stop unnecessary texture fetches.

  .phase:  Allows control on the current warps texture hash, used for scheduling.
               < NONE >
               .T - postfix increment of the 3 bit texture component of the hash.
               .P - postfix increment of the 5 bit phase component, and zero out the 3 bit texture component of the hash. 



Immediate Inputs:

  #tsPtrIdxU13:
     This immediate index (word address) is used to fetch the packed header+sampler pointer entry from constant cache.  The bank
     from which it is fetched is determined by bundle state. The constant bank entry is 32 bit structure of the form
     "samplerPtr[31:20] | headerPtr[19:0]".
     Note: Ignored if .B option is used.

  #tidU08, #smpU05:
     This is the "almost" Fermi-compatible specification of tsPtrIdxU13 which allows running of legacy apps/traces
     where sass will transform these into tsPtrIdxU13 as follows:
     #tsPtrIdxU13 = {#smpU05, #tidU08}

#paramA: source coordinate description.

*Valid paramA specifiers for TMML*
parameter	Coordinate Registers implied
1D	s
2D	s,t
3D	s,t,r
CUBE	s,t,r
ARRAY_1D	a,s
ARRAY_2D	a,s,t
RESERVED	// for ARRAY_3D
ARRAY_CUBE	a,s,t,r

           s,t,r are fp32, 
           a is U16 integer

     If the source coordinate description does not match the texture type of the texture header,
     zeroes will be returned.  The array specifiers can be freely used with non-array textures
     (and the opposite holds as well), provided the number of coordinates (1D,2D,3D,CUBE) matches.

  #wmskU04       destination write mask (decimated contiguous writes)
     Allows for write masking the returning data writes via a bit enable
     for each of R,G,B,A. A four-vector is always returned from TEX.
     #wmskU04 defaults to 0xf.

Texture fetch of mip-map level-of-detail (LOD) or axis-length information instead of RBGA using a texture coordinate vector/parameters.

The parameters are arranged in Ra/Rb registers as follows:

*Texture parameter packing in Ra and Rb*
Reg	parameter	format
Ra+0	array[15:0]	u32
Ra+1	s	fp32
Ra+2	t	fp32
Ra+3	r	fp32
Rb+0	SamplerPtr[31:20] \| HeaderPtr[19:0]	u32

In the table above, "+0/1/2/3" represents the order of packing parameters in Ra/Rb. If a parameter is not specified, then the rest are compacted upwards within the same Ra or Rb register.

The texture parameter source registers Ra/Rb and the destination (result) register Rd have alignment restrictions based on the number of scalar registers being read/written. Specifically,

Rd should be aligned to number of valid components being returned (as specified by wmask)
Ra/Rb should always be aligned to

1 (scalar register) if the scalar count for that register (Ra or Rb) is 1
2 (vec2 register) if the scalar count for that register (Ra or Rb) is 2
4 (vec4 register) if the scalar count for that register (Ra or Rb) is 3 or 4

Rb should be specified as RZ if no parameters need to be packed in Rb. (However no error is generated if non-RZ register is specified)
Ra/Rb must not be specified as RZ if any parameters need to be packed in Ra/Rb.

Some input texture values will be sanitized before being used.

Returned data is a 4-vector of fp32 values, arranged as:

  if(.mode == .LOD) { 
    R: non-clamped LOD (S8.8), ignores clamping resulting from sampler or texture header.  High bits are zero.
    G: clamped LOD (U8.8), actual LOD that would have been used.  High bits are zero
    B: {major_unit_vector.v (S2.6), major_unit_vector.u (S2.6)}, tightly packed in the register's low 16 bits.  High bits are zero.
       [0000 0000 0000 0000 vvvv vvvv uuuu uuuu]
    A: log2(minor_length/major_length) (S4.12).  High bits are zero.
  }

Use writemask to discard unwanted data.

TMML : Texture MipMap Level

Format

Description

Additional Information:

Examples: