TMML : Texture MipMap Level

Format

SPA 5.0:
{@{!}Pg} TMML{.B}.LOD{.NDV}{.NODEP}{.phase} Rd, Ra, {Rb}, #tsPtrIdxU13, #paramA{, #wmskU04} {&req_6} {&rdN} {&wrN} {?sched} ; {@{!}Pg} TMML{.B}.LOD{.NDV}{.NODEP}{.phase} Rd, Ra, {Rb}, #tidU08, #smpU05, #paramA{, #wmskU04} {&req_6} {&rdN} {&wrN} {?sched} ; .B: Bindless mode, where the texture header pointer and sampler pointer is packed into a 32 bit register as: samplerPtr[31:20] | headerPtr[19:0] Data is sent via register Rb. .LOD level-of-detail information .NDV: Forces the TEX to be considered non-divergent even though quad may be divergent. This will not promote inactive threads, only force it to be treated as non-divergent despite the fact that some threads might be inactive. To activate disabled threads in a quad SAM must be used.
Only the active mask and shader type are used to determine if a quad of threads is divergent.

.NODEP: Indicates that there is no subsequent quad derivatives to be calculated.
Threads that have been "killed" will be disabled to stop unnecessary texture fetches.

.phase: Allows control on the current warps texture hash, used for scheduling.
< NONE >
.T - postfix increment of the 3 bit texture component of the hash.
.P - postfix increment of the 5 bit phase component, and zero out the 3 bit texture component of the hash.



Immediate Inputs:

#tsPtrIdxU13:
This immediate index (word address) is used to fetch the packed header+sampler pointer entry from constant cache. The bank
from which it is fetched is determined by bundle state. The constant bank entry is 32 bit structure of the form
"samplerPtr[31:20] | headerPtr[19:0]".
Note: Ignored if .B option is used.

#tidU08, #smpU05:
This is the "almost" Fermi-compatible specification of tsPtrIdxU13 which allows running of legacy apps/traces
where sass will transform these into tsPtrIdxU13 as follows:
#tsPtrIdxU13 = {#smpU05, #tidU08}

#paramA: source coordinate description.
Valid paramA specifiers for TMML
parameterCoordinate Registers implied
1Ds
2Ds,t
3Ds,t,r
CUBEs,t,r
ARRAY_1Da,s
ARRAY_2Da,s,t
RESERVED// for ARRAY_3D
ARRAY_CUBEa,s,t,r
           s,t,r are fp32, 
a is U16 integer
     If the source coordinate description does not match the texture type of the texture header,
zeroes will be returned. The array specifiers can be freely used with non-array textures
(and the opposite holds as well), provided the number of coordinates (1D,2D,3D,CUBE) matches.

#wmskU04 destination write mask (decimated contiguous writes)
Allows for write masking the returning data writes via a bit enable
for each of R,G,B,A. A four-vector is always returned from TEX.
#wmskU04 defaults to 0xf.

Description

Texture fetch of mip-map level-of-detail (LOD) or axis-length information instead of RBGA using a texture coordinate vector/parameters.

The parameters are arranged in Ra/Rb registers as follows:

    Texture parameter packing in Ra and Rb
    Regparameterformat
    Ra+0array[15:0]u32
    Ra+1sfp32
    Ra+2tfp32
    Ra+3rfp32
    Rb+0SamplerPtr[31:20] | HeaderPtr[19:0]u32

In the table above, "+0/1/2/3" represents the order of packing parameters in Ra/Rb. If a parameter is not specified, then the rest are compacted upwards within the same Ra or Rb register.

The texture parameter source registers Ra/Rb and the destination (result) register Rd have alignment restrictions based on the number of scalar registers being read/written. Specifically,

  1. Rd should be aligned to number of valid components being returned (as specified by wmask)
  2. Ra/Rb should always be aligned to
    1. 1 (scalar register) if the scalar count for that register (Ra or Rb) is 1
    2. 2 (vec2 register) if the scalar count for that register (Ra or Rb) is 2
    3. 4 (vec4 register) if the scalar count for that register (Ra or Rb) is 3 or 4
  3. Rb should be specified as RZ if no parameters need to be packed in Rb. (However no error is generated if non-RZ register is specified)
  4. Ra/Rb must not be specified as RZ if any parameters need to be packed in Ra/Rb.

Some input texture values will be sanitized before being used.

Returned data is a 4-vector of fp32 values, arranged as:

  if(.mode == .LOD) { 
R: non-clamped LOD (S8.8), ignores clamping resulting from sampler or texture header. High bits are zero.
G: clamped LOD (U8.8), actual LOD that would have been used. High bits are zero
B: {major_unit_vector.v (S2.6), major_unit_vector.u (S2.6)}, tightly packed in the register's low 16 bits. High bits are zero.
[0000 0000 0000 0000 vvvv vvvv uuuu uuuu]
A: log2(minor_length/major_length) (S4.12). High bits are zero.
}

Use writemask to discard unwanted data.

Additional Information:

Corresponds to these DX ops:

   LOD       =  TMML.LOD  //R,G

The following operation is supported, but has been cut from DX11

   TMML.LOD  //B,A

Examples:

TMML.LOD       R2, R6, 6, 2D, 0x3;

Back to Index of Instructions