SPA 5.0:
{@{!}Pg}
TMML{.B}.LOD{.NDV}{.NODEP}{.phase}
Rd, Ra, {Rb}, #tsPtrIdxU13, #paramA{, #wmskU04}
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
.B: Bindless mode, where the texture header pointer and sampler pointer is packed into a 32 bit register as: samplerPtr[31:20] | headerPtr[19:0] Data is sent via register Rb. .LOD level-of-detail information .NDV: Forces the TEX to be considered non-divergent even though quad may be divergent. This will not promote inactive threads, only force it to be treated as non-divergent despite the fact that some threads might be inactive. To activate disabled threads in a quad SAM must be used.
{@{!}Pg}
TMML{.B}.LOD{.NDV}{.NODEP}{.phase}
Rd, Ra, {Rb}, #tidU08, #smpU05, #paramA{, #wmskU04}
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
Only the active mask and shader type are used to determine if a quad of threads is divergent.
.NODEP: Indicates that there is no subsequent quad derivatives to be calculated.
Threads that have been "killed" will be disabled to stop unnecessary texture fetches.
.phase: Allows control on the current warps texture hash, used for scheduling.
< NONE >
.T - postfix increment of the 3 bit texture component of the hash.
.P - postfix increment of the 5 bit phase component, and zero out the 3 bit texture component of the hash.
Immediate Inputs:
#tsPtrIdxU13:
This immediate index (word address) is used to fetch the packed header+sampler pointer entry from constant cache. The bank
from which it is fetched is determined by bundle state. The constant bank entry is 32 bit structure of the form
"samplerPtr[31:20] | headerPtr[19:0]".
Note: Ignored if .B option is used.
#tidU08, #smpU05:
This is the "almost" Fermi-compatible specification of tsPtrIdxU13 which allows running of legacy apps/traces
where sass will transform these into tsPtrIdxU13 as follows:
#tsPtrIdxU13 = {#smpU05, #tidU08}
#paramA: source coordinate description.
parameter | Coordinate Registers implied |
---|---|
1D | s |
2D | s,t |
3D | s,t,r |
CUBE | s,t,r |
ARRAY_1D | a,s |
ARRAY_2D | a,s,t |
RESERVED | // for ARRAY_3D |
ARRAY_CUBE | a,s,t,r |
s,t,r are fp32,
a is U16 integer
If the source coordinate description does not match the texture type of the texture header,
zeroes will be returned. The array specifiers can be freely used with non-array textures
(and the opposite holds as well), provided the number of coordinates (1D,2D,3D,CUBE) matches.
#wmskU04 destination write mask (decimated contiguous writes)
Allows for write masking the returning data writes via a bit enable
for each of R,G,B,A. A four-vector is always returned from TEX.
#wmskU04 defaults to 0xf.
Texture fetch of mip-map level-of-detail (LOD) or axis-length information instead of RBGA using a texture coordinate vector/parameters.
The parameters are arranged in Ra/Rb registers as follows:
Reg | parameter | format |
---|---|---|
Ra+0 | array[15:0] | u32 |
Ra+1 | s | fp32 |
Ra+2 | t | fp32 |
Ra+3 | r | fp32 |
Rb+0 | SamplerPtr[31:20] | HeaderPtr[19:0] | u32 |
The texture parameter source registers Ra/Rb and the destination (result) register Rd have alignment restrictions based on the number of scalar registers being read/written. Specifically,
Some input texture values will be sanitized before being used.
Returned data is a 4-vector of fp32 values, arranged as:
if(.mode == .LOD) {
R: non-clamped LOD (S8.8), ignores clamping resulting from sampler or texture header. High bits are zero.
G: clamped LOD (U8.8), actual LOD that would have been used. High bits are zero
B: {major_unit_vector.v (S2.6), major_unit_vector.u (S2.6)}, tightly packed in the register's low 16 bits. High bits are zero.
[0000 0000 0000 0000 vvvv vvvv uuuu uuuu]
A: log2(minor_length/major_length) (S4.12). High bits are zero.
}
Use writemask to discard unwanted data.
Corresponds to these DX ops:
LOD = TMML.LOD //R,G
The following operation is supported, but has been cut from DX11
TMML.LOD //B,A
TMML.LOD R2, R6, 6, 2D, 0x3;