SPA 5.0:
{@{!}Pg}
TLD4.comp{.B}{.toff}{.DC}{.NDV}{.NODEP}{.phase}
{Ps,} Rd, Ra{, Rb}, #tsPtrIdxU13 , #paramA{, #wmskU04}
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
.comp: TLD4 will return only a single component of texture. This field is used to select which component of a multi-component texture is returned for this texture fetch. The component selected is the real, post-swizzle component (example, TLD4.R would return values from the red component, just like a TEX.R instruction). If the specified component is not present, zeroes will be returned. .R - Select the Red component of the texture .G - Select the Green component of the texture .B - Select the Blue component of the texture .A - Select the Alpha component of the texture .B: Bindless mode, where the texture header pointer and sampler pointer is packed into a 32 bit register as: samplerPtr[31:20] | headerPtr[19:0] Data is sent via register Rb. .toff: Programmable Texel Offset. < NONE > .AOFFI - _aoffimmi(u,v,w) [DX10] // 1 register required ((w & 0x3f)<<16) | ((v & 0x3f)<<8) | (u & 0x3f) Each 6b field is a 2's complement integer from -32 to +31. AOFFI is not supported with CubeMap textures .PTP - Offset sampling // offsets, 2 register required This setting cannot be used with CUBE/ARRAY_CUBE textures dt1[29:24] | ds1[21:16] | dt0[13:08] | ds0[05:00] dt3[29:24] | ds3[21:16] | dt2[13:08] | ds2[05:00] each 6b field is a 2's complement integer from -32 to +31. PTP is not supported with CubeMap textures .DC: Depth comparison filter mode using reference value. RefVal // 1 register required Depth Comparison filter is not supported by 3D textures. .NDV: Forces the TEX to be considered non-divergent even though quad may be divergent. This will not promote inactive threads, only force it to be treated as non-divergent despite the fact that some threads might be inactive. To activate disabled threads in a quad SAM must be used.
{@{!}Pg}
TLD4.comp{.B}{.toff}{.DC}{.NDV}{.NODEP}{.phase}
{Ps,} Rd, Ra{, Rb}, #tidU08, #smpU05, #paramA{, #wmskU04}
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
Only the active mask and shader type are used to determine if a quad of threads is divergent.
The use of .NDV with TLD4 is deprecated and will be removed in future versions.
.NODEP: Indicates that there is no subsequent quad derivatives to be calculated.
Threads that have been "killed" will be disabled to stop unnecessary texture fetches.
.phase: Allows control on the current warps texture hash, used for scheduling.
< NONE >
.T - postfix increment of the 3 bit texture component of the hash.
.P - postfix increment of the 5 bit phase component, and zero out the 3 bit texture component of the hash.
Ps:
Predicate returning sparse tile status. Indicate that the surface access is happening to a page marked as sparse (valid, not mapped).
Immediate Inputs:
#tsPtrIdxU13:
This immediate index (word address) used to fetch the packed header+sampler pointer entry from constant cache. The bank from
which it is fetched is determined by bundle state. The constant bank entry is 32 bit structure of the form
"samplerPtr[31:20] | headerPtr[19:0]".
Note: Ignored if .B option is used.
In SetSamplerBinding.ViaHeaderBinding (i.e. OGL) mode, the headerPtr would be used as the samplerPtr as well.
Any header pointer greater than one specified in SetTexHeaderPoolC.MaximumIndex will be regarded as an "invalid"
texture (i.e. equivalent to BIND_GROUP_TEXTURE_HEADER_VALID_FALSE in fermi).
Any sampler pointer greater than one specified in SetSamplerHeaderPoolC.MaximumIndex will be regarded as an
"invalid" texture (i.e. equivalent to BIND_GROUP_TEXTURE_HEADER_VALID_FALSE in fermi).
#tidU08, #smpU05:
This is the Fermi-compatible specification of tsPtrIdxU13 which allows running of legacy apps/traces
where sass will transform these into tsPtrIdxU13 as follows:
#tsPtrIdxU13 = {#smpU05, #tidU08}
#paramA: source coordinate description.
parameter | Coordinate Registers implied |
---|---|
RESERVED | // for 1D |
2D | s,t |
RESERVED | // for 3D |
CUBE | s,t,r |
RESERVED | // for ARRAY_1D |
ARRAY_2D | a,s,t |
RESERVED | // for ARRAY_3D |
ARRAY_CUBE | a,s,t,r |
s,t,r are fp32,
a is U16 integer
If the source coordinate description does not match the texture type of the texture header, zeroes will be returned. The array specifiers can be freely used with non-array textures (and the opposite holds as well), provided the number of coordinates (2D,CUBE) matches. Note, the texture-header type "TWO_D_NO_MIPMAP" is not supported (the hardware will treat TLD4 as TEX in this case). #wmskU04 destination write mask (decimated contiguous writes) Allows for write masking the returning data writes via a bit enable for each of sample 0, 1, 2, or 3. A four-vector is always returned from TEX. #wmskU04 defaults to 0xf.
Texture fetch of the 4-texel bilerp footprint (but no filter) using a texture coordinate vector.
Bilerp footprint only, done on finest mip-map level (level 0). Texture must be 2D/CUBE/ARRAY_2D/ARRAY_CUBE. The four texel samples are placed into the Rd vector in counter clockwise order starting at lower left.
The assignment of parameters to Ra/Rb is as follows:
Reg | parameter | format |
---|---|---|
Ra+0 | array[15:0] | u32 |
Ra+1 | s | fp32 |
Ra+2 | t | fp32 |
Ra+3 | r | fp32 |
Rb+0 | SamplerPtr[31:20] | HeaderPtr[19:0] | u32 |
Rb+1 | toff1(.AOFFI or .PTP) | fp32 |
Rb+2 | toff2(.PTP) | u32 |
Rb+3 | Depth Compare Value(.DC) | fp32 |
The texture parameter source registers Ra/Rb and the destination (result) register Rd have alignment restrictions based on the number of scalar registers being read/written. Specifically,
Some input texture values will be sanitized before being used.
Corresponds to these DX ops:
load4 = TLD4 // load bilerp footprint
gather4 = TLD4, TLD4.AOFFI
gather4c = TLD4.DC, TLD4.AOFFI.DC
gather4po = TLD4.AOFFI
gather4po_c = TLD4.AOFFI.DC
TLD4.R R8,R0,5,2D,0xf;