SPA 5.0:
{@{!}Pg}
TEX{.B}{.lod}{.AOFFI}{.DC}{.NDV}{.NODEP}{.phase}
Rd, Ra{, Rb}, #tsPtrIdxU13, #paramA{, #wmskU04}
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
2 additional forms support for tiled resources (sparse status predicate and LOD clamping).
{@{!}Pg}
TEX{.B}{.lod}{.AOFFI}{.DC}{.NDV}{.NODEP}{.phase}
Rd, Ra{, Rb}, #tidU08, #smpU05, #paramA{, #wmskU04}
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
{@{!}Pg}
TEX{.B}{.lod}{.LC}{.AOFFI}{.DC}{.NDV}{.NODEP}{.phase}
{Ps,} Rd, Ra{, Rb}, #tsPtrIdxU13, #paramA{, #wmskU04}
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
.B: Bindless mode, where the texture header pointer and sampler pointer is packed into a 32 bit register as: samplerPtr[31:20] | headerPtr[19:0] Data is sent via register Rb. .lod: { .LZ, .LB, .LL, .LBA, .LLA } Level of detail (LOD) adjust mode. < NONE > .LZ - LOD level 0 (finest) // no register required .LB - LOD bias discrete // 1 fp32 register required .LL - LOD absolute discrete // 1 fp32 register required .LBA - LOD bias averaged // 1 fp32 register required (Tesla legacy mode) .LLA - LOD absolute averaged // 1 fp32 register required (Tesla legacy mode) The "averaged" options allow the TEX pipe to average the LOD's across the quad as a performance optimization. LOD Level 0 actually selects the level set by textureHeader.resViewMinMapLevel. .LC: LOD Clamp value for Sparse Textures. A 12 bit (fixed point u4.8 format) value. Packed with the ARRAY index in the same register. .AOFFI: Programmable Texture Offset. _aoffimmi(u,v,w) [DX10] // 1 register required ((w & 0xf)<<8) | ((v & 0xf)<<4) | (u & 0xf) Each 4b field is a 2's complement integer from -8 to +7. AOFFI is not supported with CubeMap textures. .DC: Depth comparison filter mode using reference value. RefVal // 1 fp32 register required Depth Comparison filter is not supported by 3D textures. For TEX and TEX.LZ, the .DC option will force a depth comparison filter mode regardless of the sampler state. For TEX.LB, TEX.LL, TEX.LBA, TEX.LLA, if the sampler state does not enable depth comparison the .DC option will not force a depth comparison filter mode. .NDV: Forces the TEX to be considered non-divergent even though quad may be divergent. This will not promote inactive threads, only force it to be treated as non-divergent despite the fact that some threads might be inactive. To activate disabled threads in a quad SAM must be used. Only the active mask and shader type are used to determine if a quad of threads is divergent. .NODEP: Indicates that there are no subsequent quad derivatives to be calculated. Threads that have been "killed" will be disabled to stop unnecessary texture fetches. .phase: { .T, .P } Allows control on the current warps texture hash, used for scheduling. < NONE > .T - postfix increment of the 3 bit texture component of the hash. .P - postfix increment of the 5 bit phase component, and zero out the 3 bit texture component of the hash. #tsPtrIdxU13: This immediate index (word address) is used to fetch the packed header+sampler pointer entry from constant cache. The bank from which it is fetched is determined by bundle state. The constant bank entry is 32 bit structure of the form "samplerPtr[31:20] | headerPtr[19:0]" Note: Ignored if .B option is used. In SetSamplerBinding.ViaHeaderBinding (i.e. OGL) mode, the headerPtr would be used as the samplerPtr as well. Any header pointer greater than one specified in SetTexHeaderPoolC.MaximumIndex will be regarded as an "invalid" texture (i.e. equivalent to BIND_GROUP_TEXTURE_HEADER_VALID_FALSE in Fermi). Any sampler pointer greater than one specified in SetSamplerHeaderPoolC.MaximumIndex will be regarded as an "invalid" texture (i.e. equivalent to BIND_GROUP_TEXTURE_HEADER_VALID_FALSE in fermi). #tidU08, #smpU05: This is the Fermi-compatible specification of tsPtrIdxU13 which allows running of legacy apps/traces where SASS will transform these into tsPtrIdxU13 as follows: #tsPtrIdxU13 = {#smpU05, #tidU08}
{@{!}Pg}
TEX{.B}{.lod}{.LC}{.AOFFI}{.DC}{.NDV}{.NODEP}{.phase}
{Ps,} Rd, Ra{, Rb}, #tidU08, #smpU05, #paramA{, #wmskU04}
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
#paramA: source coordinate description.
parameter | Coordinate Registers implied |
---|---|
1D | s |
2D | s,t |
3D | s,t,r |
CUBE | s,t,r |
ARRAY_1D | a,s |
ARRAY_2D | a,s,t |
RESERVED | // for ARRAY_3D |
ARRAY_CUBE | a,s,t,r |
s,t,r are fp32, a is U16 integer
#wmskU04: destination write mask (decimated contiguous writes) Allows for write masking the returning data writes via a bit enable for each of R,G,B,A. A four-vector is always returned from TEX. #wmskU04 defaults to 0xf. Ps: Predicate returning sparse tile status. Indicate that the surface access is happening to a page marked as sparse (valid, not mapped).
Texture fetch using a texture coordinates/parameters stored in registers Ra,Rb. The assignment of parameters is as follows:
Reg | parameter |
format |
---|---|---|
Ra+0 | (.LC) : {LodClamp[27:16] | array[15:0]} | {fixed point u4.8|u16} |
!(.LC) : array[15:0] | u32 | |
Ra+1 | s | fp32 |
Ra+2 | t | fp32 |
Ra+3 | r | fp32 |
Rb+0 | SamplerPtr|HeaderPtr | u32 |
Rb+1 | LOD | fp32 |
Rb+2 | toff[11:0] | u32 |
Rb+3 | DC | fp32 |
The texture parameter source registers Ra/Rb and the destination (result) register Rd have alignment restrictions based on the number of scalar registers being read/written. Specifically,
Some input texture values will be sanitized before being used, see Additional Information for more details.
TEX corresponds to these DX ops:
sample = TEX // sample_d = TXD/sw emulated // emulate this via SAM/SWZ/TEX/RAM sample_b = TEX.LB // lod bias supplied sample_l = TEX.LL // lod supplied sample_c = TEX.DC // depth comparison filter with reference value sample_lz = TEX.LZ // lod level 0 (finest)
Texture input coordinates go through a sanitation step before being used in texture calculations.
Definitions: ------------ ORM = original raster mask (pixel is active if at least one sample in pixel is covered) PRM = promoted raster mask (quad is active if at least one sample in quad is covered) CTM = current thread mask IPM = instruction predicate mask KTM = Killed Thread mask At thread launch: CTM is set to PRM KTM is set to ~ORM Note: Entire quads are disabled once all 4 pixels are disabled. Operation for Tex ops: ---------------------- Texture pipe sees 5 bits of status per quad: activemask[4]: one bit per pixel, allows cache and filter perf optimization divergent[1]: quad divergence, impacts LOD calculation NonDependent: TEX.NODEP - activemask[4] = ~KTM & IPM & CTM divergent[1] = div(CTM,PS,.NDV) // quad divergence Dependent: TEX - activemask[4] = IPM & CTM // Don't consider KTM or PRM divergent[1] = div(CTM,PS,.NDV) // quad divergence // Return 1 if quad is divergent, 0 if quad is not divergent div(CTM,PS,.NDV) { if (.NDV) return(0); if (!PS) return(1); if (CTM == 0xf) return(0); return(1); }
Only pixel threads can use the TEX pipe to calculate LOD. Other thread types are automatically divergent, which forces LOD calculation (if required) to a default value (either 0 or +Inf).
If .NDV is set, it overrides the previous comment (for all thread types).
TEX R0,R2,5,2D,0xf; // no need for Rb TEX.LB R0,R2,R4,5,2D,0xf; TEX.B.DC R4,R0,R8,5,1D,0xf;