SPA 5.0:
{@{!}Pg}
SULD.D{.BA}.dim{.cop}{.sz}{.clamp}
Rd0, [Ra], #tsPtrIdxU13
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
{@{!}Pg}
SULD.D{.BA}.dim{.cop}{.sz}{.clamp}
Rd0, [Ra], Rc
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
{@{!}Pg}
SULD.P.dim{.cop}{.rgba}{.clamp}
Rd0, [Ra], #tsPtrIdxU13
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
4 more variants support tiled resources (sparse status predicate).
{@{!}Pg}
SULD.P.dim{.cop}{.rgba}{.clamp}
Rd0, [Ra], Rc
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
{@{!}Pg}
SULD.D{.BA}.dim{.cop}{.sz}{.clamp}
{Ps,} Rd0, [Ra], #tsPtrIdxU13
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
{@{!}Pg}
SULD.D{.BA}.dim{.cop}{.sz}{.clamp}
{Ps,} Rd0, [Ra], Rc
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
{@{!}Pg}
SULD.P.dim{.cop}{.rgba}{.clamp}
{Ps,} Rd0, [Ra], #tsPtrIdxU13
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
Modifiers: .dim {.1D, .1D_BUFFER, .1D_ARRAY, .2D, .2D_ARRAY, .3D} .mode {.D, .P} .D This mode specifies load from surface as raw data of size .sz, without any format conversion. In this mode, if .BA (ByteAddress) is specified, the x-coordinate is assumed to be in bytes, aligned on a .sz boundary. Otherwise, x-ordinate is treated as sample co-ordinate and scaled by .sz in hardware. .P This mode specified a formatted pixel load from surface. The x value is a sample coordinate in the target surface. In this mode .rgba specifies the number of components written by the load. .cop: { .CA*, .CG, .CS, .LU, .CV , .CI} // Cache all*, global, streaming, last-use, volatile, inconsistent .CA* cache at all levels, likely to be accessed again (default) // Except L1 cache (see description) .CG cache at global level (cache in L2 and below, not L1, ) .CS maps to .CA .LU maps to .CG .CV cache as volatile (consider cached system memory lines stale, fetch again) .CI cache as inconsistent data. (expected to be used only with invariant data) .sz {.U8, .S8, .U16, .S16, .32*, .64, .128} // used in .D mode, specifes load size of raw data. .rgba {.R, .RG, .RGBA*} // used in .P mode, specifies a scalar, vec2 or vec4 destination register. .BA x-coordinate is specified as byte-address. (in .D mode) .clamp {.IGN, .NEAR*, .TRAP} Operands ------------------------------------ Ra Coordinates (Note: Ra cannot be RZ register). Rd0 Destination data register #tsPtrIdxU13 This immediate index (word address) is used to fetch the packed header+sampler pointer entry from constant cache. The bank from which it is fetched is determined by bundle state. The constant bank entry is 32 bit structure of the form "samplerPtr[31:20] | headerPtr[19:0]". (Surface instructions ignore sample pointers). Any header pointer greater than one specified in SetTexHeaderPoolC.MaximumIndex will be regarded as an "invalid" texture. Rc In bindless mode, Rc register is used to pass the "samplerPtr[31:20] | headerPtr[19:0]". Note: Rc cannot be RZ register. Ps Predicate returning sparse tile status. Indiate that the surface access is happening to a page marked as sparse (not valid). Note: The encoding of Ps is bit inverted ie 0 => PT and 7 => P0.
{@{!}Pg}
SULD.P.dim{.cop}{.rgba}{.clamp}
{Ps,} Rd0, [Ra], Rc
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
SULD loads the data from surface in global memory using pitch or blocklinear addressing.
Register Ra specifies surface coordinates. The number of coordinates depends upon surface dimension. Ra must follow register alignment rules for given number of coordinates.
.dim | Ra | Ra+1 | Ra+2 |
---|---|---|---|
1D | S32 | ||
1D_BUFFER | S32/U32 | ||
1D_ARRAY | S32 | U16 | |
2D | S32 | S32 | |
2D_ARRAY | S32 | S32 | U16 |
3D | S32 | S32 | S32 |
For 1D_BUFFER, the coordinate is S32 if .clamp is .NEAR. Otherwise, the coordinate is interpreted as U32. The 1D_ARRAY and 2D_ARRAY array indices are treated as U16, meaning only the 16 LSBs of the register value are used.
The .clamp option specifies how to clamp out of bounds addresses (too high or low).
Sc contains a pointer to texture header. The possible options for Sc are:
Size specifier for byte (.D.BA) and coordinate (.D) addressing.
All surface loads are uncached at L1 level, regardless of .cop. Within the L1 cache, surface operations to the same coordinates as cached Texture operations will not invalidate cached data lines.
If the surface being accessed is disabled, an SULD will silently fail and return 0(s) as a result.
Surface dimensions have the following compatibility with texture dimensions:
SULD.D.2D R2, [R4], R6; SULD.P.3D.R.IGN R2, [R4], 0x100; // surface header pointer is fetched from c[state_controlled_bank][0x400]