SPA 5.0:
{@{!}Pg}
SUATOM.D{.BA}.dim.op{.sz}{.clamp}
Rd, [Ra], Rb, #tsPtrIdxU13
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
{@{!}Pg}
SUATOM.D{.BA}.dim.op{.sz}{.clamp}
Rd, [Ra], Rb, Rc
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
{@{!}Pg}
SUATOM.D{.BA}.dim.CAS{.sz}{.clamp}
Rd, [Ra], Rb, #tsPtrIdxU13
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
3 more variants support tiled resources (sparse status predicate). Note that #tsPtrIdxU13 variant with sparse predicate is not offered for .op != CAS.
{@{!}Pg}
SUATOM.D{.BA}.dim.CAS{.sz}{.clamp}
Rd, [Ra], Rb, Rc
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
{@{!}Pg}
SUATOM.D{.BA}.dim.op{.sz}{.clamp}
{Ps,} Rd, [Ra], Rb, Rc
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
{@{!}Pg}
SUATOM.D{.BA}.dim.CAS{.sz}{.clamp}
{Ps,} Rd, [Ra], Rb, #tsPtrIdxU13
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
Modifiers: .dim {.1D, .1D_BUFFER, .1D_ARRAY, .2D, .2D_ARRAY, .3D} .mode {.D} .D This mode specifies that surface data is treated as raw data of size .sz, without any format conversion. In this mode, if .BA (ByteAddress) is specified, the x-coordinate is assumed to be in bytes, aligned on a .sz boundary. Otherwise, x-coordinate is treated as sample coordinate and scaled by .sz in hardware. .op {.ADD, .MIN, .MAX, .INC, .DEC, .AND, .OR, .XOR, .EXCH} .sz {.U32*, .S32, .U64, .S64, .F32.FTZ.RN, .F16x2.FTZ.RN, .SD32, .SD64} .BA x-coordinate is specified as byte-address. (in .D mode) .clamp {.IGN, .NEAR*, .TRAP} Operands ------------------------------------ Ra Coordinates Rb Atom operand data. In case of .CAS, Rb contains both compare and swap values as a vec2 (for .sz 32) or vec4 (for .sz 64). Rd Destination data register. #tsPtrIdxU13 This immediate index (word address) is used to fetch the packed header+sampler pointer entry from constant cache. The bank from which it is fetched is determined by bundle state. The constant bank entry is 32 bit structure of the form "samplerPtr[31:20] | headerPtr[19:0]". (Surface instructions ignore sample pointers). Any header pointer greater than one specified in SetTexHeaderPoolC.MaximumIndex will be regarded as an "invalid" texture. Rc In true bindless mode, Rc register is used to pass the "samplerPtr[31:20] | headerPtr[19:0]". Note: Ra/Rc cannot be RZ register. Ps Predicate returning sparse tile status. Indiate that the surface access is happening to a page marked as sparse (not valid). Note: The encoding of Ps is bit inverted ie 0 => PT and 7 => P0. ------------------------------------------------------------------------- Reduction Operations .op .sz Description, M is SurfaceLoad(Ra) ------------------------------------------------------------------------- .ADD .U32 .S32 .SD32 .U64 .F32.FTZ.RN .F16x2.RN Rd = M; M = M + Rb; .MIN .U32 .S32 .SD32 .U64 .S64 .SD64 .F16x2.RN Rd = M; M = min(M, Rb); .MAX .U32 .S32 .SD32 .U64 .S64 .SD64 .F16x2.RN Rd = M; M = max(M, Rb); .INC .U32 Rd = M; M = (M >= Rb)? 0 : (M + 1); .DEC .U32 Rd = M; M = (M == 0 || M > Rb)? Rb : M - 1; .AND .U32 .S32 .SD32 .U64 Rd = M; M = M & Rb; .OR .U32 .S32 .SD32 .U64 Rd = M; M = M | Rb; .XOR .U32 .S32 .SD32 .U64 Rd = M; M = M ^ Rb; .EXCH .U32 .S32 .U64 Rd = M; M = Rb; .CAS .U32 .S32 .U64 Rd = M; if (M == Rb) M = Rb+(1or2); -------------------------------------------------------------------------
{@{!}Pg}
SUATOM.D{.BA}.dim.CAS{.sz}{.clamp}
{Ps,} Rd, [Ra], Rb, Rc
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
SUATOM performs an atomic operation on pitch or blocklinear surfaces in global memory. It returns the prior memory value at that location to register Rd.
Register Ra specifies surface coordinates. The number of coordinates depends upon surface dimension. Ra must follow register alignment rules for given number of coordinates.
.dim | Ra | Ra+1 | Ra+2 |
---|---|---|---|
1D | S32 | ||
1D_BUFFER | S32/U32 | ||
1D_ARRAY | S32 | U16 | |
2D | S32 | S32 | |
2D_ARRAY | S32 | S32 | U16 |
3D | S32 | S32 | S32 |
For 1D_BUFFER, the coordinate is S32 if .clamp is .NEAR. Otherwise, the coordinate is interpreted as U32. The 1D_ARRAY and 2D_ARRAY array indices are treated as U16, meaning only the 16 LSBs of the register value are used.
The .clamp field specifies how to clamp out of bounds addresses (too high or low).
Sc contains a pointer to texture header. The possible options for Sc are:
Size specifier for byte (.D.BA) and coordinate (.D) addressing.
All surface operations are uncached at L1 level, regardless of .cop. Within the L1 cache, surface operations to the same coordinates as cached Texture operations will not invalidate cached data lines.
If the surface being accessed is disabled, the write will be silently dropped.
When used in a pixel shader by helper pixels or killed pixels, SUATOM automatically turned into NOP for that thread. This is to prevent unwanted writes to global memory by pixels with zero raster coverage or by pixels that have been using the KIL operation. These the threads will not participate in any SUATOM operations.
SUATOM.D.2D.ADD.IGN R10, [R2], R4, R1; // Coordinates in R2 and R3. SamplePtr and headerPtr in R1. SUATOM.D.BA.1D.U64.TRAP R2, [R3], R4, 0x100; // Surface header pointer is fetched from c[state_controlled_bank][0x400] // Adds value found in vec2 (R4,R5).