SPA 5.0:
{@{!}Pg}
ATOM{.E}.op{.sz}
Rd, [Ra + ImmS20], Rb
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
// Atomic Operation
s/Omit register Ra to specify an unsigned absolute address:
{@{!}Pg}
ATOM{.E}.CAS{.sz}
Rd, [Ra + ImmS20], Rb, Rc
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
// Atomic Compare and Swap
{@{!}Pg}
ATOM{.E}.op{.sz}
Rd, [ImmU20], Rb
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
// Atomic Operation
.op: { .ADD, .MIN, .MAX, .INC, .DEC, .AND, .OR, .XOR, .EXCH , .SAFEADD} Operation .E: Extended address (64 bits, requires two registers) .sz: { .U32*, .S32, .U64, S64, .F32.FTZ.RN, .F16x2.FTZ.RN } 2 additional syntax forms to support sparse predicate. Note: Immediate offsets are not supported for these variants
{@{!}Pg}
ATOM{.E}.CAS{.sz}
Rd, [ImmU20], Rb, Rc
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
// Atomic Compare and Swap
{@{!}Pg}
ATOM{.E}.op{.sz}
Ps, Rd, [Ra], Rb
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
// Atomic Operation
Ps: Predicate returning sparse tile status. Indiate that the surface access is happening to a page marked as sparse (not valid). Note: The encoding of Ps is bit inverted ie 0 => PT and 7 => P0 .32 is also accepted and aliases to .U32 .64 is also accepted and aliases to .U64 .128 aliases to .U128, but is illegal ------------------------------------------------------------------------------------------------------------ Atomic Operations .op .sz Description, M is [Ra + ImmS20] ------------------------------------------------------------------------------------------------------------ .ADD .U32 .S32 .U64 .F32.FTZ.RN .F16x2.RN .F64.RN Rd = M; M = M + Rb; .MIN .U32 .S32 .U64* .S64* .F16x2.RN Rd = M; M = min(M, Rb); .MAX .U32 .S32 .U64* .S64* .F16x2.RN Rd = M; M = max(M, Rb); .INC .U32 Rd = M; M = (M >= Rb)? 0 : (M + 1); .DEC .U32 Rd = M; M = (M == 0 || M > Rb)? Rb : M - 1; .AND .U32 .S32 .U64* Rd = M; M = M & Rb; .OR .U32 .S32 .U64* Rd = M; M = M | Rb; .XOR .U32 .S32 .U64* Rd = M; M = M ^ Rb; .EXCH .U32 .S32 .U64 Rd = M; M = Rb; .SAFEADD .U64* Increment put pointer in a circular queue. .CAS .U32 .S32 .U64 Rd = M; if (M == Rb) M = Rc; ------------------------------------------------------------------------------------------------------------ * SPA 3.5 (and higher)
{@{!}Pg}
ATOM{.E}.CAS{.sz}
Ps, Rd, [Ra], Rb, Rc
{&req_6}
{&rdN}
{&wrN}
{?sched}
;
// Atomic Compare and Swap
ATOM.op performs atomic operation .op with register Rb on global memory at a generic thread address, and returns the prior memory value to register Rd. The generic byte address is computed as the 32-bit addition of register Ra plus the 32-bit sign-extended immediate offset ImmS20, which is then zero-extended to 40-bits. If the .E extension is specified, the generic byte address is computed as the sum of the 64-bit value (Ra,Ra+1) plus the sign-extended immediate offset ImmS32.
ATOM combines register Rb with global memory location [Ra + ImmS20] atomically, without intervening accesses to that memory location by other threads:
atomic { // Atomic operation on global memory location [Ra + ImmS20] .sz M = mem[Ra + ImmS20]; // Read memory location Rd = M; // Return prior memory location value to register Rd M = .op(M, Rb); // Form atomic operation result value mem[Ra + ImmS20] = M; // Write memory location }
ATOM.CAS performs an atomic compare-and-swap operation on global memory. It requires one or more extra register(s) for the compare value, which are provided as Rc.
{Rb,Rc} are expected to be consecutive registers, naturally aligned based on .sz. Specifically, in ATOM.CAS.32, Rb must be R2n+0(even register) (and cannot be RZ), and Rc must be Rb+1 or RZ. Similarly, for ATOM.CAS.64, Rb must be R4n+0 (and cannot be RZ) and Rc must be Rb+2 or RZ.
Other atomic operations assemble the omitted Rc as RZ.
The generic thread address space accesses global memory, unless it falls in the Local or Shared address window. An ATOM instruction must address global memory, otherwise it generates an invalid address space error.
When used in a pixel shader, ATOM has helper pixels and killed pixels automatically predicated off by the HW to prevent unwanted writes to global memory. If the pixel's raster coverage is 0 or it has previously been killed using the KIL operation, the threads will not participate in any ATOM operations.
Memory addresses must be naturally aligned, on a byte address that is a multiple of the access size. Misaligned addresses cause a misaligned address error. An address outside an allocated memory region causes an address out-of-range error.
ATOM interprets memory data in little-endian byte order: the effective address specifies the least-significant data bits.
ATOM.ADD.F32.FTZ.RN R0, [R1 - 400], R9; # ATOM.ADD.U64 R0, [R4 + 8], R2; # even-odd registers for 64-bit ops ATOM.ADD.S32 R9, [0xfff00], R8; # absolute 20-bit address ATOM.MIN.S64 R0, [R4 + 8], R2; # signed min stored