ATOM : Atomic Operation on generic Memory

Format:

SPA 5.0:
        {@{!}Pg}   ATOM{.E}.op{.sz}    Rd, [Ra + ImmS20], Rb       {&req_6}   {&rdN}   {&wrN}   {?sched}   ;   // Atomic Operation          
        {@{!}Pg}   ATOM{.E}.CAS{.sz}   Rd, [Ra + ImmS20], Rb, Rc   {&req_6}   {&rdN}   {&wrN}   {?sched}   ;   // Atomic Compare and Swap   

    s/Omit register Ra to specify an unsigned absolute address:
        {@{!}Pg}   ATOM{.E}.op{.sz}    Rd, [ImmU20], Rb            {&req_6}   {&rdN}   {&wrN}   {?sched}   ;   // Atomic Operation          
        {@{!}Pg}   ATOM{.E}.CAS{.sz}   Rd, [ImmU20], Rb, Rc        {&req_6}   {&rdN}   {&wrN}   {?sched}   ;   // Atomic Compare and Swap   




  .op:    { .ADD, .MIN, .MAX, .INC, .DEC, .AND, .OR, .XOR, .EXCH , .SAFEADD}      Operation
  .E:     Extended address (64 bits, requires two registers)
   .sz:    { .U32*, .S32, .U64, S64, .F32.FTZ.RN, .F16x2.FTZ.RN }

2  additional   syntax forms to support sparse predicate. 
      Note: Immediate offsets are not supported for these variants 
        {@{!}Pg}   ATOM{.E}.op{.sz}    Ps, Rd, [Ra], Rb            {&req_6}   {&rdN}   {&wrN}   {?sched}   ;   // Atomic Operation          
        {@{!}Pg}   ATOM{.E}.CAS{.sz}   Ps, Rd, [Ra], Rb, Rc        {&req_6}   {&rdN}   {&wrN}   {?sched}   ;   // Atomic Compare and Swap   


   Ps: Predicate returning sparse tile status. 
            Indiate that the surface access is happening to a page marked as sparse (not valid).
            Note: The encoding of Ps is bit inverted ie  0 => PT and  7 => P0


            .32 is also accepted and aliases to .U32
            .64 is also accepted and aliases to .U64
            .128 aliases to .U128, but is illegal
          ------------------------------------------------------------------------------------------------------------
                                Atomic Operations
          .op    .sz                                                     Description,  M is [Ra + ImmS20]
          ------------------------------------------------------------------------------------------------------------
          .ADD   .U32 .S32 .U64  .F32.FTZ.RN  .F16x2.RN .F64.RN          Rd = M; M = M + Rb;
          .MIN   .U32 .S32 .U64* .S64* .F16x2.RN                         Rd = M; M = min(M, Rb);
          .MAX   .U32 .S32 .U64* .S64* .F16x2.RN                         Rd = M; M = max(M, Rb);
          .INC   .U32                                                    Rd = M; M = (M >= Rb)? 0 : (M + 1);
          .DEC   .U32                                                    Rd = M; M = (M == 0 || M > Rb)? Rb : M - 1;
          .AND   .U32 .S32 .U64*                                         Rd = M; M = M & Rb;
          .OR    .U32 .S32 .U64*                                         Rd = M; M = M | Rb;
          .XOR   .U32 .S32 .U64*                                         Rd = M; M = M ^ Rb;
          .EXCH  .U32 .S32 .U64                                          Rd = M; M = Rb;
          .SAFEADD         .U64*                                         Increment put pointer in a circular queue.
          .CAS   .U32 .S32 .U64                                          Rd = M; if (M == Rb) M = Rc;
          ------------------------------------------------------------------------------------------------------------
          * SPA 3.5 (and higher)



Description:

ATOM.op performs atomic operation .op with register Rb on global memory at a generic thread address, and returns the prior memory value to register Rd. The generic byte address is computed as the 32-bit addition of register Ra plus the 32-bit sign-extended immediate offset ImmS20, which is then zero-extended to 40-bits. If the .E extension is specified, the generic byte address is computed as the sum of the 64-bit value (Ra,Ra+1) plus the sign-extended immediate offset ImmS32.

ATOM combines register Rb with global memory location [Ra + ImmS20] atomically, without intervening accesses to that memory location by other threads:

    atomic {                    // Atomic operation on global memory location [Ra + ImmS20]
        .sz M = mem[Ra + ImmS20];       // Read memory location
            Rd = M;                     // Return prior memory location value to register Rd
            M = .op(M, Rb);             // Form atomic operation result value
            mem[Ra + ImmS20] = M;       // Write memory location
    }

ATOM.CAS performs an atomic compare-and-swap operation on global memory. It requires one or more extra register(s) for the compare value, which are provided as Rc.

{Rb,Rc} are expected to be consecutive registers, naturally aligned based on .sz. Specifically, in ATOM.CAS.32, Rb must be R2n+0(even register) (and cannot be RZ), and Rc must be Rb+1 or RZ. Similarly, for ATOM.CAS.64, Rb must be R4n+0 (and cannot be RZ) and Rc must be Rb+2 or RZ.

Other atomic operations assemble the omitted Rc as RZ.

The generic thread address space accesses global memory, unless it falls in the Local or Shared address window. An ATOM instruction must address global memory, otherwise it generates an invalid address space error.

When used in a pixel shader, ATOM has helper pixels and killed pixels automatically predicated off by the HW to prevent unwanted writes to global memory. If the pixel's raster coverage is 0 or it has previously been killed using the KIL operation, the threads will not participate in any ATOM operations.

Memory addresses must be naturally aligned, on a byte address that is a multiple of the access size. Misaligned addresses cause a misaligned address error. An address outside an allocated memory region causes an address out-of-range error.

ATOM interprets memory data in little-endian byte order: the effective address specifies the least-significant data bits.

Examples:

ATOM.ADD.F32.FTZ.RN    R0, [R1 - 400], R9;      #
ATOM.ADD.U64           R0, [R4 + 8], R2;        # even-odd registers for 64-bit ops
ATOM.ADD.S32           R9, [0xfff00], R8;       # absolute 20-bit address
ATOM.MIN.S64           R0, [R4 + 8], R2;        # signed min stored

Back to Index of Instructions