STL, STS : Store within Local or Shared Window

Format

SPA 5.0:
        {@{!}Pg}   STL{.cop}{.sz}   [Ra + ImmS24], Rb   {&req_6}   {&rdN}   {?sched}   ;   // Store within Local window          
        {@{!}Pg}   STS{.sz}         [Ra + ImmS24], Rb   {&req_6}   {&rdN}   {?sched}   ;   // Store within Shared window         

  Omit register Ra to specify an unsigned absolute address within a window:
        {@{!}Pg}   STL{.cop}{.sz}   [ImmU24], Rb        {&req_6}   {&rdN}   {?sched}   ;   // Store to absolute Local address    
        {@{!}Pg}   STS{.sz}         [ImmU24], Rb        {&req_6}   {&rdN}   {?sched}   ;   // Store to absolute Shared address   

.cop:    { .WB*, .CG, .CS, .WT }             Cache write-back*, global, streaming, write-thru 
          .WB*   Cache write-back all coherent levels (default*).
          .CG    Cache at global level (cache in L2 and below; L1 cache lines marked as evict-first.
          .CS    Cache streaming, likely to be accessed once (mark for early eviction).
          .WT    Cache write-through (to system memory).

.sz:     { .8, .U8, .S8, .16, .U16, .S16, .32*, .64, .128 } 

Description

STS and STL stores register Rb to memory at a shared or local thread addresses (respectively) specified as [Ra + ImmS32] or as [ImmU32].

If register Ra is omitted, equal to RZ, or beyond the set of registers supported for the shader, the effective address is the zero-extended absolute unsigned immediate offset. An omitted Ra register is assembled as RZ. Otherwise, the effective address is equal to the sum of register Ra (or {Ra+1, Ra} when .E is specified), and the signed-extended signed immediate offset. A negative offset is written as [Ra - offset] or [Ra + -offset]. An omitted immediate offset is assembled as zero. All offsets are in bytes.

Memory addresses must be naturally aligned, on a byte address that is a multiple of the access size. Misaligned addresses are forced to align to access size and can optionally raise an error. An address outside the window or outside the allocated memory within the window sets Rd to 0 and causes an error.

When used in a pixel shader, the STL operation has helper pixels and killed pixels automatically predicated off by the HW to prevent unwanted writes to local memory. If the pixel's raster coverage is 0 or it has previously been killed using the KIL operation, the threads will not participate in any STL operations.

Examples:

STL.32    [R3+0x1234], R1;   // store to Local address
STS.64    [R3 - 16], R4;     // store [R5,R4] to 64-bit Shared location
STS.32    [0x12], R1;        // store R1 to absolute location 0x12 in per-CTA Shared memory

Back to Index of Instructions