STG : Store to global Memory

Format

SPA 5.0:
        {@{!}Pg}   STG{.E}{.cop}{.sz}   [Ra + ImmS24], Rb   {&req_6}   {&rdN}   {?sched}   ;   // Store                                               
        {@{!}Pg}   STG{.E}{.cop}{.sz}   [     ImmU24], Rb   {&req_6}   {&rdN}   {?sched}   ;   // Store to absolute address by omitting register Ra   

 .E:      Extended address (64 bits, requires two registers)

 .cop:    { .WB*, .CG, .CS, .WT}                      Cache write-back*, global, streaming, write-thru
          .WB*   Cache write-back all coherent levels (default*)
          .CG    Cache at global level (cache in L2 and below, not L1)
          .CS    Cache streaming, likely to be accessed once (mark for early eviction)
          .WT    Cache write-through (to system memory)

 .sz:     { .8, .U8, .S8, .16, .U16, .S16, .32*, .64, .128 }                Bit size stored in memory

Description

STG stores register Rb to memory at a global thread address specified as [Ra + ImmS32] or as [ImmU32].

If register Ra is omitted, equal to RZ, or beyond the set of registers supported for the shader, the effective address is the zero-extended absolute unsigned immediate offset. An omitted Ra register is assembled as RZ. Otherwise, the effective address is equal to the sum of register Ra (or {Ra+1, Ra} when .E is specified), and the signed-extended signed immediate offset. A negative offset is written as [Ra - offset] or [Ra + -offset]. An omitted immediate offset is assembled as zero. All offsets are in bytes.

Memory addresses must be naturally aligned, on a byte address that is a multiple of the access size. Misaligned addresses are forced to align to access size and can optionally raise an error.

When used in a pixel shader, the STG operation has helper pixels and killed pixels automatically predicated off by the HW to prevent unwanted writes to global memory. If the pixel's raster coverage is 0 or it has previously been killed using the KIL operation, the threads will not participate in any STG operations.

Examples:

STG.32    [R1 + 20], R3;          // store 32-bit R2 at 20 bytes offset from byte address in R1
STG.E     [R2 + 0x1234], R5;      // store 32-bit R5 at 40-bit extended address in (R2,R3) plus offset 0x1234
STG.64    [R1 + 24], R4;          // store 64-bit (R4,R5) at 24 bytes offset from byte address in R1
STG.8     [R1 + 24], R4;          // store 64-bit (R4,R5) at 24 bytes offset from byte address in R1

Back to Index of Instructions