LDG : Load from Global Memory

Format

SPA 5.0:
        {@{!}Pg}   LDG{.E}{.cop}{.sz}   Rd, [Ra + ImmS24]       {&req_6}   {&rdN}   {&wrN}   {?sched}   ;   // Global Load                         
        {@{!}Pg}   LDG{.E}{.cop}{.sz}   Rd, [     ImmU24]       {&req_6}   {&rdN}   {&wrN}   {?sched}   ;   // Global Load from absolute address   

2 more variants support tiled resources (sparse status predicate). 
        Note that immediate offset range is reduced to 20 bits in these variants 
        {@{!}Pg}   LDG{.E}{.cop}{.sz}   Ps, Rd, [Ra + ImmS20]   {&req_6}   {&rdN}   {&wrN}   {?sched}   ;   // Global Load                         
        {@{!}Pg}   LDG{.E}{.cop}{.sz}   Ps, Rd, [     ImmU20]   {&req_6}   {&rdN}   {&wrN}   {?sched}   ;   // Global Load                         

 .E       Extended address (64 bits, requires two registers)

 .cop:  { .CA*, .CG, .CS, .LU, .CV , .CI}    // Cache all*, global, streaming, last-use, volatile, inconsistent
          .CA*   Cache at all levels, likely to be accessed again (default)
          .CG    Cache at global level (cache in L2 and below, not L1 )
          .CS    .CS maps to .CA.
          .LU    .LU maps to .CG.
          .CV    Cache as volatile (consider cached system memory lines stale, fetch again).
          .CI    Cache as inconsistent data (expected to be used only with invariant data).

 .sz:     { .U8, .S8, .U16, .S16, .32*, .64, .128, .U.128  }  Bit size in memory, unsigned or sign-extended

  Ps   Predicate returning sparse tile status. Indiate that the surface access is happening to a page marked as sparse (not valid).
                Note: The encoding of Ps is bit inverted ie  0 => PT and  7 => P0.


Description

LDG loads register Rd from memory at a global thread address specified as [Ra + ImmS20] or as [ImmU20].

If register Ra is omitted, equal to RZ, or beyond the set of registers supported for the shader, the effective address is the zero-extended absolute unsigned immediate offset. An omitted Ra register is assembled as RZ. Otherwise, the effective address is equal to the sum of register Ra (or {Ra+1, Ra} when .E is specified), and the signed-extended signed immediate offset. A negative offset is written as [Ra - offset] or [Ra + -offset]. An omitted immediate offset is assembled as zero. All offsets are in bytes.

.sz = .U.128 (Uniform 128 bit load) can be used to provide a performance hint to the hardware that the access will likely be a uniform address for all threads. Before using it, please see the performance section of the programming guide for detail on how and when to use it.

Memory addresses will be forced to be naturally aligned, with no notification of unalignment.

Examples:

    LDG.32      R3, [R1];      // load 32 bits into R3 from byte address in R1
    LDG.E       R0, [R2];      // load 32 bits into R0 from 40-bit extended address in {R2, R3}

Back to Index of Instructions