SPA 5.0:.lvl: { .CTA, .VC, .GL, SYS }
{@{!}Pg}
MEMBAR.lvl{.ivall}
{&req_6}
{&rdN}
{?sched>=?WAIT5}
;
// Memory barrier wait
MEMBAR establishes memory transaction visibility ordering with respect to the pool of processing threads
defined by .lvl as below.
.CTA : (CTA level) defines pool of processing threads as one belonging to same CTA.
.VC : (Virtual Channel level) define pool of processing threads as one belonging to
same ordered virtual channel of memory subsystem.
.GL : (Global level) defines pool of processing threads as one belonging to same GPU
.SYS : (Global level) defines pool of processing threads as that of all processing threads in system
including threads communicating with peer-to-peer/PCIE protocols.
.ivall {, .IVALLD, .IVALLT, .IVALLTD }
.ivall indicates if MEMBAR will also perform a fused (atomic) cache invalidate operation.
with .IVALLD a CCTL.D.IVALL operation is performed prior to MEMBAR
with .IVALLT a CCTLT.IVALL operation is performed prior to MEMBAR
with .IVALLTD both CCTLT.IVALL and CCTL.D.IVALL operations are performed prior to MEMBAR.
Issuing a MEMBAR.lvl guarantees that all prior memory accesses (reads, writes, and atomics) from a thread will be ordered against any subsequent memory accesses from the thread.
The scope of the ordering constraint is defined by the .lvl and implicitly enforces all lower .lvl membars.
MEMBAR.lvl does not guarantee any ordering within either the prior or the subsequent groups of memory instructions.
The memory barrier enforces vertical (trasactions issued by single thread) ordering only. It makes no guarantees as of execution synchronization with other threads. For horizontal (across threads) synchronization, BAR instructions should be used instead of, or in addition to, MEMBAR.
Note: L1 and L2 caches are incoherent; stale data may still be read from other L1 caches even after MEMBAR.VC, MEMBAR.GL, or MEMBAR.SYS completes. However, stale data will not be read if a CCTL.IV is issued prior to the read, or if the read is performed using LD.CV
Additionally, with .ivall, MEMBAR can perform a fused (atomic) cache invalidate operation where CCTL.D.IVALL and/or CCTLT.IVALL are additionally ordered prior to MEMBAR and after the memory trasactions precedding MEMBAR for a give thread.
MEMBAR Levels:
.CTA CTA thread level
.VC SM Virtual Channel
.GL Global level
.SYS System level
MEMBAR Levels:
The following picture tries to help describe when a given MEMBAR is needed, in terms of which kinds of agents are trying to communicate.
The following list describes the set of operations considered for visibility ordering by MEMBAR
// Producer consumer handshake
ST [R2] R3; // Store Data R3 at address R2
MEMBAR.GL ST [R4] R5; // Store flag R5 signalling readiness of data R3 // Any consumer threat in "global" pool is guranteed to see updated value of data first before it sees updated value of flag. // Example of .IVALL