MEMBAR : Memory Barrier

Format:

SPA 5.0:
{@{!}Pg} MEMBAR.lvl{.ivall} {&req_6} {&rdN} {?sched>=?WAIT5} ; // Memory barrier wait .lvl: { .CTA, .VC, .GL, SYS }
MEMBAR establishes memory transaction visibility ordering with respect to the pool of processing threads
defined by .lvl as below.
.CTA : (CTA level) defines pool of processing threads as one belonging to same CTA.
.VC : (Virtual Channel level) define pool of processing threads as one belonging to
same ordered virtual channel of memory subsystem.
.GL : (Global level) defines pool of processing threads as one belonging to same GPU
.SYS : (Global level) defines pool of processing threads as that of all processing threads in system
including threads communicating with peer-to-peer/PCIE protocols.

.ivall { , .IVALLD, .IVALLT, .IVALLTD }
.ivall indicates if MEMBAR will also perform a fused (atomic) cache invalidate operation.
with .IVALLD a CCTL.D.IVALL operation is performed prior to MEMBAR
with .IVALLT a CCTLT.IVALL operation is performed prior to MEMBAR
with .IVALLTD both CCTLT.IVALL and CCTL.D.IVALL operations are performed prior to MEMBAR.

Description:

Issuing a MEMBAR.lvl guarantees that all prior memory accesses (reads, writes, and atomics) from a thread will be ordered against any subsequent memory accesses from the thread.

The scope of the ordering constraint is defined by the .lvl and implicitly enforces all lower .lvl membars.

MEMBAR.lvl does not guarantee any ordering within either the prior or the subsequent groups of memory instructions.

The memory barrier enforces vertical (trasactions issued by single thread) ordering only. It makes no guarantees as of execution synchronization with other threads. For horizontal (across threads) synchronization, BAR instructions should be used instead of, or in addition to, MEMBAR.

Note: L1 and L2 caches are incoherent; stale data may still be read from other L1 caches even after MEMBAR.VC, MEMBAR.GL, or MEMBAR.SYS completes. However, stale data will not be read if a CCTL.IV is issued prior to the read, or if the read is performed using LD.CV

Additionally, with .ivall, MEMBAR can perform a fused (atomic) cache invalidate operation where CCTL.D.IVALL and/or CCTLT.IVALL are additionally ordered prior to MEMBAR and after the memory trasactions precedding MEMBAR for a give thread. 

MEMBAR Levels:
.CTA CTA thread level
.VC SM Virtual Channel
.GL Global level
.SYS System level

MEMBAR Levels:

.CTA
CTA thread level
  • Ordering of reads, writes, and atomics is only guaranteed within the scope of threads in the same CTA.
  • For communication within a CTA, MEMBAR.CTA is the appropriate type of MEMBAR.
.VC
Virtual channel level
  • Ordering of reads, writes, and atomics is only guaranteed within the scope of all threads on all SMs, and other clients on the same blocking virtual channel, but not clients on different channels.
  • For communication between CTAs within a grid, MEMBAR.VC is the appropriate type of MEMBAR
  • MEMBAR.VC will typically be more expensive (longer latency) than MEMBAR.CTA
.GL
Global level
  • Ordering of reads, writes, and atomics is only guaranteed within the scope of all other clients (eg. SM, CE, SKED, etc.) in the GPU.
  • For communication between SM threads and all clients in the same GPU, MEMBAR.GL is the appropriate type of MEMBAR
  • MEMBAR.GL will typically be more expensive (longer latency) than MEMBAR.VC
.SYS
System level
  • Ordering of reads, writes, and atomics is guaranteed with respect to all clients, including FB and those communicating via PCI-E, such as system and peer-to-peer memory.
  • This level of MEMBAR is required to ensure ordering with respect to a host CPU or other PCI-E peers.
  • MEMBAR.SYS will typically be much more expensive (longer latency) than MEMBAR.GL

The following picture tries to help describe when a given MEMBAR is needed, in terms of which kinds of agents are trying to communicate.

The following list describes the set of operations considered for visibility ordering by MEMBAR

Examples:

// Producer consumer handshake
ST [R2] R3; // Store Data R3 at address R2
MEMBAR.GL ST [R4] R5; // Store flag R5 signalling readiness of data R3 // Any consumer threat in "global" pool is guranteed to see updated value of data first before it sees updated value of flag. // Example of .IVALL

Back to Index of Instructions