EXIT : Exit Program
PEXIT : Pre-Exit

Format:

SPA 5.0:
                   PEXIT                 #OffsetS24   {&req_6}   {?sched}           ;   

        {@{!}Pg}   EXIT{.KEEPREFCOUNT}   {CC.test}    {&req_6}   {?sched>=?WAIT5}   ;   

  .test:  { .F,      .LT,     .EQ,     .LE,      .GT,      .NE,      .GE,   .NUM,     Signed numeric tests
            .NAN,    .LTU,    .EQU,    .LEU,     .GTU,     .NEU,     .GEU,  .T,       Signed or Unordered tests 
            .OFF,    .LO,     .SFF,    .LS,      .HI,      .SFT,     .HS,   .OFT,     Unsigned integer tests
            .CSM_TA, .CSM_TR, .CSM_MX, .FCSM_TA, .FCSM_TR, .FCSM_MX, .RLE,  .RGT  }   Clip State Machine tests

Description:

PEXIT:

Setup post exit synchronization point.

This instruction is used before an EXIT to specify the target address of a post warp exit routine. The target address is relative to the program counter of the instruction following the PEXIT instruction and is a signed byte address with the three lsbs being 0. The PEXIT instruction results in a token (which includes the target address) being pushed onto the CallReturnStack (CRS) but does not alter the program flow; execution continues normally to the next sequential instruction. This token is used to steer the program execution when CallReturnStack unwinds (say when EXIT is executed).

EXIT:

Exit normal shader program and transfer control to post exit synchronization point. Includes guard predicate and conditional test.

Conditional test is evaluated for active threads with TRUE guard predicate, and if TRUE, these threads are marked as completed. When all the threads of the warp are completed, the program transfers to post exit synchronization point.

After transferring to syncpoint specified by PEXIT, the warp continues execution without releasing any of it resources. When it executes one more EXIT at the end of the "post-amble" code, that's when the warp terminates, and releases its resources.

As the threads execute the final EXIT, barriers waiting on "all threads" must be checked to see if the exiting threads are the only threads that have not yet made it to a barrier for all threads in the CTA. If the exiting threads are holding up the barrier, the barrier is released.

.KEEPREFCOUNT. This enables new programming models where software allocated resources like memory pools are protected by CWD by tracking software allocated reference counters and throttling CTA launches when necessary resources are unavailable. Normally, in CTA grid based programming models, when all the warps of CTA exit, all the resources owned by the CTA are released by default. But for new programming modes like GWC, software may need to maintain ownership of resources (such as global memory pool) even after CTA exit. An example scenario arises with CTA continuations where a parent CTA exits but wants to keep its state alive in continuation buffer till all its child CTAs are completed and then get launched back to finish the processing. The .KEEPREFCOUNT option prevents a CTA exit from releasing its CWD-tracked resources upon exit. It is legal to mix EXIT and EXIT.KEEPREFCOUNT thread exits in the same CTA, but the CTA will maintain ownership of its CWD-tracked resource, if at least one of its thread exits with EXIT.KEEPREFCOUNT.

Note: Behavior of the EXIT instruction if used inside of the trap handler is undefined. If a warp inside the trap handler needs to be stopped, the warp-wide RTT.TERMINATE instruction should be used instead of EXIT.

Note: SW should not use synchronizing barriers in post-emble code between PEXIT and EXIT. The user code could be depending on threads exiting contributing to arrival count and using another barrier in post-emble code could lead to deadlock.

Examples:


# Here is an example of At-exit code implemented with PEXIT
LAUNCH_PC: 
         PEXIT LABEL_B;    # Stack = [[PEXIT, LABEL_B]]
         PRET  LABEL_A;    # Stack = [[PEXIT, LABEL_B] , [PRET, LABEL_A]]

        [ shader code goes here... no 'fallthrough' to A allowed ]

LABEL_A: 
         EXIT; # converts 'ret' threads to 'exit' threads. 
               # if the code has balanced CALL/RET tokens, PRET gets skipped over during unwind.
               # if code has RET without matching call (allowed in DX) then
               # [PRET, LABEL_A] catches it, and steers it to LABEL_A.
               # Then the EXIT will search for [PEXIT, LABEL_B] token during unwind.

LABEL_B: 

        [ exit handling code goes here... ]

         EXIT; // final exit to terminate warp

EXIT;

Back to Index of Instructions