AL2P ALD AST ATOM ATOMS B2R BAR BFE BFI BPT BRA BRK BRX CAL CCTL CCTLL CCTLT CONT CS2R CSET CSETP DADD DEPBAR DFMA DMNMX DMUL DSET DSETP EXIT F2F F2I FADD FADD32I FCHK FCMP FFMA FFMA32I FLO FMNMX FMUL FMUL32I FSET FSETP FSWZADD GETCRSPTR GETLMEMBASE HADD2 HADD2_32I HFMA2 HFMA2_32I HMUL2 HMUL2_32I HSET2 HSETP2 I2F I2I IADD IADD3 IADD32I ICMP IDE IMAD IMAD32I IMADSP IMNMX IMUL IMUL32I IPA ISBERD ISCADD ISCADD32I ISET ISETP JCAL JMP JMX KIL LD LDC LDG LDL LDS LEA LEPC LONGJMP LOP LOP3 LOP32I MEMBAR MOV MOV32I MUFU NOP OUT P2R PBK PCNT PEXIT PIXLD PLONGJMP POPC PRET PRMT PSET PSETP R2B R2P RAM RED RET RRO RTT S2R SAM SEL SETCRSPTR SETLMEMBASE SHF SHFL SHL SHR SSY ST STG STL STP STS SUATOM SULD SURED SUST SYNC TEX TEXS TLD TLD4 TLD4S TLDS TMML TXA TXD TXQ VABSDIFF VABSDIFF4 VADD VMAD VMNMX VOTE VSET VSETP VSHL VSHR XMAD
Index of Floating Point Instructions | ||
---|---|---|
Opcode | Description | |
FADD | FP32 Add | |
FADD32I | FP32 Add | |
FCHK | Single Precision FP Divide Range Check | |
FCMP | FP32 Compare to Zero and Select Source | |
FFMA | FP32 Fused Multiply and Add | |
FFMA32I | FP32 Fused Multiply and Add | |
FMNMX | FP32 Minimum/Maximum | |
FMUL | FP32 Multiply | |
FMUL32I | FP32 Multiply | |
FSET | FP32 Compare And Set | |
FSETP | FP32 Compare And Set Predicate | |
FSWZADD | FP32 Add used for FSWZ emulation | |
IPA | Interpolate Attribute | |
MUFU | Multi Function Operation | |
RRO | Range Reduction Operator FP | |
DADD | FP64 Add | |
DFMA | FP64 Fused Mutiply Add | |
DMNMX | FP64 Minimum/Maximum | |
DMUL | FP64 Multiply | |
DSET | FP64 Compare And Set | |
DSETP | FP64 Compare And Set Predicate | |
HADD2 | FP16 SIMD Addition | |
HADD2_32I | FP16 SIMD Addition | |
HFMA2 | FP16 SIMD Fused Multiply and Add | |
HFMA2_32I | FP16 SIMD Fused Multiply and Add | |
HMUL2 | FP16 SIMD Multiply | |
HMUL2_32I | FP16 SIMD Multiply | |
HSET2 | FP16 SIMD Compare and Set | |
HSETP2 | FP16 SIMD Compare and Set Predicate | |
Index of Integer Instructions | ||
Opcode | Description | |
BFE | Bit Field Extract | |
BFI | Bit Field Insert | |
FLO | Find Leading One | |
IADD | Integer Addition | |
IADD3 | 3-input Integer Addition | |
IADD32I | Integer Addition | |
ICMP | Integer Compare to Zero and Select Source | |
IMAD | Integer Multiply And Add | |
IMAD32I | Integer Multiply And Add | |
IMADSP | Extracted Integer Multiply And Add. | |
IMNMX | Integer Minimum/Maximum | |
IMUL | Integer Multiply | |
IMUL32I | Integer Multiply | |
ISCADD | Scaled Integer Addition | |
ISCADD32I | Scaled Integer Addition | |
ISET | Integer Compare And Set | |
ISETP | Integer Compare And Set Predicate | |
LEA | Compute Effective Address | |
LOP | Logic Operation | |
LOP3 | 3-input Logic Operation | |
LOP32I | Logic Operation | |
POPC | Population count | |
SHF | Funnel Shift | |
SHL | Shift Left | |
SHR | Shift Right | |
XMAD | Integer Short Multiply Add | |
Index of Video Instructions | ||
Opcode | Description | |
VABSDIFF | Integer Byte/Short Absolute Difference | |
VADD | Integer Byte/Short Addition | |
VMAD | Integer Byte/Short Multiply Add | |
VMNMX | Integer Byte/Short Minimum/Maximum | |
VSET | Integer Byte/Short Set | |
VSETP | Integer Byte/Short Compare And Set Predicate | |
VSHL | Integer Byte/Short Shift Left | |
VSHR | Integer Byte/Short Shift Right | |
VABSDIFF4 | Integer SIMD Byte Absolute Difference | |
Index of Conversion Instructions | ||
Opcode | Description | |
F2F | Floating Point To Floating Point Conversion | |
F2I | Floating Point To Integer Conversion | |
I2F | Integer To Floating Point Conversion | |
I2I | Integer To Integer Conversion | |
Index of Movement Instructions | ||
Opcode | Description | |
MOV | Move | |
MOV32I | Move | |
PRMT | Permute Register Pair | |
SEL | Select Source with Predicate | |
SHFL | Warp Wide Register Shuffle | |
Index of Predicate/CC Instructions | ||
Opcode | Description | |
CSET | Test Condition Code And Set | |
CSETP | Test Condition Code and Set Predicate | |
PSET | Combine Predicates and Set | |
PSETP | Combine Predicates and Set Predicate | |
P2R | Move Predicate Register To Register | |
R2P | Move Register To Predicate/CC Register | |
Index of Texture Instructions | ||
Opcode | Description | |
TEX | Texture Fetch | |
TLD | Texture Load | |
TLD4 | Texture Load 4 | |
TMML | Texture MipMap Level | |
TXA | Texture Virtual AA | |
TXD | Texture Fetch With Derivatives | |
TXQ | Texture Query | |
TEXS | Texture Fetch with scalar/non-vec4 source/destinations | |
TLD4S | Texture Load 4 with scalar/non-vec4 source/destinations | |
TLDS | Texture Load with scalar/non-vec4 source/destinations | |
STP | Set Texture Phase | |
Index of Graphics Load/Store Instructions | ||
Opcode | Description | |
AL2P | Attribute Logical to physical (translate) | |
ALD | Attribute Load | |
AST | Attribute Store | |
ISBERD | Read from ISBE structures used by VTG shaders | |
OUT | Output Token | |
PIXLD | Pixel Load | |
Index of Compute Load/Store Instructions | ||
Opcode | Description | |
LD | Load from generic Memory | |
LDC | Load Constant | |
LDG | Load from Global Memory | |
LDL | Load within Local Memory Window | |
LDS | Local within Shared Memory Window | |
ST | Store to generic Memory | |
STG | Store to global Memory | |
STL | Store within Local or Shared Window | |
STS | Store within Local or Shared Window | |
ATOM | Atomic Operation on generic Memory | |
ATOMS | Atomic Operation on Shared Memory | |
RED | Reduction Operation on generic Memory | |
CCTL | Cache Control | |
CCTLL | Cache Control | |
MEMBAR | Memory Barrier | |
CCTLT | Texture Cache Control | |
SUATOM | Surface Reduction | |
SULD | Surface Load | |
SURED | Atomic Reduction on surface memory | |
SUST | Surface Store | |
Index of Control Instructions | ||
Opcode | Description | |
BRA | Relative Branch | |
BRX | Relative Branch Indirect | |
JMP | Absolute Jump | |
JMX | Absolute Jump Indirect | |
SSY | Set Synchronization Point | |
SYNC | Converge threads after conditional branch | |
CAL | Relative Call | |
JCAL | Absolute Call | |
PRET | Pre-Return From Subroutine | |
RET | Return From Subroutine | |
BRK | Break | |
PBK | Pre-Break | |
CONT | Continue | |
PCNT | Pre-continue | |
EXIT | Exit Program | |
PEXIT | Pre-Exit | |
LONGJMP | Long-Jump | |
PLONGJMP | Pre-Long-Jump | |
KIL | Kill Thread | |
BPT | BreakPoint/Trap | |
IDE | Interrupt disable/enable | |
RAM | Restore Active Mask | |
RTT | Return From Trap | |
SAM | Set Active Mask | |
Index of Miscellaneous Instructions | ||
Opcode | Description | |
NOP | No Operation | |
CS2R | Move Special Register to Register | |
S2R | Move Special Register to Register | |
LEPC | Load Effective Program Counter | |
B2R | Move Barrier To Register | |
BAR | Barrier Synchronization | |
R2B | Move Register to Barrier | |
VOTE | Vote Across SIMD Thread Group | |
DEPBAR | Dependency Barrier | |
GETCRSPTR | Get Call Return Stack Pointer | |
GETLMEMBASE | Get Local Memory Base Pointer | |
SETCRSPTR | Set Call Return Stack Pointer | |
SETLMEMBASE | Set Local Memory Base Pointer |
MOV R4, c[0xa][0x0]; # [000128]moves data from a contant buffer into register R4. References to constant buffer memory in disassembled instructions are of the form "c[A][B]", where "c" indicates a reference to a constant buffer in GPU hardware. The first index ("[0xa]" in the above example) indicates which constant bank (buffer binding) the instruction is reading from. The second index ("[0x0]" in the above example) is the byte offset into the bank. There are a total of 18 constant banks available per shader stage. 4 banks are reserved by the compiler and NVN implementation for various purposes: internal data, such as driver-managed constants, shader constants, non-uniform buffer uniform data, or other non-user data. 14 banks are reserved for backing user-defined uniform buffers in the shaders, and these banks start with constant bank "c[0x3]".
HW Constant bank | Purpose |
---|---|
c[0x0] | Reserved for driver-managed constants |
c[0x1] | Immediate constants in shader code, extracted by the compiler |
c[0x2] | Bound resource uniforms (images and samplers) for the shader stage |
c[0x3] through c[0x10] | User uniform buffer bindings 0 through 13 for the shader stage |
c[0x11] | Reserved by the driver |
SSBO bindings | c[0x0] entries |
---|---|
Vertex SSBO bindings 0 through 15 | c[0x0][0x110 through 0x20F] |
Tess Control SSBO bindings 0 through 15 | c[0x0][0x210 through 0x30F] |
Tess Eval SSBO bindings 0 through 15 | c[0x0][0x310 through 0x40F] |
Geometry SSBO bindings 0 through 15 | c[0x0][0x410 through 0x50F] |
Fragment SSBO bindings 0 through 15 | c[0x0][0x510 through 0x60F] |
Compute API bindings | Constant buffer locations |
---|---|
Uniform buffer bindings 0 through 4 (if non-emulated) | c[0x3] through c[0x7] |
Uniform buffer bindings 0 through 13 (if emulated with global loads) | c[0x0][0x210 through 0x2F0]; 16 bytes each |
SSBO bindings 0 through 15 | c[0x0][0x310 through 0x40F]; 16 bytes each |
layout(binding=4) uniform sampler2D smp;declares a variable _smp_ that is associated with API binding point #4 for the shader stage. Unlike OpenGL, NVN has separate API binding points for each shader stage. The handles used by these binding points are stored in a per-stage internal constant buffer bound to hardware constant bank #2 ("c[0x2]") at a pre-defined fixed offset based on the assigned binding for each uniform in the shader. Samplers, separate textures/samplers, and images are represented as 8 byte entries per binding.
Byte range in c[0x2] | Usage |
---|---|
c[0x2][<0x0 through 0x1F >] | Reserved for internal use |
c[0x2][<0x20 through 0x11F>] | API combined texture/sampler bindings 0 through 31; 8 bytes each |
c[0x2][<0x120 through 0x15F>] | API Image bindings 0 through 7; 8 bytes each |
c[0x2][<0x160 through 0x167>] | Reserved for internal use |
c[0x2][<0x168 through 0x567>] | API texture-only bindings 0 through 127; 8 bytes each |
c[0x2][<0x568 through 0x667>] | API sampler-only bindings 0 through 31; 8 bytes each |
TEXS.NODEP.P R2, R0, R4, R4, 0xe, 2D, RGBA; # [000010]the "0xe" (14) indicates that the hardware will fetch the texture or image descriptor at an offset of 14*4 = 56 bytes from the beginning of the bound resource uniform constant buffer. That would refer to combined texture/sampler binding #3.
IPA R5, a[0x80], R4; # [0001b8] ATTR0 IPA R6, a[0x84], R4; # [0001c8] GENERIC_ATTRIBUTE_00_Y IPA R7, a[0x88], R4; # [0001d0] GENERIC_ATTRIBUTE_00_Z IPA R8, a[0x8c], R4; # [0001d8] GENERIC_ATTRIBUTE_00_Winterpolates (IPA) the four components of generic vector attribute 0, which has byte offsets in the range [0x80, 0x8F]. In GLSL shaders, a layout qualifier like:
layout(location=4) in vec4 value;will associate _value_ with generic vector attribute 4, which has an associated offset of 0xC0.