~bpf.sol | 0x03: SBF Instruction Set

All Solana Bytecode Format (SBF) opcodes

#Sources

The family of eBPF ISAs lacks a specification document as you may know it from other architectures. Instead, each flavor of eBPF is defined by its VM implementation. As such, Solana's SBF is currently defined by the solana_rbpf Rust crate.

All info below was compiled together from these sources:

Note: This source is not authoritative. It may be outdated or conflict with actual on-chain behavior. If you find any such bugs, please consider sending corrections. And if this document helps you build software, please attribute this source – thanks!

#Assembler

Several flavors of SBF disassembly exist. Unfortunately, they are not mutually compatible.

  • rbpf-style: prefix notation with C-style indirect addressing.
    Seen in rbpf-cli and tools based on the Rust implementation of Solana.
  • Capstone-style: similar to rbpf-style with different ALU mnemonics.
    Seen in binary analysis tools like radare2.
  • LLVM-style: infix notation resembling C pseudocode.
    Seen with LLVM tools used in the build process.

#Opcode Tables

#Legacy Load/Store class

The BPF_LD opcode family consists of special purpose opcodes inherited from Linux eBPF where they were used to access network packet input data.

OpcoderbpfCapstoneLLVMNotes
0x18lddwlddwr1 = 0x42 llTwo insns long
0x30ldabsbr1 = *(u8 *)skb[3]Deprecated
0x28ldabshr1 = *(u16 *)skb[3]Deprecated
0x20ldabswr1 = *(u32 *)skb[3]Deprecated
0x38ldabsdwr1 = *(u64 *)skb[3]Deprecated
0x50ldindbr1 = *(u8 *)skb[r0]Deprecated
0x48ldindhr1 = *(u16 *)skb[r0]Deprecated
0x40ldindwr1 = *(u32 *)skb[r0]Deprecated
0x58ldinddwr1 = *(u64 *)skb[r0]Deprecated

ldabs* used to access instruction input data starting at 0x4_0000_0000 and ldind* was the indirect version thereof. These instructions are disabled now and have been replaced by the new load/store class.

Only lddw remains enabled on mainnet today. It moves a 64-bit immediate into a GPR. This makes it more of a 64-bit variant of mov32 than an actual load instruction. (Note that mov64 with an immediate is a synonym to mov32 – the name is misleading).

#Load/Store class

The BPF_LDX and BPF_STX classes provide common memory operations.
Memory is referenced using a base address from a GPR and an immediate offset.

Three operation groups are available:

  • ldx*: Load value from memory into register (8-bit, 16-bit, 32-bit, 64-bit)
  • st*: Store immediate value into memory (8-bit, 16-bit, 32-bit, 32-bit zero extended to 64-bit)
  • stx*: Store value from register into memory (8-bit, 16-bit, 32-bit, 64-bit)
OpcoderbpfCapstoneLLVMNotes
0x71ldxbldxbr1 = *(u8 *)(r2 + 42)
0x69ldxhldxhr1 = *(u16 *)(r2 + 42)
0x61ldxwldxwr1 = *(u32 *)(r2 + 42)
0x79ldxdwldxdwr1 = *(u64 *)(r2 + 42)
0x72stbstb*(u8 *)(r2 + 42) = 69
0x6asthsth*(u16 *)(r2 + 42) = 69
0x62stwstw*(u32 *)(r2 + 42) = 69
0x7astdwstdw*(u64 *)(r2 + 42) = 69Immediate is 32-bit
0x73stxbstxb*(u8 *)(r2 + 42) = r1
0x6bstxhstxh*(u16 *)(r2 + 42) = r1
0x63stxwstxw*(u32 *)(r2 + 42) = r1
0x7bstxdwstxdw*(u64 *)(r2 + 42) = r1

#64-bit ALU class

The 64-bit ALU instructions operate on general-purpose registers. The stack pointer (r10) can only be used as a source operand.

Each operation has two forms:

  • rD ← OP(rD, imm)
  • rD ← OP(rD, rS)
OpcoderbpfCapstoneLLVMNotes
0x07add64add64r1 += 0x42
0x0fadd64add64r1 += r2
0x17sub64sub64r1 -= 0x42
0x1fsub64sub64r1 -= r2
0x27mul64mul64r1 *= 0x42
0x2fmul64mul64r1 *= r2
0x37div64div64r1 /= 0x42
0x3fdiv64div64r1 /= r2
0x47or64or64r1 |= 0x42
0x4for64or64r1 |= r2
0x57and64and64r1 &= 0x42
0x5fand64and64r1 &= r2
0x67lsh64lsh64r1 <<= 0x42
0x6flsh64lsh64r1 <<= r2
0x77rsh64rsh64r1 >>= 0x42
0x7frsh64rsh64r1 >>= r2
0x87neg64neg64r1 = -r1
0x97mod64mod64r1 %= 0x42
0x9fmod64mod64r1 %= r2
0xa7xor64xor64r1 ^= 0x42
0xafxor64xor64r1 ^= r2
0xb7mov64mov64r1 = 0x42same as mov32
0xbfmov64mov64r1 = r2
0xc7arsh64arsh64
0xcfarsh64arsh64
0xe7sdiv64
0xefsdiv64

#32-bit ALU class

The 32-bit ALU instructions mostly follow their 64-bit counterparts. They operate on the lower word of the input registers. The upper half of destination registers gets implicitly zeroed.

OpcoderbpfCapstoneLLVM
0x04add32addw1 += 0x42
0x0cadd32addw1 += w2
0x14sub32subw1 -= 0x42
0x1csub32subw1 -= w2
0x24mul32mulw1 *= 0x42
0x2cmul32mulw1 *= w2
0x34div32divw1 /= 0x42
0x3cdiv32divw1 /= w2
0x44or32orw1 |= 0x42
0x4cor32orw1 |= w2
0x54and32andw1 &= 0x42
0x5cand32andw1 &= w2
0x64lsh32lshw1 <<= 0x42
0x6clsh32lshw1 <<= w2
0x74rsh32rshw1 >>= 0x42
0x7crsh32rshw1 >>= w2
0x84neg32negw1 = -w1
0x94mod32modw1 %= 0x42
0x9cmod32modw1 %= w2
0xa4xor32xorw1 ^= 0x42
0xacxor32xorw1 ^= w2
0xb4mov32movw1 = 0x42
0xbcmov32movw1 = w2
0xc4arsh32arshw1 s>>= 0x42
0xccarsh32arshw1 s>>= w2
0xe4sdiv32
0xecsdiv32

#Endian ALU extension

OpcoderbpfCapstoneLLVMNotes
0xd4le{n}le{n}r1 = le{n} r1Basically a mask
0xdcbe{n}be{n}r1 = be{n} r1Swaps endianness

The LE/BE instructions operate in 16-bit, 32-bit, or 64-bit mode, indicated by the values in the immediate field.
For example, 0xdc with destination register 1 and immediate 32 is be32 r1.
They were used for portable endianness conversions. Since Solana is always little-endian, only the BE instruction swaps bytes.

  • le16 rD is equivalent to rD &= 0xFFFF.
  • le32 rD is equivalent to rD &= 0xFFFF_FFFF.
  • le64 rD is a nop.
  • be16 rD swaps the lower 2 bytes and zeroes the upper 6.
  • be32 rD reverses the order of the lower 4 bytes and zeros the upper 4.
  • be64 rD reverses the order of all 8 bytes.

#Jump class

The jump instructions combine comparisons and conditional jumps. Compared to x86 or PowerPC, this simplifies the ISA by removing the condition register and using less opcodes.

The conditional jump instructions (all except ja) compare a register either against another register or an immediate value.

OpcoderbpfCapstoneLLVM
0x05jajmpgoto +12
0x15jeqjeqif r0 == r1 goto +12
0x1djeqjeqif r0 == 42 goto +12
0x25jgtjgtif r0 > r1 goto +12
0x2djgtjgtif r0 > 42 goto +12
0x35jgejgeif r0 >= r1 goto +12
0x3djgejgeif r0 >= 42 goto +12
0x45jsetjsetif r0 & r1 != 0 goto +12
0x4djsetjsetif r0 & 42 != 0 goto +12
0x55jnejneif r0 != r1 goto +12
0x5djnejneif r0 != 42 goto +12
0x65jsgtjsgtif r0 s> r1 goto +12
0x6djsgtjsgtif r0 s> 42 goto +12
0x75jsgejsgeif r0 s>= r1 goto +12
0x7djsgejsgeif r0 s>= 42 goto +12
0xa5jltjltif r0 < r1 goto +12
0xadjltjltif r0 < 42 goto +12
0xb5jlejleif r0 <= r1 goto +12
0xbdjlejleif r0 <= 42 goto +12
0xc5jsltjsltif r0 s< r1 goto +12
0xcdjsltjsltif r0 s< 42 goto +12
0xd5jslejsleif r0 s<= r1 goto +12
0xddjslejsleif r0 s<= 42 goto +12

#Call class

The call-related opcodes push/pop the call frame stack and stack pointer.
The call frame stack is a protected data structure that can only be accessed by the call class (comparable to x86 shadow stacks).

OpcoderbpfCapstoneLLVMNotes
0x85callcallcall 0x1234
0x8Dcallxcallxcallx r3Not part of kernel eBPF. Register idx in imm field
0x95exitexitexit
  • call saves a call frame and enters a syscall or jumps to a target indicated by the immediate.
  • callx saves a call frame and jumps to the absolute address in the given register. The register index is stored in the immediate field of the instruction.
  • exit restores a call frame and jumps to its return address.

Resolving call targets

The 32-bit immediate field of the call opcode contains the hash of target symbol name or syscall name. The VM resolves a call target hash using two immutable lookup maps for syscalls and jump targets, which are constructed on program load. Syscalls take precedence over jump targets. The VM aborts when a hash cannot be resolved.

The hash algorithm is Murmur3 in 32-bit mode on the UTF-8 encoding of the symbol name.

See the next post for the syscalls available in Sealevel: 0x04: Sealevel Syscalls.