ppc-lang/doc/intermediate.md
2022-09-19 10:34:19 +03:00

9.3 KiB

++C Intermediate Lang (PPCIL 1.0)

This is a read-only binary format, used in the ++C ecosystem to represent executable code. It has an assembly-like structure (opcode, followed by operands), so that a translation to a native instruction set can be more robust. Still, this IL's instruction set implements instructions, not found in some instruction sets. The idea behind that decision is to use native optimizations where possible (for example, using the square root instruction of the x86_64 instruction set, instead of a common algorithm for all instruction sets). One important thing to note is that PPCIL is not just an itermediate for ++C. A compiler could be written that converts any language to PPCIL, for example: Java, C#, C++, C, etc.

Format

The format of a PPCIL blob is the following:

struct ppcil_t {
    version_t version;
    uint64_t consts_size;
    uint8_t consts[consts_size];
    uint64_t instructions_n;
    uint8_t instructions[instructions_n];
}

Version section

The first thing in a PPCIL blob is the version. The version format is identical with the one used ++C ecosystem. It is the first 64-bits in a PPCIL blob. It is used to specify the PPCIL format version used in the blob. An interpreter with an incompatible version can try to interpret the given PPCIL, but should give a warning if it will do so.

Constants section

This section is immediately after the version section. It is used to contain the constants, used by the code. The first part of it is an unsigned QWORDs integer, indicating the amount of bytes, following it, used to keep the constant data, used by the code.

Code section

This section is immediately after the constants section. It is used to contain the actual code. The first part of it is an unsigned QWORDs integer, indicating the count of instructions, following it. The code itself is an array of instructions (unsigned 8-bit integers).

The stack and the heap

The memory layout of a PPCIL interpreter is the usual stack/heap layout, with the only differnce that the stack is a part of the heap, so addresses from the stack can be obtained, like in the heap (if you work with the stack via pointers, all stack guarantees are overriden).

The stack contains the variables and any temporary calculation data. The heap contains all the constants, static variables and dynamically allocated memory.

Variables and parameters

The variables are contained on the stack and are used to contain temporary data druing the execution of the program. Parameters are the first n variables, that are automatically created and initialized with values by the interpreter. The parameters are passed by the caller of the piece of code. The parameters are the first n variables (where n is the amount of parameters that the function accepts). In some cases, the first variable (parameter) is the this context.

Instructions set

The instructions in PPCIL are stack-based. This is so to increase parsing speed of the intermediate language and to bring ++C and PPCIL closer. An instruction consists of an opcode, which is an unsigned 8-bit integer. The following table contains all the operators in PPCIL:

NOTE: the operand notation is as follows: size(name), so a 64-bit operand named count would be notated as follows: 8(count)

Commons

Name Opcode Operands Description
NOP 0x00 Does nothing
PAD 0x01 Pops two variables: a, b and if b < sizeof a, the last sizeof a - b bytes get trimmed from a. Else, b - sizeof a bytes get added at the end of a. The result is pushed
PADC 0x02 8(n) Executes PAD with b = n
EOF 0xFF Ends execution

Arithmetics (operandless)

Name Opcode Description
ADD 0x10 Pops two variables, adds them and pushes the result
SUB 0x11 Pops two variables, subtracts them and pushes the result
NEG 0x12 Flips the sign of the top variable in the stack
MUL 0x13 Pops two variables, multiplies them and pushes the result
IMUL 0x14 Pops two variables, multiplies them (signed) and pushes the result
DIV 0x15 Pops two variables, divides them and pushes the remainder, then the result
IDIV 0x16 Pops two variables, divides them (signed) and pushes the remainder, then the result
AND 0x17 Pops two variables, ANDs (a & b) them and pushes the remainder, then the result
OR 0x18 Pops two variables, ORs (a
XOR 0x19 Pops two variables, XORs (a ^ b) them and pushes the remainder, then the result
NOT 0x1A Flips the bits of the top variable
SL 0x1B Pops two variables: a (first popped), b and performs the operation a >> b. Pushes the result back on the stack
SR 0x1C Pops two variables: a (first popped), b and performs the operation a << b. Pushes the result back on the stack
INC 0x1D Adds one to the top variable
DEC 0x1E Subtracts one from the top variable

NOTE: All operations pick the size of the bigger variable, and do not expand it if an overflow occurs.

Float arithmetics (operandless)

Name Opcode Description
ADDF 0x20 Pops two variables, adds them as floats, and pushes the result
SUBF 0x21 Pops two variables, subtracts them as floats, and pushes the result
MULF 0x23 Pops two variables, multiplies them as floats, and pushes the result
DIVF 0x24 Pops two variables, divides them as floats, and pushes the result
SQRT 0x25 Pops a variable and pushes its square root
ITOF 0x26 Converts the top variable from a signed int to a 64-bit float
FTOI 0x27 Converts the top variable from a signed float to an int
F32 0x28 Converts the top variable from a 64-bit float to a 32-bit float
F64 0x29 Converts the top variable from a 32-bit float to a 64-bit float (first PAD 4 is executed)

NOTE: All instructions with floats first execute PAD 8, unless specified otherwise

Stack operations

Name Opcode Operands Description
PUSH 0x30 8(n) Allocates n bytes on the stack
PUSHC 0x31 8(size), size(val) Pushes val to the stack
PUSHA 0x32 8(size), size(val) Pushes val to the stack
LOAD 0x33 Pops a variable = a and pushes *a
LOADC 0x34 8(addr) Executes LOAD with a = addr
PUSHW 0x35 2(val) Executes PUSHC 2 val
PUSHD 0x36 4(val) Executes PUSHC 4 val
PUSHQ 0x37 8(val) Executes PUSHC 8 val
VADDR 0x38 8(n) Pushes the addres of the variable with index n
CADDR 0x39 8(n) Calculates n + consts, where consts is the start of the constants blob and pushes that address
DUP 0x3A Duplicates the top variable in the stack
POP 0x3B Pops the last allocated batch of memory from the stack
MOVA 0x3C Pops two variables: a, b and executes *b = a
MOVAC 0x3D 8(addr) Executes SAVE with b = addr
MOVV 0x3E 8(var) Executes SAVE with b = address of var

Function operations

Name Opcode Operands Description
CALL 0x40 8(ptr) Considering ptr points to the body of a function, calls the function
CALLP 0x41 Pops a variable = a an executes CALL a
CALLC 0x43 Pops a variable = a and calls a native function in the cdecl convention.
RET 0x44 Pops a variable = a, executes the ++C function end procedure and pushes a to the stack

Jumps (operandless)

Name Opcode Description
JMP 0x50 Pops a variable: n, and offsets the code execution n amount of bytes (relative to the end of the instruction)
JEQ 0x51 Pops two variables, and if they're equal, executes JMP
JNE 0x52 Pops two variables, and if they're not equal, executes JMP
JL 0x53 Pops two variables, and if the first one is less than the second, executes JMP
JG 0x54 Pops two variables, and if the first one is greater than the second, executes JMP
JLE 0x55 Pops two variables, and if the first one is less or equal to the second, executes JMP
JGE 0x56 Pops two variables, and if the first one is greater or equal to the second, executes JMP
JLU 0x57 Pops two variables, and if the first one is less than the second (unsigned), executes JMP
JGU 0x58 Pops two variables, and if the first one is greater than the second (unsigned), executes JMP
JLEU 0x59 Pops two variables, and if the first one is less or equal to the second (unsigned), executes JMP
JGEU 0x5A Pops two variables, and if the first one is greater or equal to the second (unsigned), executes JMP
JZ 0x5B Pops a variable, and if it's zero, executes JMP
JNZ 0x5C Pops a variable, and if it's not a zero, executes JMP

NOTE: All comparasion jumps will first pad the smaller variable to the size of the bigger, then compare them like integers.

NOTE2: All jumps must be within the boundaries of the PPCIL blob.

NOTE3: All jumps are preformed after the address has been padded to the native pointer size