SimpleScalar DEF File Format Overview By: Todd Austin, December 2003, austin@simplescalar.com INTRODUCTION ------------ This document details the SimpleScalar DEF file format. DEF files are stylized C macro files that describe how to decode and execute instructions for a particular SimpleScalar target. By modifying this file, it is possible to add new instructions or change or enhance the semantics of existing instructions. All SimpleScalar simulators rely on DEF files to get information about decoding and executing instruction, as a result, changes you make in a DEF file will be immediately understood by all SimpleScalar simulators. DESCRIBING THE DECODE TREE -------------------------- Undoubtedly the most subtle portion of the DEF file specification is the decode tree. The decode tree specifies to the instruction interpreter how an instruction word should be inspected to determine what the instruction does. The process of decode is implemented by the interface: MD_SET_OPCODE(enum md_opcode OP, md_inst_t INST) Where INST is the instruction word to decode, and OP is the variable that will be set with the fully decoded instruction opcode enumerator. The OP variable must be of type "enum md_opcode", which is defined in machine.h. In the event the instruction is invalid, the special instruction enumerator OP_NA is returned. This interface is implemented with a macro in machine.h. The decode tree is described in the opcode information contained in the following DEF file interfaces: MD_TOP_OP, DEFLINK, CONNECT, and DEFINST. The mechanics of how these interfaces translate into a decoder are somewhat complicated and fortunately not necessary to understand to build a new decoder. If one wishes to study the decoder generator, study the function md_init_decoder() in the file machine.c. This function builds a decode tree, stored in the arrays md_mask2op[], md_opshift[], md_opmask[], and md_opoffset[]. The implementation of the interface MD_SET_OPCODE walks the decode tree arrays to arrive at at the enumerator assigned to the instruction. MD_TOP_OP defines the top-level opcode field of the instruction, used by the MD_SET_OPCODE implementation to "get started" with decode. Every instruction set has a top-level opcode field that the decode inspects first. It is defined as follows in machine.h (ARM target shown): #define MD_TOP_OP(INST) (((INST) >> 24) & 0x0f) This interface returns an opcode value that "connects" to DEFINST and DEFLINK definitions at the top of the DEF file. DEFINST defines an instruction enumerator, its opcode information, and execution semantics. For decoding, the enumerator and opcode fields are most important. It is defined as follows in machine.def (ARM target example shown): DEFINST(BR, 0x0a, "b%c", "%j", IntALU, F_CTRL|F_DIRJMP, DNA, DNA, DNA, DCOND, DNA, DNA, DNA, DNA, DNA) The enumerator, defined in the first field, is returned by MD_SET_OPCODE when this instruction is decoded. The opcode information, defined in the second field, will cause the MD_SET_OPCODE interface to return BR when MD_TOP_OP() returns the value 0x0a. As such, when the top-level opcode of an ARM instruction is 0x0a, it is a branch (BR) instruction. DEFINST definitions can be in any order, with the follow caveats. If there are two definitions for a particular opcode, the latter definition in the file is recognized. In addition, it is possible to define a DEFINST that matches a range of opcodes. In the following definition: DEFINST(TST, 0xff00, "tst%c", "%n,%m", IntALU, F_ICOMP, DNA, DNA, DNA, DCOND, DNA, DNA, DNA, DNA, DNA) The instruction TST will match any opcode from 0x00..0xff. If followed by a duplicate definition of on (or more) opcodes, for example: DEFINST(SWP, 0x09, "swp%c", "%d,%w,%s,%n", IntMULT, F_ICOMP, DNA, DNA, DNA, DCOND, DNA, DNA, DNA, DNA, DNA) The SWP instruction definition will override the ranged definition of 0x09 by TST. Consequently, opcodes 0x00..0x08 and 0x0a..0xff will match to TST and opcode 0x09 will match to SWP. Similar ranged opcode semantics are supported by the DEFLINK interfaces (detailed below). Some opcodes to not indicate a particular instruction, but rather indicate that another field in the instruction must be examined to determine precisely which instruction. For these opcodes, define a DEFLINK decode tree node that CONNECTS to another decode tree definition further down in the DEF file. For example, in ARM a top-level opcode of 0x08 indicates that the instruction is one of many post-update load/store multiple instructions. Consequently, this opcode is defined to be a DEFLINK node: DEFLINK(BLKPOST_LINK, 0x08, "blkpost_link", 20, 0x0f) The first instruction is the enumerator of the instruction class, the second field is the opcode of the link node. The third field is a descriptive string used for debugging decode tables. The final two fields indicate where the next opcode field in the instruction is located. The fourth field is the number of bits to shift the instruction right, after which is is AND'ed with the value in the fifth field to produce a new opcode for decode. All DEFLINK decode nodes must have an accompanying CONNECT definition later in the DEF file. The later CONNECT definition for this example is: CONNECT(BLKPOST_LINK) The DEFINST and DEFLINK definition between this CONNECT and the next define the opcodes that match the new opcode produced in the earlier DEFLINK definition. For example, if the bits 20-23 of the instruction are 0x01, it will match this definition that follows the CONNECT node: DEFINST(LDM_L, 0x01, "ldm%c%a", "%n,%R", RdPort, F_MEM|F_LOAD|F_DISP|F_UCODE, DNA, DNA, DNA, DNA, DNA, DNA, DNA, DNA, DNA) There is not restriction as to the depth of the decode tree. The top-level opcode can match a DEFLINK which can defined an opcode that matches another DEFLINK, and so on. Eventually, however, the decode will reach a DEFINST and match an actual instruction, or reach a point where no instruction is defined for a particular opcode, at which point MD_SET_OPCODE will return the invalid instruction enumerator OP_NA. DESCRIBING INSTRUCTION SEMANTICS -------------------------------- The remainder of the instruction definition DEFLINK described instruction disassembly, instruction semantics, and various other instruction attributes. For example, the ARM instruction MUL has the following instruction definition: #define MUL_IMPL \ { \ if (COND_VALID(PSR)) \ { \ if ((RM) == (RN)) \ { \ SET_GPR(RN, 0); \ DECLARE_FAULT(md_fault_invalid); \ } \ if ((RN) == MD_REG_PC) \ { \ DECLARE_FAULT(md_fault_invalid); \ } \ \ SET_GPR(RN, GPR(RM) * GPR(RS)); \ } \ } DEFINST(MUL, 0x00, "mul%c", "%n,%w,%s", IntMULT, F_ICOMP, DGPR(RN), DNA, DNA, DCOND, DGPR(RM), DGPR(RS), DNA, DNA, DNA) The first two fields of the DEFINST node are for instruction decoding, as described above. The third and fourth fields are used by the disassembler to print out the instruction. See the function md_print_insn() in machine.c for details on instruction decoding. The fifth field (IntMULT) indicates the instruction resource requirement, in this case the instruction requires an integer multiplier resource. See resource.h for information on SimpleScalar resource definitions. The sixth field (F_ICOMP) defines the instruction "flags". Instruction flags are returned by the interface: MD_OP_FLAGS(OP) Which returns the flags defined for instruction enumerator OP. The instruction flags are defined by as constants in machine.h. For MUL, the instruction has a single flag of F_ICOMP, indicating that the instruction is an integer computation. Multiple flags can be defined at the same time, for example the instruction ARM STRH_PO has the following flags F_MEM|F_STORE|F_DISP|F_UCODE Indicating that the instruction is a memory operation, a store, uses an immediate value, and produces a micro-op stream on complex microarchitectures. Additional flags can be defined by modifying the file machine.h. The ninth through thirteenth (DGPR(RN), DNA, DNA, DCOND, DGPR(RM), DGPR(RS), DNA) fields indicate the three output register dependencies and three input register dependencies for the instruction. These dependencies describe which instructions are read and written by the instruction. They are used by scheduler, value predictors, address predictors, and other microarchitectural models to access instruction operands. For an example use of these fields, please see the instruction dependency decoder in sim-outorder's scheduler, in the function ruu_dispatch() in sim-outorder.c. The final two fields in the DEFINST node are used by the x86 DEF file; they will be documented in the future. The final portion of the instruction definition is the instruction semantics definition. SimpleScalar instruction semantics are specified with a C code snippet. For the MUL instruction, the #define constant MUL_IMPL holds the C code snippet that implements the MUL instruction. The C code snippet used to implement the instruction can use any valid C constructs, including function calls, loops, local variable definitions, and additional macros. Once important aspect of instruction implementation code snippets is that they cannot access target architecture registers, memory, or I/O devices directly. Instead, all accesses to state and I/O must be through predefined macro interfaces. By restricting this access to pre-defined macro interfaces, it is possible to instantiate instruction interpreter implementations with varied state access semantics. For example, a simple functional simulator could be instantiated with simple array accesses for integer register file accesses, while a more complicated out-of-order simulator model could instantiate an interpreter implementation where register accesses were made through a register renaming mechanism. The register accessor macros (e.g., GPR(RD)) are defined in machine.h. The memory accessor macros (e.g., MEM_READ()) are defined in memory.h. The I/O access macros are defined in syscall.h. The One last note, instruction implementation expressions are not pretty, but they are portable to a wide variety of compilers, so we must continue to use this approach. INSTANTIATING INSTRUCTION INTERPRETERS -------------------------------------- An instruction interpreter consists of three parts. An example of a simple functional instruction decoder is shown below for the ARM instruction set. The first step in executing an instruction is decoding it: /* decode the instruction */ MD_SET_OPCODE(op, inst); The instruction is contained in the md_inst_t variable INST. The opcode enumerator of the instruction INST is stored into the enum md_opcode variable OP. Next, the instruction must be interpreted, that is execute such that it can make the necessary updates to the machine state and I/O interfaces. To accomplish this an interpreter implementation must be instantiated. To instantiate an interpreter implementation, the register, memory and I/O accessor macros must be defined. In the following code sequence, the accessors are defined for an ARM instruction set: /* general purpose registers */ #define GPR(N) (regs.regs_R[N]) #define SET_GPR(N,EXPR) \ ((void)(((N) == 15) ? setPC++ : 0), regs.regs_R[N] = (EXPR)) /* processor status register */ #define PSR (regs.regs_C.cpsr) #define SET_PSR(EXPR) (regs.regs_C.cpsr = (EXPR)) /* precise architected memory state accessor macros */ #define READ_BYTE(SRC, FAULT) \ ((FAULT) = md_fault_none, addr = (SRC), MEM_READ_BYTE(mem, addr)) #define READ_HALF(SRC, FAULT) \ ((FAULT) = md_fault_none, addr = (SRC), MEM_READ_HALF(mem, addr)) #define READ_WORD(SRC, FAULT) \ ((FAULT) = md_fault_none, addr = (SRC), MEM_READ_WORD(mem, addr)) #define READ_QWORD(SRC, FAULT) \ ((FAULT) = md_fault_none, addr = (SRC), MEM_READ_QWORD(mem, addr)) #define WRITE_BYTE(SRC, DST, FAULT) \ ((FAULT) = md_fault_none, addr = (DST), MEM_WRITE_BYTE(mem, addr, (SRC))) #define WRITE_HALF(SRC, DST, FAULT) \ ((FAULT) = md_fault_none, addr = (DST), MEM_WRITE_HALF(mem, addr, (SRC))) #define WRITE_WORD(SRC, DST, FAULT) \ ((FAULT) = md_fault_none, addr = (DST), MEM_WRITE_WORD(mem, addr, (SRC))) #define WRITE_QWORD(SRC, DST, FAULT) \ ((FAULT) = md_fault_none, addr = (DST), MEM_WRITE_QWORD(mem, addr, (SRC))) /* system call handler macro */ #define SYSCALL(INST) sys_syscall(®s, mem_access, mem, INST, TRUE) Next, the instruction interpreter can be instantiated. This is accomplished by #include'ing the machine.def file into the simulator code at the point where an instruction interpreter implementation is required. Given the accessor macros defined above, the resulting C code will implement an interpreter that accesses register, memory and I/O as prescribed for the simulator that is being built. The following code sequence instantiates an ARM instruction interpreter: /* execute the instruction */ switch (op) { #define DEFINST(OP,MSK,NAME,OPFORM,RES,CLASS,O1,O2,O3,I1,I2,I3,I4,OFLAGS,IFLAGS)\ case OP: \ SYMCAT(OP,_IMPL); \ break; #define DEFLINK(OP,MSK,NAME,MASK,SHIFT) \ case OP: \ panic("attempted to execute a linking opcode"); #define CONNECT(OP) #define DECLARE_FAULT(FAULT) \ { fault = (FAULT); break; } #include "machine.def" default: panic("attempted to execute a bogus opcode"); } The code sequence is a switch statement, with the code to run selected by the opcode enumerator returned by MD_SET_OPCODE. An arm of the case statement is instantiated for each instruction defined (DEFINST) in the machine.def file. In addition, the occurrence of an invalid opcode or a DEFLINK defined opcode indicates a simulator error, consequently, the fatal() interface is called to terminate the simulation. The DECLARE_FAULT macro indicates to the instruction interpreter what should be accomplished if a fault is detected during instruction interpretation. In this simulator instance, the variable FAULT is set with the fault that occurred, and the switch statement is then exited. Finally, the interpreter needs to check for an instruction fault, and if so terminate the simulator (or handle the fault): /* check for an instruction fault */ if (fault != md_fault_none) fatal("fault (%d) detected @ 0x%08p", fault, regs.regs_PC); In addition, the ARM instruction interpreter "watches" for updates to R15, the PC register in the general purpose register file. If this occurs, the next PC of the instruction (regs.regs_NPC) is updated to reflect the computed jump target address: /* did instruction write to the PC register? (e.g., ARM target) */ if (setPC) regs.regs_NPC = GPR(MD_REG_PC); ADDITIONAL DETAILS ------------------ The MD_MAX_MASK constant, defined in machine.h, indicates the maximum size of the instruction decoder tables. In the event the decoder tables overflow during simulator startup, this constant's value must be increased. At the end of the machine.def file are a sequence of #undef's definitions, one for each #define in the machine.def file. These #undef's are necessary to allow the machine.def file to be included multiple times in the same C source file. If additional #define's are added to machine.def, an accompanying #undef should also be included at the end of the file. ADDING AND USING NEW INSTRUCTIONS --------------------------------- Adding new instruction to SimpleScalar can be accomplished by defining an additional DEFINST node in the machine.def file for an undefined opcode. In addition to defining the instruction in the interpreter, it is useful to utilize the instruction in software algorithms. Perhaps the most convenient way to add new instructions to a program is to define instruction functions in GNU GCC. An instruction function is an inlined GNU GCC function call that resolved to a single instruction. By integrating these function calls into a source-level algorithm implementation, it is possible to create a high-quality compilation utilizing the new instruction. The following code sequence demonstrates the integration of a new multiply instruction into the ARM instruction set. Two different GNU GCC inlining options are demonstrated. int multiply_int32(int a, int b) { int c; asm(".long ((0x01 << 25) | (0x00 << 23) | ((XXX%0) << 9) | ((XXX%1) << 4) | (XXX%2))" : "=r" (c) : "r" (a) , "r" (b)); return c; } #define __multiply_int32(a,b) \ ({ int c; asm(".long ((0x01 << 25) | (0x00 << 23) | ((XXX%0) << 9) | ((XXX%1) << 4) | (XXX%2))" : "=r" (c) : "r" (a) , "r" (b)); c; }) void foo(int a, int b) { int c, d; c = multiply_int32(a, b); d = __multiply_int32(a, b); printf("Hello world...the answer is %d\n", c+d); } void main(void) { foo(1, 3); return 0; } The following sequence will compile the instruction with the new instruction implementation: vi main.c arm-linux-gcc -O -S main.c sed "s/XXXr//g" main.s > ! main_fix.s arm-linux-gcc -O -c main_fix.s arm-linux-objdump -dl main_fix.o | less For additional information, please see the GNU GCC documentation concerning ASM instruction definitions, available from www.gnu.org.